An analysis based on the misuse of multithreading in an application

  • 2020-05-10 18:43:19
  • OfStack

1. Requirements and preliminary realization
Simple 1 windows service: the client connects to the mail server, downloads the mail (including attachments), saves it as.eml, and then deletes the mail from the server. The pseudo-code of the implementation is roughly as follows:


      public void Process()
        {
            var recordCount = 1000;// Number of mail records taken out each time 
            while (true)
            {
                using (var client = new Pop3Client())
                {
                    //1 Establish a connection and authenticate your identity 
                    client.Connect(server, port, useSSL);
                    client.Authenticate(userName, pwd);
                    var messageCount = client.GetMessageCount(); //  The number of messages in the mailbox 
                    if (messageCount > recordCount)
                    {
                        messageCount = recordCount;
                    }
                    if (messageCount < 1)
                    {
                        break;
                    }
                    var listAllMsg = new List<Message>(messageCount); // Use to temporarily save retrieved mail 
                    //2 After taking out the email, fill it into the list, at most each time recordCount email 
                    for (int i = 1; i <= messageCount; i++) // Mailbox indexes are based on 1 Initial, index range : [1, messageCount]
                    {
                        listAllMsg.Add(client.GetMessage(i)); // Take the message to the list 
                    }
                    //3 , traverse and save to the client, in the format of .eml
                    foreach (var message in listAllMsg)
                    {
                        var emlInfo = new System.IO.FileInfo(string.Format("{0}.eml", Guid.NewGuid().ToString("n")));
                        message.SaveToFile(emlInfo);// Save the message as .eml Format file 
                    }
                    //4 , traverse and delete 
                    int messageNumber = 1;
                    foreach (var message in listAllMsg)
                    {
                        client.DeleteMessage(messageNumber); // Delete messages (essentially, just type before closing the connection DELETE Tag, not really deleted)  
                        messageNumber++;
                    }
                    //5 , disconnect the connection and truly complete the deletion 
                    client.Disconnect();
                    if (messageCount < recordCount)
                    {
                        break;
                    }
                }
            }
        }

The open source component Mail.Net (actually a union of OpenSMTP.Net and OpenPop projects) was used to receive mail in development, and the interface implementation was simple to invoke. After writing the code, I found that the basic functions were satisfied. Based on the principle of being faster and more efficient on a stable basis, I finally performed performance tuning.

2. Performance tuning and BUG analysis
For the moment, whether the time-consuming operations here are computationally intensive or IO intensive, someone 1 sees a set that needs to be traversed sequentially, 1 by 1, and can't help but have the urge for multithreaded asynchronous parallel operations. Asynchronous conditional asynchronous induction as far as possible, without conditions, asynchronous, create conditions also want to really take advantage of multithreaded, powerful processing capacity to make full use of the server, and is confident the compasses wrote many multi-threaded programs, this business logic is simple and exception handling also relatively easy to control (even if there is a problem also have compensation measures, can improve it in the post processing), in theory need to check the number of emails every day will not too much, CPU were it not for a long time and memory killer, the multi-thread asynchronous service implementation should be acceptable. Moreover, based on the analysis, it is obvious that this is a typical frequently accessed IO intensive application, of course, from IO processing.

1. Get emails
As you can see from the sample code for Mail.Net, fetching mail requires an index starting at 1 and must be in order. If multiple requests are made asynchronously, how is the index passed in? I was a little hesitant at first about having to order this one, but if you go through Lock or Interlocked or something like that, you obviously lose the advantage of multithreading, and I guess it's probably not as fast as sequential synchronization.

Analyze, let's write some code to see how efficient it is.

Write a quick asynchronous method to pass the integer parameters, and control the change in the total number of extracted messages by Interlocked. After each asynchronous method is acquired, Message can be added to the listAllMsg list by Lock.

Mail server test mail is not much, the test to get 1 two mail, well, very good, the extraction of mail success, the initial adjustment of the harvest, gratifying to congratulate.

2. Save your email
The tuning process goes like this: the implementation code for traversing and saving as.eml is changed to use multithreading, and the message.SaveToFile save operation is processed in parallel.

3. Delete emails
Tune again: modify the code that iterates through deleted messages as if it were a multithreaded save operation, and also handle the deletion in parallel with multiple threads. Ok, very good, very good, at this time I was thinking what Thread ah, ThreadPool ah, CCR ah, TPL ah, EAP ah, APM ah, I know all can be used to it once, choose the best to use the optimal efficiency of the 1, very technical content, wahaha.

Then, I quickly wrote an asynchronous delete method to start the test. In the case of not much mail, such as 3 two letters, can work normally, it seems to be pretty fast.

By this point my heart was ready to celebrate.

4. Cause analysis of BUG
From the above 1, 2, 3 independent effects look, it seems that each thread can run independently without the need for mutual communication or data sharing, and the use of asynchronous multi-threading technology, the quick access to the fast delete is also fast, it seems that mail processing will enter the best state. But finally, extract, save, and delete the integrated debugging test. After running the log for 1 period of time, tragedy happened:

When there are many emails in the test, such as about 2310, we can see PopServerException abnormal in the log. It seems that there are some garbled codes, and the garbled codes seem to be different every time. Then test 3 two letters, found that sometimes can work normally, sometimes also throw PopServerException exception, or there is a messy code, analysis error stack, is in the place of deleting the mail.

I am kao, what kind of trouble is this? Did you not get along well with the email server? Why is PopServerException always abnormal?

Is there, is there something wrong with the asynchronous delete method? Asynchronous delete, index number 1, uh, index problem? I'm still not sure.

At this point can you see why multithreaded delete operations throw exceptions? You already know why? OK, this is not going to make any sense to you, so you don't have to read it.

Tell me about my search.

Looking at the log, I initially suspect that there is something wrong with the method of deleting emails, but it is still reliable after a visual inspection. Then it is estimated that the message encoding is incorrect when deleting, and then it is unlikely that the same message synchronization code will be checked, saved and deleted, and no exception will be thrown. I was not quite assured, and I tested several emails separately for several times. Those with attachments and those without, html's plain text, and the synchronization code handled well.

To find out, open the Mail.NET source code, from the DeleteMessage method tracking to Mail.Net Pop3Client SendCommand class SendCommand method, 1 immediately feel a clue. DeleteMessage delete the message source as follows:


        public void DeleteMessage(int messageNumber)
        {
            AssertDisposed();
            ValidateMessageNumber(messageNumber);
            if (State != ConnectionState.Transaction)
                throw new InvalidUseException("You cannot delete any messages without authenticating yourself towards the server first");
            SendCommand("DELE " + messageNumber);
        }

The last line of SendCommand needs to submit an DELE command. Follow along to see how it works:

        private void SendCommand(string command)
        {
            // Convert the command with CRLF afterwards as per RFC to a byte array which we can write
            byte[] commandBytes = Encoding.ASCII.GetBytes(command + "\r\n");
            // Write the command to the server
            OutputStream.Write(commandBytes, 0, commandBytes.Length);
            OutputStream.Flush(); // Flush the content as we now wait for a response
            // Read the response from the server. The response should be in ASCII
            LastServerResponse = StreamUtility.ReadLineAsAscii(InputStream);
            IsOkResponse(LastServerResponse);
        }

Note the InputStream and OutputStream attributes, which are defined as follows (the magic private modified attribute, which is rare) :

   /// <summary>
        /// This is the stream used to read off the server response to a command
        /// </summary>
        private Stream InputStream { get; set; }
        /// <summary>
        /// This is the stream used to write commands to the server
        /// </summary>
        private Stream OutputStream { get; set; }

It is assigned by calling the public void Connect(Stream inputStream, Stream outputStream) method in Pop3Client class, and the Connect method finally called by the Connect method is as follows:

      /// <summary>
        /// Connects to a remote POP3 server
        /// </summary>
        /// <param name="hostname">The <paramref name="hostname"/> of the POP3 server</param>
        /// <param name="port">The port of the POP3 server</param>
        /// <param name="useSsl">True if SSL should be used. False if plain TCP should be used.</param>
        /// <param name="receiveTimeout">Timeout in milliseconds before a socket should time out from reading. Set to 0 or -1 to specify infinite timeout.</param>
        /// <param name="sendTimeout">Timeout in milliseconds before a socket should time out from sending. Set to 0 or -1 to specify infinite timeout.</param>
        /// <param name="certificateValidator">If you want to validate the certificate in a SSL connection, pass a reference to your validator. Supply <see langword="null"/> if default should be used.</param>
        /// <exception cref="PopServerNotAvailableException">If the server did not send an OK message when a connection was established</exception>
        /// <exception cref="PopServerNotFoundException">If it was not possible to connect to the server</exception>
        /// <exception cref="ArgumentNullException">If <paramref name="hostname"/> is <see langword="null"/></exception>
        /// <exception cref="ArgumentOutOfRangeException">If port is not in the range [<see cref="IPEndPoint.MinPort"/>, <see cref="IPEndPoint.MaxPort"/> or if any of the timeouts is less than -1.</exception>
        public void Connect(string hostname, int port, bool useSsl, int receiveTimeout, int sendTimeout, RemoteCertificateValidationCallback certificateValidator)
        {
            AssertDisposed();
            if (hostname == null)
                throw new ArgumentNullException("hostname");
            if (hostname.Length == 0)
                throw new ArgumentException("hostname cannot be empty", "hostname");
            if (port > IPEndPoint.MaxPort || port < IPEndPoint.MinPort)
                throw new ArgumentOutOfRangeException("port");
            if (receiveTimeout < -1)
                throw new ArgumentOutOfRangeException("receiveTimeout");
            if (sendTimeout < -1)
                throw new ArgumentOutOfRangeException("sendTimeout");
            if (State != ConnectionState.Disconnected)
                throw new InvalidUseException("You cannot ask to connect to a POP3 server, when we are already connected to one. Disconnect first.");
            TcpClient clientSocket = new TcpClient();
            clientSocket.ReceiveTimeout = receiveTimeout;
            clientSocket.SendTimeout = sendTimeout;
            try
            {
                clientSocket.Connect(hostname, port);
            }
            catch (SocketException e)
            {
                // Close the socket - we are not connected, so no need to close stream underneath
                clientSocket.Close();
                DefaultLogger.Log.LogError("Connect(): " + e.Message);
                throw new PopServerNotFoundException("Server not found", e);
            }
            Stream stream;
            if (useSsl)
            {
                // If we want to use SSL, open a new SSLStream on top of the open TCP stream.
                // We also want to close the TCP stream when the SSL stream is closed
                // If a validator was passed to us, use it.
                SslStream sslStream;
                if (certificateValidator == null)
                {
                    sslStream = new SslStream(clientSocket.GetStream(), false);
                }
                else
                {
                    sslStream = new SslStream(clientSocket.GetStream(), false, certificateValidator);
                }
                sslStream.ReadTimeout = receiveTimeout;
                sslStream.WriteTimeout = sendTimeout;
                // Authenticate the server
                sslStream.AuthenticateAsClient(hostname);
                stream = sslStream;
            }
            else
            {
                // If we do not want to use SSL, use plain TCP
                stream = clientSocket.GetStream();
            }
            // Now do the connect with the same stream being used to read and write to
            Connect(stream, stream); //In/OutputStream Property initialization 
        }

1 now see TcpClient object, this is not based on Socket, Socket programming to achieve POP3 protocol operation instructions? There is no doubt that you need to initiate an TCP connection, what 3 handshakes, send the command to operate the server... It all came back to me.

We know that an TCP connection is a session (Session), and sending commands (such as get and delete) requires communication with the mail server via an TCP connection. If it is a multithreaded on one session sending commands (such as acquisition (TOP or RETR), delete (DELE)) server operation, the operation of these commands are not thread-safe, it is likely to appear OutputStream fight with InputStream data does not match each other, this is probably the reason why we see the log of the code. Speaking of thread safety, I suddenly realized that it might be a problem to check email. To verify my idea, I checked the source code of GetMessage method:


        public Message GetMessage(int messageNumber)
        {
            AssertDisposed();
            ValidateMessageNumber(messageNumber);
            if (State != ConnectionState.Transaction)
                throw new InvalidUseException("Cannot fetch a message, when the user has not been authenticated yet");
            byte[] messageContent = GetMessageAsBytes(messageNumber);
            return new Message(messageContent);
        }

The internal GetMessageAsBytes method finally followed the SendCommand method:

      if (askOnlyForHeaders)
            {
                // 0 is the number of lines of the message body to fetch, therefore it is set to zero to fetch only headers
                SendCommand("TOP " + messageNumber + " 0");
            }
            else
            {
                // Ask for the full message
                SendCommand("RETR " + messageNumber);
            }

According to my tracking, the scrambled code that threw an exception in the test was LastServerResponse(This is response response server back a issued to to it IsOKResponse IsOKResponse was to to it IsOKResponse IsOKResponse IsOKResponse back back IsOKResponse IsOKResponse back to to to it IsOKResponse IsOKResponse back PopServerException exception, which does not start with "+OK" :

    /// <summary>
        /// Tests a string to see if it is a "+OK" string.<br/>
        /// An "+OK" string should be returned by a compliant POP3
        /// server if the request could be served.<br/>
        /// <br/>
        /// The method does only check if it starts with "+OK".
        /// </summary>
        /// <param name="response">The string to examine</param>
        /// <exception cref="PopServerException">Thrown if server did not respond with "+OK" message</exception>
        private static void IsOkResponse(string response)
        {
            if (response == null)
                throw new PopServerException("The stream used to retrieve responses from was closed");
            if (response.StartsWith("+OK", StringComparison.OrdinalIgnoreCase))
                return;
            throw new PopServerException("The server did not respond with a +OK response. The response was: \"" + response + "\"");
        }

At this point, it turns out that the biggest pitfall is that Pop3Client is not thread-safe. Finally found the reason, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha, ha.

After a while, I finally calm down and reflect on myself. I made a very low-level mistake and felt dizzy. How could I forget TCP and thread safety? Ah, ah, ah, ah, ah, ah, ah, ah, ah, ah, ah, ah, ah, ah.

By the way, eml is saved through the SaveToFile method of the Message object, which does not need to communicate with the mail server, so there is no exception to the asynchronous saving (RawMessage in binary array does not have data mismatch). Its source code is as follows:


      /// <summary>
        /// Save this <see cref="Message"/> to a file.<br/>
        /// <br/>
        /// Can be loaded at a later time using the <see cref="LoadFromFile"/> method.
        /// </summary>
        /// <param name="file">The File location to save the <see cref="Message"/> to. Existent files will be overwritten.</param>
        /// <exception cref="ArgumentNullException">If <paramref name="file"/> is <see langword="null"/></exception>
        /// <exception>Other exceptions relevant to file saving might be thrown as well</exception>
        public void SaveToFile(FileInfo file)
        {
            if (file == null)
                throw new ArgumentNullException("file");
            File.WriteAllBytes(file.FullName, RawMessage);
        }

To sum up, let's look at how bug came into being: not being sensitive enough to TCP and thread safety, tuning performance when you see for loop, insufficient test data, and accidentally hitting a mine. Ultimately, the cause of the error is a poor choice of thread-safe asynchronous scenarios, and there are many such improper USES, typically the misuse of database connections. I read an article about the misuse of database connection objects, such as this article "explaining why to close the database connection, can you not close the problem in detail", which I also summarized at that time, so it is very impressive. Now we still need to be worshipful 1, for using1 Pop3Client or SqlConnection this way of sharing a connection to access the network may not be suitable for the use of multi-threading, especially when intensive communication with the server, even if the use of multi-threading technology, performance is not necessarily improved.

We often use some Libray or 1. NET client, such as FastDFS, Memcached, RabbitMQ, Redis, MongDB, Zookeeper etc, they all have to access the network and server communication and resolution protocol, and analyzed several client source code, remember FastDFS, Memcached and Redis within the client are 1 Pool implementation, the impression they are not thread safe risk. You must keep in accordance with the personal experience, the use of their fear of heart, maybe you use language and library programming experience are very friendly, API instructions easy to understand, call up looks easy, but want to use to use to also is not so easy, all the best fast 1 times the source code to understand roughly the implementation approach, otherwise such as over here are not familiar with internal implementation principle in the box is likely to fall into the trap of and don't even know it. When we refactor or tune using multithreading techniques, there is a profound problem that must not be overlooked. It is important to be aware of scenarios that are suitable for asynchronous processing, as well as cache scenario 1, which I think is even more important than how to write code. I have been particularly impressed by the fact that refactoring or tuning has to be done carefully, and that the data that the test relies on has to be well prepared. Many business system can run when the data is good, but in high concurrency data volume larger environment is easy to appear all sorts of puzzling problems, such as described in this article, the test multi-thread asynchronous access and delete the email, mail server on only one or two small email content and attachment, through asynchronous access and delete normal operation, no abnormal log, but the data more than 1, the abnormal log, screening, debugging, see the source code, and then screen... This article is out there.


Related articles: