The distinction between CreateThread of and beginthread of is resolved in detail

2020-04-02 01:33:45
OfStack

We know that there are two ways to create a thread under Windows. One is to call the Windows API CreateThread() to create a thread. The other is to call the function _beginthread() or _beginthreadex() of MSVC CRT to create a thread. The corresponding ExitThread also has two functions, ExitThread() of the Windows API and _endthread() of the CRT. Both of these functions are used to create and exit threads. What's the difference?

Many developers are not aware of the relationship between the two, so they pick a function at random, see that there's nothing wrong with it, and are busy working on more pressing tasks rather than delving into them. Until one day suddenly discovered that a program running for a long time will have a small memory leak, the developer will never think that because of the two sets of functions mixed results.

Based on the relationship between the Windows API and the MSVC CRT, you can see that _beginthread() is a wrapper around CreateThread(), which ultimately calls CreateThread() to create the thread. So what does it do before _beginthread() calls CreateThread()? We can look at the source code for _beginthread(), which is in thread.c in the CRT source code. We can see that it requests a structure called _tiddata before calling CreateThread(), and then passes it to _beginthread() 's own thread entry function, _threadstart, after initializing the structure with the _initptd() function. _threadstart first saves the pointer to the _tiddata structure passed by _beginthread() to the explicit TLS array of the thread, and then it calls the user's thread entry to actually start the thread. After the user thread ends, the _threadstart() function calls _endthread() to end the thread. And _threadstart also wraps the user thread entry function with s/s to capture all unprocessed signals and hand them over to the CRT for processing.

So in addition to the signal, it's clear that the primary purpose of CRT wrapping the Windows API thread interface is that _tiddata. What is stored in this thread private structure? We can find its definition in mtdll.h, which holds crt-related and thread-private information such as thread ID, thread handle, erron, strtok() previous call location, rand() function seed, exception handling, and so on. It can be seen that MSVC CRT does not use the method of s/s (thread) we mentioned earlier to define the thread private variables, so as to prevent the library function from invalidating under multiple threads. Instead, it applies a _tiddata structure on the heap, places the thread private variables inside the structure, and stores the pointer of _tiddata by explicit TLS.

With this information in mind, we should be thinking that if we create a thread with CreateThread() and then call CRT's strtok() function, we should be wrong, because the _tiddata required for strtok() does not exist, but we never seem to have encountered such a problem. If you look at the strtok() function, you will see that when _getptd() is first called to get the _tiddata structure of the thread, this function will apply to the _tiddata structure if it finds that the thread has not applied to it and will be responsible for initialization. So no matter which function we call to create the thread, we can safely call all the functions that need _tiddata, because once the structure doesn't exist, it will be created.

So when is _tiddata released? ExitThread() certainly won't, because it doesn't know that there is a structure called _tiddata, so it's clear that _endthread() was released, which is exactly what CRT does. Many times, though, we find that even if we use CreateThread() and ExitThread() (which has the same effect as exiting the thread function without calling ExitThread()), we don't find any memory leaks, so why? On closer inspection, we found that the password was in the CRT DLL's entry function, DllMain. We know that when a process/thread starts or exits, each DLL's DllMain is called once, giving the dynamically linked CRT a chance to release the thread's _tiddata in the DllMain. However, DllMain only works if the CRT is a dynamically linked version, there is no DllMain in the statically linked CRT! This is the case where the use of CreateThread() causes a memory leak, in which _tiddata cannot be released at the end of the thread, causing a leak.

We can test this with this little program:


#include <Windows.h>
#include <process.h>
void thread(void *a)
{
    char* r = strtok( "aaa", "b" );
    ExitThread(0); //It doesn't matter whether the function is called or not
}
int main(int argc, char* argv[])
{
    while(1) {
        CreateThread(  0, 0, (LPTHREAD_START_ROUTINE)thread, 0, 0, 0 );
        Sleep( 5 );
    }
return 0;
}

If we use the dynamically linked CRT (/MD, /MDd), we will not have a problem. However, if we use the statically linked CRT (/MT, /MTd), we will see that the memory usage keeps increasing when we run the program and observe it in the process manager. However, if we change the ExitThread() in the thread() function to _endthread(), we will have no problem, because _endthread() will release _tiddata().

The question can be summed up as: When using a CRT (which almost all programs do), try to use the set of functions _beginthread()/_beginthreadex()/_endthread()/_endthreadex() to create threads. In MFC, there is a similar set of functions is AfxBeginThread() and AfxEndThread(), according to the above principle analogy, it is the thread wrapper function at MFC level, they will maintain the thread and MFC related structure, when we use MFC class library, try to use the thread wrapper function it provides to ensure that the program runs correctly.