Introduction to strong and weak symbols in C language

  • 2020-04-02 03:01:32
  • OfStack

The concept of notation, already mentioned in extern "C" usage details, is a compiler notation for variables and functions. The compiler also has different rules for producing notations for C and C++ code

Let's start with a simple piece of code


 
void hello(); 
int main() 

    hello(); 
    return 0; 


Obviously, this code is not going to link through, it's going to say undefined reference to hello, it's going to say hello is undefined, because we're only declaring the function hello, we're not defining it. But let's change the code a little bit as follows

__attribute__((weak)) void hello(); 
int main() 

    hello(); 
    return 0; 


And what you're going to find is that the compile link is going to pass, but the run is going to report an error, because we're going to declare hello as a weak symbol, and the weak symbol is going to be treated as a zero by the linker when we link, and of course executing a function with an address of zero is going to report an error, but it's not going to report an error, because it doesn't have any output

__attribute__((weak)) void hello(); 
int main() 

    if(hello) 
        hello(); 
    return 0; 

The compiler thinks that functions and initialized global variables are strong symbols, and uninitialized global variables are weak symbols. The linker has the following rules when dealing with strong and weak symbols

1. Strong symbols with the same name are not allowed in different object files
2. If a symbol is a strong symbol in an object file and a weak symbol in another object file, select a strong symbol
3. If A symbol is A weak symbol in all the target files, choose the one with the largest space, such as double global_var in the target file A and int global_var in file B. Double takes 8 bytes, which is greater than 4 bytes of int

We can simply verify this with the following two files


 
char global_var; 
int main() 

    return 0; 

 
 
int global_var; 

The global variable global_var is not initialized in either file, so it is a weak symbol. Execute the compile command GCC 1.c 2.c and use readelf to view the symbol table readelf-s a.out


Num:    Value          Size Type    Bind   Vis      Ndx Name 
62: 0000000000600818     4 OBJECT  GLOBAL DEFAULT   25 global_var 
63: 0000000000400474    11 FUNC    GLOBAL DEFAULT   13 main 
64: 0000000000400358     0 FUNC    GLOBAL DEFAULT   11 _init 

Here, the size occupied by the symbol global_var is 4, which means that the linker selects an int global_var with a larger space. We will modify it a little bit and initialize the global variable in 1.c as follows


 
char global_var = 1; 
int main() 

    return 0; 

 
 
int global_var; 

At this point, the global_var in 1.c is a strong symbol, and the global_var in 2.c is a weak symbol. Also, after compiling, readelf is used to view the symbol table readelf-s a.out as follows


Num:    Value          Size Type    Bind   Vis      Ndx Name 
62: 0000000000600818     1 OBJECT  GLOBAL DEFAULT   25 global_var 
63: 0000000000400474    11 FUNC    GLOBAL DEFAULT   13 main 
64: 0000000000400358     0 FUNC    GLOBAL DEFAULT   11 _init 

The size occupied by the symbol global_var is 1, indicating that the linker selected a strong symbol

When writing code, you should try to avoid having different types of symbols, which can cause very weird and subtle errors. To avoid this, you can take the following measures:

The best policy: eliminate all global variables
2. The solution: declare the global variable as static and provide an interface for access
3. The bottom line: global variables must be initialized, even if they are initialized to 0
4. Must-have: open GCC's -fno-common option, which disables different types of symbols

Say so many, as if to say should as far as possible, with a strong symbol that what's the use of weak symbols, the so-called existence is reasonable, sometimes we even need to display the definition of weak symbols, the library function can be useful, such as the weak symbols in the library can be user custom symbols cover, so as to realize the library version of the custom, or in the use of certain extension, the user can define a weak symbols, when links the function, function module can be normal use, if the remove function module, program can also be normal links, just lack of certain features, For example, we can use the following code to determine whether the program has linked to the pthread library and thus determine what action to take


 
#include <stdio.h> 
#include <pthread.h> 
 
__attribute__((weak)) int pthread_create(  
    pthread_t*,  
    const pthread_attr_t*,  
    void*(*)(void*),  
    void*); 
 
int main() 

    if (pthread_create) 
    { 
        printf("This is multi-thread version!n"); 
    } 
    else 
    { 
        printf("This is single-thread version!n"); 
    } 
    return 0; 

The results of the compile run are as follows


$ gcc test.c 
$ ./a.out 
This is single-thread version! 
$ gcc test.c -lpthread 
$ a.out 
This is multi-thread version! 


Related articles: