Deep understanding of C memory alignment

  • 2020-04-02 01:58:00
  • OfStack

I. preliminary explanation of memory alignment

Memory alignment can be summarized in one sentence:

"Data items can only be stored in memory locations where the address is an integer multiple of the size of the item."

For example, an int takes up four bytes, and the address can only be in the position of 0,4,8.

Case 1:


#include <stdio.h>
struct xx{
        char b;
        int a;
        int c;
        char d;
};
int main()
{
        struct xx bb;
        printf("&a = %p/n", &bb.a);
        printf("&b = %p/n", &bb.b);
        printf("&c = %p/n", &bb.c);
        printf("&d = %p/n", &bb.d);
        printf("sizeof(xx) = %d/n", sizeof(struct xx));
        return 0;
}

The implementation results are as follows:

&a = ffbff5ec
&b = ffbff5e8
&c = ffbff5f0
&d = ffbff5f4
sizeof(xx) = 16

Will be found between a and b empty out of the three bytes, that is to say, after b 0 xffbff5e9, 0 xffbff5ea, 0 xffbff5eb empty, a stored in the 0 xffbff5ec directly, because the size of a is 4, can only be stored in the position of four integer times. If you print the size of xx, you'll see that it's 16, and some of you might say, well, b leaves 3 bytes blank, so that's 13, right? What about the other three? This will be understood a little bit more as you read this article, but it is simply that the three bytes after d are also wasted, that is, they are also occupied by the structure.

The structure of the structure can be simply modified to reduce the use of memory, for example, the structure can be defined as:


struct xx{
        char b; 
        char d;
        int a;          
        int c;                  
};

The size of this structure is 12, which saves a lot of space. As you can see, when we define the structure, we must consider the effect of memory alignment, so that our program can take up less memory.

Two. Operating system default alignment coefficient

Each has its own operating system with the default memory alignment coefficient, if is the new version of the operating system, the default alignment coefficient is commonly 8, because the biggest type storage unit operating system definition is 8 bytes, such as long long (why it have to be like this, would explain) in the third quarter, there is no more than 8 bytes type (for example, int is 4, char is 1, long in the 32-bit compiler is 4 (compile time is 8). When the operating system's default alignment coefficient conflicts with the memory alignment theory described in the first section, the operating system's alignment coefficient is used as the benchmark.

Such as:

Assuming that the default alignment coefficient of the operating system is 4, variables of type long long do not meet the requirements described in the first section, that is, the structure of long long can be stored in a position divisible by 4 or divisible by 8.

You can change the default alignment coefficient of the operating system with the #pragma pack() statement. It is not recommended to change the default alignment coefficient when writing programs

Example 2:


#include <stdio.h>
#pragma pack(4)
struct xx{
        char b;
        long long a;
        int c;
        char d;
};
#pragma pack()
int main()
{
        struct xx bb;
        printf("&a = %p/n", &bb.a);
        printf("&b = %p/n", &bb.b);
        printf("&c = %p/n", &bb.c);
        printf("&d = %p/n", &bb.d);
        printf("sizeof(xx) = %d/n", sizeof(struct xx));
        return 0;
}

The printed result is:

&a = ffbff5e4
&b = ffbff5e0
&c = ffbff5ec
&d = ffbff5f0
sizeof(xx) = 20

Found that a, which took 8 bytes, was stored in a position not divisible by 8 and divisible by 4, taking the default alignment coefficient of the operating system.

Three. Reasons for memory alignment

Memory alignment is a strategy adopted by the operating system to quickly access memory, in short, to place secondary access to variables. When the operating system accesses memory, it reads a certain length at a time (this length is the operating system's default alignment coefficient, or an integer multiple of the default alignment coefficient). If there is no memory alignment, two accesses to the bus are generated in order to read a variable.

For example, if there is no memory alignment, the variable position of structure xx will be as follows:


struct xx{
        char b;         //0xffbff5e8
        int a;            //0xffbff5e9       
        int c;             //0xffbff5ed      
        char d;         //0xffbff5f1
};

The operating system first reads memory 0xffbff5e8-0xffbff5ef, and then reads memory 0xffbff5f8-0xffbff0 -0xffbff5f8. In order to get the value c, the two groups of memory need to be merged and consolidated, which severely reduces the efficiency of memory access. (which brings us to the age-old question: which is more important, space or efficiency? No discussion here.

So you can understand why the first variable in a structure, regardless of type, is divisible by 8 (because accessing memory starts at an integer multiple of 8, to increase the efficiency of the read)!

The problem of memory alignment mainly lies in understanding the distribution of composite structures such as structs in memory.

First, understand the concept of memory alignment.
Many real computer systems have restrictions on where primitive types of data can be stored in memory. They require that the first address of the data be a multiple of some number k(usually 4 or 8). This is called memory alignment.

This k behaves differently on different cpus and compilers. For example, 32-bit word long computer and 16-bit word long computer. This is a little far from us. Our development mainly involves two major platforms, Windows and Linux (Unix), and the compilers involved are also mainly Microsoft compilers (such as cl) and GCC.

The goal of memory alignment is to have the first address of each basic data type as a multiple of the corresponding k, which is the ultimate tool for understanding memory alignment. There is also a distinction to be made between compilers. Understanding these two points will pretty much solve all the memory alignment problems.

K in different compilers:
1. For Microsoft compilers, the size of each base type is this k. In general, char is of type 8, int is 32, long is 32, and double is 64.
2. For the GCC compiler under Linux, if the size is less than or equal to 2, the value of k is its size, and if the size is greater than or equal to 4, it is 4.

Now that you understand the above, it should be clear about the memory distribution of composite structures such as structs.

Let's take a look at the simplest type: struct members are basic data types, for example:


struct test1
{
char a;
short b;
int c;
long d;
double e;
};

Under Windows platform, Microsoft compiler:

Suppose starting from address 0, first of all, the k value of a is 1, its first address can be any location, so a occupies the first byte, namely address 0; And then the k value of b is 2, and his first address must be a multiple of 2, not 1, so the byte of address 1 is filled, and the first address of b is address 2, occupying address 2, 3; And then you go to c. c has a value of k of 4, and his first address is a multiple of 4, so his first address is 4, 5, 6, 7; And then you go to d, and d has a value of k of 4, so his first address is 8, which is 8, 9, 10, 11. Finally, e, his k value is 8, his first address is a multiple of 8, so address 12,13,14,15 is filled, his first address should be 16, occupying address 16-23. Obviously it has a size of 24.

This is how test1 is distributed in memory. We create a variable of type test1, and assign values of a, b, c, d, and e to 2, 4, 8, 16, and 32, respectively. Then, the corresponding hexadecimal number of each byte in memory is printed out from the low address.
2, 0, 4, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 40

Validation:
The inference is clearly correct.

Under the Linux platform, GCC compiler:
Suppose starting from address 0, first of all, the k value of a is 1, its first address can be any location, so a occupies the first byte, namely address 0; And then the k value of b is 2, and his first address must be a multiple of 2, not 1, so the byte of address 1 is filled, and the first address of b is address 2, occupying address 2, 3; And then you go to c. c has a value of k of 4, and his first address is a multiple of 4, so his first address is 4, 5, 6, 7; And then you go to d, and d has a value of k of 4, so his first address is 8, which is 8, 9, 10, 11. And then finally to e, which is different from the Microsoft compiler from here, it's not 8, it's still 4, so it starts at 12, it takes up 12-19. Obviously it has a size of 20.

Validation:
We create a variable of type test1, and assign values of a, b, c, d, and e to 2, 4, 8, 16, and 32, respectively. Then, the corresponding hexadecimal number of each byte in memory is printed out from the low address.
2, 0, 4, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 40, 40

The inference is clearly correct.

Next, take a look at a few special cases, and to avoid the hassle of describing the memory distribution, just calculate the structure size.

The first: nested structures


struct test2
{
char f;
struct test1 g;
};

Under Windows platform, Microsoft compiler:

This case if unwrapped the second member of the test2 to study distribution of memory, you can know, test2 members occupy at address 0 f, g.a occupy address 1, after the distribution of memory remains unchanged, still meet all basic data members first address to the corresponding k multiples of this principle, then the size of the test2 or 24. However, the size of test2 is actually 32, because the memory distribution of test1 cannot be changed because of the structure of test2. Therefore, in order for each member of test1 to still meet the alignment requirements, a certain number of bytes need to be filled after the member f. So test2 has 8 bytes more than test1, so test2 has a size of 32.

Under the Linux platform, GCC compiler:

Also, in this case if unwrapped the second member of the test2 to study distribution of memory, you can know, test2 members occupy at address 0 f, g.a occupy address 1, after the distribution of memory unchanged, still meet all basic data members first address to the corresponding k multiples of this principle, then the size of the test2 or 20. But the size of test2 is actually 24, again because the memory distribution of test1 cannot be changed because of the structure of test2, so in order for each member of test1 to still meet the alignment requirements, a certain number of bytes need to be filled after the member f, and it is not difficult to find that the number should be 3 to ensure the alignment of test1. So test2 has 4 bytes more than test1, so test2 has a size of 24.

Second: bit - segment alignment


struct test3
{
unsigned int a:4;
unsigned int b:4;
char c;
};

or

struct test3
{
unsigned int a:4;
int b:4;
char c;
};

Under Windows platform, Microsoft compiler:

Adjacent Numbers of the same type (signed and unsigned, as long as the base type is the same, also the same number) can be considered as a whole if they occupy no more than the size of the base type. Different types of Numbers should follow their own alignment.
For example, in test3, a and b can be considered as an int, so the size of test3 is 8 bytes. And the values of a and b are sequentially arranged in memory from the low position, the first 0-3 and 4-7 bits in the 4-byte region

If test4 is below the format


struct test4
{
unsigned int a:30;
unsigned int b:4;
char c;
};

So the size of test4 is 12 bytes, and the values of a and b are distributed in the first 30 bits of the first 4 bytes and the first 4 bits of the second 4 bytes, respectively.

Test5 is as follows


struct test5
{
unsigned int a:4;
unsigned char b:4;
char c;
};

Since int and char are of different types and are aligned in their respective ways, the size of test5 should be 8 bytes, with the values of a and b in the first 4 bits of the first 4 bytes and the first 4 bits of the fifth byte, respectively.

Under the Linux platform, GCC compiler:
The same code at the page code block index 8
Under GCC, when the sum of adjacent members, regardless of the same type, exceeds the size of the first member, the value of k is 1 in the structure, and the value of k outside the structure is the value of its basic type. To arrange in order in memory without exceeding.
As with test3, its size is 4. The values of a and b are sequentially arranged in memory as 0-3 and 4-7 bits in the first four bytes respectively.

If test4 is below the format


struct test4
{
unsigned int a:20;
unsigned char b:4;
char c;
};

The size of test4 is 4 bytes, and the values of a and b are distributed in the 0-19 and 20-23 bits of the first 4 bytes, respectively, while c is stored in the fourth byte.
Test5 is as follows

struct test5
{
unsigned int a:10;
unsigned char b:4;
short c;
};

So the size of test5 should be 4 bytes, and the values of a and b should be 0-9 and 10-13 bits. C is stored in the last two bytes. If the magnitude of a becomes 20
So the size of test5 should be 8 bytes. namely

struct test6
{
unsigned int a:20;
unsigned char b:4;
short c;
};

At this point, test6's a and b occupy a total of 0,1,2 and 3 bytes, and c's k value is 2. In fact, it can be the first position of 4 bits, but outside the structure, a should be aligned with an int. That is to say, if two consecutive test6 objects are stored in memory, the first position of a must be a multiple of 4, and then c must be filled with two more bits. So the size of test6 is eight bytes.

The bit segment structure part is more complex. That's all I know for now.


Related articles: