The problem of member variable migration in C structure is discussed in detail

  • 2020-05-10 18:30:19
  • OfStack

The principle of displacement of structures in c is simple, but often forgotten. Taking notes is a good way to remember

There are three principles:

a. The offset of the first address of all members in the structure must be an integer of the length of the data type of the device, where the offset of the first member is 0,

For example, if the second member type is int, the first address offset must be a multiple of 4, otherwise the first address will be "filled". And so on

b. The total number of bytes taken up by the struct, that is, the value returned by the sizeof () function, must be an integer multiple of the length of the maximum member, otherwise "padding at the end";

c. If A takes B as its member, the offset of the first address stored in B must be an integer multiple of the maximum length of the member data contained in B.

If the members of B are int, double, char, then the offset of B should be an integer multiple of 8. Otherwise, "fill in the middle."

I believe that you must have used structs in the c language program development process 1, so I wonder how you understand the member variable offset in the structure? In this article, I will share with you my recent thoughts and conclusions on structural migration in c language.

Example 1

Let's define requirements 1 first:

The known structure type is defined as follows:


struct node_t{
 char a;
 int b;
 int c;
};

And the structure 1Byte is aligned


#pragma pack(1)

O:

The offset of the member variable c in the structure struct node_t.

Note: the offset here refers to the offset relative to the starting position of the structure.

When I look at this problem, I believe that different people may come up with different solutions. Let's analyze the following possible solutions:

Method 1

If you are familiar with the library functions of c, then the first function that comes to mind is offsetof (just a macro, let's call it that). The prototype of man 3 offsetof is as follows:


 #include <stddef.h>

  size_t offsetof(type, member);

With the above library functions, we can do it with 1 line of code:


offsetof(struct node_t, c);

Of course, that's not the point of this article, so read on.


Method 2

When we are not familiar with the library functions of c language, we can still use our own methods to solve the problem.

The most straightforward idea is: [address of struct member variable c] minus [address of struct start]

Let's first define a structure variable node:


struct node_t node;

Then calculate the offset of the member variable c:


(unsigned long)(&(node.c)) - (unsigned long)(&node)

&(node.c) is the address of the structural member variable c and is forcibly converted to unsigned long;

&node is the starting address of the structure, which is also forcibly converted to unsigned long;

Finally, we subtract the above two values to get the offset of the member variable c.

Methods 3

According to method 2, we can get the offset of the member variable c without the help of library functions. But as programmers, we should be good at thinking, is it possible to make some improvements to the above code, so that our code becomes more concise? Before making any concrete improvements, we should analyze the problems in method 2.

Believe I don't need to say more, careful you 1 set already aware of, the most main method 2 one problem is that we had one custom node structure variables, although we can custom variable does not limit in the title, but when we meet with more strict and not allowed in the title of custom variables, at the moment we are thinking about a new solution.

Before exploring the new solution, let's explore a small problem related to migration:

A small problem

This is a simple geometric problem. Suppose we move from A to B on the coordinate axis. How do we calculate the offset of B to A? This question is so easy for us that most of us will probably blurt it out and get the answer B-A.

Is the answer completely accurate? The reason is that when A is the origin, A=0, the above answer B-A is directly simplified to B.

What does this simple little question tell us?

If we combine the idea of method 2 with the above small problems, do we soon get the following correlation:


(unsigned long)(&(node.c)) - (unsigned long)(&node)

and

B - A
The idea of our little problem is that when A is the origin of coordinates, B-A is simplified to B. Then, corresponding to our method 2, when the memory address of node is 0, i.e. (&node==0), the above code can be simplified to:


(unsigned long)(&(node.c))

Since node memory address ==0, so


node.c  // The structure of the body node Middle member variable c

Then we can express it in another way, as follows:


((struct node_t *)0)->c

The above code should be easy to understand. Since we know that the memory address number of the structure is 0, we can access the member variables of the structure directly through the memory address. The meaning of the corresponding code is to obtain the member variable c of struct node_t whose memory address is 0.

Note: this is just to use the compiler's features to calculate the structure offset, without any operation on the memory address 0. Some students may have some questions about this. For a detailed understanding of this problem, please refer to 1 point of thinking about the access method of c language structure member variables.

At this point, our offset solution eliminates the custom variable struct node_t node and solves it with 1 line of code:


#pragma pack(1)
0

Is the above code more concise than method 2?

Here we define the above code function as a macro that is used to calculate the offset of a member variable in a structure (the macro will be used in later examples) :


#pragma pack(1)
1

Using the macro above, we can directly get the offset of the member variable c in the structure struct node_t as follows:


#pragma pack(1)
2

Example 2

We define the requirements as follows:

The known structure type is defined as follows:


struct node_t{
 char a;
 int b;
 int c;
};

int *p_c, this pointer points to the member variable c of struct node_t x

Structure 1Byte aligned

#pragma pack(1)
O:

The value of the member variable b of the structure x?

When we get to this problem, let's do a simple analysis of 1, which means how to find the value of another member variable of a structure based on a pointer to a member variable of that structure.

Then the possible solutions are:

Method 1

Since we know that the structure is aligned with 1Byte, the simplest solution to this problem is:

*(int *)((unsigned long)p_c - sizeof(int))
The above code is very simple. The address of the member variable c is subtracted from sizeof(int) to get the address of the member variable b, which is then cast to int *, and finally the value of the member variable b is finally obtained.

Method 2

The code for method 1 is simple but does not extend well enough. We want to get p_node directly to the pointer to the structure through p_c, and then access any member variable of the structure through p_node.

From this, we can get the idea of calculating the starting address p_node of the structure as follows:

[address of member variable c p_c] minus [offset of c in structure]

From example 1, we get that the offset of the member variable c in struct node_t is:


#pragma pack(1)
4

Therefore, we get the starting address pointer p_node of the structure is:


#pragma pack(1)
5

We can also directly use the OFFSET_OF macro defined in example 1, and the code above becomes:


#pragma pack(1)
6

Finally, we can use the following code to get the values of the member variables a and b:


#pragma pack(1)
7

We also define the function of the above code as the following macro:

#define STRUCT_ENTRY(ptr, type, member) (type *)((unsigned long)(ptr)-OFFSET_OF(type, member)) The function of this macro is to obtain a pointer to any member variable of the structure.

We used the macro above to modify the previous code as follows:


STRUCT_ENTRY(p_c, struct node_t, c)

p_c is the pointer to the struct node_t member variable c;

struct node_t structure type;

c is the member variable pointed to by p_c;

Note:

Some notes on address operation in the example above:


#pragma pack(1)
9

set


p_a == 0x95734104 ; 

The following are the relevant results calculated by the compiler:


p_a + 10 == p_a + sizeof(int)*10 =0x95734104 + 4*10 = 0x95734144

(unsigned long)p_a + 10 == 0x95734104+10 = 0x95734114

(char *)p_a + 10 == 0x95734104 + sizeof(char)*10 = 0x95734114

From the above three situations, I believe you should be able to understand what I mean. (note: a subsequent blog post will elaborate on this issue from a compiler's perspective.)

conclusion

This article describes some interesting things about the c language structure with a few examples that I hope will help you. I believe that some of you have already seen some clues as to why you think about the above. This is also the topic of the following posts.

If there are any errors in the text, please point them out.


Related articles: