C string memory allocation is Shared with resident pool learning

  • 2020-05-26 10:00:17
  • OfStack

When I first learned C#, I heard that CLR has a special memory management mechanism for the String class: sometimes, two objects of the String class are declared, but they point to the same instance. As follows:


String s1 ="Hello";
String s2 ="Hello";                       
//s2 and s1 The actual value is zero Hello
bool same = (object) s1 == (object) s2;
// Here is s1 , s2 Whether or not the same is cited 1 Object instances 
// So you can't write bool same = s1 == s2; 
// because String The class reloading == Operator to compare String The actual value that the object contains 

same here is going to be assigned true. That is, s1 and s2 really reference the same String object. Of course, it should be noted that both s1 and s2 are assigned to the same string Hello, which is why this happens.

Now we have a preliminary conclusion that when there are multiple string variables containing the same actual value of the string, CLR may not allocate memory for them repeatedly, but point them all to the same string object instance. (I said possible here, because in some cases, it does happen that multiple copies of the actual value of a string exist in memory at the same time. Read on.)

As we know, the String class has many special features, one of which is that it does not change (immutable). This means that each time we operate on an String object (say, using Trim, Replace, etc.), we do not actually modify the instance of the String object, but instead return a new String object instance as a result of the operation. Instance 1 of String object is generated and will not be changed until the end of time!

Based on features like the String class, it makes perfect sense for CLR to have a variable representing the same actual value of the string point to the same String instance. Because any modification to a reference to an String instance with any one reference does not actually affect the state of that instance, nor does it affect the actual value of the string represented by all other references to that instance. This way, CLR manages the memory allocation of the String class to optimize memory usage and avoid having redundant data in memory.

To implement this mechanism, CLR silently maintains a table called the resident pool (Intern Pool). This table records all references to string instances declared using literals in the code. This means that strings declared with literals go into the resident pool, while strings declared otherwise do not, and thus do not automatically benefit from CLR's mechanism for preventing string redundancy. This is an example of a situation I mentioned above where multiple copies of the actual value of a string exist in memory at the same time. Consider this example:


StringBuilder sb =new StringBuilder();
sb.Append("He").Append("llo");

string s1 ="Hello";
string s2 = sb.ToString();
bool same = (object) s1 == (object) s2;

same at this moment, it is not true, because although s1, s2 said is the same string, but with s2 does not come in through the literal statement, CLR for sb. ToString () method return values when allocating memory, will not to reside pool to check whether there is a value for Hello string already exists, so natural, won't make s2 pointing to the object resides in the pool.
To enable programmers to force CLR to check the resident pool to avoid redundant copies of strings, the designer of the String class provides a class method called Intern. Here is an example of this method:


StringBuilder sb =new StringBuilder();
sb.Append("He").Append("llo");

string s1 ="Hello";
string s2 = String.Intern(sb.ToString());
bool same = (object) s1 == (object) s2;

Okay, same is true again. The Intern method takes a string as an argument and checks the resident pool for the existence of the string represented by the argument. If it exists, returns a reference to the string that resides in the pool. Otherwise add a new string representing the same value to the resident pool and return a reference to that string. Note, however, that even if the Intern method finds a string with the same value in the resident pool, it doesn't save you from having to allocate the string memory once, because the string as a parameter has already been allocated once. And use Intern method advantage, if Intern method found the same value in reside pool string, although at this time there are two copies of the of the string in memory (1 1 is a parameter, is hosted in the pool), but with the passage of time, parameters of the referenced the copy will be garbage collected, so for the string is no redundancy in the memory.
When there is a method in your program that can create and return a long string based on a different context, and it will often return the same string as the program runs, you may want to consider using the Intern method to increase memory utilization.
However, it is also worth noting that using the Intern method to keep a string alive in the resident pool has a side effect: even if no other reference exists to the string in the resident pool, the string will still be garbage collected. This means that even if the string in the resident pool is no longer useful, it may not be destroyed until CLR terminates. You should also take this particular behavior into account when using the Intern method.


Related articles: