About C++ why not add garbage collection mechanism

  • 2020-05-10 18:35:29
  • OfStack

Fans of Java often criticize C++ for not providing a garbage collection (Gabage Collector) mechanism similar to Java (which is normal, just as fans of C++ sometimes attack Java for not having this or that, or not having this or that), leading to the programmer's nightmare of dynamic storage in C++, right? You often hear about memory loss (memory leak) and illegal pointer access, which can be a big headache, and you can't throw away the flexibility that Pointers bring.

In this article, I do not want to expose the inherent flaws in the garbage collection mechanism provided by Java, but to point out the feasibility of introducing garbage collection in C++. Note to the reader that the approach presented here is based more on current standards and library design, rather than requiring language definitions to be modified or compilers to be extended.

What is garbage collection?

As a programming language that supports Pointers, C++ gives programmers the convenience of dynamically managing memory resources. When using a pointer object in the form of (please note that due to the reference cannot be changed after initialization the limitation of the reference target language mechanism, polymorphism application in most case, it depends on the pointer), programmers must be finished their memory allocation, use and release, language itself is unable to provide any help in the process, except maybe right and operating system according to the requirements of your close cooperation, to complete the actual memory management. In the standard text, there are multiple references to "undefined (undefined)", which is mostly related to Pointers.

Some languages provide garbage collection, in which the programmer is only responsible for allocating memory and using it, and the language itself is responsible for releasing memory that is no longer used, thus freeing the programmer from the unpleasant task of memory management. However, C + + does not provide a similar mechanism. C + + designer Bjarne Stroustrup spends a section discussing this feature in The Design and Evolution of C + +, the only book I know that introduces the ideas and philosophy of language design. In short, Bjarne himself says,

"I deliberately designed the C++ so that it doesn't rely on automatic garbage collection (usually just garbage collection). This is based on my own experience with garbage collection systems, and I am afraid of the severe space and time costs, as well as the complexity of implementing and porting garbage collection systems. Also, garbage collection would make C++ unsuitable for many low-level jobs, which is one of its design goals. But I like the idea of garbage collection as a mechanism that simplifies design and eliminates many of the root causes of errors.

The rationale for garbage collection is easy to understand: it is easy for users to use and more reliable than the user-provided storage management model. The arguments against recycling are many, but they are not fundamental. They are about implementation and efficiency.

There are plenty of arguments against this: each application will do better with garbage collection. Similarly, there is a strong argument against this: no application could be better off with garbage collection.

Not every program needs to run forever; Not all code is basic library code; For many applications, 1 point of storage loss is acceptable; Many applications can manage their own storage without the need for garbage collection or other related technologies such as reference counting.

My conclusion is that, in principle and in practice, recycling is needed. But for today's users and for general usage and hardware, we simply cannot afford to put the semantics of C++ and its basic library on top of the garbage collection system."

In my opinion, a unified automatic garbage collection system cannot be used in a variety of different application environments without incurring an implementation burden. Later I will design an optional garbage collector for a specific type, and it is clear that there will always be some efficiency overhead, which may not be desirable if the C++ user is forced to accept.

About why C + + no garbage collection and may therefore efforts in C + +, the above mentioned works are the narrative on the problem that I have ever seen the most comprehensive, although only a short section of the content, but have covered a lot of content, this is the one that Bjarne works penetration characteristics of concise and within 10 feet.

Step by step, I would like to introduce the garbage collection system of my own homemade vintage wine, which can be freely selected as required without affecting other codes.

Constructors and destructors

The constructors and destructors provided in C++ address the need for automatic resource release. Bjarne has a famous saying, "the resource requirement is initialization (Resource Inquirment Is Initialization)".

Therefore, we can apply for the allocation of resources in the constructor and release the allocated resources in the destructor. As soon as the lifetime of the object ends, the resources requested by the object are automatically released.

That leaves only one problem. If the object itself is dynamically created in free storage (Free Store, the so-called "heap") and managed by a pointer (I'm sure you already know why), you still have to encode an explicit call to the destructor, using the delete expression of the pointer, of course.

Smart Pointers

Fortunately, for some reason, the standard library of C++ has introduced at least one type of smart pointer, which, although limited in use, can just solve our problem. This is the only smart pointer of 1 in the standard library ::std::auto_ptr < > .

It wraps Pointers into classes and overrides the backreference (dereference) operator operator * and the member selection operator operator - > , to mimic the behavior of Pointers. About auto_ptr < > For details, please refer to The C++ Standard Library (Chinese translation: C++ standard library).

For example, the following code,


#include < cstring >
#include < memory >
#include < iostream >


class string
{
public:
  string(const char* cstr) { _data=new char [ strlen(cstr)+1 ]; strcpy(_data, cstr); }
  ~string() { delete [] _data; }
  const char* c_str() const { return _data; }
private:
  char* _data;
};


void foo()
{
  ::std::auto_ptr < string > str ( new string( " hello " ) );
  ::std::cout << str->c_str() << ::std::endl;
}

Since str is a local object of the function, auto_ptr ends at the end of the function exit point lifetime < string > The destructor calls the string object maintained by the internal pointer (previously allocated by the new expression in the constructor), and then executes the destructor of string, freeing up memory for the actual string dynamic request. It is also possible to manage other types of resources in string, such as synchronous resources for multithreaded environments. The following diagram illustrates the above process.

Enter the function foo and exit the function
| A
V |
auto_ptr < string > ::auto < string > () auto_ptr < string > ::~auto_ptr < string > ()
| A
V |
string::string() string::~string()
| A
V |
_data=new char[] delete [] _data
| A
V |
Use resources ---- > Release resources

Now we have the most simple recycling mechanism (I hid the 1 point, in string, you still need to own coding control object dynamic creation and destruction, but the rules of this case is very simple, is the allocation of resources in the constructor, release resources in the destructor, as if the pilot must be checked before after take-off and landing gear 1 sample.) , even if an exception occurs in the foo function, the lifetime of str will end. C++ guarantees that the 1 cut that occurs when the natural exit occurs will be effective when the exception occurs.

auto_ptr < > Just one type of smart pointer whose replication behavior provides the semantics of ownership transfer, that is, the smart pointer transfers ownership of the actual pointer that is maintained internally at the time of replication, for example

auto_ptr < string > str1( new string( < str1 > ) );
cout < < str1- > c_str();
auto_ptr < string > str2 (str1); // str1 internal pointer no longer points to the original object
cout < < str2- > c_str();
cout < < str1- > c_str (); // undefined, str1 internal pointer is no longer valid

At some point, you need to share the same object, and auto_ptr is not enough. Due to some historical reasons, the standard library of C++ does not provide other forms of smart Pointers.

Another smart pointer

However, we can make another form of smart pointer by ourselves, that is, smart pointer with value replication semantics and Shared values.

We can use reference counting (Reference Counting/Using Counting) when we need multiple objects of the same class to have a copy of an object at the same time. This was once a widely used technique in C++ to improve efficiency and COW(copy on write, copy when rewriting). In order to ensure the correct behavior, COW leads to a decrease in efficiency. (Herb Shutter's column Guru in C++ Report magazine and More Exceptional C++ are devoted to this issue in More Exceptional C++.)

However, for our current problem, reference counting itself will not be a big problem, because there is no replication involved. In order to ensure the correctness of the multithreaded environment, it does not need to sacrifice too much efficiency. However, in order to simplify the problem, the consideration of multithreaded security is ignored here.

First of all, we designed a class template based on auto_ptr (from More Execptional C++, Herb Shutter More Execptional C++).


template < typename T >
class shared_ptr
{
private:
 class implement //  Implementation class, reference count 
 {
 public:
  implement(T* pp):p(pp),refs(1){}
  
  ~implement(){delete p;}
  
  T* p; //  Pointer to the actual 
  size_t refs; //  Reference counting 
 };
 implement* _impl;


public:
 explicit shared_ptr(T* p)
  : _impl(new implement(p)){}


 ~shared_ptr()
 {
  decrease(); //  Count of diminishing 
 }


 shared_ptr(const shared_ptr& rhs)
  : _impl(rhs._impl)
 {
  increase(); //  Count increment 
 }
 
 shared_ptr& operator=(const shared_ptr& rhs)
 {
  if (_impl != rhs._impl) //  Avoid self-assignment 
  {
   decrease(); //  The count decrements and the original object is no longer Shared 
   _impl=rhs._impl; //  Sharing new objects 
   increase(); //  Count increments to maintain the correct reference count 
  }
  return *this;
 }


 T* operator->() const
 {
  return _impl->p;
 }
  
 T& operator*() const
 {
  return *(_impl->p);
 }
 
private:
 void decrease()
 {
  if (--(_impl->refs)==0)
  { //  No longer Shared, destroy the object 
   delete _impl;
  }
 }
 
 void increase()
 {
  ++(_impl->refs);
 }
};

This class template is so simple that it doesn't need much explanation of the code. A simple usage example is given to illustrate shared_ptr < > As an alternative to a simple garbage collector.


void foo1(shared_ptr < int >& val)
{
 shared_ptr < int > temp(val);
 *temp=300;
}


void foo2(shared_ptr < int >& val)
{
 val=shared_ptr < int > ( new int(200) );
}


int main()
{
 shared_ptr < int > val(new int(100));
 cout<<"val="<<*val;
 foo1(val); 
 cout<<"val="<<*val;
 foo2(val);
 cout<<"val="<<*val;
}

In the main() function, foo1(val) is first called. A local object temp is used in the function, which shares the same data with val and modifies the actual value. After the function returns, the value owned by val also changes, while val itself has not been modified.

foo2(val) is then called, and a new value is created using an anonymous temporary object in the function. val is modified using an assignment expression, and val and the temporary object have the same value. When the function returns, val still has the correct value.

Finally, throughout the process, except when using shared_ptr < int > The constructor of new used the new expression to create the new, there is no action to delete the pointer, but all the memory management is correct, thanks to shared_ptr < > A delicate design.

Have auto_ptr < > And shared_ptr < > After the two big ones, it should be enough to handle garbage collection in most cases. If you need smart Pointers with more complex semantics (mainly the semantics when copying), you can refer to the source code of boost, which designs a variety of smart Pointers.

Standard containers

For multiple objects that need to have the same type in the program, making good use of the various container classes provided by the standard library can minimize explicit memory management. However, the standard container is not suitable for storing Pointers, so the support for polymorphism still faces difficulties.

Use smart Pointers as the element type of the container, but most need to value standard containers and algorithms to copy semantic elements, described above auto_ptr of transfer of ownership and homemade shared_ptr can provide the correct value of the Shared object replication semantics, Herb Sutter in the More Execptional C + + "in the design for one person having a complete copy of semantic smart Pointers ValuePtr, solved the problem of the pointer used in standard containers.

However, polymorphism remains unresolved, and I'll focus on the use of container-managed polymorphic objects in another article.

Language support

Why not add garbage collection support to the C++ language?

Based on the previous discussion, we can see that different application environments may require different garbage collectors. Regardless of the use of garbage collection in 372101, it is necessary to integrate these different types of garbage collectors in 1. Even if it is successful (which I doubt), it will lead to an increase in efficiency cost.

This goes against the design philosophy of C++, which says "don't pay for unnecessary features" and that forcing users to accept garbage collection is not an option.

Instead, choose your own garbage collector on demand, and the rules you need to master are much simpler and less error-prone than explicitly managing memory.

The most important point is that C++ is not a "dumb" programming language. It favors programmers who are fond of and good at thinking. It is a challenge for programmers who like C++ to design a garbage collector suitable for their own needs.


Related articles: