Polymorphism and multiple inheritance implementations in C++ differ from Java

2020-08-22 22:16:44
OfStack

Polymorphic problems

I was asked the famous question, "How do C++ and Java achieve polymorphism", and then turned over. Too famous instead not to prepare, only to know the virtual table related. After the interview, the similarities and differences between the implementations of C++ and Java polymorphism are compared, 1 and recorded here.

C++ polymorphic virtual pointer implementation

First of all, C++. Polymorphism means that the subclass overrides the superclass member function (Override), assigns a subclass pointer to the superclass, and then calls the member function on the superclass pointer, which will call the overridden version of the subclass member function. Simple examples:


class Parent1 {
  public:
  virtual void sayHello() { printf("Hello from parent1!\n"); }
};

class Child : public Parent1 {
  public:
  virtual void sayHello() { printf("Hello from child!\n"); }
};

int main() {
  Parent1 *p = new Child();
  p->sayHello();  // get "Hello from child!"
}

The first thing to realize is that, for the underlying implementation, a member function is a function whose first argument is an object pointer, and the compiler automatically adds an object pointer to the function argument and names it this, except that it is not fundamentally different from a normal function. For non-polymorphic member function calls, the procedure for non-member function calls is basically 1, and the actual function called is determined at compile time based on the parameter list (which contains the object pointer type) and the function name.

To achieve polymorphism, you cannot infer the function signature from the object pointer type alone, that is, in the example, p->sayHello() This 1 line of code cannot be executed based only on the type of p that the function called should be Parent::sayHello or Child:sayHello . In a polymorphic mechanism, each class parent and child needs to carry an additional pointer in its data structure, which points to the virtual table of the class.

The virtual function table of a class is a table of all the function Pointers that can be overridden, and the list of virtual functions that an object's virtual function pointer points to when it is created depends on its actual type. As in the example above, the virtual function lists of Parent1 and Child classes have only one function, respectively Parent1::sayHello and Child::sayHello At compile time, the compiler translates a function call into an instruction like "reference the N function in the vtable", as in this case, "Reference the first function in the vtable". Read the real function pointer in the virtual function table at run time. The runtime CPU cost is basically 1 pointer dereferencing and 1 access to the following table.

Neither Parent1 nor Child objects have a custom data structure. Running the following code confirms that the real data structure size of both Parent1 and Child objects is 8 bytes, meaning only virtual function list Pointers. Output Parent1 and Child1 objects as 64-bit integers, and you can see that p1, p2 have the same value, and p3 is different from the first two. This value is also the virtual function table address of the corresponding class.


Parent1* p1 = new Parent1();
Parent1* p2 = new Parent1();
Parent1* p3 = new Child();
printf("sizeof Parent1: %d, sizeof Child: %d\n",
  sizeof(Parent1), sizeof(Child));
printf("val on p1: %lld\n", *(int64_t*)p1);
printf("val on p2: %lld\n", *(int64_t*)p2);
printf("val on p3: %lld\n", *(int64_t*)p3);

C++ Polymorphism and multiple inheritance

There is an interesting question about how C++ supports polymorphism when multiple inheritance occurs. As mentioned earlier, the principle of polymorphism is that the compiler compiles a member function call to "reference the N function in the vTABLE". The location of the vtable in the object data structure and the number of functions in the vtable to be called are determined at compile time. If there is only one vfunction list in a multiple inherited object, then the positions in the vfunction lists of the different parent classes will conflict. If there are multiple virtual function lists, it is difficult to determine the location of the virtual function list pointer in the data structure at compile time. C++ takes the very subtle approach of placing all the data structures of the parent class (including the list of virtual Pointers) in order on the object's data structure, which normally points to the start of the data structure. When a pointer is cast, the C++ compiler adjusts the value of the pointer as much as possible so that it points to where the pointer type should be. The value of the pointer changes during this process.

For example, the Child class inherits Parent1 and Parent2 classes, so when the Child pointer is converted to Parent1, the value of the pointer is not adjusted, because Parent1 is the first parent of Child. However, when converting Child to Parent2, it is necessary to increase the length of the Parent1 data structure by pointing to the starting position of the corresponding Parent2 data structure. In this example, the Parent1 data structure has only a virtual function list pointer, which is 8 in length on a 64-bit machine. Therefore, the value increases by 8 when the Child pointer is converted to the Parent2 pointer.


class Parent1 {
  public:
  virtual void sayHello() { printf("Hello from parent1!\n"); }
};

class Parent2 {
  public:
  virtual void sayHi() { printf("Hi from Parent2!\n"); }
};

class Child : public Parent1, public Parent2 {
  public:
  virtual void sayHello() { printf("Hello from child!\n"); }
  virtual void sayHi() { printf("Hi from child!\n"); }
};

int main() {
  Child *p = new Child();
  printf("size of Child: %d", sizeof(Child));
  printf("pointer val as Child*: %lld\n", int64_t(p));
  printf("pointer val as Parent1*: %lld\n", int64_t((Parent1*)p));
  printf("pointer val as Parent2*: %lld\n", int64_t((Parent2*)p));
}

Run this code, and you'll see that the Child data structure grows to 16, or two Pointers. And the value of the pointer is different in the latter two type conversions, 8 bytes different on the 64-bit machine, that is, the data structure size of Parent1. In addition, if p is converted to the Void pointer and then to the Parent pointer, the compiler will not be able to correctly infer this offset, and undefined behavior will occur.

This feature illustrates an interesting fact: the C++ compiler can infer the offset of a pointer at compile time, so the compiler should also be able to infer the actual type of the object that pointer points to. So, what's the point of having a vtable when you can infer the real type of an object at compile time? Can't you just infer the correct function call? The problem is that if you do infer polymorphic function calls at compile time, it means generating different binary code for different types of objects. The same line of code produces different function calls depending on the pointer value. This also means that the third library needs to provide source code to make relevant inferences, similar to the template library. None of this is acceptable, so a list of virtual functions is still necessary. With a list of virtual functions, code that USES Pointers can generate 1 to machine code.
Another way of looking at it is that the compiler can indeed infer the true type of all variables when compiling a complete App, but this requires too much context. It is not acceptable to compile a piece of code that requires context information other than the type of the input parameter, and to generate different binary files based on that context information.

Java polymorphic comparison

Since the polymorphic mechanism of Java is simpler than that of C++, the mechanism of C++ can be used to realize Java polymorphism in theory. But C++ differs from Java in one crucial respect: C++ requires superclass member methods to be overridden with the Virtual keyword. This means that the compiler can compile the superclass to ensure that those functions are likely to be overridden, so it can directly determine the specific function to be called at compile time for functions that cannot be overridden, while using a virtual pointer table for functions that can be overridden. Java's methods are rewritable by default, so it can be assumed that Java method calls require a query to the list of virtual functions, which is one more point of overhead than C++ does not override functions.

Java does not support multiple inheritance, but Java supports the interface Interface, which is similar to multiple inheritance and cannot be simply looked up using 1 virtual function table. The class needs to generate a list of virtual functions for each Interface it implements, similar to C++. The OpenJDK documentation states that the way to find the list of virtual functions for Interface in the class definition is crude: traverse through all the Interface lists implemented by the class. It is noted in the documentation that true multiple inheritance is rare and often boils down to single inheritance. There may be various optimizations to this traversal process, which I do not have a deep understanding of.

Consider one difference between Java and C++ : C++ has no run-time type, and the compiler tries to ensure that the location of the pointer at compile time has the correct data structure for the object. When assigning a subclass pointer to a superclass pointer variable, the compiler does its best to adjust it, but if Void pointer assignment occurs, etc., the compiler cannot guarantee that the pointer points to the correct object data structure. As long as there are no errors in the syntax, the error will not be reported immediately, and the compiler will not be able to verify that the problem will occur. The exception will not be reported until the pointer is actually dereferred. Java has a runtime type, which is checked at runtime when assigning objects to variables of different types, and errs in the assignment if there is no correct type inheritance relationship.

In addition, comparing Interface and C++ multiple inheritance of Java shows that Interface has a much higher runtime overhead than C++ multiple inheritance. But C++ multiple inheritance requires one pointer per parent class, and the compiler has more work to do at compile time. Java is a much more "strongly typed" language than C++.