Comprehensive analysis of memory usage by static_cast and dynamic_cast to C++ objects

2020-05-10 18:35:57
OfStack

static_cast and dynamic_cast are type conversion operators for C++. Any type conversion implicitly performed by the compiler can be done explicitly by static_cast, which means that you can convert between parent and child classes using static_cast. dynamic_cast, on the other hand, can only be used for transformations between classes. So what's the point of dynamic_cast? Because dynamic_cast provides an important feature: runtime type checking to ensure the security of the transformation.

The danger of converting with static_cast

We know that a base class pointer can point to a base class object or a derived class object without explicit conversion. Such as:


class Base{
  // ... 
};
class Derived{
  // ... 
};
int main{
  Base *p = new Base();//OK
  Base *p = new Derived();//OK
}

Both of the above definitions are correct, so what if you want to do the opposite and have a subclass pointer to a superclass object? The following code:


class Base{
  // ... 
};
class Derived{
  // ... 
};
int main{
  Derived *p = new Base();//error
  Derived *p = static_cast<Derived*>(new Base());//OK
}

If you directly convert a pointer of type Base to a pointer of type Derived, an error will be reported at compile time. If the static operator is added to the conversion, it will compile smoothly. However, this approach is dangerous and can cause unpredictable and hard-to-find errors at run time. The following code:


class Base{
  public:
    Base():m_b(4){};
    int m_b;
void m_funcB(){cout << "base" << endl;};
};
class Derived:public Base{
  public:
    Derived():m_d(3){};
    int m_d;
    void m_funcD(){cout << "derived" << endl;};
};
int main(){
  Derived* p = static_cast<Derived*>(new Base());
  cout << p->m_d << endl;
  p->m_funcD();
}

Although p is a pointer of type Derived, it actually points to the Base object, while the Base object does not have the data member m_d, so the output is unpredictable (1 straight out 0 on my machine). It was this unpredictable that lead to difficult to track error, just think, if you execute this code will collapse, so is also screening, but now not 1 set, the collapse is the result of the execution and 1, our forecast is not likely to lead to a serial of logic errors, it's like to dig a pit or a ticking time bomb.

But the odd one is, p- > After m_funcD(), you can actually print "derived". What's going on here? m_funcD is obviously a function of class Derived, and it does not exist in class Base, which we will explain later.

Secure the conversion with dynamic_cast

In principle, we should not have a subclass pointer to a superclass object. But if we write code like the one above, we would like to have a check mechanism to help us find the problem, so that we can avoid the unpredictable consequences of manipulating the converted pointer.

C++ is supported for run-time type recognition (RTTI), a mechanism that, in addition to helping us achieve polymorphism, also enables security checks during type conversion. Going back to the code above, let's modify it a little bit:


class Base{
  public:
    Base():m_b(4){};
    int m_b;
    virtual void m_funcB(){cout << "base" << endl;};
};

class Derived:public Base{
  public:
    Derived():m_d(3){};
    int m_d;
    void m_funcD(){cout << "derived" << endl;};
};
int main(){
  Derived* p = dynamic_cast<Derived*>(new Base());
  cout << p->m_d << endl;
  p->m_funcD();
}

What will the result be? The program crashed. And the reason is that we implemented p- > m_d, and p at this point is a null pointer. The reason is that there is a security check for type conversion using dynamic_cast. In this case, we convert a parent pointer to a subclass pointer, which is considered an invalid operation, so we return NULL, so p becomes a null pointer. So when we convert with dynamic_cast, we can know whether the conversion is successful by simply checking the resulting pointer. static_cast does not provide this check, which is why dynamic_cast is safer than static_cast.

Now, a little digressing from 1, if you comment out the print m_d, p- > After m_funcD(), you can still print "derived". We will discuss this phenomenon after summarizing the difference between dynamic_cast and static_cast.

The difference between dynamic_cast and static_cast:

dynamic_cast is a more secure way to implement run-time type safety checks, but only for polymorphic types and only for pointer or reference type conversions. static_cast can be applied to any type and does not require a type to be polymorphic. static_cast is more widely used, but dynamic_cast is more powerful and secure.

Object footprint analysis:

Let's take a look at the phenomenon we mentioned twice: why call a subclass method with a pointer to a subclass object that actually points to the base class object, when there are no errors and the call is ok?

A class contains nothing more than two kinds of members: data and methods. So when we instantiate an object, what does it contain, and how much memory does it actually take up? Write 1 piece of code try 1:


class Base{
  public:
    Base():m_b(4){};
    int m_b;
    virtual void m_funcB(){cout << "base" << endl;};
};
class Derived:public Base{
  public:
    Derived():m_d(3){};
    int m_d;
    void m_funcD(){cout << "derived" << endl;};
};
int main(){
  cout << sizeof(Base) << endl;  
  cout << sizeof(Derived) << endl;
}

The printed results are 8 and 12, respectively.

So how do you calculate the memory footprint of a class or object? Take Base as an example. Firstly, the member variable m_b takes up 4 bytes. Secondly, since m_funcB is a virtual function, there should be a table of virtual functions, which is actually a pointer to the table. In addition to inheriting m_b, which is a member of Base, Derived also holds the address of the virtual function table and has its own member variable m_d, so it takes up 12 bytes.

Or someone might ask: what about constructors? And then there's the virtual function itself and then there's the body of the function? Doesn't that count? Indeed, class functions are not stored in instantiated objects. Just think, for each object, the function implementation is 1. If every instantiated object is stored in the body of the function once, isn't it unnecessary and a huge waste of memory usage?

The function is compiled and placed in the code segment as part 1, so as long as we define the Derived pointer, no matter what object the actual pointer points to, since the program already knows which class the method belongs to, as long as the pointer is of the right type, we can find the correct entry to call the function. So even though our code is written this way, it still works correctly:


void * p2 = (int*)0;
Derived* p3= (Derived*)p2;
cout << p3->m_funcD() << endl;

No matter what address you assign to p2, the m_funcD function is executed correctly. Of course, if p3 is defined as any other type, then the compilation will fail.

If you execute the following code:


void * p2 = (int*)0;
Derived* p3= (Derived*)p2;
cout << p3->m_d << endl;

Then the program will appear errors, because different from the member function, the member variable is each object will be stored in memory with the actual memory address, so the member function belongs to the class, the member variable belongs to their own object.