Example analysis of eight types of errors that the C++ compiler cannot capture

  • 2020-04-02 02:43:37
  • OfStack

This article analyzes 8 kinds of errors that the C++ compiler can't catch. It is helpful to deeply understand the operation principle of C++. The specific analysis is as follows:

As we all know, C++ is a complex programming language full of subtle pitfalls. There are almost endless ways to mess things up in C++. Fortunately, today's compilers are smart enough to detect quite a few of these programming pitfalls and notify programmers of compilation errors or compilation warnings. In the end, any errors that the compiler can detect are not a big problem if handled properly, because they are caught at compile time and resolved before the program actually runs. At worst, an error that a compiler can catch will only cost the programmer some time, as they look for ways to fix the compilation error.

The most dangerous errors are those that the compiler cannot catch. Such errors are not easy to detect, but can lead to serious consequences such as incorrect output, data corruption, and program crashes. As the project grows, the complexity of the code's logic and the multitude of execution paths can mask these bugs, causing them to appear only intermittently and making them difficult to track and debug. Although the list in this article is mostly a review for experienced programmers, the consequences of such bugs tend to be enhanced to varying degrees depending on the size of the project and the nature of the business.

All of these examples were tested on Visual Studio 2005 Express using the default alarm level. Depending on which compiler you choose, you may get different results. I strongly recommend that all programmer friends use the highest alarm level! Some compilation tips may not be flagged as a potential problem at the default warning level, but are caught at the highest warning level!

1) the variable is not initialized

Uninitialized variables are one of the most common and common mistakes in C++ programming. In C++, the memory space allocated for variables is not completely "clean" and is not automatically cleared when space is allocated. As a result, an uninitialized variable will contain a value, but there is no way to know exactly what that value is. In addition, the value of the variable may change each time the program is executed. This can lead to intermittent episodes, which are particularly difficult to track. Take a look at the following code snippet:


if (bValue)
   // do A
else
   // do B

If bValue is an uninitialized variable, the judgment result of the if statement cannot be determined, and both branches may be executed. In general, the compiler prompts for uninitialized variables. The following code snippet triggers a warning message on most compilers.


int foo()
{
  int nX;
  return nX;
}

However, there are some simple examples that do not generate warnings:


void increment(int &nValue)
{
  ++nValue;
}
int foo()
{
  int nX;
  increment(nX);
  return nX;
}

The code snippet above probably doesn't generate a warning, because the compiler doesn't typically keep track of whether the function increment() has assigned a value to nValue.

Uninitialized variables are more common in classes, and initialization of members is generally done through the implementation of constructors.


class Foo
{
private:
  int m_nValue;
public:
  Foo();
  int GetValue() { return m_bValue; }
};
 
Foo::Foo()
{
  //Oops, we forgot to initialize m_nValue
}
 
int main()
{
  Foo cFoo;
  if (cFoo.GetValue() > 0)
    // do something
  else
    // do something else
}

Note that m_nValue is never initialized. As a result, GetValue() returns a garbage value, and both branches of the if statement are likely to execute.

Novice programmers often make the following mistake when defining multiple variables:


int nValue1, nValue2 = 5;

The idea here is that both nValue1 and nValue2 are initialized to 5, but actually only nValue2 is initialized, and nValue1 is never initialized.

Since an uninitialized variable can be any value, it can cause the program to behave differently each time it executes, and it is difficult to find the root cause of the problem caused by an uninitialized variable. On one execution, the program might work, on the next, it might crash, and on the next, it might produce an error output. When you run a program under a debugger, the variables you define are usually zeroed. This means that your program may work fine under the debugger every time, but it may crash intermittently in the release! If you run into this kind of oddity, the culprit is often an uninitialized variable.

2) integer division

Most binary operations in C++ require two operands of the same type. If the operands are of different types, one of the operands is promoted to a type that matches the other operand. In C++, the division operator can be thought of as two different operations: one on an integer and one on a floating point number. If the operand is of floating point type, the division operation returns the value of a floating point:


float fX = 7;
float fY = 2;
float fValue = fX / fY; // fValue = 3.5

If the operand is an integer type, the division operation discards any fractional part and returns only the integer part.


int nX = 7;
int nY = 2;
int nValue = nX / nY;  // nValue = 3

If one operand is an integer and the other is a floating point, the integer is promoted to a floating point:


float fX = 7.0;
int nY = 2;
float fValue = fX / nY;
 
//NY is promoted to floating point. The division operation returns a floating point value
// fValue = 3.5

Many novice programmers try to write the following code:


int nX = 7;
int nY = 2;
float fValue = nX / nY; //FValue = 3(not 3.5!)

The idea here is that nX/nY will produce a floating-point division operation, because the result is assigned to a floating-point variable. But that's not really the case. NX /nY is first computed as an integer value before it is raised to floating point and assigned to fValue. But before the assignment, the decimal part is discarded.

To force two integers to use floating point division, one of the operands needs to be converted to a floating point:


int nX = 7;
int nY = 2;
float fValue = static_cast<float>(nX) / nY; // fValue = 3.5

Because nX is explicitly converted to float, nY is implicitly promoted to float, so the division operator performs floating-point division, resulting in 3.5.

It is often hard to tell at a glance whether a division operator performs integer or floating point division:


z = x / y; //Is this integer division or floating point division?

But adopting Hungarian nomenclature can help to dispel this confusion and prevent mistakes:


int nZ = nX / nY;   //Integer division
double dZ = dX / dY; //Floating point division

Another interesting thing about integer division is that the C++ standard does not specify how to truncate results when an operand is negative. As a result, the compiler is free to truncate up or down! For example, minus 5/2 could be evaluated either as minus 3 or as minus 2, depending on whether the compiler is rounding down or rounding up to 0. Most modern compilers round to 0.

3) =   Vs.   = =

It's an old question, but a valuable one. Many C++ novices confuse the meaning of the assignment operator (=) with the equality operator (==). But even programmers who know the difference between the two operators can make keyboard mistakes, which can lead to unintended results.


//If nValue is 0, return 1, otherwise return nValue
int foo(int nValue)
{
  if (nValue = 0) //This is a keyboard error!
    return 1;
  else
    return nValue;
}
 
int main()
{
  std::cout << foo(0) << std::endl;
  std::cout << foo(1) << std::endl;
  std::cout << foo(2) << std::endl;
 
  return 0;
}

The function foo() is supposed to return 1 if nValue is 0, or nValue if nValue is 0. But because the assignment operator is inadvertently used instead of the equality operator, the program produces unexpected results:


0
0
0

When the if statement in foo() is executed, nValue is assigned to 0. If (nValue = 0) actually becomes if (nValue). The result is that the if condition is false, which causes the code under else to be executed, to return the value of nValue, which happens to be the zero factorial assigned to nValue. So this function will always return 0.

Set the alarm level to the highest in the compiler, give a warning when an assignment operator is used in a conditional statement, or misuse an equality test where the assignment operator should be used, in addition to a conditional judgment, to indicate that the statement is not doing anything. As long as you use a higher alarm level, the problem is essentially fixable. There is also a technique that some programmers like to use to avoid the confusion between = and ==. That is, write the constant on the left side of the conditional judgment. If you mistakenly write == as =, you will cause a compilation error because the constant cannot be assigned.

4) mix signed and unsigned Numbers

As we mentioned in the integer division section, most binary operators in C++ require both operands to be of the same type. If the operand is of a different type, one of the operands elevates its type to match the other. This can lead to some unexpected results when mixed with signed and unsigned Numbers! Consider the following example:


cout << 10  �  15u; //15u is an unsigned integer

Some people would say it's minus 5. Since 10 is a signed integer and 15 is an unsigned integer, the type promotion rule is in effect here. The type promotion hierarchy in C++ looks like this:

Long double (highest)
A double
float
Unsigned long int
Long int
Unsigned int
int                             (minimum)

Because int is lower than unsigned int, int is promoted to unsigned int. Fortunately, 10 is already a positive integer, so the type promotion does not change the way the value is interpreted. Therefore, the above code is equivalent to:


cout << 10u  �  15u;

Okay, now it's time to see this little trick. Since both are unsigned integers, the result of the operation should also be an unsigned integer variable! The 10 u - u = 15-5 u. But unsigned variables do not include negative Numbers, so minus 5 here will be interpreted as 4,294,967,291 (assuming a 32-bit integer). Therefore, the code above will print out 4,294,967,291 instead of minus 5.

This can take a more confusing form:


int nX;
unsigned int nY;
if (nX  �  nY < 0)
  // do something

Because of the type conversion, this if statement will always be judged false, which is clearly not the programmer's original intention!

5)   The delete   Vs.   The delete []

Many C++ programmers forget that the new and delete operators actually come in two forms: versions for a single object and versions for an array of objects. The new operator is used to allocate the memory space of a single object on the heap. If the object is of a class type, the constructor for that object is called.


Foo *pScalar = new Foo;

The delete operator is used to reclaim memory space allocated by the new operator. If the object being destroyed is a class type, the destructor of that object is called.


delete pScalar;

Now consider the following code snippet:


Foo *pArray = new Foo[10];

This line allocates memory space for an array of 10 Foo objects, because the subscript [10] is placed after the type name, and many C++ programmers don't realize that the operator new[] is actually called to do the task of allocating space, not new. The new[] operator ensures that the constructor of the class is called once for each object created. Instead, to delete an array, use the delete[] operator:


delete[] pArray;

This ensures that each object in the array calls the destructor of that class. What happens if the delete operator is applied to an array? Only the first object in the array will be destructed, so the heap space will be destroyed!

6)   Side effects of compound expressions or function calls

The side effect is that an operator, expression, statement, or function continues to do something after the operator, expression, statement, or function completes the specified operation. Side effects are sometimes useful:


x = 5;

The side effect of the assignment operator is that you can permanently change the value of x. Other C++ operators with side effects include *=, /=, %=, +=, -=, < < =, > > =, &=, |=, ^=, and the infamous ++ and - operators. However, there are several places in C++ where the order of operations is undefined, which results in inconsistent behavior. Such as:


void multiply(int x, int y)
{
  using namespace std;
  cout << x * y << endl;
}
 
int main()
{
  int x = 5;
  std::cout << multiply(x, ++x);
}

Because the order in which the parameters to multiply() are calculated is undefined, the above program may print out either 30 or 36, depending on which of the x and ++x is calculated first or later.

Another slightly odd example of an operator:


int foo(int x)
{
  return x;
}
 
int main()
{
  int x = 5;
  std::cout << foo(x) * foo(++x);
}

Because the order in which the operands are evaluated in C++ operators is undefined (this is true for most operators, with some exceptions), the above example might also print out 30 or 36, depending on whether the left or right operands are evaluated first.

In addition, consider the following compound expression:


if (x == 1 && ++y == 2)
  // do something

The programmer's intent might be to say, "if x is 1 and y's pre-increment is 2, do something." However, if x is not equal to 1, C++ will take the short-circuit evaluation rule, which means that ++y will never be evaluated! So y only increases when x is equal to 1. This is probably not what the programmer intended! A good rule of thumb is to put any possible side effects into their own statements.

7) switch statement without break

Another classic mistake novice programmers make is forgetting to add a break to the switch block:


switch (nValue)
{
  case 1: eColor = Color::BLUE;
  case 2: eColor = Color::PURPLE;
  case 3: eColor = Color::GREEN;
  default: eColor = Color::RED;
}

When the switch expression evaluates to the same tag value as the case, the execution sequence is executed from the first case statement that satisfies. The execution sequence continues until it either reaches the end of the switch statement block or encounters a return, goto, or break statement. All other tags will be ignored!

Consider the code above, what happens if nValue is 1. Case 1 satisfies, so eColor is set to Color::BLUE. Moving on to the next statement, this again sets eColor to Color::PURPLE. The next statement sets it to Color::GREEN. Finally, set it to Color::RED in default. In fact, the code snippet above will set eColor to Color::RED! Regardless of the value of nValue.

The correct way is to write it as follows:


switch (nValue)
{
  case 1: eColor = Color::BLUE; break;
  case 2: eColor = Color::PURPLE; break;
  case 3: eColor = Color::GREEN; break;
  default: eColor = Color::RED; break;
}

The break statement terminates the execution of the case statement, so the value of eColor will remain as expected by the programmer. While this is very basic switch/case logic, it's easy to miss a break statement and create the inevitable "waterfall" execution flow.

8) call virtual functions in constructors

Consider the following procedure:


class Base
{
private:
  int m_nID;
public:
  Base()
  {
    m_nID = ClassID();
  }
 
  //ClassID returns a class - dependent ID number
  virtual int ClassID() { return 1;}
 
  int GetID() { return m_nID; }
};
 
class Derived: public Base
{
public:
  Derived()
  {
  }
 
  virtual int ClassID() { return 2;}
};
 
int main()
{
  Derived cDerived;
  cout << cDerived.GetID(); //Print out a 1, not a 2!
  return 0;
}

In this program, the programmer calls the virtual function in the constructor of the base class, expecting it to be determined as Derived::ClassID(). But it doesn't -- the program prints out a 1 instead of a 2. When a derived class inherited from a base class is instantiated, the base class object is constructed before the derived class object. This is done because members of derived classes may have dependencies on already initialized base class members. The result is that when the constructor for the base class is executed, the derived class object is not constructed at all! Therefore, any call to a virtual function at this point will only be determined as a member of the base class, not a derived class.

According to this example, when the base class part of cDerived is constructed, the part of its derived class does not yet exist. Therefore, a call to the function ClassID resolves to Base::ClassID() (not Derived::ClassID()), which sets m_nID to 1. Once the part of the Derived class of cDerived is also constructed, any call to ClassID() on the cDerived object will be determined as Derived::ClassID() as expected.

Notice that other programming languages, such as C# and Java, assign virtual function calls to the class with the deepest level of inheritance, even if the derived class has not yet been initialized! C++ does things differently, for the sake of programmer security. This is not to say that one approach is necessarily better than the other, but simply to indicate that different programming languages may behave differently on the same issue.

Conclusion:

I think it is appropriate to start with the basic problems that novice programmers may encounter. Regardless of a programmer's level of experience, errors are inevitable, whether due to lack of knowledge, typos, or just general carelessness. Being aware of the problems that are most likely to cause trouble can help reduce the likelihood that they will cause trouble. While there is no substitute for experience and knowledge, good unit tests can help us catch these bugs before they become embedded in our code.

I believe that this article has a certain learning value for everyone's C++ programming.


Related articles: