C++ tip: Don't do the compiler's job

  • 2020-06-12 10:15:23
  • OfStack

For C++ programming veterans, there are times when they like to rewrite things in the way the compiler works in order to make the code run more efficiently. It's smart and a good programmer, but it's also risky. Because sometimes you can make a simple slip of the pen and make a very subtle mistake. This article gives a similar example.

This Bug appears in the MySQL source code.

Error code:


static int rr_cmp(uchar *a,uchar *b)
{
 if (a[0] != b[0])
  return (int) a[0] - (int) b[0];
 if (a[1] != b[1])
  return (int) a[1] - (int) b[1];
 if (a[2] != b[2])
  return (int) a[2] - (int) b[2];
 if (a[3] != b[3])
  return (int) a[3] - (int) b[3];
 if (a[4] != b[4])
  return (int) a[4] - (int) b[4];
 if (a[5] != b[5])
  return (int) a[1] - (int) b[5];   <<<<====
 if (a[6] != b[6])
  return (int) a[6] - (int) b[6];
 return (int) a[7] - (int) b[7];
}

Description:

This is a typical error that occurs when copying and pasting code snippets. Programmers are likely to put" if (a[1] != b[1]) (int) a[1] � (int) b[1]; This code was copied several times (and then manually changed the array index) to implement a loop. But the programmer forgot to change the subscript 1 of one of the rows to 5. As a result, the function sometimes returns the correct value (and sometimes not), an error that is difficult to detect. In fact, the error was really hard to catch, and all the other tests failed to find it before we scanned the MySQL source code with ES14en-ES15en.

Correct code:


if (a[5] != b[5])
 return (int) a[5] - (int) b[5];

Although the previous code looks clean and easy to read, it is still very likely that the programmer missed the error. Because the internal structure of the code block is similar, you instinctively skim through it without paying much attention to the code.

The reason the code is written this way is probably because the programmer wants to optimize the code as much as possible. He or she wants to "unroll the loop" manually. But I don't think it's a good idea to be here.

First of all, I doubt that programmers can really do it this way. You know, modern compilers are pretty smart, and if you can actually optimize program performance, it automatically optimizes the unwinding loop.

Second, the attempt to optimize resulted in bug in the code. If programmer 1 starts writing a simple loop, the chances of making a mistake are much lower.

I suggest writing the method as follows:


static int rr_cmp(uchar *a,uchar *b)
{
 for (size_t i = 0; i < 7; ++i)
 {
  if (a[i] != b[i])
   return a[i] - b[i];
 }
 return a[7] - b[7];
}

This has two advantages:

1. This function is easier to read and understand. 2. Reduce your chances of making mistakes when writing code.

As for performance, I can say that this version is no slower than the previous long-written version.

This recommended approach actually says the following: Code should be simple and easy to read. Simple code is usually the right code. Don't do the compiler's job -- for example, expand the loop (manually). The compiler knows exactly what to do and doesn't need your help. Manual code optimization works only for certain critical code, and may only be appropriate after the analyzer has confirmed that the code is a bottleneck.

conclusion


Related articles: