Three mechanisms of Go language defer statements

  • 2020-10-07 18:43:17
  • OfStack

Versions 1.13 and 1.14 of Golang optimized defer twice, resulting in a significant reduction in defer's performance overhead in most scenarios. How did this work?

This is because each version adds a new mechanism to defer, which allows the compiler to select a different mechanism for each defer statement at compile time, depending on the version and circumstances, and to run calls in a lighter manner.

The heap allocation

In previous versions of Golang 1.13, all defer were allocated on the heap, and this mechanism takes two steps at compile time:

Insert runtime. deferproc at the position of defer statement. When executed, the delayed call will be saved as 1 _defer record, and the entry address of the delayed call and its parameters will be copied and saved into the call list of Goroutine.
runtime. deferreturn is inserted at the position before the function returns. When executed, the delayed call is pulled from the Goroutine linked list and executed, and multiple delayed calls are executed sequentially as recursive calls at the end of jmpdefer.

The main performance issues with this mechanism are the memory allocation when each defer statement produces a record, and the system call overhead of recording parameters and moving parameters when the call is completed.

On the stack

Go 1.13 replaces deferproc with deferprocStack in the form of allocation on the stack, where _defer is released when the function returns, eliminating the performance overhead associated with memory allocation and simply maintaining the linked list of _defer.

The compiler has its own logic for choosing whether to use deferproc or deferprocStack, and in most cases the latter will be used and performance will improve by about 30%. However, defer will still be used if the defer statement appears in the loop, or if higher order compiler optimizations cannot be performed, or if too many defer are used in the same function.

Open coding

Version 1.14 of Go continues to include development coding (open coded), which eliminates the need for deferproc or deferprocStack operations at run time before the deferred call is inserted directly into the function return, and deferreturn at run time does not make tail recursive calls, but simply iterates through all deferred function executions in a loop.

This mechanism makes the overhead of defer almost negligible. The only run-time cost of defer is storing information about participating in deferred calls, but there are 1 conditions for using this mechanism:

Compiler optimization is not disabled, i.e., -ES58en "-ES59en" is not set;
The number of defer in the function is not more than 8, and the product of the number of return statements and delay statements is not more than 15;
defer is not in a loop statement.

The mechanism also introduces an element, the delay bit (defer bit), which is used at run time to record whether each defer was executed (especially defer in the conditional judgment branch), making it easy to determine which functions were executed when the last delayed call was made.

The principle of delay bit:

For each defer in the same function, it will be allocated 1 bit. If it is executed, it will be set to 1; otherwise, it will be set to 0. When it needs to judge the delay call before the return of the function, it will use the mask to judge the bit of each position.

To be lightweight, the delay bit is officially limited to 1 byte, or 8 bits, which is why you can't go beyond 8 defer. If you do, you will still choose stack allocation, but obviously in most cases you won't go beyond 8.

This is demonstrated in code as follows:


deferBits = 0 //  Delay the initial value of the bit  00000000

deferBits |= 1<<0 //  The first 1 a  defer , is set to  00000001
_f1 = f1 //  Delay function 
_a1 = a1 //  The parameters of the delay function 
if cond {
  //  If the first 2 a  defer  Is executed, set to  00000011 Otherwise, it will still be  00000001
  deferBits |= 1<<1
  _f2 = f2
  _a2 = a2
}
...
exit:
//  Before the function returns, the delay bit is checked in reverse order, and the function is evaluated by the mask bit by bit to determine whether the function is called 

//  if  deferBits  for  00000011 ,  00000011 & 00000010 != 0 , so call  f2
//  Otherwise,  00000001 & 00000010 == 0 , don't call  f2
if deferBits & 1<<1 != 0 {
  deferBits &^= 1<<1 //  The shift prepares for the next judgment 
  _f2(_a2)
}
//  Similarly, since  00000001 & 00000001 != 0 , the call  f1
if deferBits && 1<<0 != 0 {
  deferBits &^= 1<<0
  _f1(_a1)
}

conclusion

The recent release of version 1.14 has finally put an end to the controversy over the performance issue 1 of Golang defer. We don't need to worry about the performance overhead of the defer if it's not a special case.

The resources

[1] Ou ES94en-ES95en

[2] Feng Yun-go1.14 realized the principle of the great improvement of defer performance

[3] 34481-opencoded-defers


Related articles: