node uses the async_hooks module for request tracing

2021-10-24 18:59:23
OfStack

The async_hooks module is an experimental API officially added to Node. js in v version 8.0. 0. We are also in production under v8. x. x.

So what is async_hooks?

async_hooks provides API for tracking asynchronous resources, which are objects with associated callbacks.

In short, the async_hooks module can be used to trace asynchronous callbacks. So how to use this tracking ability, and what are the problems in the process of using it?

Understanding async_hooks

async_hooks under v8.x. x has two main parts, one is createHook to track the life cycle, and the other is AsyncResource to create asynchronous resources.


const { createHook, AsyncResource, executionAsyncId } = require('async_hooks')

const hook = createHook({
 init (asyncId, type, triggerAsyncId, resource) {},
 before (asyncId) {},
 after (asyncId) {},
 destroy (asyncId) {}
})
hook.enable()

function fn () {
 console.log(executionAsyncId())
}

const asyncResource = new AsyncResource('demo')
asyncResource.run(fn)
asyncResource.run(fn)
asyncResource.emitDestroy()

The meaning and execution result of the above code is:

Creates an hooks instance containing the hook function executed during each init, before, after, destroy declaration cycle of the asynchronous operation. Enable this hooks instance. Manually create an asynchronous resource of type demo. The init hook is triggered, the asynchronous resource id is asyncId, the type is type (that is, demo), the creation context of the asynchronous resource id is triggerAsyncId, and the asynchronous resource is resource. Execute the fn function twice with this asynchronous resource, which will trigger before twice and after twice. The asynchronous resource id is asyncId, and this asyncId is the same as the value obtained through executionAsyncId in the fn function. Manually trigger the destroy lifecycle hook.

Asynchronous operations like async, await, promise syntax or request are all backed by asynchronous resources, which will also trigger these lifecycle hook functions.

Then, in the init hook function, we can connect asynchronous calls in series through the pointing relationship between the asynchronous resource creating context triggerAsyncId (parent) and the current asynchronous resource asyncId (child), and get the asyncId of the asynchronous resource executing the current callback through executionAsyncId () in the callback function (that is, fn of the above code), and trace the source of the call from the call chain.

At the same time, we also need to note that init is a hook created by asynchronous resources, not a hook created by asynchronous callback functions, and will only be executed once when asynchronous resources are created. What problems will this bring in actual use?

Request tracking

For the purpose of exception troubleshooting and data analysis, it is hoped that in our Node. js service of Ada architecture, request-id in the request header of the request sent by the client received by the server will be automatically added to the request header of each request sent to the middle and back office services.

The simple design of the function implementation is as follows:

The init hook enables asynchronous resources on the same call chain to share one storage object. request-id in the request header is parsed and added to the corresponding store of the current asynchronous call chain. Rewrite the request method of http and https modules, and obtain the request-id in the corresponding storage of the current call chain when the request is executed.

The sample code is as follows:


const http = require('http')
const { createHook, executionAsyncId } = require('async_hooks')
const fs = require('fs')

//  Trace the call chain and create a call chain storage object 
const cache = {}
const hook = createHook({
 init (asyncId, type, triggerAsyncId, resource) {
  if (type === 'TickObject') return
  //  Because in  Node.js  Medium  console.log  Is also an asynchronous behavior, which will trigger  init  Hook, so we can only log by synchronous method 
  fs.appendFileSync('log.out', `init ${type}(${asyncId}: trigger: ${triggerAsyncId})\n`);
  //  Determines whether the call chain storage object has been initialized 
  if (!cache[triggerAsyncId]) {
   cache[triggerAsyncId] = {}
  }
  //  Share the parent node's storage with the current asynchronous resource by reference 
  cache[asyncId] = cache[triggerAsyncId]
 }
})
hook.enable()

//  Rewrite  http
const httpRequest = http.request
http.request = (options, callback) => {
 const client = httpRequest(options, callback)
 //  Object of the store corresponding to the asynchronous resource to which the current request  request-id  Write  header
 const requestId = cache[executionAsyncId()].requestId
 console.log('cache', cache[executionAsyncId()])
 client.setHeader('request-id', requestId)

 return client
}

function timeout () {
 return new Promise((resolve, reject) => {
  setTimeout(resolve, Math.random() * 1000)
 })
}
//  Create a service 
http
 .createServer(async (req, res) => {
  //  Object for the current request  request-id  Write storage 
  cache[executionAsyncId()].requestId = req.headers['request-id']
  //  Simulation 1 Some other time-consuming operations 
  await timeout()
  //  Send 1 Request 
  http.request('http://www.baidu.com', (res) => {})
  res.write('hello\n')
  res.end()
 })
 .listen(3000)

Execute the code and perform a send test, and find that request-id can be obtained correctly.

Trap

However, the above code is problematic. As demonstrated in the previous code when introducing async_hooks module, an asynchronous resource can continuously execute different functions, that is, asynchronous resources may be reused. In particular, for asynchronous resources like TCP, which are created by the C/C + + section, multiple requests may use the same TCP asynchronous resource, so that in this case, the initial init hook function will only be executed once when multiple requests arrive at the server, resulting in call chain tracing of multiple requests to the same triggerAsyncId, thus referencing the same store.

We will make the following modifications to the previous code for 1 verification. The storage initialization section saves triggerAsyncId for easy observation of the trace relationship of asynchronous calls:


  if (!cache[triggerAsyncId]) {
   cache[triggerAsyncId] = {
    id: triggerAsyncId
   }
  }

The timeout function is changed to perform one long time-consuming operation and then one short time-consuming operation:


function timeout () {
 return new Promise((resolve, reject) => {
  setTimeout(resolve, [1000, 5000].pop())
 })
}

After restarting the service, send two requests in a row using postman (curl is not used because curl closes the connection at the end of each request, which makes it impossible to reproduce), and you can observe the following output:

{id: 1, requestId: 'Second request id'}
{id: 1, requestId: 'id for second request'}

It can be found that in the case of multiple concurrent operations and other operations with variable time consumption between write-read storage operations, the value of the request storage that arrives at the server first will be replicated by the request execution that arrives at the server later, so that the previous request reads the wrong value. Of course, you can guarantee that you don't insert other time-consuming operations between writing and reading, but this kind of mental maintenance guarantee is obviously unreliable in complex services. At this time, we need to make JS enter a brand-new asynchronous resource context before each read and write, that is, get a brand-new asyncId, so as to avoid this reuse. The part of the call chain store needs to be modified in the following ways:


const http = require('http')
const { createHook, executionAsyncId } = require('async_hooks')
const fs = require('fs')
const cache = {}

const httpRequest = http.request
http.request = (options, callback) => {
 const client = httpRequest(options, callback)
 const requestId = cache[executionAsyncId()].requestId
 console.log('cache', cache[executionAsyncId()])
 client.setHeader('request-id', requestId)

 return client
}

//  Extract the stored initialization as 1 Independent methods 
async function cacheInit (callback) {
 //  Utilization  await  Operation causes  await  The code after entering 1 A new asynchronous context 
 await Promise.resolve()
 cache[executionAsyncId()] = {}
 //  Use  callback  Execute in such a way that subsequent operations belong to this new asynchronous context 
 return callback()
}

const hook = createHook({
 init (asyncId, type, triggerAsyncId, resource) {
  if (!cache[triggerAsyncId]) {
   // init hook  No more initialization 
   return fs.appendFileSync('log.out', ` Not used  cacheInit  Method to initialize `)
  }
  cache[asyncId] = cache[triggerAsyncId]
 }
})
hook.enable()

function timeout () {
 return new Promise((resolve, reject) => {
  setTimeout(resolve, [1000, 5000].pop())
 })
}

http
.createServer(async (req, res) => {
 //  Subsequent actions are used as  callback  Incoming  cacheInit
 await cacheInit(async function fn() {
  cache[executionAsyncId()].requestId = req.headers['request-id']
  await timeout()
  http.request('http://www.baidu.com', (res) => {})
  res.write('hello\n')
  res.end()
 })
})
.listen(3000)

It is worth mentioning that this organization using callback is 10% different from the middleware pattern of koajs.


async function middleware (ctx, next) {
 await Promise.resolve()
 cache[executionAsyncId()] = {}
 return next()
}

NodeJs v14

This way of using await Promise. resolve () to create a new asynchronous context always seems "crooked". Fortunately, the official implementation of creating asynchronous context asyncResource. runInAsyncScope is provided in NodeJs v 9.x. x. Even better, NodeJs v14. x. x version directly provides the official implementation of asynchronous call chain data storage, which will directly help you complete the three tasks of asynchronous call relationship tracking, creating new asynchronous online text and managing data! The API will not be described in detail, and we will directly use the implementation before the new API transformation


const { AsyncLocalStorage } = require('async_hooks')
//  Create directly 1 A  asyncLocalStorage  Stores instances, no longer requires management  async  Lifecycle hook 
const asyncLocalStorage = new AsyncLocalStorage()
const storage = {
 enable (callback) {
  //  Use  run  Method to create a brand new store, and you need to let subsequent operations act as  run  Method to use the new asynchronous resource context 
  asyncLocalStorage.run({}, callback)
 },
 get (key) {
  return asyncLocalStorage.getStore()[key]
 },
 set (key, value) {
  asyncLocalStorage.getStore()[key] = value
 }
}

//  Rewrite  http
const httpRequest = http.request
http.request = (options, callback) => {
 const client = httpRequest(options, callback)
 //  Object of the asynchronous resource store  request-id  Write  header
 client.setHeader('request-id', storage.get('requestId'))

 return client
}

//  Use 
http
 .createServer((req, res) => {
  storage.enable(async function () {
   //  Object for the current request  request-id  Write storage 
   storage.set('requestId', req.headers['request-id'])
   http.request('http://www.baidu.com', (res) => {})
   res.write('hello\n')
   res.end()
  })
 })
 .listen(3000)

As you can see, the official implementation of asyncLocalStorage. run API and our version 2 implementation are also structurally identical.

Thus, under Node. js v 14. x. x, the function of request tracing using async_hooks module is easily realized.