node uses the async_hooks module for request tracing
- 2021-10-24 18:59:23
- OfStack
The async_hooks module is an experimental API officially added to Node. js in v version 8.0. 0. We are also in production under v8. x. x.
So what is async_hooks?
async_hooks provides API for tracking asynchronous resources, which are objects with associated callbacks.
In short, the async_hooks module can be used to trace asynchronous callbacks. So how to use this tracking ability, and what are the problems in the process of using it?
Understanding async_hooks
async_hooks under v8.x. x has two main parts, one is createHook to track the life cycle, and the other is AsyncResource to create asynchronous resources.
const { createHook, AsyncResource, executionAsyncId } = require('async_hooks')
const hook = createHook({
init (asyncId, type, triggerAsyncId, resource) {},
before (asyncId) {},
after (asyncId) {},
destroy (asyncId) {}
})
hook.enable()
function fn () {
console.log(executionAsyncId())
}
const asyncResource = new AsyncResource('demo')
asyncResource.run(fn)
asyncResource.run(fn)
asyncResource.emitDestroy()
The meaning and execution result of the above code is:
Creates an hooks instance containing the hook function executed during each init, before, after, destroy declaration cycle of the asynchronous operation. Enable this hooks instance. Manually create an asynchronous resource of type demo. The init hook is triggered, the asynchronous resource id is asyncId, the type is type (that is, demo), the creation context of the asynchronous resource id is triggerAsyncId, and the asynchronous resource is resource. Execute the fn function twice with this asynchronous resource, which will trigger before twice and after twice. The asynchronous resource id is asyncId, and this asyncId is the same as the value obtained through executionAsyncId in the fn function. Manually trigger the destroy lifecycle hook.Asynchronous operations like async, await, promise syntax or request are all backed by asynchronous resources, which will also trigger these lifecycle hook functions.
Then, in the init hook function, we can connect asynchronous calls in series through the pointing relationship between the asynchronous resource creating context triggerAsyncId (parent) and the current asynchronous resource asyncId (child), and get the asyncId of the asynchronous resource executing the current callback through executionAsyncId () in the callback function (that is, fn of the above code), and trace the source of the call from the call chain.
At the same time, we also need to note that init is a hook created by asynchronous resources, not a hook created by asynchronous callback functions, and will only be executed once when asynchronous resources are created. What problems will this bring in actual use?
Request tracking
For the purpose of exception troubleshooting and data analysis, it is hoped that in our Node. js service of Ada architecture, request-id in the request header of the request sent by the client received by the server will be automatically added to the request header of each request sent to the middle and back office services.
The simple design of the function implementation is as follows:
The init hook enables asynchronous resources on the same call chain to share one storage object. request-id in the request header is parsed and added to the corresponding store of the current asynchronous call chain. Rewrite the request method of http and https modules, and obtain the request-id in the corresponding storage of the current call chain when the request is executed.The sample code is as follows:
const http = require('http')
const { createHook, executionAsyncId } = require('async_hooks')
const fs = require('fs')
// Trace the call chain and create a call chain storage object
const cache = {}
const hook = createHook({
init (asyncId, type, triggerAsyncId, resource) {
if (type === 'TickObject') return
// Because in Node.js Medium console.log Is also an asynchronous behavior, which will trigger init Hook, so we can only log by synchronous method
fs.appendFileSync('log.out', `init ${type}(${asyncId}: trigger: ${triggerAsyncId})\n`);
// Determines whether the call chain storage object has been initialized
if (!cache[triggerAsyncId]) {
cache[triggerAsyncId] = {}
}
// Share the parent node's storage with the current asynchronous resource by reference
cache[asyncId] = cache[triggerAsyncId]
}
})
hook.enable()
// Rewrite http
const httpRequest = http.request
http.request = (options, callback) => {
const client = httpRequest(options, callback)
// Object of the store corresponding to the asynchronous resource to which the current request request-id Write header
const requestId = cache[executionAsyncId()].requestId
console.log('cache', cache[executionAsyncId()])
client.setHeader('request-id', requestId)
return client
}
function timeout () {
return new Promise((resolve, reject) => {
setTimeout(resolve, Math.random() * 1000)
})
}
// Create a service
http
.createServer(async (req, res) => {
// Object for the current request request-id Write storage
cache[executionAsyncId()].requestId = req.headers['request-id']
// Simulation 1 Some other time-consuming operations
await timeout()
// Send 1 Request
http.request('http://www.baidu.com', (res) => {})
res.write('hello\n')
res.end()
})
.listen(3000)
Execute the code and perform a send test, and find that request-id can be obtained correctly.
Trap
At the same time, we also need to note that init is a hook created by asynchronous resources, not a hook created by asynchronous callback functions, and will only be executed once when asynchronous resources are created.
However, the above code is problematic. As demonstrated in the previous code when introducing async_hooks module, an asynchronous resource can continuously execute different functions, that is, asynchronous resources may be reused. In particular, for asynchronous resources like TCP, which are created by the C/C + + section, multiple requests may use the same TCP asynchronous resource, so that in this case, the initial init hook function will only be executed once when multiple requests arrive at the server, resulting in call chain tracing of multiple requests to the same triggerAsyncId, thus referencing the same store.
We will make the following modifications to the previous code for 1 verification. The storage initialization section saves triggerAsyncId for easy observation of the trace relationship of asynchronous calls:
if (!cache[triggerAsyncId]) {
cache[triggerAsyncId] = {
id: triggerAsyncId
}
}
The timeout function is changed to perform one long time-consuming operation and then one short time-consuming operation:
function timeout () {
return new Promise((resolve, reject) => {
setTimeout(resolve, [1000, 5000].pop())
})
}
After restarting the service, send two requests in a row using postman (curl is not used because curl closes the connection at the end of each request, which makes it impossible to reproduce), and you can observe the following output:
{id: 1, requestId: 'Second request id'}
{id: 1, requestId: 'id for second request'}
It can be found that in the case of multiple concurrent operations and other operations with variable time consumption between write-read storage operations, the value of the request storage that arrives at the server first will be replicated by the request execution that arrives at the server later, so that the previous request reads the wrong value. Of course, you can guarantee that you don't insert other time-consuming operations between writing and reading, but this kind of mental maintenance guarantee is obviously unreliable in complex services. At this time, we need to make JS enter a brand-new asynchronous resource context before each read and write, that is, get a brand-new asyncId, so as to avoid this reuse. The part of the call chain store needs to be modified in the following ways:
const http = require('http')
const { createHook, executionAsyncId } = require('async_hooks')
const fs = require('fs')
const cache = {}
const httpRequest = http.request
http.request = (options, callback) => {
const client = httpRequest(options, callback)
const requestId = cache[executionAsyncId()].requestId
console.log('cache', cache[executionAsyncId()])
client.setHeader('request-id', requestId)
return client
}
// Extract the stored initialization as 1 Independent methods
async function cacheInit (callback) {
// Utilization await Operation causes await The code after entering 1 A new asynchronous context
await Promise.resolve()
cache[executionAsyncId()] = {}
// Use callback Execute in such a way that subsequent operations belong to this new asynchronous context
return callback()
}
const hook = createHook({
init (asyncId, type, triggerAsyncId, resource) {
if (!cache[triggerAsyncId]) {
// init hook No more initialization
return fs.appendFileSync('log.out', ` Not used cacheInit Method to initialize `)
}
cache[asyncId] = cache[triggerAsyncId]
}
})
hook.enable()
function timeout () {
return new Promise((resolve, reject) => {
setTimeout(resolve, [1000, 5000].pop())
})
}
http
.createServer(async (req, res) => {
// Subsequent actions are used as callback Incoming cacheInit
await cacheInit(async function fn() {
cache[executionAsyncId()].requestId = req.headers['request-id']
await timeout()
http.request('http://www.baidu.com', (res) => {})
res.write('hello\n')
res.end()
})
})
.listen(3000)
It is worth mentioning that this organization using callback is 10% different from the middleware pattern of koajs.
async function middleware (ctx, next) {
await Promise.resolve()
cache[executionAsyncId()] = {}
return next()
}
NodeJs v14
This way of using await Promise. resolve () to create a new asynchronous context always seems "crooked". Fortunately, the official implementation of creating asynchronous context asyncResource. runInAsyncScope is provided in NodeJs v 9.x. x. Even better, NodeJs v14. x. x version directly provides the official implementation of asynchronous call chain data storage, which will directly help you complete the three tasks of asynchronous call relationship tracking, creating new asynchronous online text and managing data! The API will not be described in detail, and we will directly use the implementation before the new API transformation
const { AsyncLocalStorage } = require('async_hooks')
// Create directly 1 A asyncLocalStorage Stores instances, no longer requires management async Lifecycle hook
const asyncLocalStorage = new AsyncLocalStorage()
const storage = {
enable (callback) {
// Use run Method to create a brand new store, and you need to let subsequent operations act as run Method to use the new asynchronous resource context
asyncLocalStorage.run({}, callback)
},
get (key) {
return asyncLocalStorage.getStore()[key]
},
set (key, value) {
asyncLocalStorage.getStore()[key] = value
}
}
// Rewrite http
const httpRequest = http.request
http.request = (options, callback) => {
const client = httpRequest(options, callback)
// Object of the asynchronous resource store request-id Write header
client.setHeader('request-id', storage.get('requestId'))
return client
}
// Use
http
.createServer((req, res) => {
storage.enable(async function () {
// Object for the current request request-id Write storage
storage.set('requestId', req.headers['request-id'])
http.request('http://www.baidu.com', (res) => {})
res.write('hello\n')
res.end()
})
})
.listen(3000)
As you can see, the official implementation of asyncLocalStorage. run API and our version 2 implementation are also structurally identical.
Thus, under Node. js v 14. x. x, the function of request tracing using async_hooks module is easily realized.