Some methods of giving full play to the performance of Node. js programs are introduced

  • 2020-06-19 09:46:12
  • OfStack

An ES1en. JS process will only run on a single physical core, which is why you need to take extra care when developing scalable servers.

Since there are a series of stable API, plus the development of native extensions to manage processes, there are many different ways to design a parallel Node.JS application. In this post, we compare these possible architectures.

This article also introduces the ES10en-cluster module: a small ES12en.JS library that can be used to easily manage processes and distributed computing on two lines.

Problems encountered

In our Mozilla Persona project we needed to be able to handle a large number of requests with different characteristics, so we tried using Node.JS.

In order not to affect the user experience, we designed the 'Interactive' request to require only lightweight computing consumption, but provide faster reflection time so that the UI does not feel stuck. In contrast, the 'Batch' operation takes about half a second of processing time and may have a longer latency for other reasons.


In order to better design, we found a lot of ways to meet our current needs.
With scalability and cost in mind, we list the following key requirements:

Efficiency: Efficient use of all idle processors Response: Our "application" responds quickly and in real time Elegance: When requests are too numerous to handle, we handle what we can handle. If you can't handle it, give clear feedback Simplicity: Our solution must be simple and convenient to use

Through the above points, we can clearly and purposefully screen

Scenario 1: Process directly in the main thread.

When the main thread deals directly with data, the results are poor:

You can't take full advantage of multi-core CPU, and in an interactive request/response, you have to wait for the current request (or response) to be processed without elegance.

The only advantage of this scheme is that it is simple enough


function myRequestHandler(request, response) [
 // Let's bring everything to a grinding halt for half a second.
 var results = doComputationWorkSync(request.somesuch);
}

In the ES48en.JS program, you want to process multiple requests simultaneously, but you also want to process them simultaneously.

Method 2: Use asynchronous processing or not.

Would 1 be a big performance improvement if it were executed in the background using asynchronous methods?

The answer is no. It depends on whether running in the background makes sense

For example, if the performance is no better when using javascript or native code on the main thread than when using synchronous processing, it may not be necessary to use asynchronous methods in the background

Read the code below


function doComputationWork(input, callback) {
 // Because the internal implementation of this asynchronous
 // function is itself synchronously run on the main thread,
 // you still starve the entire process.
 var output = doComputationWorkSync(input);
 process.nextTick(function() {
  callback(null, output);
 });
}
 
function myRequestHandler(request, response) [
 // Even though this *looks* better, we're still bringing everything
 // to a grinding halt.
 doComputationWork(request.somesuch, function(err, results) {
  // ... do something with results ...
 });

}
The key point is that the use of NodeJS asynchronous API does not depend on multi-process applications

Scenario 3: Use thread libraries for asynchronous processing.

With proper implementation, libraries implemented with native code can break through the limitations of NodeJS calls to achieve multithreading functionality.

There are many such examples, bcrypt library written by Nick Campbell is one of the excellent ones.

If you run a test of this library on a 4-core machine, you'll see a magic 1:4 times throughput per hour, and almost all resources are used up! But if you test on a 24-core machine, the results won't change much: four cores are almost 100% used, but the rest are mostly idle.

The problem is that the library USES the thread pool inside NodeJS, which is not suitable for such calculations. In addition, this thread pool limit is dead and can only run 4 threads.

Beyond the dead limit, the deeper reason for this problem is this:

Using NodeJS's internal thread pool for a lot of computing can interfere with its file or network operations and make the program seem slow to respond. It's hard to find a proper way to deal with a waiting queue: imagine if you had a queue with five minutes of computation in it, would you want to add more threads to it?

Component libraries with built-in threading mechanisms do not effectively take advantage of multicore in this case, which reduces the responsiveness of the program and makes it perform worse as the load increases.


Scenario 4: Use the cluster module of NodeJS

Versions of NodeJS 0.6.x and above provide an cluster module, allowing the creation of a set of processes that "share the same socket" for load sharing.

What if you take the above scenario and use the cluster module at the same time?

The resulting solution would have the same disadvantages as synchronous processing or built-in thread pool 1: slow response and no elegance.

Sometimes, just adding a new run instance won't solve the problem.

Scheme 5: Es112EN-cluster module is introduced

In Persona, our solution is to maintain 1 set of function single 1 (but different) computing processes.

Along the way, we wrote the ES119en-ES120en library.

The library automatically starts and manages sub-processes on demand so that you can use a cluster of 1 local sub-processes to process data in code.

Use examples:


const computecluster = require('compute-cluster');
 
// allocate a compute cluster
var cc = new computecluster({ module: './worker.js' });
 
// run work in parallel
cc.enqueue({ input: "foo" }, function (error, result) {
 console.log("foo done", result);
});
cc.enqueue({ input: "bar" }, function (error, result) {
 console.log("bar done", result);
});

fileworker.js responds to message events and processes incoming requests:


process.on('message', function(m) {
 var output;
 // do lots of work here, and we don't care that we're blocking the
 // main thread because this process is intended to do one thing at a time.
 var output = doComputationWorkSync(m.input);
 process.send(output);
});
 

Without changing the calling code, the compute-cluster module can be integrated with the existing asynchronous API, allowing true multicore parallelism with minimal code.

Let's look at the performance of the scheme in four ways.

Multicore parallelism: The child processes use the entire core.

Responsiveness: Since the core management process is only responsible for starting child processes and passing messages, it is idle most of the time and can handle more interaction requests.

Even when the machine is under heavy load, we can still leverage the operating system scheduler to increase the priority of core management processes.

Simplicity: Using asynchronous API to hide implementation details, we can easily integrate this module into our current project without even changing the calling code.

Now let's see if we can find a way that even if the load suddenly surges, the efficiency of the system does not abnormally decrease.

The best goal, of course, is still to keep the system running efficiently and handling as many requests as possible even with a surge in stress.


To help implement a good solution, compute-ES160en not only manages child processes and delivers messages, it also manages other information.

It records the number of children that are currently running and the average time it takes each child to complete.

With these records, we can predict how long a child process will take before it starts.

This, along with user-set parameters (max_request_time), allows us to close requests that might time out without processing.

This feature makes it easy to determine your code based on the user experience. For example, "Users should not wait more than 10 seconds while logging in." This is roughly equivalent to setting max_request_time to 7 seconds (considering the network transfer time).

The results of our stress tests on Persona services are very satisfactory.

Under extremely high pressure, we were able to provide services to authenticated users, blocked 1 unauthenticated user and displayed the relevant error messages.


Related articles: