Record an MongoDB performance issue of migrates from MySQL to MongoDB

  • 2020-06-15 10:27:39
  • OfStack

Company for the project, with several machine, high performance services with 1 color duplex 4 nuclear hyper-threading CPU, plus 32 G memory, operations staff, after installed MongoDB to hand in my hand, I used to see first before using the new server logs, l understand basic situation, when I browse MongoDB log, found 1 some warning information:

WARNING: You running a NUMA machine. We suggest mongod this avoid problems: numactl?? interleave=all mongod [other options]

At that time, I didn't know what NUMA was, so I didn't deal with it. I just fed back the problem to the operation and maintenance personnel. Later, I found that the operation and maintenance personnel didn't pay attention to the problem.

Migration requires importing old data. MongoDB itself has 1 mongoimport tool available, but it only accepts source files in json, csv and other formats, which is not suitable for my needs, so I didn't use it. Instead, I wrote a script in PHP. After running smoothly for a period of time, I found that the speed of data import decreased, and PHP threw an exception:

cursor timed out (timeout: 30000, time left: 0:0, status: 0)

I can't judge the problem at 1 time, so I want to increase the value of Timeout in the PHP script for 1 time:


<?php
MongoCursor::$timeout = -1;
?>

Unfortunately, this did not solve the problem. Instead, the mistake appeared in a different way:

max number of retries exhausted, couldn't send query, couldn't send query: Broken pipe

Then I used strace to track the 1 PHP script and found that the process was stuck on the recvfrom operation:


shell> strace -f -r -p <PID>
recvfrom(<FD>,

Query the meaning of the recvfrom operation with the following command:


shell> apropos recvfrom
receive a message from a socket

Or confirm 1 in the following way:


shell> lsof -p <PID>
shell> ls -l /proc/<PID>/fd/<FD>

At this point, if you query the current operation of MongoDB, you will find that almost every operation consumes a large amount of time:


mongo> db.currentOp()

At the same time, running mongostat results in a very high locked value.

...

I found an article on the Internet: MongoDB ES81en-ES82en for Faster Loading Importing. It looks very similar to my problem, but his problem is actually caused by data migration caused by automatic sharding. The solution is to use manual sharding.

...

Asked a few friends, someone reported that he had encountered similar problems. In his scenario, the main reason for the problem was that the system IO was busy, and the pre-allocation of data files blocked other operations, resulting in an avalanche effect.

To verify this possibility, I searched 1 MongoDB log:


shell> grep FileAllocator /path/to/log
[FileAllocator] allocating new datafile ... filling with zeroes...
[FileAllocator] done allocating datafile ... took ... secs

The file system I use is ext4 (xfs is also good). It is very fast to create data files, so this is not the reason. However, if someone USES ext3, they may encounter this kind of problem.

MongoDB automatically generates data files on demand: first < DB > .0, size 64M, and then < DB > 1, double size to 128M, here we are < DB > 5, double the size to 2G, then keep the data file at 2G. To avoid possible problems, you can adopt the strategy of manually creating the data file beforehand:


#!/bin/sh

DB_NAME=$1

cd /path/to/$DB_NAME

for INDEX_NUMBER in {5..50}; do
  FILE_NAME=$DB_NAME.$INDEX_NUMBER

  if [ ! -e $FILE_NAME ]; then
    head -c 2146435072 /dev/zero > $FILE_NAME
  fi
done

Note: The value 2146435072 is not standard 2G, it is determined by the range of INT integers.

...

The last way to ask for help is the official forum, where international friends suggested me to check whether 1 is caused by poor index, dead horse when live horse doctor, I activated Profiler record slow operation:


mongo> use <DB>
mongo> db.setProfilingLevel(1);

But it turns out that it's mostly insert (since I'm mainly importing data), and it doesn't need an index:


mongo> use <DB>
mongo> db.system.profile.find().sort({$natural:-1})

...

Problem has not solved, help themselves, I repeated several times the migration process of old data, the natural result is the same, but I found that when there is a problem, there is always a process called irqbalance CPU occupancy rate is high, search under 1, find that many of the introduced irqbalance article mentioned NUMA, reminds me 1 following warning messages in the log to see, before I le a go, went around so big a circle! Settle down to read through the documents and find that the official introduction has already been made. Follow the following Settings:


shell> echo 0 > /proc/sys/vm/zone_reclaim_mode
shell> numactl --interleave=all mongod [options]

For a description of the zone_reclaim_mode kernel parameters, refer to the official documentation.

Note: Starting from MongoDB1.9.2: MongoDB will automatically set zone_reclaim_mode on startup.

As for the meaning of NUMA, simply put, in the framework of multiple physical CPU, NUMA divides the memory into local and remote. Each physical CPU has its own local memory, which is faster than accessing remote memory. By default, each physical CPU can only access its own local memory. For MongoDB, which needs large memory, it may cause insufficient memory. For a detailed introduction of NUMA, please refer to the article written by foreigners.

In theory, MySQL, Redis, Memcached and so on May be affected by NUMA and should be noted.


Related articles: