Three solutions to the MongoDB disk IO problem

  • 2020-08-22 23:00:03
  • OfStack

IO concept

In the process of database optimization and storage planning, some important concepts of IO 1 are always mentioned. Here I will record 1 in detail. The degree of familiarity with this concept also determines the degree of understanding of database and storage optimization.

Read/write IO, most commonly referred to as read IO, is an instruction to read a sector from disk. Instruction 1 typically informs the disk of the start sector position, then gives the number of consecutive sectors to be read from behind the initial sector, and whether the action is read or write. The disk receives this instruction and reads or writes the data as required by the instruction. The command that the controller sends plus the data is IO once, read or write.

Large/small block IO refers to the number of successive read sectors given in the instruction of the controller. If the number is large, such as 128,64, etc., it should be considered as large block IO; if it is small, such as 1, 4,8, etc., it should be considered as small block IO. There is no clear boundary between large block and small block.

Continuous/random IO, continuous and random, refers to the initial sector address given by IO and the end sector address of the previous IO, whether it is completely continuous or not far apart, if so, this IO should be considered as a consecutive IO, if the difference is too large, then it is considered as a random IO. Continuous IO, because the initial sector and the last end sector are very close apart, the magnetic head hardly needs to change channel or the change time is very short. If the difference is too large, the magnetic head will need a long time to change lanes. If there is a lot of random IO, the magnetic head will keep changing lanes, which greatly reduces the efficiency.

Sequential/concurrent IO, which means whether the disk controller issues one or more instruction sets to the disk group for every one time. If it is 1, then the IO queue in the controller cache can only come one by one. At this time, it is IO in order. If the controller can simultaneously issue a set of instructions to multiple disks in a disk group, multiple IO can be executed at a time, which is concurrent IO mode. Concurrent IO mode improves efficiency and speed.

IO concurrency probability. Single disk, IO concurrency probability is 0, because a disk can only do IO once at a time. In the case of raid0 with 2 disks, there is a 1/2 chance that 2 IO will occur if the band is too deep (the band is too small to concatenate IO, as described below). For other cases, please calculate by yourself.

IOPS. Time taken for 1 IO = seek time + data transmission time. IOPS = IO concurrency coefficient/(seek time + data transmission time). Since the seek time is several orders of magnitude larger than the transmission time, the key factor affecting IOPS is the bottom seek time. In the case of continuous IO, the seek time is very short, and the seek time is only needed when changing tracks. Under this premise, the less transmission time, the higher the IOPS.

IO throughput per second. Obviously, IO throughput per second = IOPS times the average IO SIZE. The larger the Io size, the higher the IOPS and the higher the IO throughput per second. Set the magnetic head data reading and writing speed per second as V and V as fixed. Then IOPS = IO concurrency coefficient/(seek time +IO SIZE/V), so IO throughput per second = IO concurrency coefficient times IO SIZE times V/ (V times seek time +IO SIZE). We can see that the biggest factors affecting the throughput of IO per second are IO SIZE and seek time. The larger the seek time is, the smaller the seek time is and the higher the throughput is. There is only one factor that can significantly affect IOPS, and that is seek time.

Three solutions to the MongoDB disk IO problem

1. Use composite large documents

We know that MongoDB is a document database, and that each record is a document in JSON format. For example, in the following example, one such statistic is generated every day:

[

{metric: content_count, client: 5, value: 51, date: ISODate(2012-04-01 13:00)}

{metric: content_count, client: 5, value: 49, date: ISODate(2012-04-02 13:00)}

]

In the case of composite large documents, all the data of a month can be stored in one record like this:

[

{metric: content_count, client: 5, month: 2012-04, 1:51, 2:49... }

]

By the above two ways of storage, the first one stores about 7GB data (the machine only has 1.7GB memory), the test reads the information for 1 year, the two have a significant difference in reading performance:

Type 1:1.6 seconds

Type 2:0.3 seconds

So what's the problem?

In fact, the reason is that combinatorial storage can read fewer documents when reading data. If the documents can not be read completely in memory, the cost is mainly spent on disk seek. In the first storage mode, the number of documents to be read in the acquisition of one year's data is more, so the number of disk seek is also more. So it's slower.

In fact, foursquare, a well-known user of MongoDB, USES this approach extensively to improve reading performance.

2. Use a special index structure

As we know, MongoDB and traditional database 1 are both data structures with B tree as index. For tree indexes, the more centralized the indexes used to hold hot data are in storage, the less memory the indexes waste. So let's compare the following two index structures:


  db.metrics.ensureIndex({ metric: 1, client: 1, date: 1})  with  db.metrics.ensureIndex({ date: 1, metric: 1, client: 1 })

With these two different structures, the difference in insertion performance is also obvious.

When the first structure is adopted, the insertion speed of 10k/s can be basically maintained when the data volume is below 20 million, but when the data volume increases, the insertion speed will gradually decrease to 2.5k/s, and when the data volume increases again, the performance may be even lower.

When the second structure is adopted, the insertion speed can be basically stable at 10k/s.

The reason for this is that the second structure places the date field in the first bit of the index, so that when the index is built, the new data updates the index, not in the middle, but at the end of the index. Indexes that have been inserted too early rarely need to be modified in subsequent inserts. In the first case, since the date field is not in the first place, its index updates often occur in the middle of the tree structure, resulting in frequent large-scale changes to the index structure.

3. Reserve space

As with point 1, this point also takes into account that the main operating time of a traditional mechanical hard disk is spent on the seek operation.

For example, let's take the example in point 1 again. When inserting data, we insert the space required by the data of the year once in advance. This ensures that our data for a year and 12 months is stored sequentially on disk in 1 record, so we may only need 1 sequential read on disk to get a year's worth of data at read time, compared to the previous 12 reads on disk.


  db.metrics.insert([

  { metric: content_count, client: 3, date: 2012-01, 0: 0, 1: 0, 2: 0, ... }

  { .................................., date:

  { .................................., date:

  { .................................., date:

  { .................................., date:

  { .................................., date:

  { .................................., date:

  { .................................., date:

  { .................................., date:

  { .................................., date:

  { .................................., date:

  { .................................., date:

  ])

Results:

[

If you don't reserve space, it takes 62ms to read a year's worth of records

If reserved space is used, it only takes 6.6ms to read a 1-year record

]

conclusion


Related articles: