Optimization of the Mongodb index

  • 2020-06-01 11:14:49
  • OfStack

MongoDB is a database based on distributed file storage. Written by C++ language. Designed to provide scalable, high-performance data storage solutions for WEB applications. The MongoDB index is almost identical to that of a relational database.MongoDB's query optimizer can use this data structure to quickly find and sort the documents in the collection (collection) (collection). From the command line, you can build an index by calling the ensureIndex() function, which specifies one or more fields that need to be indexed. Here's how to optimize the mongodb index

1. Introduction to the index

For example, the following data


db.refactor.insert({"username":"refactor","age":24,"isactive":true})
db.refactor.insert({"username":"refactor","age":30,"isactive":false})
db.refactor.insert({"username":"aaaaa","age":24,"isactive":false})
db.refactor.insert({"username":"aaaaa","age":34,"isactive":true})
db.refactor.insert({"username":"sssssss","age":24,"isactive":true})
db.refactor.insert({"username":"tttttt","age":24,"isactive":true})
db.refactor.insert({"username":"tttttt","age":54,"isactive":true})
db.refactor.insert({"username":"bbbbb","age":24,"isactive":false})
db.refactor.insert({"username":"rrrrr","age":24,"isactive":true})
db.refactor.insert({"username":"rrrrr","age":54,"isactive":false})

To search by the username key, you can index it to speed up the query.


db.refactor.ensureIndex({"username":1})

To search by the username, age keys, you can build an index on this key to speed up the query.


db.refactor.ensureIndex({"age":1,"username":1})

The document passed to ensureIndex is a set of keys with a value of 1 or -1,1 ascending, and -1 descending, indicating the direction in which the index was created. If the index has only 1 key, the direction is irrelevant.

If you have multiple keys, you have to consider the direction of the index.

2. The built-in monitoring of mongodb can be used as the basis for optimization based on the monitoring information

1. explain execution plan

MongoDB provides an explain command to let us know how the system handles query requests. With the explain command, we can see how the system can use indexes to speed up retrieval and optimize indexes accordingly.

Several key field descriptions

cursor: return cursor type (BasicCursor or BtreeCursor)
nscanned: number of documents scanned
n: number of documents returned
millis: time in milliseconds
indexBounds: the index used

For example,

SQL code


>db.order.ensureIndex({"user.uid":1})

>db.order.find({ "status": 1.0, "user.uid": { $gt: 2663199.0 } }).explain() 
{ 
"cursor" : "BtreeCursor user.uid_1", 
"nscanned" : 337800, 
"nscannedObjects" : 337800, 
"n" : 337800, 
"millis" : 1371, 
"nYields" : 0, 
"nChunkSkips" : 0, 
"isMultiKey" : false, 
"indexOnly" : false, 
"indexBounds" : { 
"user.uid" : [ 
[ 
2663199, 
1.7976931348623157e+308 
] 
] 
} 
}

2. Optimizer profile

In MySQL, the slow query log is often used as the basis for our database optimization. Is there a similar function in MongoDB? And the answer is yes, that's MongoDBDatabaseProfiler. So MongoDB not only has but also has more detailed information than MySQL's SlowQueryLog.
mongodb takes the slow statement to output and puts it in db.system.profile. Similar to mysql's slowlog configuration, mongo will output slow statements to profile only if parameters are set. There are two parameters to control the output of profile
db.setProfilingLevel(level,slowms);

The default is 0. No output 1. Output 2 in full according to the second parameter's time threshold (in milliseconds). Usually we are testing the environment to turn on parameters when tuning. profile is generally not exported in a production environment.

Such as


> db.system.profile.find({millis:{$gt:1000}})

Can output, query time more than 1 second of the slow statement.

The output values of profile mean

ts: command execution time
info: content of the command
query: stands for query
order.order: library and collection of queries
reslen: return the result set size, byte number
nscanned: number of scanned records
nquery: query conditions follow
nreturned: returns the number of records and time spent
millis: time spent

If the discovery time is long, then you need to optimize.

Such as

(1) if the number of nscanned is large or close to the total number of records, index query may not be used.

(2) reslen is very large, and it is possible to return unnecessary fields.

(3) nreturned is very large, so it is possible that there is no restriction on the query.

3. Index selection mechanism of MongoDB

The optimizer for MongoDB selects the better index in the comparison.

First, it makes a preliminary "best index" for the query.

Second, if the best index does not exist, it will try to find the best index.

Finally, the optimizer remembers all the choices for similar queries (only to large file changes or changes on the index).

So how does the optimizer define the "best index" for the query? The best index must contain all the fields in the query that can be filtered and sorted. In addition, any field used for range scanning and sorting must be ranked after the field for the equivalent query. If there are different best indexes, Mongo will be selected at random.

4. Index summary of MongoDB

1. Equivalence test

Add all the fields in the index that need to be equivalent tested, in any order.

2. Sort field (ascending/descending problem of multi-sort field)

Add fields to the index in the order of the query.

3. Range filtering

Add a range filter field to the index from low to high based on the cardinality of the field (the number of different values of the field in Collection).

4. If the equivalent or range query field in the index does not filter out more than 90% of the documents in Collection, it is probably better to remove it from the index.

5. Index makes it possible to obtain data through key fields, enabling quick query and update of data. However, it is important to note that indexes can also add a bit of a burden to the system when inserting and deleting. When inserting data into a collection, the fields of the index must be added to B-Tree. Therefore, indexes are suitable for data sets that are read far more than written. For collections that are written frequently, indexes may have side effects in some cases. However, most collections are frequently read collections, so collections are useful in most cases.

6. If the data set is small (usually less than 4M), sort () can be used to return data without the need for an index. In this case, do a good job of combining limit() and sort().

About the optimization of Mongodb index to give you so much, I hope to help you!


Related articles: