The MongoDB index USES details

  • 2020-06-03 08:40:44
  • OfStack

Index is like the directory of a book, if the search for a certain content without the help of the directory, can only search through the whole, which leads to very low efficiency; If you can quickly locate the specific content area with the help of the directory, the efficiency will be improved linearly.

The index profile

First open the command line and type mongo. By default mongodb connects to a database named test.


➜ ~ mongo
MongoDB shell version: 2.4.9
connecting to: test
> show collections
> 

You can use show collections/tables to see that the database is empty.

Then execute the following code at the mongodb command line terminal


> for(var i=0;i<100000;i++) {
... db.users.insert({username:'user'+i})
... }
> show collections
system.indexes
users
> 

Looking at the database, we find two more tables, system.indexes, the so-called index, and users, the newly created database table.
This gives you 100,000 pieces of data in the user table.


> db.users.find()
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e4"), "username" : "user0" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e5"), "username" : "user1" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e6"), "username" : "user2" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e7"), "username" : "user3" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e8"), "username" : "user4" }
{ "_id" : ObjectId("5694d5da8fad9e319c5b43e9"), "username" : "user5" }

Now you need to find any one of these, for example


> db.users.find({username: 'user1234'})
{ "_id" : ObjectId("5694d5db8fad9e319c5b48b6"), "username" : "user1234" }

This data was found successfully, but you need to know the details, and you need to add the explain method


> db.users.find({username: 'user1234'}).explain()
{
  "cursor" : "BasicCursor",
  "isMultiKey" : false,
  "n" : 1,
  "nscannedObjects" : 100000,
  "nscanned" : 100000,
  "nscannedObjectsAllPlans" : 100000,
  "nscannedAllPlans" : 100000,
  "scanAndOrder" : false,
  "indexOnly" : false,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "millis" : 30,
  "indexBounds" : {
    
  },
  "server" : "root:27017"
}

There are many parameters, of which we are currently focusing on "nscanned" : 100,000 and "millis" : 30.

nscanned represents the total number of documents scanned by mongodb during the completion of this query. You can see that every document in the collection is scanned for a total of 30 milliseconds.

If you have 10 million pieces of data, if you go through the document once per query. Well, that's a lot of time.

Indexes are a good solution for such queries.


> db.users.ensureIndex({"username": 1})

Then look for user1234


> db.users.ensureIndex({"username": 1})
> db.users.find({username: 'user1234'}).explain()
{
  "cursor" : "BtreeCursor username_1",
  "isMultiKey" : false,
  "n" : 1,
  "nscannedObjects" : 1,
  "nscanned" : 1,
  "nscannedObjectsAllPlans" : 1,
  "nscannedAllPlans" : 1,
  "scanAndOrder" : false,
  "indexOnly" : false,
  "nYields" : 0,
  "nChunkSkips" : 0,
  "millis" : 0,
  "indexBounds" : {
    "username" : [
      [
        "user1234",
        "user1234"
      ]
    ]
  },
  "server" : "root:27017"
}

Surprisingly enough, the query was done in a flash, because only one piece of data was found by the index, not 100,000.

There is, of course, a cost to using an index: for every index you add, each write (insert, update, delete) takes more time. This is because, when data changes, not only the document is updated, but all indexes on the level collection are updated. Therefore, mongodb restricts each collection to a maximum of 64 indexes. In general, you should not have more than two indexes on a particular collection.

tip

If you have a very generic query, or if the query is causing a performance bottleneck, indexing into a field such as username is a good option. This field should not be indexed, however, because it is only a query that is used by an administrator (regardless of how long the query takes).

The composite index

Index values are ordered by 1, so sorting documents using index keys is very fast.


db.users.find().sort({'age': 1, 'username': 1})

This is sorted by age first and then username, so username doesn't play much of a role here. To optimize this sort, you may need to index on age and username.

db.users.ensureIndex({'age':1, 'username': 1})
This creates a composite index (an index on multiple fields), which is useful if the query criteria include multiple keys.

After the composite index is created, each index entry includes an age field and an username field and points to where the document is stored on disk.
At this point, the age field is sorted in strict ascending order, and then in ascending order if age is equal.

A query

Click here for enquiry (point query)

Used to query for a single value (although there may be multiple documents containing this value)


db.users.find({'age': 21}).sort({'username': -1})

Since we have already built the composite index, 1 age1 username, the index is built using ascending sorting (that is, the number 1), when we use the point query to look up {age: 21}, we still have 100,000 data, maybe many people of age 21, so we will find more than one data. sort({'username': -1}) then sorts the data in reverse order, as intended. But let's not forget that the index 'username' : 1 is ascending (from smallest to largest), if you want to reverse order, you just start with the last index and iterate to get the desired result.

The sorting direction is not important, mongodb can traverse the index from any direction.
To sum up, the composite index is very efficient in the case of point-based query. It can directly locate the age without sorting the results and return the results.

Multi-value query (multi-value-query)


db.users.find({'age': {"$gte": 21, "$lte": 30}})

Find documents that match multiple values. Multi-value queries can also be understood as multi-point queries.
As above, look for ages between 21 and 30. monogdb USES the first key in the index, "age", to get matching results, which are usually in index order.


> for(var i=0;i<100000;i++) {
... db.users.insert({username:'user'+i})
... }
> show collections
system.indexes
users
> 
0

Like the previous one, you need to sort the results.
In the absence of sort, the first result of our query is based on the fact that age equals 21 and age equals 22.. Sort from small to large in this way. When age equals 21 and there are more than one, sort usernameA-ES128en (0-9) in this way. So, sort({'username': 1}), to sort all the results by name ascending, this time had to be sorted in memory and then returned. It's not as efficient as the last one.

Of course, sorting doesn't take much time when there are very few documents.
If the result set is large, say more than 32MB, MongoDB will refuse to sort so much data.

There is another solution

Another index {'username': 1, 'age': 1} can also be created. If username is indexed first, then sortusername is not sorted. But you need to search the entire document for age equals 21, so the search takes a long time.

But which is more efficient?

If you have more than one index, how do you choose which one to use?
Efficiency is contingent, if there is no limit, do not have to sort but need to search the entire set of time will be much more than the former. But when some data is returned (such as limit (1000)), a new winner is created.


> for(var i=0;i<100000;i++) {
... db.users.insert({username:'user'+i})
... }
> show collections
system.indexes
users
> 
1

Where hint can be used to specify the index to use.
So there are advantages to this approach. For example, in scenario 1, we don't pull out all the data, we just look up the most recent data, so it's more efficient.

The index type

Only 1 index

You can ensure that the specified key for each document in the collection has a unique value of 1.

db.users.ensureIndex({'username': 1, unique: true})
For example, using the mongoose framework, you can specify unique: true when defining schema.
If you insert two identical pieces of data called sheets 3, the second piece will fail. _id is the only index and cannot be deleted.

Sparse index

Sparse indexes can be created using sparse

> db.users.ensureIndex({'email': 1}, {'unique': true, 'sparse': true})

The index management

The ES187en. indexes collection contains the details of each index

db.system.indexes.find()

1.ensureIndex() Creates indexes

db.users.ensureIndex({'username': 1})
Create an index in the background so that the database can still process read and write requests while recreating the index, and specify the background option.

db.test.ensureIndex({"username":1},{"background":true})

getIndexes() view index


> for(var i=0;i<100000;i++) {
... db.users.insert({username:'user'+i})
... }
> show collections
system.indexes
users
> 
2

The v field is used internally only to identify the index version.

dropIndex delete index


> db.users.dropIndex("username_1")
{ "nIndexesWas" : 2, "ok" : 1 }

or

Copy them all into your notes > db.users.dropIndex({"username":1})


Related articles: