Explanation of the difference between store field and non store field in Elasticsearch

  • 2021-10-25 06:52:06
  • OfStack

Differences between store field and non-store field in Elasticsearch

When defining mapping for index, we can specify whether some fields want store (default is no store)

So what's the difference between them?


PUT /my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type": "string",
          "store": true 
        },
        "date": {
          "type": "date",
          "store": true 
        },
        "content": {
          "type": "string"
        }
      }
    }
  }
}

In fact, whether you set store to ture, or, false, elasticsearch will store these field for us, the difference is:

When store is false (default configuration), these field are stored only in "_ source" field. When store is true, the value of these field is stored in a separate field at the level of _ source. It is also stored in _ source, so there are two copies.

So under what circumstances do you need to set store field? 1 There are two general situations:

_ source field disable in index mapping. In this case, if an field is not defined as store=true, those will not see the field in the returned query results. The content of _ source is very large. At this time, if we want to explain the value of an field in the returned _ source document, the overhead will be very high (of course, you can also define source filtering to reduce network overhead), which is compared with one book stored in an document, so there may be these field in document: title, date and content. If we only want to query the information of title and date in the book without explaining the whole _ source (very large), we can consider setting title and date to store=true at this time. It should be noted that it seems that field store can reduce the query overhead, but it will also increase the access frequency of disk. If you define store for all 10 field in _ source, there will be 10 disk seek operations when you query these field. Returning _ source has only one disk seek operation. So this is what we need blance when we define it.

store attribute of elasticsearch and _ source field

As we all know, the _ source field stores the original contents of the index, so what is the setting of the store property? Why should es set the default value of store to no? Is setting to yes duplicate storage?

We write the value of an field to an es, either to perform an search operation on this field (the specific id is not known) or to perform an retrieve operation (retrieved by id). However, if you do not explicitly set the store property of the field to yes and the _ source field enabled, you can still get the value of the field. This means that it still makes sense for an field not to be index or store in some cases.

When you set the store property of an field to true, this is handled at the lucene level. lucene is an inverted index that performs a quick full-text search and returns a list of documents id that meet the search criteria. In addition to full-text indexing, lucene also provides the feature of storing the values of fields to support queries that provide id (from id to get the original information). Normally, the value of field that we store at the lucene level is the value of id+field that follows search request 1. es does not need to store every field value you want to return, because by default, the complete information of every 1 document has been stored, so you can follow the query structure to return all field values you want.

There are a few situations where explicitly storing certain field values is necessary: when _ source is disabled, or you do not want to get the value of field from parser in source (even though this process is automatic). Keep in mind that getting a value from every stored field requires one disk io, multiple disk io if you want to get values from multiple field, but only one disk io if you get values from _ source, because _ source is just one field. Therefore, in most cases, getting from _ source is fast and efficient.

The default setting _ source in es is enable and stores values for the entire document. This means that the information of the whole document can be returned when performing search operation. If you don't want to return the full information of this document, you can also specify the required field. es will automatically extract the value of the specified field from _ source and return it (for example, the requirement of highlighting).

You can specify one field store as true, which means that the data of this field will be stored separately. At this time, if you ask to return field1 (store: yes), es will recognize that field1 has been stored, so it will not be loaded from _ source, but from the memory block of field1.

Under what circumstances do you need to explicitly specify the store attribute? Most cases are not necessary. Getting values from _ source is fast and efficient. If your document is very long and it is expensive to store _ source or get field from _ source, you can explicitly set the store property of some field to yes. The drawbacks are as above: Suppose you store 10 field, and if you want to get the values of these 10 field, you need to get io several times, if you get it from _ source, you only need to get it once, and _ source is compressed.

There is another case: reindex from some field, when reindexing some fields. Reading data from source and then reindex is obviously less expensive than reading data from some field. These fields store are appropriately set to yes.

Summary:

If an field is indexed, it can be queried. If store: yes, the value of the field can be displayed.

But if you store this doc data (_ source enable), even if store is no, you can still get the value of field (client to parse).

Therefore, one store is set to field of no. If _ source is disable, it can only be retrieved and cannot be displayed.


Related articles: