Mongodb actual full text search function

  • 2020-06-23 02:10:42
  • OfStack

preface

It is well known that in a traditional relational database, we usually structure the data by associating and aggregating 1 series of tables to query the results we want. In unstructured data, there is no such predefined structure, so how to quickly locate the results we need is not an easy task.

As an NoSQL database, Mongodb is ideal for storing and managing unstructured data, such as text data on the Internet. If we use Mongodb to store a lot of blog posts, how do we quickly find all the posts on the topic of "nodejs"? Mongodb's built-in full-text search can help with this. Without further ado, let's start with a detailed introduction:

In this post, I'll cover some of my experiences with Mongdb text search.

What is Mongodb text search?

Mongodb text search is the functional module of Mongodb to search the database, similar to the search engine built into the database. Some people may wonder, why do you need a search engine to look up a database? Direct query with the condition not to get. For example, in the previous article topic search, it was impossible to extract the topic of each article beforehand and store it in a special field, so there was no way to do conditional queries. In addition, there are many different expressions for the same subject. For example, "node" and "nodejs" can be regarded as the same subject.

Mongodb text search can automatically perform word segmentation, fuzzy matching and synonym matching on large text data to solve the problem of text search.

Building a text index

To enable Mongodb to perform full-text searches, you first need to create a text index of the fields being searched. The keyword for building a text index is text, and you can build a text index for a single field or a composite text index for multiple fields. It is important to note that each collection can only build 1 text index, and only the fields of the String or String array can be indexed by text.

We can create a text index with the following command:


db.collection.createIndex({ subject: "text", content: "text" })

In mongoose, we can create a text index with the following code:


schema.index({ subject: "text", content: "text" })

Note: since each collection supports only one text index, it often doesn't work when you need to add or remove text index fields from schema. At this point, you need to go to the database and manually delete the text index that has been created.

Text search example

The syntax for text search is:


{
 $text:
  {
   $search: <string>,
   $language: <string>,
   $caseSensitive: <boolean>,
   $diacriticSensitive: <boolean>
  }
}

In mongoose, we can do a text search with the following statement:


var query = model.find({ $text: { $search: "hello world" } })

The keywords after $search can have more than one, and the separator between the keywords can be multiple characters, such as Spaces, underscores, commas, plus signs, and so on, but not - and \", because the two symbols would have other USES. Searches for multiple keywords are or relationships unless your keywords contain -. For example, hello world will contain all text matching hello or world, while ES70en-ES71en will only match text containing hello and not world.

$language indicates the language type of the search, and searches for Chinese text have been added in the latest version of Mongodb 3.2 enterprise.

$caseSensitive sets case sensitivity.

$diacriticSensitive sets whether to distinguish pronunciation symbols, CAFÉ In Cafe, the meaning is the same as 1, but the accent is different.

We can also sort the search results by degree of match:


db.posts.find(
  { $text: { $search: "hello world" } },
  { score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } )

Matters needing attention

When Mongodb creates a text index, it indexes the keywords that extract all the text, thus causing a 1-point performance problem. Therefore, for structured fields, it is recommended to use normal relational queries, and only consider full-text search if you need to search large chunks of text.

conclusion


Related articles: