How to use JOIN operation details in MongoDB

  • 2020-06-23 02:12:31
  • OfStack

preface

MongoDB is one written by C + + language document-oriented non-relational database (implementation) is one kind of NoSql database, also between relational database and relational database data storage products, and it is well known SQL unlike NoSQL maximum 1 is not supported JOIN, in the traditional database, SQL JOIN clause allows you to use common fields, in the combination of two or more tables in the table data in each row. For example, if you have tables books and publishers, you can write commands like this:


SELECT book.title, publisher.name
FROM book
LEFT JOIN book.publisher_id ON publisher.id;

In other words, the publisher_id field in the book table refers to the id dictionary in the publishers table. These are common examples: there are thousands of books available for each publisher, and if you want to update publisher's information, we only need to change one record. The data redundancy is minimal because we don't need to update his publisher information repeatedly for each book, and this technique is basically a normalized thing. The SQL database provides 1 column of specifications and constraints to ensure data relevance.

--------------------------------------------------------------------------------

NoSQL == No JOIN?

That's not always the case...

--------------------------------------------------------------------------------

Document-oriented databases, such as MongoDB, are designed to store unstructured data that is ideally unrelated to each other in a data set; if a piece of data contains two or more times, the data is duplicated. Since most of the time we still need data association, and only rarely we don't, NoSQL's features seem to be disappointing. Fortunately, MongoDB 3.2 introduces a new $lookup operation that can provide an operation similar to LEFT OUTER JOIN under two or more conditions.

--------------------------------------------------------------------------------

MongoDB Aggregation

$lookup is only allowed in aggregation operations, think of it as a pipe operation: query, filter, combine results. The output of one operation is used as input to the next. Aggregation is more difficult to understand than simple query operations, and these operations are usually slow, yet they are efficient. Aggregation can be explained with a good example. Suppose we use the user data set to create a social platform, storing each user's information in a separate document, for example:


{
 "_id": ObjectID("45b83bda421238c76f5c1969"),
 "name": "User One",
 "email: "userone@email.com",
 "country": "UK",
 "dob": ISODate("1999-09-13T00:00:00.000Z")
}

We can add enough users to the user collection, but each MongoDB document must have a value for the _id field, which is like the key in SQL and is automatically added to the document if we don't specify _id explicitly. Our social network now needs an post collection, which in combination stores user comments, a document that stores plain text, times, ratings, and a player reference written to the user_id field.


{
 "_id": ObjectID("17c9812acff9ac0bba018cc1"),
 "user_id": ObjectID("45b83bda421238c76f5c1969"),
 "date: ISODate("2016-09-05T03:05:00.123Z"),
 "text": "My life story so far",
 "rating": "important"
}

We now want to display 210 recent important comments from all users, sorted by time. Each returned document should contain the text of the comment, when the comment was posted, and the name and country of the relevant user.

The aggregate query for the MongoDB database is an array of pass pipe operations in which each operation is sequentially specified. First, we need to extract all the documents from all the post collections, which are filtered using $match with an accurate memory.


{ "$match": { "rating": "important" } }

We now need to sort the filtered documents by time, using the $sort operation.


{ "$sort": { "date": -1 } }

Since we are returning only 210 pieces of data, we can use $limit to limit the number of documents we need to process.


{ "$limit": 20 }

We now use the $lookup operation to connect data from the user collection, which requires an object of 4 parameters:

localField: Find the field in the input document

from: A collection that needs to be connected

3. foreignField: The field that needs to be looked up in the from collection

4, as: Output field name

So here's what we do:


{ "$lookup": {
 "localField": "user_id",
 "from": "user",
 "foreignField": "_id",
 "as": "userinfo"
} }

In our output, we will create a new field called userinfo, which is an array where each element is a matching element in the user collection.


"userinfo": [
 { "name": "User One", ... }
]

Between ES117en.user_id and ES120en._id, we have a one-to-one relationship because there is only one user for each post. So our userinfo array will contain only one element, and we can say use the $unwind operation to deconstruct it and insert it into a self document.


{ "$unwind": "$userinfo" }

The output will now be converted to a more common structure:


"userinfo": {
 "name": "User One",
 "email: "userone@email.com",
  ... 
}

Finally, we can use the $project operation in the pipeline to return the comment information, the time of the comment, the user name of the comment, the country, and so on.


{
 "_id": ObjectID("45b83bda421238c76f5c1969"),
 "name": "User One",
 "email: "userone@email.com",
 "country": "UK",
 "dob": ISODate("1999-09-13T00:00:00.000Z")
}
0

Merge all the above operations

Our final aggregate query matched comments, sorted by order, limited the latest 210 pieces of information, connected user data, flat user array, and finally returned only the necessary data we needed, the total command is as follows:


{
 "_id": ObjectID("45b83bda421238c76f5c1969"),
 "name": "User One",
 "email: "userone@email.com",
 "country": "UK",
 "dob": ISODate("1999-09-13T00:00:00.000Z")
}
1

The result is a collection of 210 documents, such as:


{
 "_id": ObjectID("45b83bda421238c76f5c1969"),
 "name": "User One",
 "email: "userone@email.com",
 "country": "UK",
 "dob": ISODate("1999-09-13T00:00:00.000Z")
}
2

MongoDB's $lookup is nice and efficient, but the above basic example is just a combined collection query. It is not a substitute for the more efficient JOIN clause in SQL. MongoDB also provides a restriction that if the user collection is deleted, the post document remains.

Ideally, this $lookup operation should not be used very often. If you need to use it often, you are using the wrong data store (database) : if you have associated data, you should use the associated database (SQL).

That is, $lookup is a new addition to MongoDB 3.2, which addresses some of the frustrations of using 1 small associated data query in the Nosql database.

conclusion


Related articles: