Use a personal summary

  • 2020-06-03 08:42:20
  • OfStack

Recently 1 has been using mongodb, sometimes need to use statistics, on the Internet to look up some information, the most suitable for use is to use aggregate, the following 1 to introduce my experience in use.

MongoDB polymerization
In MongoDB, aggregation (aggregate) is mainly used to process data (such as statistical mean, sum, etc.) and return calculated data results. This is similar to count(*) in the sql statement.
aggregate () method
The aggregated method in MongoDB USES aggregate().
grammar
The basic syntax format for the aggregate() method is as follows:

db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)
The instance

The data in the collection is as follows:


{
  _id: ObjectId(7df78ad8902c)
  title: 'MongoDB Overview', 
  description: 'MongoDB is no sql database',
  by_user: 'ofstack.com',
  url: 'https://www.ofstack.com',
  tags: ['mongodb', 'database', 'NoSQL'],
  likes: 100
},
{
  _id: ObjectId(7df78ad8902d)
  title: 'NoSQL Overview', 
  description: 'No sql database is very fast',
  by_user: 'ofstack.com',
  url: 'https://www.ofstack.com',
  tags: ['mongodb', 'database', 'NoSQL'],
  likes: 10
},
{
  _id: ObjectId(7df78ad8902e)
  title: 'Neo4j Overview', 
  description: 'Neo4j is no sql database',
  by_user: 'Neo4j',
  url: 'http://www.neo4j.com',
  tags: ['neo4j', 'database', 'NoSQL'],
  likes: 750
},

Now we use the above set to calculate the number of articles written by each author, using aggregate() to calculate the following results:


> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
  "result" : [
   {
     "_id" : "w3cschool.cc",
     "num_tutorial" : 2
   },
   {
     "_id" : "Neo4j",
     "num_tutorial" : 1
   }
  ],
  "ok" : 1
}
>

The above examples are similar to sql statements: select by_user, count(*) from mycol by by_user
In the above example, we grouped the data with the field by_user field and calculated the sum of the same values for the by_user field.
The following table shows the expressions for 1 of the aggregations:

表达式 描述 实例
$sum 计算总和。 db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg 计算平均值 db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min 获取集合中所有文档对应值得最小值。 db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max 获取集合中所有文档对应值得最大值。 db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push 在结果文档中插入值到1个数组中。 db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet 在结果文档中插入值到1个数组中,但不创建副本。 db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first 根据资源文档的排序获取第1个文档数据。 db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last 根据资源文档的排序获取最后1个文档数据 db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

Concept of pipeline
Pipes are used in Unix and Linux 1 to take the output of the current command as an argument to the next command.
The aggregation pipeline for MongoDB passes the MongoDB document to the next pipeline after processing on one. Pipeline operations can be repeated.
Expression: Process input document and output. Expressions are stateless and can only be used to evaluate documents in the current aggregation pipeline, not to process other documents.
Here we introduce a few common operations in the aggregation framework:
$project: Modify the structure of the input document. It can be used to rename, add, or delete fields, or to create computed results and nested documents.
$match: Used to filter data and output only eligible documents. $match USES the standard query operations of MongoDB.
$limit: Used to limit the number of documents returned by the MongoDB aggregation pipeline.
$skip: Skips the specified number of documents in the aggregation pipeline and returns the remaining documents.
$unwind: Breaks an array-type field in the document into multiple rows, each containing a value from the array.
$group: Grouping documents in the collection for statistical results.
$sort: Sort the input document and output it.
$geoNear: Outputs ordered documents close to a geographic location.

Pipeline operator instance

1. Example of $project


db.article.aggregate(
  { $project : {
    title : 1 ,
    author : 1 ,
  }}
 );

This leaves only _id,tilte, and author3 fields in the result. By default the _id field is included.


db.article.aggregate(
  { $project : {
    _id : 0 ,
    title : 1 ,
    author : 1
  }});

2. $match instance


db.articles.aggregate( [
            { $match : { score : { $gt : 70, $lte : 90 } } },
            { $group: { _id: null, count: { $sum: 1 } } }
            ] );

$match is used to get records with scores greater than 70 and less than or equal to 90, and then send eligible records to the next phase of the $group pipeline operator for processing.

3. $skip instance


db.article.aggregate(
  { $skip : 5 });

After being processed by the $skip pipe operator, the first five documents are "filtered" out.

I can't describe more than what others have written. If you can find one more sample of N, I will write my summary.

Basic knowledge of

Please find more on your own. Here are the key documents.

Introduction to operation:

$project: Include, exclude, rename, and display fields
$match: Query requiring parameters similar to find()1
$limit: Limit the number of results
$skip: Ignores the number of results
$sort: Sort the results by the given field
$group: Combine the results with the given expression
$unwind: Split the embedded array into its own top-level file


Documentation: MongoDB official aggregate description.

Related use:

db.collection.aggregate([array]);

array is any one or more operators.
The usage of group and match, sqlserver, group is easy to understand. You can count the number of groups or the sum or average of the groups according to the specified column.
match before group is to query the source data, while match after group is to filter the data after group.

Same thing with sort, skip, limit;


 {_id:1,name:"a",status:1,num:1}
 {_id:2,name:"a",status:0,num:2}
 {_id:3,name:"b",status:1,num:3}
 {_id:4,name:"c",status:1,num:4}
 {_id:5,name:"d",status:1,num:5}

Here is an example:
Application 1: Count the amount and total amount of name;


db.collection.aggregate([
  {$group:{_id:"$name",count:{$sum:1},total:{$sum:"$num"}}
]);

Application 2: Count the number of name =1;


db.collection.aggregate([
  {$match:{status:1}},
  {$group:{_id:"$name",count:{$sum:1}}}
]);

Application 3: Count the number of name, and the number is less than 2;


db.collection.aggregate([
  {$group:{_id:"$name",count:{$sum:1}},
  {$match:{count:{$lt:2}}}
]);

Application 4: Count the number of name =1, and the number is 1;


> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
  "result" : [
   {
     "_id" : "w3cschool.cc",
     "num_tutorial" : 2
   },
   {
     "_id" : "Neo4j",
     "num_tutorial" : 1
   }
  ],
  "ok" : 1
}
>
0

Multi-column group, multi-column according to name and status


> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
  "result" : [
   {
     "_id" : "w3cschool.cc",
     "num_tutorial" : 2
   },
   {
     "_id" : "Neo4j",
     "num_tutorial" : 1
   }
  ],
  "ok" : 1
}
>
1

The $project operator is very simple,


> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
  "result" : [
   {
     "_id" : "w3cschool.cc",
     "num_tutorial" : 2
   },
   {
     "_id" : "Neo4j",
     "num_tutorial" : 1
   }
  ],
  "ok" : 1
}
>
2

The result is table data for only _id,name,status3 fields, equivalent to sql expressions select _id,name,status from collection

$unwind
This operator can split an array of documents into multiple documents, useful in special conditions, I have not done much research.

Above basically can realize most of the statistics, group pre-condition, group post-condition, is the focus.


Related articles: