Implementation of PyMongo Query Data

  • 2021-11-13 08:27:46
  • OfStack

Directory query data
Set query criteria
More Query Operations PS: pymongo Maximum Query Limit Solution

Query data

All the data stored in mongodb is to be taken out when it needs to be read.
However, in addition to reading by a certain 1 column, such as score: sorting reading; There will also be a condition screening that I only look at for a certain period of time and a certain class; There will also be various operations that I want to see the average score of each class, aggregate and seek average... and so on
These operations can be done through find_one (), find ():


ret2find = collect.find_one()
# {'_id': ObjectId('5ea780bf747e3e128470e485'), 'class_name': ' Gao 3 ( 1 ) class ', 'student_name': ' Zhang 3', 'subject': ' English ', 'score': 100, 'date': '20200301'}

ret2find = collect.find()
# <pymongo.cursor.Cursor object at 0x0000024BBEBE15C8>

As can be seen from the above results, the find_one () query yields a single 1 dictionary; find () is a generator object that can be fetched through for val in ret2find: traversal

Set query criteria

However, it is not enough to take out all the data. Query 1 usually brings conditions, even complicated conditions-for example, what should I do if I find subjects with grades greater than 90 in Class 3 (1), Zhang 3 or Li 4?


ret2find = collect.find({"class_name":" Gao 3 ( 1 ) class ","score":{"$gt":90},"$or":[{"student_name":" Zhang 3"},{"student_name":" Li 4"}]})

for val in ret2find:
    print(val)

There are two main points above:

{"class_name": "Senior 3 (1) Class", "score": {"$gt": 90}}

This paragraph is written to mean "Senior 3 (1) Class and Score > 90 ";
While the $gt comparison operator, the table is larger than the meaning, in addition to the $gt operator, there are:

符号 含义
$lt 小于
$lte 小于等于
$gt 大于
$gte 大于等于
$ne 不等于
$in 在范围内
$nin 不在范围内

{"$or": [{"student_name": "Zhang 3"}, {"student_name": "Li 4"}]}

This paragraph means "the student's name is Zhang 3 or Li 4"
The $or logical operator is used to represent the relationship between conditions. Logical operators other than $or include:

符号 含义
$and 按条件取 交集
$not 单个条件的 相反集合
$nor 多个条件的 相反集合
$or 多个条件的 并集

More query operations

In addition to the above general operations, we will also use:

符号 含义 示例 示例含义
$regex 正则匹配 {"student_name":{"regex":".∗3"}} 学生名以 “3” 结尾
$expr 允许查询中使用 聚合表达式 {"expr":{"gt":["spent","budget"]}} 查询 花费 大于 预算 的超支记录
$exists 属性是否存在 {"date":{"$exists": True}} date属性存在
$exists 属性是否存在 {"date":{"$exists": True}} date属性存在
$type 类型判断 {"score":{"$type":"int"}} score的类型为int
$mod 取模操作 {'score': {'$mod': [5, 0]}} 分数取5、0的模

More query operators can be clicked to view official documents

PS: pymongo Maximum Query Limit

When traversing mongo data with pyhton, the limit query will block at 101 rows, as follows


    lista_a = []
    for info in db.get_collection("dbs").find():
        lista_a.append(info)
        print("info nums=",len(info))

''' The results show that '''
'''info nums=101'''

Analysis reason: The find () method of mongodb returns the cursor cursor, which may have a limit threshold of 101, refer to the documentation as follows

Original:

The MongoDB server returns the query results in batches. The amount of data in the batch will not exceed the maximum BSON document size. To override the default size of the batch, see batchSize() and limit().

New in version 3.4: Operations of type find(), aggregate(), listIndexes, and listCollections return a maximum of 16 megabytes per batch. batchSize() can enforce a smaller limit, but not a larger one.

find() and aggregate() operations have an initial batch size of 101 documents by default. Subsequent getMore operations issued against the resulting cursor have no default batch size, so they are limited only by the 16 megabyte message size.

For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.

Translation:

The MongoDB server returns query results in batches. The amount of data in the batch does not exceed the maximum BSON document size. To override the default batch size, see batchSize () and limit ().
New version 3.4: Operations of types find (), aggregate (), listIndexes, and listCollections return up to 16 megabytes per batch. batchSize () can enforce smaller restrictions, but not larger ones.
The initial batch size for find () and aggregate () operations defaults to 101 documents. Subsequent getMore operations issued against the generated cursor do not have a default batch size, so they are limited only by the 16mb message size. For a query that contains a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.

Solutions


    lista_a = []
    for info in db.get_collection("dbs").find().batch_size1(5000): # Modify the maximum limit threshold 
        lista_a.append(info)
        print("info nums=",len(info))

But this method is that the cursor returns 5000 pieces of data at a time, and it loops through. What should I say if I look up words 50,000 times? As follows


   lista_a = []
   cousor=db.get_collection("dbs").find().batch_size1(5000)
    for i in range(50000): # Modify the maximum limit threshold 
        lista_a.append(next(cousor))

Related articles: