MongoDB tutorial series (8) : GridFS storage details
- 2020-05-27 07:27:15
mongoDB documents are stored in BSON format, which supports binary data types, when we save data in binary format directly into mongoDB documents. However, when files are too large, such as pictures and videos, the length of each document is limited, so mongoDb provides a specification for handling large files --GridFS.
GridFS implementation principle
In the GridFS database, fs.chunks and fs.files are used by default to store files, among which fs.files is used to store information of files, fs.chunks is used to store data of files, and one record in the fs.files is as follows, that is, one file information is as follows:
"_id" : ObjectId("4f4608844f9b855c6c35e298"), // only 1id Can be a user-defined type
"filename" : "CPU.txt", // The file name
"length" : 778, // The length of the file
"chunkSize" : 262144, //chunk The size of the
"uploadDate" : ISODate("2012-02-23T09:36:04.593Z"), // Upload time
"md5" : "e2c789b036cfb3b848ae39a24e795ca6", // Of the file md5 value
"contentType" : "text/plain" // Of the file MIME type
"meta" : null // Other information of the file, default is no" meta "This key , the user can define it as any BSON object
The corresponding chunk (Chinese meaning data block) in fs.chunks is shown as follows:
"_id" : ObjectId("4f4608844f9b855c6c35e299"), //chunk the id
"files_id" : ObjectId("4f4608844f9b855c6c35e298"), // Of the file id , corresponding to fs.files Is equal to fs.files The foreign key of the collection
"n" : 0, // What is the number of documents chunk Block if the file is greater than chunksize It's going to be split into multiple chunk block
"data" : BinData(0,"QGV...") // Of the file 2 Base data, which is omitted here
The default size is 256k, so save the files into the GridFS process. If the files are larger than chunksize, split the files into multiple chunk, then save the chunk in fs.chunks, and finally save the file information into fs.files.
When reading the file, find a suitable record in fs.files according to the query conditions, and get the value of "_id". Then, according to this value, find all the chunk with files_id as _id in fs.funks, and sort according to "n".
1. GridFS does not automatically process the same files of md5. For the same files of md5, if there is only one storage in GridFS, the user needs to process, and the calculation of the value of md5 is completed by the client.
2. Since GridFS first saved the file data to fs.chunks and then saved the file information to fs.files during the uploading process, if the uploading process fails, garbage data may appear in fs.chunks, which can be cleaned regularly.