Introduction to the use of GridFS

  • 2020-05-17 06:51:47
  • OfStack

GridFS profile

GridFS is a built-in feature in MongoDB that can be used to store a large number of small files.

GridFS use

MongoDB provides a command-line tool, mongofiles, to handle GridFS,

List all documents:


mongofiles list

Upload 1 file:

mongofiles put xxx.txt

Download 1 file:

mongofiles get xxx.txt

Find files:

// All file names containing" xxx "The file
mongofiles search xxx
// Will look for all file names with the xxx "Is the prefix of the file
mongofiles list xxx

Parameter description:
The default is fs, Mongofiles list, and d testGridfs
-u, p, specify username and password
-h specifies the host
-port specifies the host port
-c specifies the collection name; the default is fs
-t specifies the MIME type of the file, which is ignored by default
GridFS implementation principle

GridFS stores files in the database by default using fs.chunks and fs.files.
Among them, fs.files sets file information and fs.chunks stores file data.

One record in the set of 1 fs.files is as follows, that is, the information of 1 file is as follows:


{
"_id" : ObjectId("4f4608844f9b855c6c35e298"),       // only 1id Can be a user-defined type
"filename" : "CPU.txt",      // The file name
"length" : 778,      // The length of the file
"chunkSize" : 262144,    //chunk The size of the
"uploadDate" : ISODate("2012-02-23T09:36:04.593Z"), // Upload time
"md5" : "e2c789b036cfb3b848ae39a24e795ca6",      // Of the file md5 value
"contentType" : "text/plain"     // Of the file MIME type
"meta" : null    // Other information of the file, default is no" meta "This key , the user can define it as any BSON object
}

The corresponding fs. chunk in chunks is as follows:


{
"_id" : ObjectId("4f4608844f9b855c6c35e299"),    //chunk the id
"files_id" : ObjectId("4f4608844f9b855c6c35e298"),  // Of the file id , corresponding to fs.files Is equal to fs.files The foreign key of the collection
"n" : 0,     // What is the number of documents chunk Block if the file is greater than chunksize It's going to be split into multiple chunk block
"data" : BinData(0,"QGV...")     // Of the file 2 Base data, which is omitted here
}

The default size of chunk is 256K.
Therefore, in the process of saving files into GridFS, if the files are larger than chunksize, the files will be divided into multiple chunk, and then these chunk will be saved into fs.chunks, and finally, the file information will be saved into fs.files.

When the file is read, a suitable record is found in fs.files according to the query conditions, and the value of "_id" is obtained. Then, according to this value, chunk with "files_id" as "_id" is searched in fs.chunks, and sorted by "n".

Matters needing attention

1.GridFS does not automatically process the same files of md5. For the same files of md5, if you want to store only one file in GridFS, you should handle it by yourself. The calculation of the Md5 value is done by the client.
2. Since GridFS first saved the file data to fs.chunks and then saved the file information to fs.files during the uploading process, if the uploading process fails, garbage data may appear in fs.chunks. This junk data can be cleaned up regularly.


Related articles: