MongoDB frees up free space in several common ways

  • 2020-10-07 18:55:41
  • OfStack

preface

When we delete a document or collection from MongoDB, MongoDB does not release the disk space that has been taken up. It maintains the data files that have taken up disk space, even though there may be a list of empty records, large and small, in the data files (empty record list). When the client program inserts the document again, MongoDB allocates storage space from the empty record list to the new document. In order to use disk space more efficiently, we need to defragment mongodb's data files and reclaim unused space. There are no more than two kinds of thoughts:

1. Reorganize the original data

2. Only copy the data to form a complete backup of the data

The following are some common implementation methods:

1, compact

2, db. repairDatabase ()

3. secondary node resynchronization

4, db. copyDatabase ()

1. compat

This command is defined on the official website as rewriting and defragging all data and indexes in a collection.

Method of use


use yourdatabase;
db.runCommand({ compact : 'yourCollection' });

Matters needing attention

1. Make sure you have a relatively new backup before executing the command

2. compact on MongoDB using MMAPv1 storage engine requires at least 2G free space in the partition where the data file is located

3. On MongoDB using THE WiredTiger storage engine, the compact command overwrites the collection and index and frees up unused space, but on MongoDB using the MMAPv1 storage engine, the command only defragment the data files of the collection and recreate its index. No space will be freed up and the space will be reclaimed on MongoDB using MMAPv1 storage engine. It is recommended to use the third method "secondary node resynchronization".

4. Capped Collections in MongoDB using MMAPv1 storage engine cannot be compressed, but MongoDB using WiredTiger storage engine will be compressed when performing compact.

5. When running the command on the replica set, execute it on each node separately

6. This command can only be executed on the instance of mongod and cannot be run on the instance of mongos. That is, the compact operation for the sharding cluster is performed separately on each sharding node.

7. 1 Generally, this command runs on the secondary node. When executed, the node will be forced to enter the RECOVERING state, and the instance read and write operation of RECOVERING state will be blocked

8, again encounter special circumstances to stop running the command, you can query the process information through db.currentOp (), and then through db.killOp () to kill the process

compact may increase the total size and number of data files, especially on the first run. But this does not increase the disk space used by the total collection, because the storage size is the amount of data allocated in the database file, not the size/number of files on the file system

10. Capped Collections in MongoDB using MMAPv1 storage engine cannot be compressed, but MongoDB using WiredTiger storage engine will be compressed when executing compact.

2. db.repairDatabase()

Rebuild databases and indexes by losing invalid or corrupted data. Similar to the file system repair command fsck. So this command is mainly used to repair data.

Method of use


use yourdatabase;
db.repairDatabase();

Matters needing attention

1. db. repairDatabase() is mainly used to repair data. If you have a full copy of the data and have access to it, use the third method, secondary Node Resynchronize

2. Make sure you have a relatively new backup before executing the command

3, this command will completely block the database read and write, careful operation

4. Execution of this command requires free space equal to the sum of all data file sizes plus 2G where the data files are located

5. Execute this command on secondary node using MMAPv1 storage engine to compress the collection data

6. The MongoDB library using WiredTiger storage engine has no compression effect

7. To stop the command under special circumstances, you can query the process information through ES152en.currentOp (), and then kill the process through ES154en.killOp ()

8. It's time consuming

3. secondary node resynchronization

The main idea is to delete the specified data from the secondary node and restart data synchronization with primary. Resynchronization can also be used when replica set member data is too stale. Unlike directly copying data files, MongoDB only synchronizes data, so there is no empty collection of data files after the resynchronization, thus achieving the recovery of disk space.

Method of use

You must first ensure that the data is backed up completely.

1. In the case of primary node, first force it to become secondary node; otherwise, skip this step:


 rs.stepdown(120) ; 

2. Then delete secondary node on primary:


 rs.remove("IP:port");

3. Delete all files under secondary node dbpath.

4. Rejoin the nodes into the cluster, and then make it automatically synchronize the data:


 rs.add("IP:port");

5. After data synchronization is completed, steps 1-4 of the loop can free up disk space for all nodes in the cluster

For some special cases, if secondary node cannot be offline, a new node can be added to the replica set, and then secondary will automatically start data synchronization.

In general, the resynchronization method is better. The first method does not block the read-write of the replica set, and the second method consumes less time than the first two methods

4. db.copyDatabase()

mongodb also supports online copying of data: ES209en.copyDatabase ("from","to","IP:port"), which also frees up space because ES215en.copyDatabase copies the data rather than representing the data file on disk. However, this command has been deprecated since version 4.0; 3.x can still be used

Such as:


 db.copyDatabase("sourceDB","DistDB");

Put the source library sourceDB. Copy to DistDB.

Of course, this command supports remote replication.

The full syntax for this command is:

[

db.copyDatabase( < Source database name > , < Target database name > , < IP of source mongodb: port > , < The account required for the source database connection > , < password > , < mechanism > )

]

Above: The command must be executed on the target database server. If the source database and target data inventory are located on 1 MongoDB server, < IP: port for source mongodb > , < The account required for the source database connection > , < password > Both can be omitted. < mechanism > Is an authentication type, optional.

Matters needing attention

db. copyDatabase() does not block reads and writes to the source database and the target database, so two copies of data may not be identical

Replication of index data locks the database, and this operation also affects other databases

3. db.copyDatabase() should not be used in mongos instances

4. db. copyDatabase() should not be used to copy databases containing sharded collections

5. Change in version 4.0: ES303en. copyDatabase() only supports SCRAM for authentication. < mechanism > Options.

6, some different between the MongoDB version does not support this kind of copy method, as shown in the link: https: / / docs mongodb. com/manual/reference/method/db copyDatabase /

In addition, there are 1 other methods, such as the import/export method (mongodump/mongorestore), which is not applicable in the case of very large data volume, because the import/export method USES a full volume form, so that there is enough free space for the imported data.

conclusion


Related articles: