rsync's solution to using a lot of memory when backing up massive files

  • 2020-05-15 03:04:05
  • OfStack

Most of the linux distributions come with rsync, but the version is lower, with 1 generally being 2.6.X
In the 2.X version, rsync backups are list-based and then backed up (add or remove), which can consume a lot of memory when dealing with a large number of files.
When backing up, each file scanned by rsync (also one directory) takes about 100 bytes of memory in its list, and even more if the --delete parameter is added.
For example, there are about 8 million pictures on one server here, and they are updated frequently. The number of files increases rapidly, about 100,000 pieces are added every day. At the time of backup, rsync occupied approximately 2G's memory, a large amount of memory, resulting in a lack of physical memory on the server, which led to the use of swap, and then higher iowait(swap memory), resulting in a slower rsync list, and affected the business on the server.
In this case, before rsync 3.X, the common advice was to break the backup operation into smaller ones. For example, the original 10 image directory 1 backup, now split into 10 backup operations, only one at a time backup. In addition, it is recommended to reduce the depth of the directory so that the number of directories can be reduced and the memory footprint of rsync can be reduced. There is also software called digisync, which is designed to back up the number of files in the G class.

rsync 3.X USES incremental file list, which is now backed up (added or removed) on one side of the list, compared to the original 2.X. This undoubtedly saves a lot of time for the backup operation of a large number of files.
It was found that the amount of memory used by rsync 3.0.4 backup is approximately 4M, which is similar to the amount of memory used by a single apache process.

rsync homepage in http: / / samba anu. edu. au rsync/the latest stable version of rsync - 3.1.1


cd /usr/src/
wget http://samba.anu.edu.au/ftp/rsync/src/rsync-3.1.1.tar.gz
tar xzvf rsync-3.1.1.tar.gz
cd rsync-3.1.1
./configure --prefix=/usr
make
make install

Then run rsync --version to see the version number

It is important to note that both the source and destination hosts must be upgraded to rsync 3.X to use the new features of rsync 3.X.

It is worth mentioning that since the release of version 2.6.9 on November 6, 2006, 3.0 was not released until March 2008, so for a long time, people had to find various ways to deal with the large number of file backup operations


Related articles: