Sync directories using rsync under linux
- 2020-05-13 04:16:08
- OfStack
This article describes the problem of one-way synchronization of two machine directories using rsync under linux. With rsync synchronization, you can maintain the 1 uniqueness of the directory (including the delete operation).
Data synchronization mode
1. Pull data from the host
Standby start up process
Sync command:
rsync -avzP --delete root@{remoteHost}:{remoteDir} {localDir}
Parameter description:
-a, which is equivalent to -rlptgoD (-r is recursive) -l is a link file, which means to copy a link file; -p means to keep the original permissions of the file; -t maintains the original file time; -g maintains the original user group of the file; -o maintains the original ownership of the document; -D equivalent to block device file); -z compression during transmission; -P transmission schedule; -information such as the progress of v transmission;Example:
rsync -avzP --delete root@192.168.1.100:/tmp/rtest1 /tmp/
2. Push data to the standby machine
The process started on the host
Sync command:
rsync -avzP --delete {localDir} root@{remoteHost}:{remoteDir}
Example:
rsync -avzP --delete /tmp/rtest1 root@192.168.1.101:/tmp/
Automatic sync configuration
Describes how to configure synchronization without entering a password.
1. Use ssh key
This method can be synchronized directly using the rsync command, without the need to enter a password.
Generate ssh key on the host:
ssh-keygen -t rsa
Add pubkey to the standby
ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.1.101
Or manually add:
On the host, execute the following command to get pubkey:
cat ~/.ssh/id_rsa.pub
Add key content on standby:
vi ~/.ssh/authorized_keys
Enter your password automatically using pexpect
The sample code is as follows:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pexpect
import time
import traceback
def doRsync(user,passwd,ip,srcDir,dstDir,timeout=3600):
cmd = "rsync -azPq --delete {srcDir} {rUser}@{rHost}:{dstDir}".format(
rUser = user,rHost=ip,srcDir=srcDir,dstDir=dstDir
)
try:
ssh = pexpect.spawn(cmd,timeout=timeout)
print cmd
i = ssh.expect(['password:', 'continue connecting (yes/no)?'], timeout=5)
if i == 0 :
ssh.sendline(passwd)
elif i == 1:
ssh.sendline('yes')
ssh.expect('password: ')
ssh.sendline(passwd)
ssh.read()
ssh.close()
except :
#print traceback.format_exc()
pass
if __name__ == '__main__':
doRsync("root","123456","192.168.1.101","/tmp/rtest1","/tmp")
The code above is implemented using python, which can be implemented in other languages depending on the situation.
other
1. What happens if rsync is dropped by kill in the process of execution;
It is safe to kill an rsync process and run the whole thing again; it will continue where it left off. It may be a little inefficient, particularly if you haven't passed --partial (included in -P), because rsync will check all files again and process the file it was interrupted on from scratch.
rsync is safe to be dropped by kill and will work properly the next time it is started.
2. rsync cannot specify a time period;
1) this problem can be solved by kill
2) or use the timeout parameter of pexpect for control
3) you can filter out the name of the folder through find search, and then use rsync to synchronize. This can be done according to the characteristics of the existing business, such as:
find /tmp -name '*' -newermt '2016-03-08' ! -newermt '2016-03-20'
3. rsync synchronizes during file writing (e.g., rsync during recording)
After testing, rsync will synchronize part of the file contents, and rsync will keep the 1 to 1 of the file after executing rsync after the file is written
4. When the number of files reaches more than one million, it is time-consuming to scan the changed files when rsync is synchronized