Sync directories using rsync under linux

  • 2020-05-13 04:16:08
  • OfStack

This article describes the problem of one-way synchronization of two machine directories using rsync under linux. With rsync synchronization, you can maintain the 1 uniqueness of the directory (including the delete operation).

Data synchronization mode

1. Pull data from the host

Standby start up process

Sync command:


rsync -avzP --delete root@{remoteHost}:{remoteDir} {localDir}

Parameter description:

-a, which is equivalent to -rlptgoD (-r is recursive) -l is a link file, which means to copy a link file; -p means to keep the original permissions of the file; -t maintains the original file time; -g maintains the original user group of the file; -o maintains the original ownership of the document; -D equivalent to block device file); -z compression during transmission; -P transmission schedule; -information such as the progress of v transmission;

Example:


rsync -avzP --delete root@192.168.1.100:/tmp/rtest1 /tmp/

2. Push data to the standby machine

The process started on the host

Sync command:


rsync -avzP --delete {localDir} root@{remoteHost}:{remoteDir}

Example:


rsync -avzP --delete /tmp/rtest1 root@192.168.1.101:/tmp/

Automatic sync configuration

Describes how to configure synchronization without entering a password.

1. Use ssh key

This method can be synchronized directly using the rsync command, without the need to enter a password.

Generate ssh key on the host:


ssh-keygen -t rsa

Add pubkey to the standby


ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.1.101

Or manually add:

On the host, execute the following command to get pubkey:


cat ~/.ssh/id_rsa.pub

Add key content on standby:


vi ~/.ssh/authorized_keys

Enter your password automatically using pexpect

The sample code is as follows:


#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pexpect
import time
import traceback

def doRsync(user,passwd,ip,srcDir,dstDir,timeout=3600):
  cmd = "rsync -azPq --delete {srcDir} {rUser}@{rHost}:{dstDir}".format(
    rUser = user,rHost=ip,srcDir=srcDir,dstDir=dstDir
  )
  try:
    ssh = pexpect.spawn(cmd,timeout=timeout)
    print cmd
    i = ssh.expect(['password:', 'continue connecting (yes/no)?'], timeout=5)
    if i == 0 :
      ssh.sendline(passwd)
    elif i == 1:
      ssh.sendline('yes')
      ssh.expect('password: ')
      ssh.sendline(passwd)
    ssh.read()
    ssh.close()
  except :
    #print traceback.format_exc()
    pass

if __name__ == '__main__':
  doRsync("root","123456","192.168.1.101","/tmp/rtest1","/tmp")

The code above is implemented using python, which can be implemented in other languages depending on the situation.

other

1. What happens if rsync is dropped by kill in the process of execution;

It is safe to kill an rsync process and run the whole thing again; it will continue where it left off. It may be a little inefficient, particularly if you haven't passed --partial (included in -P), because rsync will check all files again and process the file it was interrupted on from scratch.

rsync is safe to be dropped by kill and will work properly the next time it is started.

2. rsync cannot specify a time period;

1) this problem can be solved by kill

2) or use the timeout parameter of pexpect for control

3) you can filter out the name of the folder through find search, and then use rsync to synchronize. This can be done according to the characteristics of the existing business, such as:

find /tmp -name '*' -newermt '2016-03-08' ! -newermt '2016-03-20'

3. rsync synchronizes during file writing (e.g., rsync during recording)

After testing, rsync will synchronize part of the file contents, and rsync will keep the 1 to 1 of the file after executing rsync after the file is written

4. When the number of files reaches more than one million, it is time-consuming to scan the changed files when rsync is synchronized


Related articles: