Replication of large files (single process and multi process) using mmap

  • 2020-06-19 11:15:47
  • OfStack

mmap is used to copy large files, for your reference, the specific content is as follows

A typical file replication process is:

1. Read the contents of the copied file (fread).
2. Write (fwrite) to the new file.

The process of file copying using mmap is as follows:

1. mmap mapping for copied files and new files.
2. Copy the contents of the memory mapped by the copied file to the memory mapped by the new file.

With the basics in mind, let's look at how to do it, and this article will only examine the method for large file replication using mmap

Specific approach

Here are some details to note when using mmap:

The file size must be greater than or equal to the size of the memory mapped area, so for new files created, you can use the file truncation function (ftruncate) to change the file size and the size of the copied file to 1, so that the mapped area of 1 and the copied file can be 1 as large as the memory mapped area.
I'll use single-process and multi-process replication for the mmap file below

Single process mmap file copy


#include<stdio.h>
#include <string.h>
#include <malloc.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
/*  The first rule 1 To the parameters of the command   : 
 * mycopy  Source file address   Destination file address 
 *
 *  Functions that may be needed: open\read\write\close
 *  Basic flow: Open source file, read content, write to another 1 File, close file 
 *  Core approach: Pass mmap( Fastest method of interprocess communication ) 
 *  How to get file size: 
 *
 */
unsigned long get_file_byte_num(const char * filename)
{
 FILE *fp = fopen(filename, "r");
 fseek(fp, 0, SEEK_END);
 return ftell(fp);
}
int main(int argc, char ** argv)
{
 //  The input parameters are first parsed to obtain the source and destination files 
 if (argc < 3)
 {
  perror(" Insufficient parameter input ");
 }
 int slen = strlen(argv[1]); //  This is the length of the source file name 
 int tlen = strlen(argv[2]); //  This is the length of the target file name 
 char *sname = (char *)malloc(sizeof(char)*(slen + 1));
 char *tname = (char*)malloc(sizeof(char)*(tlen + 1));
 strcpy(sname, argv[1]);
 strcpy(tname, argv[2]);
 //  Open the target file 
 //  Calculate the size of the source file in bytes 
 unsigned long byte_num = get_file_byte_num(sname);
 printf(" The length of the file is %ld byte \n", byte_num);
 //-------- To establish  mmap  Mapping area  --------------
 //  Gets the file descriptor for the copied file 
 int fd = open(sname, O_RDWR|O_CREAT, 0644);
 int tfd = open(tname, O_RDWR|O_CREAT, 0644);
 ftruncate(tfd, byte_num);

 char *mem =(char*) mmap(NULL, byte_num, PROT_WRITE|PROT_READ, MAP_SHARED, fd, 0);
 if (mem == MAP_FAILED)
  perror("mmap err");
 char *tmem =(char*) mmap(NULL, byte_num, PROT_WRITE|PROT_READ, MAP_SHARED, tfd, 0);
 if (tmem == MAP_FAILED)
  perror("mmap err");

 close(fd); //  Once the memory map is established, the file descriptor can be closed 
 close(tfd);

 memcpy(tmem, mem, byte_num);

 //  Recycle the child process and wait for the copy to finish 
 munmap(mem, byte_num);
 munmap(tmem, byte_num);
 free(sname);
 free(tname);

Multiprocess mmap copy file

The so-called multiprocess mmap copy file does nothing but distribute the copy to multiple processes, with the core idea remaining the same.


#include<stdio.h>
#include <string.h>
#include <malloc.h>
#include <unistd.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <signal.h>
#include <sys/wait.h>
/*  The first rule 1 To the parameters of the command   : 
* mycopy  Source file address   Destination file address 
*
*  Functions that may be needed: open\read\write\close
*  Basic flow: Open source file, read content, write to another 1 File, close file 
*  Core approach: Pass mmap( Fastest method of interprocess communication ) 
*  How to get file size: 
*
*/
//  Gets the number of bytes in the file 
unsigned long get_file_byte_num(const char * filename)
{
 FILE *fp = fopen(filename, "r");
 fseek(fp, 0, SEEK_END);
 return ftell(fp);
}

// sigchld Signal processing function 
void sigchld_handle(int a)
{
 pid_t pid;
 while ((pid = waitpid(0,NULL, WNOHANG)) > 0)
 {
  printf(" Recycling to 1 Child process %d\n", pid);
 }
}

int main(int argc, char ** argv)
{
 //  The input parameters are first parsed to obtain the source and destination files 
 if (argc < 3)
 {
  perror(" Insufficient parameter input ");
 }
 int slen = strlen(argv[1]); //  This is the length of the source file name 
 int tlen = strlen(argv[2]); //  This is the length of the target file name 
 char *sname = (char *)malloc(sizeof(char)*(slen + 1));
 char *tname = (char*)malloc(sizeof(char)*(tlen + 1));
 strcpy(sname, argv[1]);
 strcpy(tname, argv[2]);
 //  Open the target file 
 FILE * tfp = fopen(tname, "w"); //  Create if it doesn't exist 
 //  Calculate the size of the source file in bytes 
 unsigned long byte_num = get_file_byte_num(sname);
 printf(" The length of the file is %ld byte \n", byte_num);
 //-------- To establish  mmap  Mapping area  --------------
 //  Gets the file descriptor for the copied file 
 int fd = open(sname, O_RDWR|O_CREAT, 0644);
 int tfd = open(tname, O_RDWR|O_CREAT, 0644);
 ftruncate(tfd, byte_num); //  will tfd The size of the file pointed to was changed to byte_num

 char *mem =(char*) mmap(NULL, byte_num, PROT_WRITE|PROT_READ, MAP_SHARED, fd, 0);
 char * mem_tmp = mem; //  Back up the Shared memory entry address 
 if (mem == MAP_FAILED)
 perror("MAP_FAILED");
 char *tmem = (char*)mmap(NULL, byte_num, PROT_WRITE, MAP_SHARED,tfd, 0);
 char * tmem_tmp = tmem;
 if (tmem == MAP_FAILED)
 perror("mmap err");

 close(fd); //  Once the memory map is established, the file descriptor can be closed 
 close(tfd);

 //  This can be passed between parent and child processes mem The pointer communicates 
 //  Specifies that the number of processes is 5mZ const int num_proc = 5;
 //  Calculate the number of bytes to copy per process based on the number of processes, 
 //  The last 1 The process copies the number of bytes remaining. 
 const int num_proc = 5;
 const unsigned long each_proc_byte = byte_num/num_proc;
 //  It might not be divisible, so in the end 1 Multiple copies are required for each process 1 some 
 const unsigned long last_proc_byte = byte_num - each_proc_byte*(num_proc - 1); 

 //  shielding sigchld The system default handling of the signal 
 sigset_t set;
 sigemptyset(&set); //  Initialize the 1 Under the set
 sigaddset(&set, SIGCHLD);
 sigprocmask(SIG_BLOCK, &set, NULL); //  shielding 

 //  Loop creates child processes 
int i; 
pid_t pid;
for(i = 0; i < num_proc - 1; ++i)
{
 if ((pid = fork())==-1)
 perror("fork error");
 if (pid == 0)
 break;
}

 // ------- Specific copy process ---------
 if (i == num_proc - 1) //  The parent process 
 {
  //  Set up signal capture and unmasking 
  struct sigaction act;
  act.sa_handler = sigchld_handle;
  sigemptyset(&act.sa_mask); //  No other signals are shielded during processing 
  sigaction(SIGCHLD, &act, NULL); //  Start signal acquisition 
  sigprocmask(SIG_UNBLOCK, &set, NULL); //  remove SIGCHLD The shielding 

  memcpy(tmem_tmp + each_proc_byte*i, mem_tmp + each_proc_byte*i, last_proc_byte); 

 }else
 {
  memcpy(tmem + each_proc_byte*i, mem_tmp + each_proc_byte*i, each_proc_byte);
 }
 //
 //  Recycle the child process and wait for the copy to finish 
 //

  munmap(mem, byte_num);
  munmap(tmem, byte_num);
  free(sname);
  free(tname);
  return 0;
}

A few questions

1. When the main process does munmap will not affect the replication by other processes using mmap.

The user space between father and son processes follows the principle of read-time sharing and write-time replication. mmap is definitely the memory of user space, so I think the best way to deal with it is for each process to do munmap.

2. Notes of sleep() function:

sleep () makes the calling sleep until seconds or or or or signal ES801en not ignored.

That is to say, there are two more ways to end sleep: 1. Sleep time is up. 2. The process calling sleep receives a signal which is not shielded. After the signal processing is completed, sleep will be understood.

3, the child process may be slower than the main process end, and can not let the main process block waiting for the child process (through signal capture to reclaim the child process), in this case, how to reclaim the child process?

First, the main process is ready to signal the capture and recovery of the child process code. Second, recycling is optional, and if the main process exits first, the child process is handed over to the init process for recycling. So, you don't have to worry about recycling.


Related articles: