php's solution for concurrent read and write file conflicts

  • 2020-09-28 08:50:13
  • OfStack

For applications where daily IP is not high or concurrency is not very high, don't worry about 1 in general! There is no problem with using a file like method 1. However, if the concurrency is high, when we perform read and write operations on the file, it is very likely that multiple processes perform operations on the 1 file. If the access to the file is not monopolized at this time, it is easy to cause data loss.
For example, in an online chat room (assuming the chat content is written to a file), at the same time, both user A and user B are operating to save the file. First, A opens the file and then updates the data inside, but here B also happens to open the same file and is also ready to update the data inside. When A saves the written file, B has already opened the file. When B saves the file back, the data is lost because the B user has no idea that the file it opened was changed by the A user, so when the B user saves the changes, the A user's update is lost.
For such problems, solution of 1 when operating process of 1 file, for other lock in the first place, means that there is only the process shall have the right to read files, and other processes if read now, is no problem, but if a process is trying to want to upgrade it at this moment, will be refused to operation, had to lock the file process at this time if the file update is complete, the release of the exclusive identifier, the files back to the state can change. Similarly, if the process is operating on a file without a lock, then it can safely lock the file and enjoy itself.
A one-of-a-kind solution would be:


$fp=fopen('/tmp/lock.txt','w+');
if (flock($fp,LOCK_EX)){
    fwrite($fp,"Write something here\n");
    flock($fp,LOCK_UN);
}else{
    echo 'Couldn\'t lock the file !';
}
fclose($fp);

But in PHP, flock doesn't seem to work that well! In multiple concurrent cases, it often seems that resources are monopolized, not released immediately, or not released at all, resulting in deadlocks that can cause the server's cpu usage to be high and sometimes even kill the server completely. This seems to be the case in many linux/unix systems. Therefore, 1 should be considered carefully before using flock.
So there's no solution? Not really. If we use flock() properly, it is entirely possible to solve the deadlock problem. Of course, there would be a good solution to our problem if we didn't consider using the flock() function. After my personal collection and summary, I have summarized the following solutions.
Scenario 1: When locking the file, set a timeout. The general implementation is as follows:

if($fp=fopen($fileName,'a')){
 $startTime=microtime();
 do{
  $canWrite=flock($fp,LOCK_EX);
  if(!$canWrite){
   usleep(round(rand(0,100)*1000));
  }
 }while((!$canWrite)&&((microtime()-$startTime)<1000));
 if($canWrite){
  fwrite($fp,$dataToSave);
 }
 fclose($fp);
}

The timeout is set to 1ms, and if you don't get the lock within that time, you get it again and again, until you have the right to operate on the file, of course. If the timeout limit is reached, you must exit immediately, freeing the lock for other processes to operate.

Scenario 2: Instead of using the flock function, borrow temporary files to resolve read/write conflicts. The general principle is as follows:
(1) Consider a copy of the file that needs to be updated to our temporary file directory, save the last modification time of the file to a variable, and take a random file name that is not easy to repeat for the temporary file.
(2) When the temporary file is updated, the last update time of the original file and the previous saved time are tested to see whether 1 is sent.
(3) If the time of the last modification is 1, the temporary file modified will be renamed to the original file. In order to ensure synchronous update of file status, the state of the file under 1 needs to be cleared.
(4) However, if the last modification time and the previous saved 1 message indicate that the original file has been modified during this period, then the temporary file needs to be deleted and returned to false, indicating that other processes are operating on the file at this time.
The implementation code is as follows:


$dir_fileopen='tmp';
function randomid(){
    return time().substr(md5(microtime()),0,rand(5,12));
}
function cfopen($filename,$mode){
    global $dir_fileopen;
    clearstatcache();
    do{
  $id=md5(randomid(rand(),TRUE));
        $tempfilename=$dir_fileopen.'/'.$id.md5($filename);
    } while(file_exists($tempfilename));
    if(file_exists($filename)){
        $newfile=false;
        copy($filename,$tempfilename);
    }else{
        $newfile=true;
    }
    $fp=fopen($tempfilename,$mode);
    return $fp?array($fp,$filename,$id,@filemtime($filename)):false;
}
function cfwrite($fp,$string){
 return fwrite($fp[0],$string);
}
function cfclose($fp,$debug='off'){
    global $dir_fileopen;
    $success=fclose($fp[0]);
    clearstatcache();
    $tempfilename=$dir_fileopen.'/'.$fp[2].md5($fp[1]);
    if((@filemtime($fp[1])==$fp[3])||($fp[4]==true&&!file_exists($fp[1]))||$fp[5]==true){
        rename($tempfilename,$fp[1]);
    }else{
        unlink($tempfilename);
  // Indicates that there are other processes   While manipulating the target file, the current process is rejected 
        $success=false;
    }
    return $success;
}
$fp=cfopen('lock.txt','a+');
cfwrite($fp,"welcome to beijing.\n");
fclose($fp,'on');

For the function used in the above code, it needs to be explained as follows:
(1) rename (); Renaming a file or a directory is more like mv in linux. It is convenient to update the path or name of a file or directory. But when I test the above code on window, if the new file name already exists, I get an notice saying the current file already exists. But linux worked well.
(2) clearstatcache (); php caches all file attribute information to provide better performance, but sometimes when multiple processes are deleting or updating files, php does not have time to update the file attributes in the cache, which can easily lead to access to data that was not real at the last update time. So you need to use this function to clear the saved cache.

Scenario 3: Random reads and writes to the operating file to reduce the possibility of concurrency.
This approach seems to be used more often when logging user access logs. Previously, we needed to define a random space. The larger the space, the less likely the concurrency would be. Assuming that the random read/write space is [1-500], then the distribution of our log files will be from log1~ log500. For each user visit, data is randomly written to any 1 file between log1 and log500. At the same time, there are two processes logging. The A process might be an updated log32 file, but what about the B process? If you want the B process to also operate on log32, the probability is approximately 1 in 500, which is approximately zero. When the access logs need to be analyzed, we simply need to merge the logs first and then analyze them. One benefit of this approach to logging is that processes are less likely to queue operations, allowing them to complete each operation quickly.

Scenario 4: Put all the processes to be operated on into a queue. Then put a service to complete the file operation. Queue process is equivalent to the first of each 1 out of specific operations, so first we service need only obtained from the queue is equal to the specific operation matters, if there are a lot of file operation process, it doesn't matter, to our queue behind can, as long as willing to row, it doesn't matter how long the queue.

For the previous several schemes, each has its own advantages! It can be roughly divided into two categories:
(1) Need to queue (slow influence), such as scheme 1.2.4
(2) There is no queue. (Fast impact) Scheme 3
When designing a caching system, we generally do not use option 3. Because the analysis program and the writer program of scheme 3 are not synchronized, at the time of writing, do not consider the difficulty of analysis at that time, just write the line. Imagine that if we update a cache with random file reads and writes, there will seem to be a lot of flow when we read the cache. However, scheme 1.2 is completely different, although the write time needs to wait (when the lock is not successfully obtained, it will be repeatedly obtained), but it is very convenient to read the file. The purpose of adding caching is to reduce data read bottlenecks and thus improve system performance.
From the above for personal experience and 1 of the summary of the data, what is wrong, or did not talk about the place, welcome each peer correction.


Related articles: