Details of stable and reliable file operations using Python

2020-04-02 13:20:33
OfStack

Consider the following Python snippet. Perform certain operations on the data in the file and then save the results back to the file:


with open(filename) as f:
   input = f.read()
output = do_something(input)
with open(filename, 'w') as f:
   f.write(output)

Seems simple, right? It may not seem as simple as it first appears. I was debugging applications on the production server, and I often had strange behavior.
Here's an example of the failure mode I've seen:
Out-of-control server processes overflow logs and disks fill up. Write () throws an exception after truncating the file, and the file becomes empty.
Several instances of the application execute in parallel. At the end of each instance, the contents of the file end up in the book because the output of multiple instances is mixed.
After the write operation is completed, the application triggers some subsequent operations. Power goes out in a few seconds. After we restart the server, we see the old file contents again. The data that has been passed to other applications is no longer consistent with what we see in the file.
There's nothing new here. The purpose of this article is to provide common methods and techniques for Python developers who are inexperienced in system programming. I will provide code examples so that developers can easily apply these methods to their own code.
What does "reliability" mean?
Broadly speaking, reliability means that an operation can perform its required function under all specified conditions. As for the operation of the file, this function is a matter of creating, replacing, or appending the contents of the file. Here's where the inspiration comes from database theory. The ACID properties of the classical transaction model serve as a guide to improving reliability.
Before we get started, let's look at how our example relates to the four properties of acidosis:
Atomicity requires that the transaction be either completely successful or completely unsuccessful. In the example above, a full disk might cause some content to be written to a file. In addition, if other programs are reading the file while writing, they may get a partially completed version or even cause write errors
Consistency means that the operation must go from one state of the system to another. Consistency can be divided into two parts: internal and external consistency. Internal consistency is the consistency of the data structure of the file. External consistency means that the content of a file is consistent with the data associated with it. In this case, because we don't know the application, it's hard to make inferences about consistency. But because consistency requires atomicity, we can at least say that there is no guarantee of internal consistency.
Isolation if multiple identical transactions result in different results in a concurrent execution, the Isolation is violated. It is clear that the above code does not protect against operation failures or other isolated failures.
Durability means that change is permanent. Before we can tell the user that we succeeded, we must make sure that our data store is reliable and not just a write cache. The premise that the above code has successfully written the data is that if we call the write() function, the disk I/O executes immediately. But the POSIX standard does not guarantee this assumption.
Use database systems whenever possible
If we can obtain the four properties of ACID, then we have made a long-term development in increasing reliability. But it takes a lot of coding. Why reinvent the wheel? Most database systems already have ACID transactions.
Reliability data storage is a solved problem. If you need reliable storage, use a database. Chances are, without decades of work, you're not as good at figuring this out on your own as someone who's been working on it for years. If you don't want to install a large database server, you can use sqlite, which has ACID transactions, is small, free, and is included in Python's standard library.
The article should have ended there, but there's a good reason not to use data. They are usually file format or file location constraints. Neither is easy to control in a database system. The reasons are as follows:
We have to deal with files in a fixed format or in a fixed location produced by other applications,
We have to write the file for the consumption of other applications (and apply the same constraints)
Our documents must be easy to read or modify.

Here are some programming techniques to consider if you're going to implement reliable file updates yourself. I'll show you four common modes of working with file updates. After that, I'll discuss the steps you can take to satisfy the ACID property in each file update mode.
File update mode
Files can be updated in many ways, but I think there are at least four common patterns. These four patterns will form the basis for the rest of this article.
Block and write
This is probably the most basic pattern. In the following example, the hypothetical domain model code reads the data, performs some calculations, and then re-opens the existing file in write mode:


with open(filename, 'r') as f:
   model.read(f)
model.process()
with open(filename, 'w') as f:
   model.write(f)

A variant of this mode opens the file in read-write mode (plus mode in Python), finds the starting position, explicitly truncates (), and overrides the file contents.


with open(filename, 'a+') as f:
   f.seek(0)
   model.input(f.read())
   model.compute()
   f.seek(0)
   f.truncate()
   f.write(model.output())

The advantage of this variant is that it only opens the file once and always keeps the file open. For example, this simplifies locking.
Write - to replace
Another widely used pattern is to write new content to a temporary file and then replace the original file:


with tempfile.NamedTemporaryFile(
      'w', dir=os.path.dirname(filename), delete=False) as tf:
   tf.write(model.output())
   tempname = tf.name
os.rename(tempname, filename)

This method is more robust to errors than the truncated - write method. Look at the discussion of atomicity and consistency below. Many applications use this method.
These two patterns are so common that the ext4 file system in the Linux kernel can even detect them automatically, automatically fixing some reliability flaws. But don't rely on this feature: you don't always use ext4, and administrators may turn it off.
additional

The third mode is to append new data to an existing file:


with open(filename, 'a') as f:
   f.write(model.output())

This pattern is used for writing log files and other cumulative data-processing tasks. Technically, it is remarkably simple. An interesting extension application is to update only by appending operations in regular operations, and then periodically rearrange the file to make it more compact.
Spooldir
Here we use the directory as a logical data store and create a new uniquely named file for each record:


with open(unique_filename(), 'w') as f:
   f.write(model.output())

This pattern has the same cumulative characteristics as the additional pattern. One big advantage is that we can put a small amount of metadata in file names. For example, this can be used to convey information about the processing state. A particularly clever implementation of the spooldir pattern is the maildir format. Maildirs USES a naming scheme with attached subdirectories to perform updates in a reliable, lock-free manner. The md and gocept. Filestore libraries provide convenient encapsulation for maildir operations.
If your filename generation does not guarantee unique results, it may even require that the file be actually new. Then call the lower-level os.open() with the appropriate flag:


fd = os.open(filename, os.O_WRONLY | os.O_CREAT| os.O_EXCL, 0o666)
with os.fdopen(fd, 'w') as f:
   f.write(...)

After opening the file in O_EXCL, we used os.fdopen to convert the original file descriptor into a normal Python file object.
Apply ACID properties to file updates
Next, I'll try to enforce the file update mode. Instead, let's look at what can be done to satisfy ACID properties. I'm going to keep it as simple as possible, because we're not going to write a complete database system. Note that the material in this section is not exhaustive, but can provide a good starting point for your own experiments.
atomic
The write-replace pattern provides atomicity because the underlying os.rename() is atomic. This means that at any given point in time, the process either sees the old file or sees the new file. This pattern is inherently robust to write errors: if the write operation triggers an exception, the rename operation is not performed, and there is a risk that the correct old file is not overwritten with a corrupted new file.
The attach mode is not atomic because of the risk of attaching incomplete records. But there's a trick to making updates atomic: annotate checksums for each write operation. When reading the log later, ignore any records that do not have a valid checksum. In this way, only complete records are processed. In the following example, the application makes periodic measurements, each time attaching a row of JSON records to the log. We calculate the CRC32 checksum in the byte representation of the record and append it to the same line:


with open(logfile, 'ab') as f:
    for i in range(3):
        measure = {'timestamp': time.time(), 'value': random.random()}
        record = json.dumps(measure).encode()
        checksum = '{:8x}'.format(zlib.crc32(record)).encode()
        f.write(record + b' ' + checksum + b'n')

The example code simulates the measurement by creating a random value each time.


$ cat log
{"timestamp": 1373396987.258189, "value": 0.9360123151217828} 9495b87a
{"timestamp": 1373396987.25825, "value": 0.40429005476999424} 149afc22
{"timestamp": 1373396987.258291, "value": 0.232021160265939} d229d937

To process this log file, we read the records one line at a time, separating the checksums, and comparing them with the records we read.


with open(logfile, 'rb') as f:
    for line in f:
        record, checksum = line.strip().rsplit(b' ', 1)
        if checksum.decode() == '{:8x}'.format(zlib.crc32(record)):
            print('read measure: {}'.format(json.loads(record.decode())))
        else:
            print('checksum error for record {}'.format(record))

Now we simulate the truncated write operation by truncating the last line:


$ cat log
{"timestamp": 1373396987.258189, "value": 0.9360123151217828} 9495b87a
{"timestamp": 1373396987.25825, "value": 0.40429005476999424} 149afc22
{"timestamp": 1373396987.258291, "value": 0.23202

When reading the log, the last incomplete line is rejected:


$ read_checksummed_log.py log
read measure: {'timestamp': 1373396987.258189, 'value': 0.9360123151217828}
read measure: {'timestamp': 1373396987.25825, 'value': 0.40429005476999424}
checksum error for record b'{"timestamp": 1373396987.258291, "value":'

The method of adding checksums to logging is used for a number of applications, including many database systems.
A single file in spooldir can also add a checksum to each file. Another approach that may be simpler is to borrow the write-replace mode: first write the file to the side and then move it to the final location. Design a naming scheme that protects files being processed by consumers. In the following example, all files ending in.tmp are ignored by the reader, so they can be safely used during write operations.


newfile = generate_id()
with open(newfile + '.tmp', 'w') as f:
   f.write(model.output())
os.rename(newfile + '.tmp', newfile)

Finally, truncation - write nonatomic. Unfortunately, I can't offer variations that satisfy atomicity. After the interception, the file is empty and no new content has been written. If a concurrent program reads the file now or if an exception occurs and the program is aborted, we see neither the new version nor the new version.
consistency
Much of what I talked about atomicity also applies to consistency. In fact, atomic renewal is a prerequisite for internal consistency. External consistency means that several files are updated simultaneously. This is not easy to do, and locking files can be used to ensure that read and write access does not interfere with each other. Consider that files in a directory need to be consistent with each other. A common pattern is to specify a lock file that controls access to the entire directory.
Example of writing a program:


with open(os.path.join(dirname, '.lock'), 'a+') as lockfile:
   fcntl.flock(lockfile, fcntl.LOCK_EX)
   model.update(dirname)

Examples of reading programs:


with open(os.path.join(dirname, '.lock'), 'a+') as lockfile:
   fcntl.flock(lockfile, fcntl.LOCK_SH)
   model.readall(dirname)

This method takes effect only if it controls all reads. Because there is only one writer activity at a time (an exclusive lock blocks all Shared locks), the scalability of this method is limited.
Further, we can apply write-replace mode to the entire directory. This involves creating a new directory for each update and changing the conforming link after the update is complete. For example, a mirror application maintains a directory that contains a zip package and an index file that lists the file name, file size, and checksum. When the upstream mirror is updated, it is not enough just to atomically update the archive and index files separately. Instead, we need to provide both zip and index files to avoid checksum mismatches. To solve this problem, we maintain a subdirectory for each build, and then change the symbolic link to activate that build.


mirror
|-- 483
|   |-- a.tgz
|   |-- b.tgz
|   `-- index.json
|-- 484
|   |-- a.tgz
|   |-- b.tgz
|   |-- c.tgz
|   `-- index.json
`-- current -> 483

The new generation 484 is being updated. When all the compression packages are ready and the index file is updated, we can switch the current symbolic link with an atomic call to os.symlink(). Other applications always see either completely old or completely new generation. The reader needs to use os.chdir() to enter the current directory, and it is important not to specify files with full pathnames. A race condition occurs when the reader opens current/index.json and then current/a.tgz, but the symbolic link has changed.
Isolation,
Isolation means that concurrent updates to the same file are serializable -- there is a serial schedule that allows the actually executed parallel schedule to return the same result. "Real" database systems use advanced technologies like MVCC to maintain serializability while allowing for high levels of parallelism. Going back to our scenario, we ended up using locks to serialize file updates.
It is easy to lock a truncated - write update. Simply acquire an exclusive lock before all file operations. The following example code reads an integer from a file, increments, and finally updates the file:


def update():
   with open(filename, 'r+') as f:
      fcntl.flock(f, fcntl.LOCK_EX)
      n = int(f.read())
      n += 1
      f.seek(0)
      f.truncate()
      f.write('{}n'.format(n))

Using write - replace mode with lock updates is a bit cumbersome. Using locks like truncation - write can cause update conflicts. Some naive implementation might look like this


def update():
   with open(filename) as f:
      fcntl.flock(f, fcntl.LOCK_EX)
      n = int(f.read())
      n += 1
      with tempfile.NamedTemporaryFile(
            'w', dir=os.path.dirname(filename), delete=False) as tf:
         tf.write('{}n'.format(n))
         tempname = tf.name
      os.rename(tempname, filename)

What's wrong with this code? Imagine two processes competing to update a file. The first process runs earlier, but the second process blocks on the fcntl.flock() call. When the first process replaces the file and releases the lock, the file descriptor that is now opened in the second process points to a "ghost" file that contains the old content (any path name is not reachable). To avoid this conflict, we must check that the file we open is the same as the one returned by fcntl.flock(). So I wrote a new LockedOpen context manager to replace the built-in open context. To make sure we actually open the correct file:


class LockedOpen(object):
    def __init__(self, filename, *args, **kwargs):
        self.filename = filename
        self.open_args = args
        self.open_kwargs = kwargs
        self.fileobj = None
    def __enter__(self):
        f = open(self.filename, *self.open_args, **self.open_kwargs)
        while True:
            fcntl.flock(f, fcntl.LOCK_EX)
            fnew = open(self.filename, *self.open_args, **self.open_kwargs)
            if os.path.sameopenfile(f.fileno(), fnew.fileno()):
                fnew.close()
                break
            else:
                f.close()
                f = fnew
        self.fileobj = f
        return f
    def __exit__(self, _exc_type, _exc_value, _traceback):
        self.fileobj.close()

Locking an append update is as simple as locking a truncated - write update: you need an exclusive lock, and the append is done. Processes that require a long run and open a file for a long time can release the lock when updated, allowing others to enter.
One nice feature of the spooldir pattern is that it does not require any locks. In addition, you build on the use of flexible naming patterns and a robust file name generational. The mail directory specification is a good example of the spooldir pattern. It can be easily adapted to other situations, not just dealing with email.
persistence
Persistence is a bit special because it depends not only on the application, but also on the OS and hardware configuration. In theory, we can assume that the os.fsync() or os.fdatasync() calls return no results if the data does not reach the persistent store. In practice, we might encounter several problems: we might face an incomplete implementation of fsync, or a poor disk controller configuration, neither of which provides any guarantee of persistence. There is a discussion from the MySQL developers about where errors can occur that goes into great detail. Some database systems, such as PostgreSQL, even offer a choice of persistence mechanisms so that administrators can choose the best one at run time. The unlucky ones, however, can only use os.fsync() and expect it to be implemented correctly.
In truncation - write mode, we need to send a synchronization signal before closing the file after the write operation. Note that this usually involves another level of write caching. The glibc cache blocks the write operation inside the process even before it is passed to the kernel. Also to get the empty glibc cache, flush() before synchronizing:


with open(filename, 'w') as f:
   model.write(f)
   f.flush()
   os.fdatasync(f)

Alternatively, you can call Python with the argument -u to get unbuffered writes for all file I/O.
Most of the time I prefer os.fsync() to os.fdatasync() to avoid synchronizing metadata updates (ownership, size, mtime...) . Updates to the metadata can eventually lead to disk I/O searches, which can slow things down.
Using the same technique for write-replace style updates is only half the battle. We need to make sure that the contents of the new write file are written to the non-volatile memory before replacing the old file, but what about the replace operation? We can't guarantee that the directory update will perform just right. There is a lot of talk on the web about how to synchronize directory updates. But in our case, where the old and new files are in the same directory, we can use a simple solution to avoid this problem.


os.rename(tempname, filename)
dirfd = os.open(os.path.dirname(filename), os.O_DIRECTORY)
os.fsync(dirfd)
os.close(dirfd)

We call the underlying os.open () to open the directory (open () is not supported by Python's native open () method), and then execute os.fsync () on the directory file descriptor.
The treatment of append updates is similar to the truncation - write I mentioned.
The spooldir mode has the same directory synchronization problem as the write-replace mode. Fortunately, you can use the same solution: the first step is to synchronize the files, then the directories.

conclusion
This makes it possible to update files reliably. I've demonstrated four properties that satisfy ACID. The example code shown here ACTS as a toolkit. Mastering this programming technique will best meet your needs. Sometimes, you don't need to satisfy all ACID properties, maybe just one or two. I hope this article has helped you make well-informed decisions about what to implement and what to leave behind.