Write a simple FUSE file system tutorial with Python

  • 2020-05-05 11:23:48
  • OfStack

If you're a long-time reader of mine, you should know that I'm looking for a perfect backup program, and I ended up writing my own layer of encryption based on bup.

While writing encbup, I wasn't happy with the fact that I had to download an entire huge archive just to restore a file, but I still wanted to use EncFS with rdiff-backup for remote mount, encrypt, deduplication, and version backup.

After trying obnam again (it's still surprisingly slow, to paraphrase), I noticed that it had an mount command. After further research, I found fuse-python and fusepy, and felt that writing an FUSE filesystem with Python should be easy.

Smart readers may have realized what I was going to do next: I decided to write an encrypted file system layer in Python! It will be very similar to EncFS, but with some important differences:

      runs in reverse mode by default, receiving normal files and exposing an encrypted directory. Any backup program will find (and back up) these encrypted directories without any additional storage.       also accepts configuration files composed of a list of directories and exposes these directories at the mount point. In this case, all backup scripts need to back up the hardpoints, and the various directories are backed up immediately.       will favor backup over encrypted storage. It should be fun to write.

an FUSE file system example

The first step in writing this script is to write a purely pass-through file system. It simply takes a directory and exposes it at the mount point, ensuring that any changes at the mount point are mirrored to the source data.

fusepy requires you to write a class that defines various operating system-level methods. You can choose to define the methods that your filesystem wants to support and leave the rest undefined, but I need to define all the methods because my filesystem is a pass-through filesystem that should behave as much as possible as the original filesystem.

Writing this code is a lot of fun and easy, because most of the methods are simple encapsulations of the os module (sure, you can assign them directly, open= os.open, etc., but my module needs some path extensions). Unfortunately, fuse-python has an bug (as far as I know) that when a file is opened and read, it cannot send the file handle back to the file system. My script fails because it does not know which file handle corresponds to when an application performs a read and write operation. With very few changes to fusepy, it works just fine. It only has one file, so you can put it directly into your project.
code

Here, I'm happy to give you this code to refer to when you're going to implement your own file system. This code provides a good starting point where you can copy the class directly into your project and override some of its methods as needed.

Here's the real code:


#!/usr/bin/env python
 
from __future__ import with_statement
 
import os
import sys
import errno
 
from fuse import FUSE, FuseOSError, Operations
 
class Passthrough(Operations):
  def __init__(self, root):
    self.root = root
 
  # Helpers
  # =======
 
  def _full_path(self, partial):
    if partial.startswith("/"):
      partial = partial[1:]
    path = os.path.join(self.root, partial)
    return path
 
  # Filesystem methods
  # ==================
 
  def access(self, path, mode):
    full_path = self._full_path(path)
    if not os.access(full_path, mode):
      raise FuseOSError(errno.EACCES)
 
  def chmod(self, path, mode):
    full_path = self._full_path(path)
    return os.chmod(full_path, mode)
 
  def chown(self, path, uid, gid):
    full_path = self._full_path(path)
    return os.chown(full_path, uid, gid)
 
  def getattr(self, path, fh=None):
    full_path = self._full_path(path)
    st = os.lstat(full_path)
    return dict((key, getattr(st, key)) for key in ('st_atime', 'st_ctime',
           'st_gid', 'st_mode', 'st_mtime', 'st_nlink', 'st_size', 'st_uid'))
 
  def readdir(self, path, fh):
    full_path = self._full_path(path)
 
    dirents = ['.', '..']
    if os.path.isdir(full_path):
      dirents.extend(os.listdir(full_path))
    for r in dirents:
      yield r
 
  def readlink(self, path):
    pathname = os.readlink(self._full_path(path))
    if pathname.startswith("/"):
      # Path name is absolute, sanitize it.
      return os.path.relpath(pathname, self.root)
    else:
      return pathname
 
  def mknod(self, path, mode, dev):
    return os.mknod(self._full_path(path), mode, dev)
 
  def rmdir(self, path):
    full_path = self._full_path(path)
    return os.rmdir(full_path)
 
  def mkdir(self, path, mode):
    return os.mkdir(self._full_path(path), mode)
 
  def statfs(self, path):
    full_path = self._full_path(path)
    stv = os.statvfs(full_path)
    return dict((key, getattr(stv, key)) for key in ('f_bavail', 'f_bfree',
      'f_blocks', 'f_bsize', 'f_favail', 'f_ffree', 'f_files', 'f_flag',
      'f_frsize', 'f_namemax'))
 
  def unlink(self, path):
    return os.unlink(self._full_path(path))
 
  def symlink(self, target, name):
    return os.symlink(self._full_path(target), self._full_path(name))
 
  def rename(self, old, new):
    return os.rename(self._full_path(old), self._full_path(new))
 
  def link(self, target, name):
    return os.link(self._full_path(target), self._full_path(name))
 
  def utimens(self, path, times=None):
    return os.utime(self._full_path(path), times)
 
  # File methods
  # ============
 
  def open(self, path, flags):
    full_path = self._full_path(path)
    return os.open(full_path, flags)
 
  def create(self, path, mode, fi=None):
    full_path = self._full_path(path)
    return os.open(full_path, os.O_WRONLY | os.O_CREAT, mode)
 
  def read(self, path, length, offset, fh):
    os.lseek(fh, offset, os.SEEK_SET)
    return os.read(fh, length)
 
  def write(self, path, buf, offset, fh):
    os.lseek(fh, offset, os.SEEK_SET)
    return os.write(fh, buf)
 
  def truncate(self, path, length, fh=None):
    full_path = self._full_path(path)
    with open(full_path, 'r+') as f:
      f.truncate(length)
 
  def flush(self, path, fh):
    return os.fsync(fh)
 
  def release(self, path, fh):
    return os.close(fh)
 
  def fsync(self, path, fdatasync, fh):
    return self.flush(path, fh)
 
def main(mountpoint, root):
  FUSE(Passthrough(root), mountpoint, foreground=True)
 
if __name__ == '__main__':
  main(sys.argv[2], sys.argv[1])

If you want to run it, just install fusepy, put the code in a file (like myfuse.py) and run python myfuse.py/your directory/mount point directory. You will find that all files in the/your directory path go to the/hardpoint directory and can be manipulated as if they were a native file system.
conclusion

In general, I don't think writing a file system is that easy. The next step is to add the ability to encrypt/decrypt the script, as well as some helper methods. My goal is to make it a complete alternative to EncFS in addition to being more extensible (because it's written in Python) and including some additional features for backup files.

If you would like to follow up on this script, please subscribe to my mailing list below or follow me at Twitter. As always, feedback is welcome (comment below).


Related articles: