In depth analysis: with 1K memory to achieve efficient I and O RandomAccessFile class

  • 2020-04-01 01:59:04
  • OfStack

Subject:
The most popular version of J2SDK is the 1.3 series. Developers using this version need random access to files, so they have to use the RandomAccessFile class. Its I/O performance is far from that of other common development languages, which seriously affects the efficiency of the program.
Developers urgently need to improve efficiency, the following analysis of the RandomAccessFile and other file class source code, find out where the crux, and improve the optimization, create a "sex/price ratio" good random file access class BufferedRandomAccessFile.
Let's do a basic test before we improve: COPY a 12-megabyte file with verbatim sections (which involves reading and writing here).

read write Time consumed (seconds) RandomAccessFile RandomAccessFile 95.848 BufferedInputStream + a DataInputStream BufferedOutputStream + DataOutputStream 2.935

We can see that the difference is about 32 times, and RandomAccessFile is too slow. First look at the two key parts of the source code, comparative analysis, find out why.
1.1. [RandomAccessFile]


public class RandomAccessFile implements DataOutput, DataInput {
 public final byte readByte() throws IOException {
  int ch = this.read();
  if (ch < 0)
   throw new EOFException();
  return (byte)(ch);
 }
 public native int read() throws IOException; 
 public final void writeByte(int v) throws IOException {
  write(v);
 } 
 public native void write(int b) throws IOException; 
}


It can be seen that every byte read/write of RandomAccessFile requires an I/O operation on the disk.
1.2. [BufferedInputStream]

public class BufferedInputStream extends FilterInputStream {
 private static int defaultBufferSize = 2048; 
 protected byte buf[]; //Create a read cache
 public BufferedInputStream(InputStream in, int size) {
  super(in);        
  if (size <= 0) {
   throw new IllegalArgumentException("Buffer size <= 0");
  }
  buf = new byte[size];
 }
 public synchronized int read() throws IOException {
  ensureOpen();
  if (pos >= count) {
   fill();
   if (pos >= count)
    return -1;
  }
  return buf[pos++] & 0xff; //Read directly from BUF[]
 } 
 private void fill() throws IOException {
 if (markpos < 0)
     pos = 0;  
 else if (pos >= buf.length) 
     if (markpos > 0) { 
  int sz = pos - markpos;
  System.arraycopy(buf, markpos, buf, 0, sz);
  pos = sz;
  markpos = 0;
     } else if (buf.length >= marklimit) {
  markpos = -1; 
  pos = 0; 
     } else {  
  int nsz = pos * 2;
  if (nsz > marklimit)
      nsz = marklimit;
  byte nbuf[] = new byte[nsz];
  System.arraycopy(buf, 0, nbuf, 0, pos);
  buf = nbuf;
     }
 count = pos;
 int n = in.read(buf, pos, buf.length - pos);
 if (n > 0)
     count = n + pos;
 }
}

1.3. [BufferedOutputStream]

public class BufferedOutputStream extends FilterOutputStream {
   protected byte buf[]; //Create a write cache
   public BufferedOutputStream(OutputStream out, int size) {
  super(out);
  if (size <= 0) {
   throw new IllegalArgumentException("Buffer size <= 0");
  }
  buf = new byte[size];
    } 
public synchronized void write(int b) throws IOException {
  if (count >= buf.length) {
      flushBuffer();
  }
  buf[count++] = (byte)b; //Read directly from BUF[]
   }
   private void flushBuffer() throws IOException {
  if (count > 0) {
   out.write(buf, 0, count);
   count = 0;
  }
   }
}


As you can see, a Buffered I/O putStream reads/writes one byte per byte. If the data to be manipulated is in BUF, it will directly read/write to the BUF [] in memory. Otherwise, fill buf[] from the corresponding location of the disk, and then directly on the memory of buf[] read/write operations, most of the read/write operations are on the memory of buf[] operations.
1.3. summary
The memory access time unit is nanoseconds (10e-9), the disk access time unit is milliseconds (10e-3), the same operation overhead, memory is a million times faster than the disk. It is theoretically foreseeable that even tens of thousands of memory operations will take much less time than the disk I/O overhead. Obviously the latter is to increase the BUF access located in memory, reduce the disk I/O overhead, improve the access efficiency, of course, this also increases the BUF control part of the overhead. From the practical point of view, access efficiency increased 32 times.
According to the conclusion of 1.3, try adding a buffered read and write mechanism to the RandomAccessFile class as well.
Random access class and order class is different, the former is created through the implement DataInput/DataOutput interface, while the latter is an extension of FilterInputStream/FilterOutputStream created, cannot be directly copied.
2.1. Open buffer BUF[default: 1024 bytes] as a read/write Shared buffer.
2.2. Implement the read buffer first.
Basic principles of read-buffer logic:
A reads A byte of the file POS location.
B check the presence of BUF? If so, it is read directly from BUF and returns the character BYTE.
C if not, BUF relocates to the POS location and fills the BUFFER with the file contents of the BUFSIZE bytes near the location, returning B.
The following is the key part of the code and its description:

public class BufferedRandomAccessFile extends RandomAccessFile {
//  Byte read(long pos) : reads the byte at the pos location of the current file
//  Bufstartpos and bufendpos represent the first/last offset address of the BUF map in the current file.
//  Curpos refers to the offset address of the current class file pointer.
    public byte read(long pos) throws IOException {
        if (pos < this.bufstartpos || pos > this.bufendpos ) {
            this.flushbuf();
            this.seek(pos);
            if ((pos < this.bufstartpos) || (pos > this.bufendpos)) 
                throw new IOException();
        }
        this.curpos = pos;
        return this.buf[(int)(pos - this.bufstartpos)];
    }
//Void flushbuf() : bufdirty is true, write to disk the data in buf[] that has not been written to disk.
    private void flushbuf() throws IOException {
        if (this.bufdirty == true) {
            if (super.getFilePointer() != this.bufstartpos) {
                super.seek(this.bufstartpos);
            }
            super.write(this.buf, 0, this.bufusedsize);
            this.bufdirty = false;
        }
    }
//Void seek(long pos) : moves the file pointer to the pos position and populates the buf[] map to pos
 The file block that you are in. 
    public void seek(long pos) throws IOException {
        if ((pos < this.bufstartpos) || (pos > this.bufendpos)) { // seek pos not in buf
            this.flushbuf();
            if ((pos >= 0) && (pos <= this.fileendpos) && (this.fileendpos != 0)) 
{   // seek pos in file (file length > 0)
               this.bufstartpos =  pos * bufbitlen / bufbitlen;
                this.bufusedsize = this.fillbuf();
            } else if (((pos == 0) && (this.fileendpos == 0)) 
|| (pos == this.fileendpos + 1)) 
{   // seek pos is append pos
                this.bufstartpos = pos;
                this.bufusedsize = 0;
            }
            this.bufendpos = this.bufstartpos + this.bufsize - 1;
        }
        this.curpos = pos;
    }
//Int fillbuf() : fill in buf[] according to bufstartpos.
    private int fillbuf() throws IOException {
        super.seek(this.bufstartpos);
        this.bufdirty = false;
        return super.read(this.buf);
    }
}

So bufferread basic implementation, verbatim section COPY a 12 Megabyte file (here involves reading and writing, with bufferedrandaccessfile try to read speed) :

read write Time consumed (seconds) RandomAccessFile RandomAccessFile 95.848 BufferedRandomAccessFile BufferedOutputStream + DataOutputStream 2.813 BufferedInputStream + a DataInputStream BufferedOutputStream + DataOutputStream 2.935

You can see a significant increase in speed, comparable to the BufferedInputStream+DataInputStream.
2.3. Implement write buffering.
Basic principles of writing buffer logic:
A to write A byte of the file POS location.
B check whether there is this mapping in BUF? If so, write directly to BUF and return true.
C if not, BUF relocates to the location of the POS and fills the BUFFER with the BUFSIZE byte file contents near the location, returning B.
The following is the key part of the code and its description:


//Boolean write(byte bw, long pos) : writes a byte bw to the current file pos position.
//According to the different POS and the location of BUF: modify, append, BUF, BUF, etc
 Conditions. In the logical judgment, the most likely situation, the first judgment, so as to improve the speed. 
//Fileendpos: indicates the tail offset address of the current file, mainly for the append factor
    public boolean write(byte bw, long pos) throws IOException {
        if ((pos >= this.bufstartpos) && (pos <= this.bufendpos)) { 
// write pos in buf
            this.buf[(int)(pos - this.bufstartpos)] = bw;
            this.bufdirty = true;
            if (pos == this.fileendpos + 1) { // write pos is append pos
                this.fileendpos++;
                this.bufusedsize++;
            }
        } else { // write pos not in buf
            this.seek(pos);
            if ((pos >= 0) && (pos <= this.fileendpos) && (this.fileendpos != 0)) 
{ // write pos is modify file
                this.buf[(int)(pos - this.bufstartpos)] = bw;
            } else if (((pos == 0) && (this.fileendpos == 0)) 
|| (pos == this.fileendpos + 1)) { // write pos is append pos
                this.buf[0] = bw;
                this.fileendpos++;
                this.bufusedsize = 1;
            } else {
                throw new IndexOutOfBoundsException();
            }
            this.bufdirty = true;
        }
        this.curpos = pos;
        return true;
    }

So bufferwrite basic implementation, verbatim section COPY a 12 Megabyte file, (here involves reading and writing, combined with bufferread, with BufferedRandomAccessFile try to read/write speed) :

read write Time consumed (seconds) RandomAccessFile RandomAccessFile 95.848 BufferedInputStream + a DataInputStream BufferedOutputStream + DataOutputStream 2.935 BufferedRandomAccessFile BufferedOutputStream + DataOutputStream 2.813 BufferedRandomAccessFile BufferedRandomAccessFile 2.453
Visible comprehensive read/write speed beyond BufferedInput/OutputStream + DataInput/OutputStream.
Optimize BufferedRandomAccessFile.
Optimization principle:
The & # 8226; Statements that are frequently invoked are the ones that need to be optimized the most, and the optimization is the most effective.
The & # 8226; When multiple nested logical decisions are made, the decisions that are most likely to occur should be placed on the outermost layer.
The & # 8226; Reduce unnecessary NEW.
Here's a typical example:

public void seek(long pos) throws IOException {
  ...
this.bufstartpos =  pos * bufbitlen / bufbitlen; 
//Bufbitlen refers to the bit length of buf[]. For example, if bufsize=1024, bufbitlen=10.
...
}

Seek function is used in each function and is called very frequently. The above weighted statement determines the mapping position of buf[] corresponding to the current file according to pos and bufsize. It is obviously not a good method to use "*" and "/" to determine.
Optimization 1: this.bufstartpos = (pos < < Bufbitlen) > > Bufbitlen;
Optimization 2: this.bufstartpos = pos & bufmask; // this. Bufmask = ~((long)this. Bufsi-1);
Both are more efficient than before, but the latter is clearly better because the former requires two shifts and the latter requires only one logical and operation (which bufmask can pre-compute).
So the optimization of the basic implementation, verbatim section COPY a 12 Megabyte file, (here involved in the read and write, combined with the buffer read, with the optimized BufferedRandomAccessFile try to read/write speed) :

read write Time consumed (seconds) RandomAccessFile RandomAccessFile 95.848 BufferedInputStream + a DataInputStream BufferedOutputStream + DataOutputStream 2.935 BufferedRandomAccessFile BufferedOutputStream + DataOutputStream 2.813 BufferedRandomAccessFile BufferedRandomAccessFile 2.453 BufferedRandomAccessFile optimal BufferedRandomAccessFile optimal 2.197
You can see that the optimization, while not obvious, is still faster than it was before, and perhaps even more so on older machines.
The above comparison is for sequential access, and even for random access, there is more than one BYTE in most cases, so the buffering mechanism still works. The general sequential access class to achieve random access is not very easy.
It needs to be improved
Provide file append function:

public boolean append(byte bw) throws IOException {
        return this.write(bw, this.fileendpos + 1);
    }

Provide file current location modification function:

public boolean write(byte bw) throws IOException {
        return this.write(bw, this.curpos);
    }

Returns file length (different from the original RandomAccessFile class due to BUF reading and writing) :

public long length() throws IOException {
        return this.max(this.fileendpos + 1, this.initfilelen);
    }

Returns the current pointer to the file (different from the original RandomAccessFile class because it is read and written through BUF) :

public long getFilePointer() throws IOException {
        return this.curpos;
    }

Buffered write to multiple bytes of the current location:

public void write(byte b[], int off, int len) throws IOException {
        long writeendpos = this.curpos + len - 1;
        if (writeendpos <= this.bufendpos) { // b[] in cur buf
System.arraycopy(b, off, this.buf, (int)(this.curpos - this.bufstartpos), 
len);
            this.bufdirty = true;
            this.bufusedsize = (int)(writeendpos - this.bufstartpos + 1);
        } else { // b[] not in cur buf
            super.seek(this.curpos);
            super.write(b, off, len);
        }
        if (writeendpos > this.fileendpos)
            this.fileendpos = writeendpos;
        this.seek(writeendpos+1);
}
    public void write(byte b[]) throws IOException {
        this.write(b, 0, b.length);
    }


Provides buffered reads to multiple bytes of the current location:

public int read(byte b[], int off, int len) throws IOException {
long readendpos = this.curpos + len - 1;
   if (readendpos <= this.bufendpos && readendpos <= this.fileendpos ) { 
// read in buf
      System.arraycopy(this.buf, (int)(this.curpos - this.bufstartpos), 
b, off, len);
   } else { // read b[] size > buf[]
     if (readendpos > this.fileendpos) { // read b[] part in file
        len = (int)(this.length() - this.curpos + 1);
       }
       super.seek(this.curpos);
       len = super.read(b, off, len);
       readendpos = this.curpos + len - 1;
   }
       this.seek(readendpos + 1);
       return len;
}
   public int read(byte b[]) throws IOException {
        return this.read(b, 0, b.length);
   }
public void setLength(long newLength) throws IOException {
        if (newLength > 0) {
            this.fileendpos = newLength - 1;
        } else {
            this.fileendpos = 0;
        }
        super.setLength(newLength);
}

public void close() throws IOException {
        this.flushbuf();
        super.close();
    }

Now that the work is almost complete, try the new multi-byte read/write function to COPY a 12-megabyte file by reading/write 1024 bytes at the same time.

read write Time consumed (seconds) RandomAccessFile RandomAccessFile 95.848 BufferedInputStream + a DataInputStream BufferedOutputStream + DataOutputStream 2.935 BufferedRandomAccessFile BufferedOutputStream + DataOutputStream 2.813BufferedRandomAccessFile BufferedRandomAccessFile 2.453 BufferedRandomAccessFile optimal BufferedRandomAccessFile optimal 2.197 Finish BufferedRandomAccessFile Finish BufferedRandomAccessFile 0.401
Compared with JDK1.4 new class MappedByteBuffer+RandomAccessFile?
JDK1.4 provides NIO class, among which MappedByteBuffer class is used for mapping buffer and can also map random file access. It can be seen that JAVA designers also see the problem of RandomAccessFile and improve it. How to copy files by MappedByteBuffer+RandomAccessFile? Here are the main parts of the test program:

RandomAccessFile rafi = new RandomAccessFile(SrcFile, "r");
   RandomAccessFile rafo = new RandomAccessFile(DesFile, "rw");
 FileChannel fci = rafi.getChannel();
FileChannel fco = rafo.getChannel();
 long size = fci.size();
 MappedByteBuffer mbbi = fci.map(FileChannel.MapMode.READ_ONLY, 0, size);
MappedByteBuffer mbbo = fco.map(FileChannel.MapMode.READ_WRITE, 0, size);
long start = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
            byte b = mbbi.get(i);
            mbbo.put(i, b);
}
fcin.close();
fcout.close();
rafi.close();
rafo.close();
System.out.println("Spend: "+(double)(System.currentTimeMillis()-start) / 1000 + "s");

Try JDK1.4's map-buffering read/write feature, where verbatim segments COPY a 12-megabyte file (read and write here) :

read write Time consumed (seconds) RandomAccessFile RandomAccessFile 95.848 BufferedInputStream + a DataInputStream BufferedOutputStream + DataOutputStream 2.935 BufferedRandomAccessFile BufferedOutputStream + DataOutputStream 2.813 BufferedRandomAccessFile BufferedRandomAccessFile 2.453 BufferedRandomAccessFile optimal BufferedRandomAccessFile optimal 2.197 Finish BufferedRandomAccessFile Finish BufferedRandomAccessFile 0.401 MappedByteBuffer + RandomAccessFile MappedByteBuffer + RandomAccessFile 1.209
Indeed, it seems that JDK1.4 is a great improvement over 1.3. If later use 1.4 version of software development, need to random access to the file, it is recommended to use MappedByteBuffer+RandomAccessFile. But given the current using JDK1.3 and previous version program accounted for the vast majority of the actual situation, if you develop a JAVA program using the RandomAccessFile class to random access to the file, and because of its poor performance, but not for worry by the user, please try this article provided BufferedRandomAccessFile class, do not have to overthrow rewritten, simply IMPORT the class, all the RandomAccessFile BufferedRandomAccessFile instead, The performance of your program will improve dramatically, and that's all you need to do.
Future considerations
On this basis, readers can set up multi-page cache and cache elimination mechanism to cope with the application of high intensity of random access.


Related articles: