Explain the difference between character stream and byte stream in Java

2020-05-09 18:32:15
OfStack

This paper analyzes the difference between character stream and byte stream in Java for your reference. The details are as follows

1. What is flow

The stream in Java is an abstraction of the sequence of bytes. We can imagine that there is a water pipe, but now instead of water flowing through the pipe, there is a sequence of bytes. Like water flow 1, the flow in Java also has a "flow direction". An object from which a sequence of 1 byte can be read is called an input stream. An object to which a sequence of 1 byte can be written is called an output stream.

2. The byte stream

The most basic unit of byte stream processing in Java is a single byte, which is usually used to process binary data. The two most basic byte stream classes in Java are InputStream and OutputStream, which represent the basic input byte stream and output byte stream, respectively. The InputStream and OutputStream classes are both abstract classes, and we usually use the 1 series of subclasses of them provided in the Java class library in practice. Let's take the InputStream class as an example to introduce the byte stream in Java.

The InputStream class defines a basic method, read, for reading bytes from a byte stream. This method is defined as follows:

public abstract int read() throws IOException;
this is an abstract method, which means that any input stream class derived from InputStream needs to implement this 1 method, which reads 1 byte from the byte stream, returns -1 at the end, or returns the bytes read in. The thing to notice about this method is that it's going to block 1 until it returns either a read byte or a negative 1. In addition, bystream caching is not supported by default, which means that every time the read method is called, the operating system is asked to read 1 byte, which is usually accompanied by a disk IO, so the efficiency is low. Some of you might think that InputStream's overloading method, which takes a byte array as a parameter, can read multiple bytes at once without having to frequently run IO on disk. So is it? Let's take a look at the source of this method:


public int read(byte b[]) throws IOException {
  return read(b, 0, b.length);
}

it calls another version of the read overloaded method, so let's go on:


  public int read(byte b[], int off, int len) throws IOException {
    if (b == null) {
      throw new NullPointerException();
    } else if (off < 0 || len < 0 || len > b.length - off) {
      throw new IndexOutOfBoundsException();
    } else if (len == 0) {
      return 0;
    }

    int c = read();
    if (c == -1) {
      return -1;
    }
    b[off] = (byte)c;

    int i = 1;
    try {
      for (; i < len ; i++) {
        c = read();
        if (c == -1) {
          break;
        }
        b[off + i] = (byte)c;
      }
    } catch (IOException ee) {
    }
    return i;
  }

As you can see from the above code, the read(byte[]) method actually reads a byte array "once" inside by looping the read() method, so essentially the method does not use a memory buffer. To use the memory buffer for efficient reading, we should use BufferedInputStream.

3. Character stream

The most basic unit of character stream processing in Java is the Unicode code element (size 2 bytes), which is usually used to process text data. The so-called Unicode code element is one Unicode code unit, which ranges from 0x0000 to 0xFFFF. Each number in the above range corresponds to one character, and the String type in Java by default encodes the character as Unicode and stores it in memory. Unlike in memory, however, data stored on disk is usually encoded in a variety of ways. Using a different encoding, the same character will have a different base 2 representation. Here's how the character stream actually works:

Output stream: converts the sequence of characters to be written to the file (actually the Unicode sequence) into the sequence of bytes in the specified encoding mode, and then writes to the file;
Input character stream: the byte sequence to be read is decoded in the specified encoding into the corresponding character sequence (actually the Unicode code sequence from) so that it can be stored in memory.
We use one demo to deepen our understanding of this process. The sample code is as follows:


import java.io.FileWriter;
import java.io.IOException;


public class FileWriterDemo {
  public static void main(String[] args) {
    FileWriter fileWriter = null;
    try {
      try {
        fileWriter = new FileWriter("demo.txt");
        fileWriter.write("demo");
      } finally {
        fileWriter.close();
      }
    } catch (IOException e) {
      e.printStackTrace();
    }
  }
}

In the code above , we used FileWriter to write the four characters "demo" to demo.txt. We used the base 106 editor WinHex to check the content of demo.txt:

As you can see from the figure above, the "demo" we wrote was encoded as "64, 65, 6D, 6F", but we did not specify the encoding explicitly in the above code. In fact, we used the operating system's default character encoding to encode the characters we were writing when we did not specify it.

Since the character stream is actually converting the Unicode code sequence to the byte sequence of the corresponding encoding mode before the output, it will use the memory buffer to store the byte sequence obtained after the conversion, wait for the conversion to be completed and then write it to the disk file together with 1.

4. Difference between character stream and byte stream

From the above description, we can know that the main differences between the byte stream and the character stream are as follows:

The basic unit of byte stream operation is byte. The basic unit of character flow operations is the Unicode code element.
Byte stream default does not use buffer; Character streams use buffers.
A byte stream is usually used to process data in base 2. In fact, it can process any type of data, but it does not support directly writing or reading Unicode code elements. Character flow usually deals with text data, and it supports writing and reading Unicode code elements.

the above is my understanding of the character stream and byte stream in Java. I hope you can correct me if the description is unclear or inaccurate. Thank you.