How do Java read and write files to solve the problem of garbled code

  • 2020-04-01 04:07:00
  • OfStack

When reading the file stream, often encounter the phenomenon of garbled code, of course, the cause of the garbled code can not be a, here is mainly introduced because of the file coding format caused by the garbled code. First, make it clear that text files and binary files are concepts and differences.

Text files are files based on character encodings, common encodings are ASCII encodings, UNICODE encodings, ANSI encodings, and so on. Binaries are files based on value encoding, and you can specify what a value means according to the specific application (such a process can be considered as a custom encoding).

So you can see that text files are basically fixed-length encoded (there are also fixed-length encodings such as utf-8). Binaries, on the other hand, can be thought of as variable-length code, because it's a value code, and it's up to you how many bits represent a value.

  For binary files, it is never possible to use strings, because the default string initialization will use the system default encoding, however, binary files because of the nature of the custom encoding and fixed format of the encoding will be in conflict, so the binary files can only use byte stream to read, operate, write.

    For text files, the encoding is fixed, so as long as you parse the file in its own encoding format before reading it, get the bytes, and then initialize the string by specifying the format, the resulting text is unscrambled. Although the binary can also get its text encoding, that's not accurate, so it can't be compared.

The specific operation is as follows:

1) get the format of the text file


public static String getFileEncode(String path) {
    String charset ="asci";
    byte[] first3Bytes = new byte[3];
    BufferedInputStream bis = null;
    try {
      boolean checked = false;
      bis = new BufferedInputStream(new FileInputStream(path));
      bis.mark(0);
      int read = bis.read(first3Bytes, 0, 3);
      if (read == -1)
        return charset;
      if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {
        charset = "Unicode";//UTF-16LE
        checked = true;
      } else if (first3Bytes[0] == (byte) 0xFE && first3Bytes[1] == (byte) 0xFF) {
        charset = "Unicode";//UTF-16BE
        checked = true;
      } else if (first3Bytes[0] == (byte) 0xEF && first3Bytes[1] == (byte) 0xBB && first3Bytes[2] == (byte) 0xBF) {
        charset = "UTF8";
        checked = true;
      }
      bis.reset();
      if (!checked) {
        int len = 0;
        int loc = 0;
        while ((read = bis.read()) != -1) {
          loc++;
          if (read >= 0xF0)
            break;
          if (0x80 <= read && read <= 0xBF) //Separately appear below BF, also be GBK
            break;
          if (0xC0 <= read && read <= 0xDF) {
            read = bis.read();
            if (0x80 <= read && read <= 0xBF) 
            //Double bytes (0xC0-0xDF) (0x80-0xBF), also possible in GB encoding
              continue;
            else
              break;
          } else if (0xE0 <= read && read <= 0xEF) { //Mistakes are possible, but less likely
            read = bis.read();
            if (0x80 <= read && read <= 0xBF) {
              read = bis.read();
              if (0x80 <= read && read <= 0xBF) {
                charset = "UTF-8";
                break;
              } else
                break;
            } else
              break;
          }
        }
        //TextLogger.getLogger().info(loc + " " + Integer.toHexString(read));
      }
    } catch (Exception e) {
      e.printStackTrace();
    } finally {
      if (bis != null) {
        try {
          bis.close();
        } catch (IOException ex) {
        }
      }
    }
    return charset;
  }
 
  private static String getEncode(int flag1, int flag2, int flag3) {
    String encode="";
    //TXT files start with a few extra bytes, FF, FE (Unicode),
    // FE , FF ( Unicode big endian ) ,EF , BB , BF ( UTF-8 ) 
    if (flag1 == 255 && flag2 == 254) {
      encode="Unicode";
    }
    else if (flag1 == 254 && flag2 == 255) {
      encode="UTF-16";
    }
    else if (flag1 == 239 && flag2 == 187 && flag3 == 191) {
      encode="UTF8";
    }
    else {
      encode="asci";//ASCII
    }
    return encode;
  }

2) read the file stream through the encoding format of the file



  public static String readFile(String path){
    String data = null;
    //Determine if the file exists
    File file = new File(path);
    if(!file.exists()){
      return data;
    }
    //Gets the file encoding format
    String code = FileEncode.getFileEncode(path);
    InputStreamReader isr = null;
    try{
      //Parse the file according to the encoding format
      if("asci".equals(code)){
        //The GBK encoding is used here, rather than the ambient encoding format, because the ambient default encoding is not equal to the operating system encoding
        // code = System.getProperty("file.encoding");
        code = "GBK";
      }
      isr = new InputStreamReader(new FileInputStream(file),code);
      //Read file contents
      int length = -1 ;
      char[] buffer = new char[1024];
      StringBuffer sb = new StringBuffer();
      while((length = isr.read(buffer, 0, 1024) ) != -1){
        sb.append(buffer,0,length);
      }
      data = new String(sb);
    }catch(Exception e){
      e.printStackTrace();
      log.info("getFile IO Exception:"+e.getMessage());
    }finally{
      try {
        if(isr != null){
          isr.close();
        }
      } catch (IOException e) {
        e.printStackTrace();
        log.info("getFile IO Exception:"+e.getMessage());
      }
    }
    return data;
  }

3) write to a file in the format specified by the file



  public static boolean writeFile(byte data[], String path , String code){
    boolean flag = true;
    OutputStreamWriter osw = null;
    try{
      File file = new File(path);
      if(!file.exists()){
        file = new File(file.getParent());
        if(!file.exists()){
          file.mkdirs();
        }
      }
      if("asci".equals(code)){
        code = "GBK";
      }
      osw = new OutputStreamWriter(new FileOutputStream(path),code);
      osw.write(new String(data,code));
      osw.flush();
    }catch(Exception e){
      e.printStackTrace();
      log.info("toFile IO Exception:"+e.getMessage());
      flag = false;
    }finally{
      try{
        if(osw != null){
          osw.close();
        }
      }catch(IOException e){
        e.printStackTrace();
        log.info("toFile IO Exception:"+e.getMessage());
        flag = false;
      }
    }
    return flag;
  }

4) for binary files with little content, such as Word documents, you can use the following way to read and write files



  public static byte[] getFile(String path) throws IOException {
    FileInputStream stream=new FileInputStream(path);
    int size=stream.available();
    byte data[]=new byte[size];
    stream.read(data);
    stream.close();
    stream=null;
    return data;
  }
 
 
 
  
  public static boolean toFile(byte data[], String path) throws Exception {
    FileOutputStream out=new FileOutputStream(path);
    out.write(data);
    out.flush();
    out.close();
    out=null;
    return true;
  }

The above is the entire content of this article, I hope to help you with your study.


Related articles: