Solution of Garbled Chinese Character Content in java Reading File

  • 2021-09-16 07:10:39
  • OfStack

java reads some Chinese characters in the file with random codes

Read an txt file, print it out in the code, and the contents of some Chinese characters in the invoice are garbled.

The way I started is this. This is completely wrong. Chinese characters are two bytes. If you read a fixed number of bytes at a time, you may truncate Chinese characters.

There will be some garbled code.


package susq.path;​
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
​
/**
 * @author susq
 * @since 2018-05-18-19:28
 */
public class WrongMethodReadTxt {
    public static void main(String[] args) throws IOException {
        ClassLoader classLoader = WrongMethodReadTxt.class.getClassLoader();
        String filePath = classLoader.getResource("").getPath() + "/expect1.txt";
​
        System.out.println(filePath);
​
        File file = new File(filePath);
        try (FileInputStream in = new FileInputStream(file)) {
            byte[] bytes = new byte[1024];
            StringBuffer sb = new StringBuffer();
            int len;
            while ((len = in.read(bytes)) != -1) {
                sb.append(new String(bytes, 0, len));
            }
            System.out.println(sb.toString());
        }
    }
}

If there are Chinese characters, read them as characters:


package susq.path;​
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
​
/**
 * @author susq
 * @since 2018-05-18-17:39
 */
public class SysPath {
    public static void main(String[] args) throws IOException {
        ClassLoader classLoader = SysPath.class.getClassLoader();
        String filePath = classLoader.getResource("").getPath() + "/expect1.txt";
​
        System.out.println(filePath);
​
        File file = new File(filePath);
        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            StringBuffer sb = new StringBuffer();
            while (br.ready()) {
                sb.append(br.readLine());
            }
            System.out.println(sb);
        }
    }
}

When java's IO stream reads data, it solves the problem of Chinese garbled code and some Chinese garbled code

Situation: When reading data with IO stream, if the encoding format is not set, the data may not be what we want

Solution: When reading data, set the encoding

Code: (The string sets the corresponding code, but in this way, it will lead to individual Chinese garbled codes, which seems to be caused by byte [])


// Here I pass socket Mode, get the stream, and read the data 
    // The agent requires external configuration ( Agent configuration needs to be judged. If it is configured, it will be added; If it is not configured, it will not be added )
    Socket socket = new Socket("192.168.99.100", 80);
 String url = "GET " + href + " HTTP/1.1\r\n\r\n";
 socket.getOutputStream().write(new String(url).getBytes());  
    InputStream is = socket.getInputStream();
    byte[] bs = new byte[1024]; 
 int i;
 StringBuilder str = new StringBuilder();
 while ((i = is.read(bs)) > 0) {
  //1 Be sure to add coding, otherwise , When output to a file, some data will be messed up 
  str.append(new String(bs, 0, i,"UTF-8"));
        // Due to socket Reading will not be disconnected, so it can only be read from disconnected 
  if(new String(bs, 0, i,"UTF-8").contains("</html>")){
     break;
  }
 }

Solve the problem of individual Chinese garbled codes:

Code:


// The agent requires external configuration ( Agent configuration needs to be judged. If it is configured, it will be added; If it is not configured, it will not be added )
  Socket socket = new Socket("192.168.99.100", 80);
  //Socket socket = new Socket();
  String url = "GET " + href + " HTTP/1.1\r\n\r\n";
  socket.getOutputStream().write(new String(url).getBytes());
  InputStream is = socket.getInputStream();
  
  // Solve individual Chinese garbled codes 
  StringBuilder str = new StringBuilder("");
  InputStreamReader isr = new InputStreamReader(is,"UTF-8");
  BufferedReader br = new BufferedReader(isr);
  String line = null;  
  while ((line = br.readLine()) != null) {
    str.append(line + "\n");
       if(line.contains("</html>")){
        break;
       }
  }

Related articles: