Java efficient read large file instance analysis
- 2020-12-13 18:58:39
- OfStack
1, an overview of the
This tutorial demonstrates how to read large files efficiently with Java. Java -- Back to basics.
2. Read in memory
The standard way to read file rows is to read them in memory, and both Guava and ApacheCommonsIO provide a quick way to read file rows as follows:
Files.readLines(new File(path), Charsets.UTF_8);
FileUtils.readLines(new File(path));
The problem with this approach is that all the rows of the file are stored in memory, which can quickly cause programs to throw OutOfMemoryError exceptions when the file is large enough.
For example: read a file of approximately 1G:
@Test
public void givenUsingGuava_whenIteratingAFile_thenWorks() throws IOException {
String path = ...
Files.readLines(new File(path), Charsets.UTF_8);
}
This approach starts with very little memory (approximately 0Mb memory consumption)
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 128 Mb
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 116 Mb
However, when the file is all read into memory, we finally see (approximately 2GB memory consumption) :
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Total Memory: 2666 Mb
[main] INFO org.baeldung.java.CoreJavaIoUnitTest - Free Memory: 490 Mb
This means that this 1 process consumes approximately 2.1GB's memory -- for the simple reason that all the rows of the file are now stored in memory.
Keeping all the contents of a file in memory will quickly run out of available memory -- no matter how much memory is actually available, this is obvious.
Also, we usually don't need to put all the lines of the file in memory once -- instead, we just walk through every line of the file, do the processing, and throw it away when we're done. So, that's exactly what we're going to do -- iterate through the rows, instead of keeping all the rows in memory.
3. File flow
Now let's take a look at this solution -- we'll scan the contents of the file using the java.util.Scanner class, reading line by line:
FileInputStream inputStream = null;
Scanner sc = null;
try {
inputStream = new FileInputStream(path);
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// System.out.println(line);
}
// note that Scanner suppresses exceptions
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
This scenario would iterate over all the lines in the file -- allowing each line to be processed without retaining a reference to it. They're not stored in memory anyway :(about 150MB memory consumption)
[main]INFOorg.baeldung.java.CoreJavaIoUnitTest-TotalMemory:763Mb
[main]INFOorg.baeldung.java.CoreJavaIoUnitTest-FreeMemory:605Mb
4, ApacheCommonsIO flow
It can also be implemented using the CommonsIO library, taking advantage of the custom LineIterator provided by the library:
LineIterator it = FileUtils.lineIterator(theFile, "UTF-8");
try {
while (it.hasNext()) {
String line = it.nextLine();
// do something with line
}
} finally {
LineIterator.closeQuietly(it);
}
Since the entire file is not stored in memory, this results in fairly conservative memory consumption (approximately 150MB)
[main]INFOo.b.java.CoreJavaIoIntegrationTest-TotalMemory:752Mb
[main]INFOo.b.java.CoreJavaIoIntegrationTest-FreeMemory:564Mb
5, conclusion
This article describes how to process large files without repeatedly reading and running out of memory -- it provides a useful solution for processing large files.
The implementation and code snippets for all these examples are available on my github project -- it's an Eclipse-based project, so it should be easy to import and run.
Above is the entire content of this article on Java efficient reading large file instance analysis, I hope to help you. Interested friends can continue to refer to other related topics in this site, if there is any deficiency, welcome to comment out. Thank you for your support!