Performance comparison of three different md5 computing methods in the Go language

  • 2020-06-01 10:02:46
  • OfStack

preface

This paper mainly introduces three different calculation methods of md5. In fact, the difference is the difference of reading files, that is, the disk I/O, so you can also use 1 instead of 3 on the network I/O. Let's take a look.

ReadFile

So let's look at the first one, which is pretty straightforward:


func md5sum1(file string) string {
 data, err := ioutil.ReadFile(file)
 if err != nil {
 return ""
 }

 return fmt.Sprintf("%x", md5.Sum(data))
}

This is rude because ReadFile actually calls 1 readall, which allocates the most memory.

Benchmark to send 1:


var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *testing.B) {
 for i := 0; i < b.N; i++ {
 md5sum1(test_path)
 }
}

go test -test.run=none -test.bench="^BenchmarkMd5Sum1$" -benchtime=10s -benchmem

BenchmarkMd5Sum1-4 300 43704982 ns/op 19408224 B/op 14 allocs/op
PASS
ok tmp 17.446s

First of all, the size of this file is 19405028 bytes, which is very close to the 19408224 B/op above, because readall does allocate the memory of the file size. The code is as follows:

ReadFile source


// ReadFile reads the file named by filename and returns the contents.
// A successful call returns err == nil, not err == EOF. Because ReadFile
// reads the whole file, it does not treat an EOF from Read as an error
// to be reported.
func ReadFile(filename string) ([]byte, error) {
 f, err := os.Open(filename)
 if err != nil {
 return nil, err
 }
 defer f.Close()
 // It's a good but not certain bet that FileInfo will tell us exactly how much to
 // read, so let's try it but be prepared for the answer to be wrong.
 var n int64

 if fi, err := f.Stat(); err == nil {
 // Don't preallocate a huge buffer, just in case.
 if size := fi.Size(); size < 1e9 {
 n = size
 }
 }
 // As initial capacity for readAll, use n + a little extra in case Size is zero,
 // and to avoid another allocation after Read has filled the buffer. The readAll
 // call will read into its allocated internal buffer cheaply. If the size was
 // wrong, we'll either waste some space off the end or reallocate as needed, but
 // in the overwhelmingly common case we'll get it just right.
 
 // readAll  The first 2 Three parameters are to be created  buffer  The size of the 
 return readAll(f, n+bytes.MinRead)
}

func readAll(r io.Reader, capacity int64) (b []byte, err error) {
 //  this  buffer  The size is  file size + bytes.MinRead 

 buf := bytes.NewBuffer(make([]byte, 0, capacity))
 // If the buffer overflows, we will get bytes.ErrTooLarge.
 // Return that as an error. Any other panic remains.
 defer func() {
 e := recover()
 if e == nil {
 return
 }
 if panicErr, ok := e.(error); ok && panicErr == bytes.ErrTooLarge {
 err = panicErr
 } else {
 panic(e)
 }
 }()
 _, err = buf.ReadFrom(r)
 return buf.Bytes(), err
}

io.Copy

And then the second one,


func md5sum2(file string) string {
 f, err := os.Open(file)
 if err != nil {
 return ""
 }
 defer f.Close()

 h := md5.New()

 _, err = io.Copy(h, f)
 if err != nil {
 return ""
 }

 return fmt.Sprintf("%x", h.Sum(nil))
}

The second kind of characteristic is: used io.Copy . In the 1 general case (special cases mentioned below), io.Copy It allocates 32 *1024 bytes of memory at a time, which is 32 KB. Then let's look at Benchmark:


func BenchmarkMd5Sum2(b *testing.B) {

 for i := 0; i < b.N; i++ {
 md5sum2(test_path)
 }
}

$ go test -test.run=none -test.bench="^BenchmarkMd5Sum2$" -benchtime=10s -benchmem

BenchmarkMd5Sum2-4 500 37538305 ns/op 33093 B/op 8 allocs/op
PASS
ok tmp 22.657s

32 * 1024 = 32768, close to 33093 B/op above.

io.Copy + bufio.Reader

And then let's look at the third case.

Not only this time io.Copy , bufio.Reader. bufio, as its name implies, buffered I/O, performs better. bufio.Reader By default, 4096-byte buffer is created.


func md5sum3(file string) string {
 f, err := os.Open(file)
 if err != nil {
 return ""
 }
 defer f.Close()
 r := bufio.NewReader(f)

 h := md5.New()

 _, err = io.Copy(h, r)
 if err != nil {
 return ""
 }

 return fmt.Sprintf("%x", h.Sum(nil))

}

Take a look at Benchmark:


func BenchmarkMd5Sum3(b *testing.B) {
 for i := 0; i < b.N; i++ {
 md5sum3(test_path)
 }
}

$ go test -test.run=none -test.bench="^BenchmarkMd5Sum3$" -benchtime=10s -benchmem
BenchmarkMd5Sum3-4 300 42589812 ns/op 4507 B/op 9 allocs/op
PASS
ok tmp 16.817s

Is the 4507 B/op above close to the 4096? So why io.Copy + bufio.Reader The way used memory will be simpler than simple io.Copy How about 1 less memory footprint? As mentioned above, in general, io.Copy allocates 32 *1024 bytes of memory each time. What is the special case? The answer is in the source code.

1. Take a look at io. Copy related source code:


var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *testing.B) {
 for i := 0; i < b.N; i++ {
 md5sum1(test_path)
 }
}
0

From the above source code, with bufio.Reader Implementation of the io.Reader Instead of going to the default buffer create path, it returns ahead of time and USES it bufio.Reader Create buffer, which is also used bufio.Reader The allocated memory is going to be 1 less.

If you wish, of course io.Copy Also allocate 1 point less memory, can also do, but is used io.CopyBuffer , buf, just create a []byte of 4096, just follow bufio.Reader Not much difference.

See if this is the case:


var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *testing.B) {
 for i := 0; i < b.N; i++ {
 md5sum1(test_path)
 }
}
1

From the results, the allocated memory difference is not big, after all, the implementation is not 1, can not be 1 to.

The next time you write a program to download a large file, you will use it again ioutil.ReadAll(resp.Body) ?

Finally, compare the situation of Benchmark as a whole:


var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *testing.B) {
 for i := 0; i < b.N; i++ {
 md5sum1(test_path)
 }
}
2

summary

The three different calculation methods of md5 are all similar in execution time. The biggest difference is the allocation of memory.

bufio has a strong advantage in handling I/O.

Try to avoid the use of ReadAll.

conclusion

The above is the whole content of this article, I hope the content of this article to your study or work can bring 1 definite help, if you have questions you can leave a message to communicate.


Related articles: