Performance comparison of three different md5 computing methods in the Go language
- 2020-06-01 10:02:46
- OfStack
preface
This paper mainly introduces three different calculation methods of md5. In fact, the difference is the difference of reading files, that is, the disk I/O, so you can also use 1 instead of 3 on the network I/O. Let's take a look.
ReadFile
So let's look at the first one, which is pretty straightforward:
func md5sum1(file string) string {
data, err := ioutil.ReadFile(file)
if err != nil {
return ""
}
return fmt.Sprintf("%x", md5.Sum(data))
}
This is rude because ReadFile actually calls 1 readall, which allocates the most memory.
Benchmark to send 1:
var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *testing.B) {
for i := 0; i < b.N; i++ {
md5sum1(test_path)
}
}
go test -test.run=none -test.bench="^BenchmarkMd5Sum1$" -benchtime=10s -benchmem
BenchmarkMd5Sum1-4 300 43704982 ns/op 19408224 B/op 14 allocs/op
PASS
ok tmp 17.446s
First of all, the size of this file is 19405028 bytes, which is very close to the 19408224 B/op above, because readall does allocate the memory of the file size. The code is as follows:
ReadFile source
// ReadFile reads the file named by filename and returns the contents.
// A successful call returns err == nil, not err == EOF. Because ReadFile
// reads the whole file, it does not treat an EOF from Read as an error
// to be reported.
func ReadFile(filename string) ([]byte, error) {
f, err := os.Open(filename)
if err != nil {
return nil, err
}
defer f.Close()
// It's a good but not certain bet that FileInfo will tell us exactly how much to
// read, so let's try it but be prepared for the answer to be wrong.
var n int64
if fi, err := f.Stat(); err == nil {
// Don't preallocate a huge buffer, just in case.
if size := fi.Size(); size < 1e9 {
n = size
}
}
// As initial capacity for readAll, use n + a little extra in case Size is zero,
// and to avoid another allocation after Read has filled the buffer. The readAll
// call will read into its allocated internal buffer cheaply. If the size was
// wrong, we'll either waste some space off the end or reallocate as needed, but
// in the overwhelmingly common case we'll get it just right.
// readAll The first 2 Three parameters are to be created buffer The size of the
return readAll(f, n+bytes.MinRead)
}
func readAll(r io.Reader, capacity int64) (b []byte, err error) {
// this buffer The size is file size + bytes.MinRead
buf := bytes.NewBuffer(make([]byte, 0, capacity))
// If the buffer overflows, we will get bytes.ErrTooLarge.
// Return that as an error. Any other panic remains.
defer func() {
e := recover()
if e == nil {
return
}
if panicErr, ok := e.(error); ok && panicErr == bytes.ErrTooLarge {
err = panicErr
} else {
panic(e)
}
}()
_, err = buf.ReadFrom(r)
return buf.Bytes(), err
}
io.Copy
And then the second one,
func md5sum2(file string) string {
f, err := os.Open(file)
if err != nil {
return ""
}
defer f.Close()
h := md5.New()
_, err = io.Copy(h, f)
if err != nil {
return ""
}
return fmt.Sprintf("%x", h.Sum(nil))
}
The second kind of characteristic is: used
io.Copy
. In the 1 general case (special cases mentioned below),
io.Copy
It allocates 32 *1024 bytes of memory at a time, which is 32 KB. Then let's look at Benchmark:
func BenchmarkMd5Sum2(b *testing.B) {
for i := 0; i < b.N; i++ {
md5sum2(test_path)
}
}
$ go test -test.run=none -test.bench="^BenchmarkMd5Sum2$" -benchtime=10s -benchmem
BenchmarkMd5Sum2-4 500 37538305 ns/op 33093 B/op 8 allocs/op
PASS
ok tmp 22.657s
32 * 1024 = 32768, close to 33093 B/op above.
io.Copy + bufio.Reader
And then let's look at the third case.
Not only this time
io.Copy
, bufio.Reader. bufio, as its name implies, buffered I/O, performs better.
bufio.Reader
By default, 4096-byte buffer is created.
func md5sum3(file string) string {
f, err := os.Open(file)
if err != nil {
return ""
}
defer f.Close()
r := bufio.NewReader(f)
h := md5.New()
_, err = io.Copy(h, r)
if err != nil {
return ""
}
return fmt.Sprintf("%x", h.Sum(nil))
}
Take a look at Benchmark:
func BenchmarkMd5Sum3(b *testing.B) {
for i := 0; i < b.N; i++ {
md5sum3(test_path)
}
}
$ go test -test.run=none -test.bench="^BenchmarkMd5Sum3$" -benchtime=10s -benchmem
BenchmarkMd5Sum3-4 300 42589812 ns/op 4507 B/op 9 allocs/op
PASS
ok tmp 16.817s
Is the 4507 B/op above close to the 4096? So why
io.Copy
+
bufio.Reader
The way used memory will be simpler than simple
io.Copy
How about 1 less memory footprint? As mentioned above, in general, io.Copy allocates 32 *1024 bytes of memory each time. What is the special case? The answer is in the source code.
1. Take a look at io. Copy related source code:
var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *testing.B) {
for i := 0; i < b.N; i++ {
md5sum1(test_path)
}
}
0
From the above source code, with
bufio.Reader
Implementation of the
io.Reader
Instead of going to the default buffer create path, it returns ahead of time and USES it
bufio.Reader
Create buffer, which is also used
bufio.Reader
The allocated memory is going to be 1 less.
If you wish, of course
io.Copy
Also allocate 1 point less memory, can also do, but is used
io.CopyBuffer
, buf, just create a []byte of 4096, just follow
bufio.Reader
Not much difference.
See if this is the case:
var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *testing.B) {
for i := 0; i < b.N; i++ {
md5sum1(test_path)
}
}
1
From the results, the allocated memory difference is not big, after all, the implementation is not 1, can not be 1 to.
The next time you write a program to download a large file, you will use it again
ioutil.ReadAll(resp.Body)
?
Finally, compare the situation of Benchmark as a whole:
var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *testing.B) {
for i := 0; i < b.N; i++ {
md5sum1(test_path)
}
}
2
summary
The three different calculation methods of md5 are all similar in execution time. The biggest difference is the allocation of memory.
bufio has a strong advantage in handling I/O.
Try to avoid the use of ReadAll.
conclusion
The above is the whole content of this article, I hope the content of this article to your study or work can bring 1 definite help, if you have questions you can leave a message to communicate.