Enlightenment of dead loop based on Java HashMap

  • 2020-04-01 01:51:39
  • OfStack

One, the transformation of a single thread to multithreading is also a technical work

As we saw in uncle mouse's blog, it was originally a single-threaded application, "later, our program performance problems, so we need to become multi-threaded, so, after becoming multi-threaded to the line, found that the program often took up 100% of the CPU.

Considering that the problem was exposed by taobao's engineers, their technical foundation is generally very solid, even they are wrong, so the transformation of single thread to multi-threaded is not as simple as imagined, I think.

You may be very unconvinced to ask in reply, the engineer of treasure treasure how again, single thread changes multithreading to have what difficulty? I know about synchronization, deadlocks, race conditions, lock free and thread-safe containers, and all kinds of thread-safe synchronization constructs... Can't you write a thread-safe application?

The reality is that thread-safe applications don't have to be written because you have a solid thread-safety foundation and development experience.

Try two examples:

1. Fetch data through index using thread safe container

Many people know that thread-safe containers are not always bug-free in practice. The following code is typical:


        static int GetFirstOrDefault(ThreadSafeList<int> list)
        {
            if (list.Count > 0)
            {
                return list[0];
            }
            return 0;
        }

The function argument list above if you pass in a list of elements with a total of 1 at the beginning, can you figure out what's wrong with the code above?

I happen to have summed up an earlier article on thread-safe containers. (link: #) > . A thread-safe container is not really safe, and that's where the code in question comes from.

 

2, multi-threaded operation of mail errors

In addition, the analysis of multi-threaded application scenarios may not be correct. Once, due to the performance problems of a mail program, I also boldly modified the application program, and a major BUG appeared after changing.

You can see what I've summed up in my harrowing review. (link: #) > .

 

With these two examples, I just want to show that in multi-threaded applications, the bugs that are caused by thread safety are actually quite subtle, and the possibility of an ill-considered or poorly understood problem is simply impossible to prevent.

Second, the cost of ReHash

The first point above is mostly about thread safety, and then we'll move on to hash tables to get a better understanding of expensive rehashes.

A hash table, as we commonly understand it, is "a data structure that trades space for time." I've been saying this for too long, and you might get the illusion that the hash table is sacrificing space and gaining time.

However, the process of ReHash is actually a double loss of space and time, because the analysis of the source code, we know that the process of ReHash is actually a dynamic expansion process, and the expansion of the hash table is an internal operation that consumes a lot of space and time.

Why is a ReHash an internal operation that consumes a tremendous amount of space and time?

1. It turns out that when we expand the hashed container, inside the hash table we create a new, larger array, then copy the contents of the original array into the new array and rehash it.

2, The size of the larger new array is also a matter of science. In general, the size of the new array is set to a similar prime number that doubles the size of the original array.

From these two points, 1 and 2, you can see that ReHash is very expensive indeed. I happened to write about the dynamic scaling of.net containers not long ago. (link: #) > , which is also a simple summary of the expansion of the.net HashTable mechanism, now compared with the Java HashMap source code, see the familiar ReHash function named, look again in the implementation of.net, there is indeed comparison to improve.

As for what we usually understand as "space for time", it actually refers to the data retrieval efficiency with O(1) complexity of hash, but it is affected by the filling factor, the space cost is usually large, and the space utilization is not high.

So we often say that hash tables are good for scenarios where reads are frequent and writes are rare, such as using a hash table as a cache container.

"Someone reported this to Sun, but Sun didn't think it was a problem. Because HashMap does not support concurrency in the first place. ConcurrentHashmap..."

According to the actual development experience, the thread-safe container is not really thread-safe, will use ConcurrentHashmap is only entering the initial stage, and can not help but feel the Sun in the sky.


Related articles: