Java unlocked hashmap principle and implementation details

  • 2020-04-01 02:42:25
  • OfStack

Using HashMap in Java multi-threaded environment, there are mainly the following options: use the thread-safe java.util.hashtable as an alternative ​ Using Java. Util. Collections. SynchronizedMap method, existing HashMap object packaging for thread safe. Using Java. Util. Concurrent. ConcurrentHashMap class as an alternative, it has very good performance.
And the above several methods in the implementation of the specific details, more or less used mutex. Mutex can cause thread block, reduce the running efficiency, and may cause deadlock, priority reversal and a series of problems.

CAS(Compare And Swap) is a capability provided by the underlying hardware to atomize the operation of judging And changing a value.

Atomic operations in Java

In Java. Util. Concurrent. Atomic package, Java provides us with a lot of convenient type of atoms, their underlying based entirely on the CAS operation.

For example, if we want to implement a globally common counter, we can:


privateAtomicInteger counter =newAtomicInteger(3); 

publicvoidaddCounter() { 

    for(;;) { 

        intoldValue = counter.get(); 

        intnewValue = oldValue +1; 

        if(counter.compareAndSet(oldValue, newValue)) 

            return; 

    } 

} 

Where, the compareAndSet method will check whether the existing value of counter is oldValue, if so, it will be set to the newValue newValue, the operation succeeds and returns true; Otherwise the operation fails and returns false.

When a new value of counter is calculated, compareAndSwap fails if another thread changes the value of counter. At this point, we just need to add a layer of loop on the outside, try this process again and again, then eventually we will succeed and counter value will be +1. (AtomicInteger already defines the incrementAndGet and decrementAndGet methods for the usual +1/-1 operations, which we simply call later)

Besides AtomicInteger, Java. Util. Concurrent. Atomic package also provides AtomicReference and AtomicReferenceArray types, they represent respectively the atomic reference and atomic reference array (array references).

The realization of unchained table
Before implementing an unlocked HashMap, let's look at a simpler implementation of a chained table.

Take the insert operation as an example:

First we need to find the node A in front of the insertion location and the node B behind it.
Then create a new node C and make its next pointer to node B. (see figure 1)
Finally, the next pointer to node A points to node C. (see figure 2)

< img border = 0 class = fit - image Alt = "" SRC =" / / files.jb51.net/file_images/article/201401/201401031034409.jpg "width = 600 479 _src height = =" / / files.jb51.net/file_images/article/201401/201401031034409.jpg ">

However, in the middle of the operation, it is possible for other threads to directly insert some nodes in A and B (assuming D). If we do not make any judgment, the insertion node of other threads may be lost. (see figure 3) we can use the CAS operation to determine whether the next pointer of node A still points to B when assigning it A value, and retry the entire insert operation if the next pointer of node A changes. The general code is as follows:


privatevoidlistInsert(Node head, Node c) { 

 
    for(;;) { 

 
        Node a = findInsertionPlace(head), b = a.next.get(); 

 
        c.next.set(b); 

        if(a.next.compareAndSwap(b,c)) 

            return; 
    } 
} 

(the next field of the Node class is AtomicReference< Node> Type, atomic reference to Node type)

The lookup operation of the unchained table is no different from that of a normal linked list. For its delete operation, the node A in front of the node to be deleted and the node B behind it need to be found. Then, CAS operation is used to verify and update the next pointer of node A to point to node B.

The difficulties and breakthroughs of unlocked HashMap
There are four basic operations for a HashMap: insert, delete, find, and ReHash. A typical HashMap implementation USES an array, each element of which is a linked list of nodes. For this list, we can perform insert, delete, and find operations using the operations described above, but ReHash operations are more difficult.

< img border = 0 class = fit - image Alt = "" SRC =" / / files.jb51.net/file_images/article/201401/2014010310344010.jpg "width = 648 height = 265 _src = "/ / files.jb51.net/file_images/article/201401/2014010310344010.jpg" >

In figure 4, a typical operation in a ReHash is to walk through each node in the old table, calculate its position in the new table, and then move it to the new table. During which we need to manipulate the pointer 3 times:

Point A's next pointer to D
Point B's next pointer to C​
Point the next pointer to C to E
All three pointer operations must be done at the same time to ensure atomicity of the move operation. However, it is not difficult to see that the CAS operation can only ensure that the value of a variable is atomically verified and updated at a time, which cannot meet the requirement of simultaneously verifying and updating three Pointers.

So let's think about it another way. Since the operation of moving nodes is so difficult, we can keep all the nodes in order all the time, thus avoiding the operation of moving nodes. In a typical HashMap implementation, the length of the array is always kept at 2i, and the mapping from the Hash value to the array index is simply a modulo operation on the array length (that is, only the post-i bits of the Hash binary are retained). When ReHash occurs, the length of the array doubles to 2i+1, and each node in the JTH necklace table of the old array either moves to the JTH item in the new array or to the JTH +2i item in the new array, the only difference being that the Hash value is the I +1 bit (if the I +1 bit is 0, it is still the JTH item, otherwise it is the j+2i item).

< img border = 0 class = fit - image Alt = "" SRC =" / / files.jb51.net/file_images/article/201401/2014010310344011.jpg "width = 690 height = 297 _src = "/ / files.jb51.net/file_images/article/201401/2014010310344011.jpg" >

As shown in figure 5, we sorted all the nodes in the reverse bit order according to the Hash value (e.g. 1101-> 1011) from small to large. When the array size is 8, 2, 18 are in a group; 3, 11, 27 in the other group. At the beginning of each group, a sentinel node is inserted to facilitate subsequent operations. To get the sentinel node to the front of the group correctly, we set the highest bit of the normal node Hash (flipped to the lowest) to 1, and the sentinel node does not set this bit.

When the array expands to 16 (see figure 6), the second group splits into a group with only 3 and a group with 11 and 27, but the relative order between the nodes does not change. So when we ReHash, we don't have to move the nodes.

< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201401/2014010310344012.jpg" >

Implementation details

Since the replication of the array will take a lot of time when expanding the capacity, we adopted the lazy method of dividing the entire array into blocks. Thus, when an index is accessed, it is only necessary to determine whether the block in which the index is located has been established (or not).

In addition, the size is defined as the subscript range that is currently used, and its initial value is 2. Define count to represent the total number of nodes (not counting sentry nodes) currently contained in the HashMap.

Initially, all items in the array are null except the 0th item. The 0th entry points to a list with only one sentinel node, representing the starting point of the entire chain. The initial full view is shown in figure 7, where the light green color represents the currently unused subscript range and the dotted arrow represents the block that logically exists but is not actually established.

Initializes the subscript operation

Null items in the array are considered to be in an uninitialized state, and initializing a subscript means establishing its corresponding sentinel node. Initialization is recursive, that is, if the parent index is not initialized, the parent index is initialized first. (the parent subscript of a subscript is the subscript obtained after removing the highest binary bit)


privatevoidinitializeBucket(intbucketIdx) { 

    intparentIdx = bucketIdx ^ Integer.highestOneBit(bucketIdx); 

    if(getBucket(parentIdx) ==null) 

        initializeBucket(parentIdx); 

    Node dummy =newNode(); 

    dummy.hash = Integer.reverse(bucketIdx); 

    dummy.next =newAtomicReference<>(); 

    setBucket(bucketIdx, listInsert(getBucket(parentIdx), dummy)); 

 
} 

GetBucket is the encapsulated method to obtain the subscript contents of an array, and setBucket is the same. ListInsert inserts a given node from the specified location to find a suitable location for insertion, and returns the existing node if the same hash node already exists in the linked list. Otherwise, the newly inserted node is returned.

The insert

First, the size of the HashMap is used to modulus the key's hashCode to get the array index to be inserted.
It then determines whether the subscript is null, and if so, initializes the subscript.
Construct a new node and insert it into the appropriate location, noting that the hash value in the node should be the value after the original hashCode has been bit flipped and the lowest position is 1.
Increment the number of nodes counter by 1. If there are too many nodes after increment by 1, just change the size to size*2 to represent the expansion of the array (ReHash).

Find operations

Find the index of the node to be searched in the array.
Determines whether the subscript is null, and returns a lookup failure if null.
Enter the linked list from the corresponding location and search in sequence until the node to be found or beyond the scope of this group of nodes.

Delete operation

Find out that the index of the node in the array should be deleted.
Determines whether the subscript is null, and initializes the subscript if it is null.
Locate the node to be deleted and delete it from the list. (note that due to the presence of the sentinel node, any normal element is only referenced by its unique precursor node, and there is no such thing as being referenced by both the precursor node and the pointer in the array at the same time, so there is no need to modify multiple Pointers at the same time.)
Subtract 1 from the number of nodes counter.


Related articles: