Comprehensive Analysis of Expansion and Tree Based on hashmap

  • 2021-09-12 01:09:30
  • OfStack

1. Tree


// Threshold value of changing linked list to red-black tree 
static final int TREEIFY_THRESHOLD = 8;
// Threshold value of red-black tree to linked list 
static final int UNTREEIFY_THRESHOLD = 6;
/**
* Minimum tree capacity threshold: namely   When the capacity in the hash table  >  Tree linked list is allowed only when the value is   (I. E.   To link the list   Convert to red and black trees) 
* Otherwise, if there are too many elements in the bucket, it will be expanded directly instead of tree 
* In order to avoid the conflict of expansion and tree selection, this value cannot be less than  4 * TREEIFY_THRESHOLD
**/
static final int MIN_TREEIFY_CAPACITY = 64;

There is nothing wrong with variables 1 and 2, but the key is the third: Does it mean that the list can be tree-shaped only when the array length is greater than 64?

In fact, these two variables are applied to different scenarios.

When the linked list length is greater than 8, the treeifyBin method will be called to convert it into a red-black tree, but there is a judgment inside the treeifyBin method. When the array length is greater than 64, the tree will be formed, otherwise it will only be expanded by resize.

Why?

Because the linked list is too long and the array is too short, hash collisions often occur. At this time, tree-forming is actually a temporary solution, because the root cause of the long linked list is that the array is too short. The length of the array is checked before tree-forming, and if the length is less than 64, the array is expanded instead of tree-forming.

So when expansion occurs, there are two situations

Threshold exceeded

The linked list is longer than 8, but the numerical length is less than 64

2. Expansion mechanism

hashmap internal creation process

Constructor (just initialize the parameter under 1, which means that the array and linked list will only be built when adding data)-calling put method-put method will call resize method (put method will call resize method when the array is empty or exceeds the threshold)

How is hashmap expanded

1. Setting of threshold threshold in hashmap

At first, the threshold is set to null

When the size of hashmap is not declared, the threshold setting is the default size 16* the default load factor 0.75=12

When declaring the size of hashmap, it will first call a function to set the threshold value to the power of just 2 greater than the set value (for example, if the set size is 1000, the threshold value is 1024). Then in resize method, the threshold value is assigned to the capacity size first, and then the capacity size * 0.75 is assigned to the threshold value.

The code is as follows:


Node<K,V>[] oldTab = table;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
            if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
            }
            else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                     oldCap >= DEFAULT_INITIAL_CAPACITY)
                newThr = oldThr << 1; // double threshold
        }
        else if (oldThr > 0) // initial capacity was placed in threshold
            newCap = oldThr;
        else {               // zero initial threshold signifies using defaults
            newCap = DEFAULT_INITIAL_CAPACITY;
            newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
        }
        if (newThr == 0) {
            float ft = (float)newCap * loadFactor;
            newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                      (int)ft : Integer.MAX_VALUE);
        }
        threshold = newThr;

2. Data transfer

When the array is null, a new array is created

When the array is not empty, both the capacity and threshold value are * 2, and an array with twice the previous capacity is created, and then the data of the original array is transferred to the new array.

Assuming that the size of table before expansion is 2 to the power of N, the table index of an element is determined by the post-N bit of its hash value

If the size of the expanded table is 2 to the power of N +1, the table index of the element is determined by the last N+1 bit of its hash value, which is 1 bit more than the original one

Instead of recalculating the hash value like 1.71 (it takes a lot of time to calculate the hash value), just look at whether the bit bit is added to the index or whether it is 1 or 0.

If 0, it is the same as the original position 1 in the new array.

If it is 1, + oldCap in the new original position will be sufficient.

3. Capacity calculation formula

Expansion is a special performance-consuming operation, so when programmers use HashMap, estimate the size of map, and give a rough value during initialization to avoid frequent expansion of map.

The capacity calculation formula of HashMap is size/0. 75 +1.

The principle is to guarantee that the threshold (array length * 0.75) > Actual capacity

Why is the maximum capacity of HashMap 2 to the 30th power (1 shifts to the left by 30)?

In the process of reading the source code of hashmap, I saw the limit of the maximum capacity of hashmap, and had a question.


    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

Why is the maximum capacity 1 < < 30?

Explore why process 1 is 30

The first is < < This operator must understand that in the case of 1, 1 < < x equals 2 ^ x. This is the left shift operator, which shifts binary to the left.

Look at 1 < < 30. It stands for shifting 1 to the left by 30 digits, which is 0010... 0

Look at this one piece of code:


public static void main(String[] args){
        for (int i = 30; i <= 33; i++) {
            System.out.println("1 << "+ i +" = "+(1 << i));
        }
        System.out.println("1 << -1 = " + (1 << -1));
}

The output is:

1 < < 30 = 1073741824
1 < < 31 = -2147483648
1 < < 32 = 1
1 < < 33 = 2
1 < < -1 = -2147483648

Results analysis:

The int type is a 32-bit integer, accounting for 4 bytes. There is no unsigned type in the original type of Java. -- > So the first bit is the sign bit. The positive number is 0 and the negative number is 1 The complement code is stored in java, and the hexadecimal 0x80000000, which shifts 31 bits to the left, stands for-2147483648 > So the maximum can only be 30

Explore why process 2 is 1 < < 30

After exploring 1, I believe everyone has a little understanding of why it is 30. Then why is it 1 < < 30 instead of 0x7fffffff or Integer.MAX_VALUE

Let's first look at the comments of the code


 /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<30.
     */
    static final int MAXIMUM_CAPACITY = 1 << 30;

If the value passed in by the constructor is greater than this number, replace it with that number.

ok, let's look at the invocation of the constructor:


public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

This sentence:


if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;

It's very doubtful to see this. If I want to save more than MAXIMUM_CAPACITY, will you reduce my capacity to MAXIMUM_CAPACITY? ? ?

Don't worry and keep looking: there is a sentence in resize () method:


if (oldCap >= MAXIMUM_CAPACITY) {
                threshold = Integer.MAX_VALUE;
                return oldTab;
}

Here we can see that the "maximum capacity" of hashmap is Integer.MAX_VALUE;

Summarize

MAXIMUM_CAPACITY is the maximum value of a power of 2, and the function of this value involves a wide range. One important point is that in hashmap, the capacity is guaranteed to be 2 to the power of k. Even if the initial capacity you pass in is not 2 to the power of k, the tableSizeFor () method will set your capacity to the power of 2 to the power of k. At this time, MAX_VALUE represents the maximum capacity value.

Another point is threshold. Anyone who knows one point about hashmap will know that threshold = initial capacity * loading factor. That is, the threshold of expansion. Equivalent to the actual capacity used. And expansion is double expansion. Then when the capacity reaches MAXIMUM_CAPACITY, then the capacity expansion is 1 < < 31 integer overflow.


Related articles: