In depth understanding of the implementation mechanism of HashMap in Java

  • 2020-04-01 03:57:49
  • OfStack

If anyone asks me to describe how HashMap works, my simple answer is "rules based on Hash." It's a very simple statement, but before we can understand it, we have to understand what a hash is, right?

What is a hash
Hashing is simply a unique string that is created by applying an algorithm to the properties of a variable/object and using this string to determine the uniqueness of the variable/object. A proper hash function must conform to this criterion.

When the hash function is applied to the same object or to an object equal, it should return the same value every time it is executed. In other words, two equal objects should have the same hashcode.

Note: all Java objects inherit a default hashCode() method from the Object class. This method returns the address of the object in memory as an integer, which is a good hash implementation and ensures that different objects have different hashcodes.

A little introduction to the Entry class
A map is defined as an object that maps a key to a value. Pretty simple, right?

So there must be some mechanism to store these key-value pairs in a HashMap. So, HashMap has an inner class Entry that looks like this.
 


static class Entry<K,V> implements
 
Map.Entry<K,V>
{
    final K key;
    V value;
    Entry<K,V> next;
    final int hash;
    ...//More code goes here
}

Of course, the Entry class has properties to store key-value pair mappings. The key is marked with the final tag. In addition to the key and value, we can also see two variables, next and hash. Now let's try to understand what these variables mean.

What does the put() method actually do
Before we take a closer look at the implementation of the put method, it's worth looking at the storage of the Entry instance in the array, as defined in the HashMap:
 



  transient Entry[] table;
 Now let's see put Method implementation. 
 
/**
* Associates the specified value with the specified key in this map.
* If the map previously contained a mapping for the key, the old
* value is replaced.
*
* @param key key with which the specified value is to be associated
* @param value value to be associated with the specified key
* @return the previous value associated with <tt>key</tt>, or
*        
 
<tt>null</tt> if there was no mapping for <tt>key</tt>.
*         (A
 
<tt>null</tt> return can also indicate that the map
*         previously
 
associated <tt>null</tt> with <tt>key</tt>.)
*/
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
 
modCount++;
addEntry(hash, key, value, i);
return null;
}

Let's look at it step by step
First, check whether the key is null, if the key is a null value is stored in the table[0] position, because the hashcode for null is always 0
Next, the key's hash value is computed using the key's hashCode() method, which is used to calculate the position in the array in which the Entry object is stored. The JDK's designers assumed that some people would write a very bad hashCode () method, with some very large or very small hash values. To solve this problem, they introduced another hash function that takes the hashCode() of the object and converts it to the appropriate size for the array.

Next is the indexFor(hash,table,length) method, which calculates exactly where the entry object is stored.
Now for the main part, we all know that two different objects might have the same hashCode value, how can two different objects be stored in the same place [called a bucket]?
The answer is LinkedList. If you remember, the Entry class has a next variable that always points to the next variable in the chain, which is exactly what a linked list is.

So, at the time of collision, entry objects will be stored in the form of a linked list, when an entry object needs to be stored, hashmap check if the location is near with an entry object, if not there, if there is a check her next attribute, if it is null, the current entry objects as already stored next node entry objects, so on.

What if we put another value into an existing key? Logically, the old value will be replaced. After checking the storage location of the Entry object, the hashmap traverses the Entry list for that location, calling the equals method for each Entry, all objects in the list having the same hashCode() and the equals method not equal. If the equals method is found to be equal, the substitution is performed.

In this way the HashMap guarantees the uniqueness of the key.

How the get method works
Now we understand the mechanism for storing key-value pairs in a HashMap. The next question is how to query the results from a HashMap.
The logic is the same as put, which returns the value of the key if it matches, or null if it does not.
 


/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null :
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}.  (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}

The code above looks a lot like the put() method, except if (e.hash == hash && ((k = e.keey) == key |, | key.equals(k)).

Pay attention to the point

      The data structure that stores the Entry object is an array of tables called the Entry type.       A specific index position in an array is called a bucket because it can hold the object of the first element of a LinkedList.       The Key object's hashCode() needs to be used to calculate the Entry object's storage location.       The equals() method for the Key object needs to be used to maintain the uniqueness of the object in the Map.       The get() and put() methods are independent of the Value object's hashCode and equals methods.       The null hashCode is always 0, so the Entry object is always stored in the first place of the array


Related articles: