Details and simple examples of Java hash storage

  • 2020-06-01 09:53:03
  • OfStack

Java hash storage

The data structures stored in the hash in Java mainly refer to HashSet, HashMap, LinkedHashSet, LinkedHashMap, HashTable and so on. To understand the hash storage mechanism in Java, we must first understand two methods: equals() and hashCode(). We explained the equals() method and how it differs from the "==" relationship operator in another article. For hashCode(), it is a method defined in the Object class:


public native int hashCode();

This is a local method that returns an int value that is not implemented in the Object class. This method is mainly applied to data structures that use hash, and works well with hashed set 1. For example, when inserting an object into a container (let's say HashMap), how can you tell if the object already exists in the container? Since there can be thousands of elements in a container, it is very inefficient to compare them one by one using the equals() method. The value of a hash is speed, which keeps the key somewhere so it can be found quickly. The fastest data structure for storing a set of elements is an array, so use it to store information about the keys (note that it is the information about the keys, not the keys themselves). But because arrays cannot be scaled, there is a problem: we want to store an indefinite number of values in Map, but what if the number of keys is limited by the size of the array?

The answer is that the array does not hold the key itself, but instead generates a number from the key object as the index of the array, which is the hash code (hashcode), generated by the hashCode() method defined in Object and possibly overwritten by your class. To solve the problem of array capacity being fixed, different keys can produce the same subscript, a phenomenon known as conflict. Thus, the process of querying a value in the container is to calculate the hash code of the object to be inserted through hashCode(), and then query the array using the hash code. Conflicts are often handled by external linking, which means that the array does not hold the values directly, but the list values, and then the linear query of the values in list will naturally be slow. However, if the hash function is good enough, each position in the array will have fewer values. As a result, the hash mechanism can quickly jump to a location in the array and compare only a few elements. This is why HashMap is so fast, as we can see from the HashMap.put () method:


public V put(K key, V value) {
  if (table == EMPTY_TABLE) {
   inflateTable(threshold);
  }
  if (key == null)
   return putForNullKey(value);
  int hash = hash(key);
  int i = indexFor(hash, table.length);
  for (Entry<K,V> e = table[i]; e != null; e = e.next) {
   Object k;
   if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    V oldValue = e.value;
    e.value = value;
    e.recordAccess(this);
    return oldValue;
   }
  }

  modCount++;
  addEntry(hash, key, value, i);
  return null;
 }

The main idea is to get the hash code hash according to the key object when the key is not empty, and then get the index i of the array through the hash code. Iteration is carried out in list represented by table[i] to determine whether the key exists or not through equals(). If so, update the old value with the new value and return the old value; Otherwise add the new key-value pair to HashMap. As you can see here, the hashCode method exists to reduce the number of calls to the equals method and thus improve program efficiency.

It is important to note here that hashCode() does not always need to be able to return a one-only identifier, but the equals() method must strictly determine whether two objects are the same.

Thank you for reading, I hope to help you, thank you for your support of this site!


Related articles: