Discussion on the hashcode method in Java (recommended)

  • 2020-05-19 04:59:25
  • OfStack

Hash tables are a data structure that most people are familiar with, and hash tables are used in many places to improve lookup efficiency. There is one method in Java's Object class:


public native int hashCode(); 

According to the declaration of this method, this method returns a value of type int and is a local method, so no implementation is given in the Object class.

Why does the Object class need a method like this? What does it do? Today we will discuss the hashCode method in detail.

1. The role of hashCode method

For programming languages that contain container types, hashCode is basically involved. In Java, as in Java, the main purpose of the hashCode method is to work with hashed collections (HashSet, HashMap, and HashTable).

Why do you say that? Consider a case where, when you insert an object into a collection, how do you check if the object already exists in the collection? (note: duplicate elements are not allowed in collections)

Most people would probably think of calling the equals method to compare one by one, and it works. However, if there are already 10,000 pieces of data or more in the set, if the equals method is used to compare them one by one, the efficiency is bound to be a problem. Show the effect of hashCode method is, when set to add a new object, call the object hashCode method first, get the corresponding hashcode value, in fact in the concrete implementation of HashMap will use 1 table hashcode value has been put into object saved, if not the hashcode value in table, can be directly put into it, don't make any more; If there is the hashcode value, call it equals method comparing with the new elements, the same is not saved, not the same hash other address, so there is a problem of conflict resolution, so that 1 to the number of actual call equals method is greatly reduced, said popular point: 1 Java hashCode method is based on a set of rules will be associated with the object of information (such as object storage address, object field, etc.) mapping into a numerical value, this value according as hash values. The following code is the implementation of the put method in java.util.HashMap:


public V put(K key, V value) {

    if (key == null)

      return putForNullKey(value);

    int hash = hash(key.hashCode());

    int i = indexFor(hash, table.length);

    for (Entry<K,V> e = table[i]; e != null; e = e.next) {

      Object k;

      if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {

        V oldValue = e.value;

        e.value = value;

        e.recordAccess(this);

        return oldValue;

      }

    }

    modCount++;

    addEntry(hash, key, value, i);

    return null;

  } 

put method is used to add new elements to the HashMap, according to the specific implementation in the put method, will call hashCode method to get the first element of hashCode value, then the existence of the view table hashCode value, if there is a call the equals method to determine whether there is the element, if present, is updated value values, or add new elements to the HashMap. It can be seen from here that the hashCode method exists to reduce the number of calls to the equals method, thus improving the efficiency of the program.

Some friends mistakenly believe that by default, hashCode returns the storage address of the object. In fact, this view is not comprehensive. It is true that some JVM directly returns the storage address of the object in the implementation, but most of the time it is not. The following is the implementation of generating hash hash values in HotSpot JVM:


static inline intptr_t get_next_hash(Thread * Self, oop obj) {

 intptr_t value = 0 ;

 if (hashCode == 0) {

   // This form uses an unguarded global Park-Miller RNG,

   // so it's possible for two threads to race and generate the same RNG.

   // On MP system we'll have lots of RW access to a global, so the

   // mechanism induces lots of coherency traffic.

   value = os::random() ;

 } else

 if (hashCode == 1) {

   // This variation has the property of being stable (idempotent)

   // between STW operations. This can be useful in some of the 1-0

   // synchronization schemes.

   intptr_t addrBits = intptr_t(obj) >> 3 ;

   value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;

 } else

 if (hashCode == 2) {

   value = 1 ;      // for sensitivity testing

 } else

 if (hashCode == 3) {

   value = ++GVars.hcSequence ;

 } else

 if (hashCode == 4) {

   value = intptr_t(obj) ;

 } else {

   // Marsaglia's xor-shift scheme with thread-specific state

   // This is probably the best overall implementation -- we'll

   // likely make this the default in future releases.

   unsigned t = Self->_hashStateX ;

   t ^= (t << 11) ;

   Self->_hashStateX = Self->_hashStateY ;

   Self->_hashStateY = Self->_hashStateZ ;

   Self->_hashStateZ = Self->_hashStateW ;

   unsigned v = Self->_hashStateW ;

   v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;

   Self->_hashStateW = v ;

   value = v ;

 }

 

 value &= markOopDesc::hash_mask;

 if (value == 0) value = 0xBAD ;

 assert (value != markOopDesc::no_hash, "invariant") ;

 TEVENT (hashCode: GENERATE) ;

 return value;

} 

The implementation in hotspot src/share vm/runtime/synchronizer cpp file.

So one might say, can you tell if two objects are equal based directly on the hashcode value? Definitely not, because different objects might generate the same hashcode value. Although it is not possible to judge whether two objects are equal or not based on hashcode value, it is possible to judge two objects are unequal based on hashcode value directly. If two objects have unequal hashcode value, they must be two different objects. To determine whether two objects are truly equal, you must use the equals method.

That is to say, for two objects, if the equals method is called and the result is true, the hashcode values of the two objects must be the same.

If the result obtained by the equals method is false, the hashcode values of the two objects are definitely not 1 different.

If the hashcode values of two objects are different, the equals method must yield false.

If the hashcode values of two objects are equal, the result obtained by the equals method is unknown.

2.equals method and hashCode method

In some cases, programmers will need to override the equals method when designing a class, such as the String class, but be aware that you must override the hashCode method as well as the equals method. Why do you say that?

Here's an example:


package com.cxh.test1;

import java.util.HashMap;

import java.util.HashSet;

import java.util.Set;

class People{

  private String name;

  private int age;  

  public People(String name,int age) {

    this.name = name;

    this.age = age;

  }   

  public void setAge(int age){

    this.age = age;

  }    

  @Override

  public boolean equals(Object obj) {

    // TODO Auto-generated method stub

    return this.name.equals(((People)obj).name) && this.age== ((People)obj).age;

  }

}

public class Main {

  public static void main(String[] args) {  

    People p1 = new People("Jack", 12);

    System.out.println(p1.hashCode());    

    HashMap<People, Integer> hashMap = new HashMap<People, Integer>();

    hashMap.put(p1, 1);

    System.out.println(hashMap.get(new People("Jack", 12)));

  }

} 

Here I just rewrote the equals method, which says that if two People objects have the same name and age, they are considered the same person.

This code was meant to output "1", but in fact it printed "null". Why is that? The reason is that you forgot to override the hashCode method when you overwrote the equals method.

Although the equals method is overridden so that two objects with the same logical name and age are judged to be equal objects (similar to the String class), it is important to remember that by default the hashCode method maps the object's storage address. No wonder the output of the code above is "null". The reason is very simple, p1 points to the object and

System. out. println (hashMap get (new People (" Jack ", 12))); new People("Jack", 12) generates two objects that must have different storage addresses. The following is the concrete implementation of HashMap's get method:


 public V get(Object key) {

    if (key == null)

      return getForNullKey();

    int hash = hash(key.hashCode());

    for (Entry<K,V> e = table[indexFor(hash, table.length)];

       e != null;

       e = e.next) {

      Object k;

      if (e.hash == hash && ((k = e.key) == key || key.equals(k)))

        return e.value;

    }

    return null;

  } 

So in hashmap get operation, with different values because get hashcdoe (note that the above code may in some cases the same hashcode value, but the probability is small, because even though two objects store address different are likely to get the same hashcode value), so lead to for cycle will not perform in the get method, direct return null.

So if you want the output of the above code to be "1", it's easy to rewrite the hashCode method so that the equals and hashCode methods are always logically 1 neutral.


package com.cxh.test1;

import java.util.HashMap;

import java.util.HashSet;

import java.util.Set; 

class People{

  private String name;

  private int age;

  public People(String name,int age) {

    this.name = name;

    this.age = age;

  }  

 public void setAge(int age){

    this.age = age;

  }  

  @Override

  public int hashCode() {

    // TODO Auto-generated method stub

    return name.hashCode()*37+age;

  } 

  @Override

  public boolean equals(Object obj) {

    // TODO Auto-generated method stub

    return this.name.equals(((People)obj).name) && this.age== ((People)obj).age;

  }

}

public class Main {

  public static void main(String[] args) {   

    People p1 = new People("Jack", 12);

    System.out.println(p1.hashCode());   

    HashMap<People, Integer> hashMap = new HashMap<People, Integer>();

    hashMap.put(p1, 1);  

    System.out.println(hashMap.get(new People("Jack", 12)));

  }

} 

When 1 comes, the output will be "1".

The following passage is taken from Effective Java1:

The same object is called multiple times during the execution of the program as long as the information used in the equals method comparison has not been modified, and the hashCode method must always return the same integer as 1.

If two objects are equal according to the equals method comparison, then calls to the hashCode method of both objects must return the same integer result.

If two objects are unequal according to the equals comparison, the hashCode method does not necessarily return different integers.

It's easy to understand 2 and 3, but it's easy to ignore 1. On page 194en 495 of Java programming ideas 1, there is also a paragraph similar to article 1:

"The most important factor in designing hashCode() is that whenever hashCode() is called on the same object, it should produce the same value. If you generate an hashCdoe value when you add HashMap to an object with put(), and another hashCode value when you pull it out with get(), you will not be able to retrieve the object. So if your hashCode method relies on volatile data in an object, users should be careful, because when that data changes, the hashCode() method generates a different hash code.

Here's an example:


package com.cxh.test1; 

import java.util.HashMap;

import java.util.HashSet;

import java.util.Set;

class People{

  private String name;

  private int age; 

  public People(String name,int age) {

    this.name = name;

    this.age = age;

  }  

  public void setAge(int age){

    this.age = age;

  } 

  @Override

  public int hashCode() {

    // TODO Auto-generated method stub

    return name.hashCode()*37+age;

  }  

  @Override

  public boolean equals(Object obj) {

    // TODO Auto-generated method stub

    return this.name.equals(((People)obj).name) && this.age== ((People)obj).age;

  }

}

public class Main {

  public static void main(String[] args) {  

    People p1 = new People("Jack", 12);

    System.out.println(p1.hashCode()); 

    HashMap<People, Integer> hashMap = new HashMap<People, Integer>();

    hashMap.put(p1, 1);

    p1.setAge(13);  

    System.out.println(hashMap.get(p1));

  }

} 

The output of this code is "null", and it should be clear why.

Therefore, when designing the hashCode and equals methods, if the data in the object is volatile, it is best not to rely on this field in the equals and hashCode methods.


Related articles: