An in depth understanding of the HashCode approach in Java

2020-12-13 18:59:26
OfStack

About hashCode, Wikipedia:


In the Java programming language, every class implicitly or explicitly 
provides a hashCode() method, which digests the data stored in an 
instance of the class into a single hash value (a 32-bit signed 
integer).

hashCode extracts a 32-bit integer based on all data stored in an object instance. The integer is intended to indicate the uniqueness of the instance. Similar to the MD5 code, each file can generate a unique MD5 code through the MD5 algorithm. However, hashCode in Java is not really implemented to generate a unique hashCode for each object, and there is still a definite repetition rate of 1.

Looking at the Object class first, we know that the Object class is the direct or indirect parent of all the classes in the java program, at the highest point of the class hierarchy. Many of our common methods are defined in the Object class, including the hashCode method we are going to talk about, as follows


public final native Class<?> getClass(); 
public native int hashCode(); 
public boolean equals(Object obj) { 
 return (this == obj); 
}  
public String toString() { 
 return getClass().getName() + "@" + Integer.toHexString(hashCode()); 
}

Note that the hashCode method is preceded by an native modifier, which indicates that the hashCode method is implemented in a non-ES25en language, the concrete method implementation is external, and returns the address of the memory object.

The equals and hashCode methods are overridden in many of the java classes. Why? The most common String class, for example, if I define two strings of the same character, then when I compare them, I want the result to be the same. If you do not override the equals and hashCode methods, they will definitely not be the same, because the memory addresses of the two objects are not the same.


public int hashCode() { 
  int h = hash; 
  if (h == 0) { 
    int off = offset; 
    char val[] = value; 
    int len = count; 

      for (int i = 0; i < len; i++) { 
        h = 31*h + val[off++]; 
      } 
      hash = h; 
    } 
    return h; 
  }

This code is actually the implementation of this mathematical expression


s[0]*31^(n-1) + s[1]*31^(n-2) +  ...  + s[n-1]

s[i] is the i character of string, and n is the length of String. So why is it 31 instead of something else? #63; Effective Java says that 31 was chosen because it is an odd prime number, and if the multiplier is even and the multiplication overflows, the information will be lost, because multiplying by 2 is equivalent to a shift operation. The benefits of using primes are not obvious, but it is customary to use primes to calculate hash results. A nice feature of 31 is that it uses shift and subtraction instead of multiplication to get better performance: 31*i==(i) < < 5) - i. VM now automates this optimization.

As you can see, the String class uses its value value as a parameter to calculate hashCode, that is, the same value 1 must have the same hashCode value. This is also easy to understand, since the value of value is the same, the comparison with equals is the same, and the comparison with equals method is the same, so hashCode1 must be the same. The other way around doesn't have to be 1. It does not guarantee that the same hashCode1 will have the same object.

A good hash function would look like this: generate unequal hashCode for different objects.

Ideally, the hash function should distribute unequal instances of the set evenly over all possible hashCode, which is very difficult to achieve, at least not java. Because we can see that hashCode is not randomly generated, it has a definite law, is the above mathematical equation, we can construct 1 with the same hashCode value value is not one, for example: Aa and BB hashCode is one.

The following code:


public class Main {
  public static void main(String[] args) {
    Main m = new Main();
    System.out.println(m);
    System.out.println(Integer.toHexString(m.hashCode()));
    String a = "Aa";
    String b = "BB";
    System.out.println(a.hashCode());
    System.out.println(b.hashCode());
  }
}

Output results:


Main@2a139a55 
2a139a55 
2112 
2112

1 When you override the equal function, you usually override the hashCode function. Why?

Taking a look at this example, let's create a simple class Employee


public class Employee
{
  private Integer id;
  private String firstname;
  private String lastName;
  private String department;

  public Integer getId() {
    return id;
  }
  public void setId(Integer id) {
    this.id = id;
  }
  public String getFirstname() {
    return firstname;
  }
  public void setFirstname(String firstname) {
    this.firstname = firstname;
  }
  public String getLastName() {
    return lastName;
  }
  public void setLastName(String lastName) {
    this.lastName = lastName;
  }
  public String getDepartment() {
    return department;
  }
  public void setDepartment(String department) {
    this.department = department;
  }
}

The Employee class above just has 1 very basic property and getter, setter. Now consider a case where you need to compare two employee.


public class EqualsTest {
  public static void main(String[] args) {
    Employee e1 = new Employee();
    Employee e2 = new Employee();

    e1.setId(100);
    e2.setId(100);
    //Prints false in console
    System.out.println(e1.equals(e2));
  }
}

No doubt, the above program will output false, but in fact, the above two objects represent passing through 1 employee. The real business logic wants us to return true.

To do this, we need to rewrite the equals method.


public boolean equals(Object o) {
    if(o == null)
    {
      return false;
    }
    if (o == this)
    {
      return true;
    }
    if (getClass() != o.getClass())
    {
      return false;
    }
    Employee e = (Employee) o;
    return (this.getId() == e.getId());
}

Add this method to the above class, and EauqlsTest will output true.

So are we done? No, let's try a different test.


import java.util.HashSet;
import java.util.Set;
public class EqualsTest
{
	public static void main(String[] args)
	  {
		Employee e1 = new Employee();
		Employee e2 = new Employee();
		e1.setId(100);
		e2.setId(100);
		//Prints 'true'
		System.out.println(e1.equals(e2));
		Set<Employee> employees = new HashSet<Employee>();
		employees.add(e1);
		employees.add(e2);
		//Prints two objects
		System.out.println(employees);
	}

The above program outputs two results. If two employee objects, equals, return true, then only one object should be stored in Set. What's the problem?

We forgot the second important method. As stated in Javadoc's Javadoc, if the equals() method is overridden, the hashCode() method must be overridden. We add the following method, and the program will perform correctly.


public final native Class<?> getClass(); 
public native int hashCode(); 
public boolean equals(Object obj) { 
 return (this == obj); 
}  
public String toString() { 
 return getClass().getName() + "@" + Integer.toHexString(hashCode()); 
}

Things to keep in mind

Try to use the same property of the object to generate both hashCode() and equals() methods. In our case, we used employee id.
The eqauls method must be guaranteed to be 1 (if the object has not been modified, equals should return the same value)
Whenever a.equals (b), a.hashCode () must be equal to b.hashCode ().
Both must be rewritten simultaneously.

conclusion

That is the end of this article on an in-depth understanding of the HashCode method, which I hope will be helpful to you. Interested friends can continue to refer to other related topics in this site, if there is any deficiency, welcome to comment out. Thank you for your support!