Reasons for using the 31 coefficient when defining hashcode

2020-12-13 18:59:06
OfStack

Hashing computations calculate which elements should be placed in the array. In which list, exactly. According to Java rules, if you want to put an object in HashMap, your object's class must provide the hashcode method, which returns an integer value. The String class, for example, has the following method:


public int hashCode() { 
    int h = hash; 
    int len = count; 
    if (h == 0 && len > 0) { 
      int off = offset; 
      char val[] = value; 
 
      for (int i = 0; i < len; i++) { 
        h = 31*h + val[off++]; 
      } 
      hash = h; 
    } 
    return h; 
  }

Notice the for loop up here, isn't that crazy? Let me give you an example to make it easy for you to understand what's going on. For example, if you have a string "abcde" and use base 31 to calculate the sum of this string, you would write the following calculation:

a*31^4+b*31^3+c*31^2+d*31^1+e*31^0. Note the a, b c, d or e ASCII value refers to them. It's an interesting loop that can be used to calculate the N base. This loop can be pulled out as a good tool for counting the base:


public static void main(String[] args) { 
    int[] a={1,0}; 
    System.out.println(calculate(2,a)); 
  } 
 
  private static int calculate(int radix,int[] a){ 
    int sum = 0; 
    for(int i=0;i<a.length;++i){ 
      sum = sum*radix+a[i]; 
    } 
    return sum; 
  }

The static method caculate accepts radix as the base base, and the array a simulates the number in the base to be calculated, just note that the surface order needs to be 1. For example, the base 01 2 string is arranged in an array as {0,1}. The output above is 1, which corresponds to the true value of 01.

So why use 31 as the base? First understand why HashCode is needed. Each object calculates HashCode by value. The code size is not too large to be 1 (because it is usually very slow to calculate), but it should not be repeated as much as possible, so the cardinality should be as large as possible. In addition, 31*N can be optimized by the compiler to
After moving 5 bits to the left, subtracting 1 bit has higher performance. In fact, the choice of 31 is controversial, see here.

Think that this thing will still lead to more repetition, should use a larger number. So maybe Java will change in the future. The following article introduces two conclusions:

1. Use prime numbers for cardinality

The nature of a prime number (in which only 1 and itself are factors) makes it more likely than others to produce a result that is only 1, i.e. hash code, with the least chance of conflict.

2. Choice 31 is the first choice after observing the distribution results. It is not clear why, but it is beneficial.

In addition, String.hashCode internally caches the value calculated for the first time because this is an final (immutable) class and the contents of the String object will not change. This improves performance on multiple put to HashMap occasions, but seems to be of little use.

conclusion

That's it for the 31 factor in defining hashcode, and I hope you found it helpful. Those who are interested can continue to see this site:

Rewriting hashCode() and equals() Method Details

The Essential Differences and Connections between hashCode() and equals()

Analysis of HashMap Usage from the Perspective of Java Source Code

If there is any deficiency, please let me know. Thank you for your support!