The number of hashtable barrels is usually analyzed by taking a prime number

  • 2020-05-26 08:26:09
  • OfStack

Why would an hashtable bucket have a prime number

Let's say I have a hash function

H( c ) = c % N;

When N takes a composite, the easiest example is 2 to n, let's say 2 to the third is equal to 8

H(11100(base 2)) = H(28) = 4
H(10100(base 2)) = H(20) = 4

At this point, the fourth bit (from right to left) of c "fails", that is, whatever the value of the fourth bit of c is, the value of H(c) will be 1. At this point, the fourth bit of c does not participate in the operation of H(c) at all, so H(c) cannot fully reflect the characteristics of c, which increases the probability of conflict.

When taking other composite Numbers, some bits of c "fail" to varying degrees, leading to conflicts in some common applications.

However, taking a prime number can basically ensure that every bit of c participates in the operation of H(c), so as to reduce the chance of conflict in common applications.

(personal opinion: sometimes it's not too bad not to pick prime Numbers... But there is no doubt that it is safe to take prime Numbers.

That's how I understand it

So let's add one more point, and this is saying that in common applications, there are some data that are similar, and it's better to use prime Numbers, for example, if the data that you want to store is a compressed state, or if you want to store a table that describes the current search state, then it's more likely that you're going to have a hash that doesn't use prime Numbers.

If it's a randomly distributed integer, then the hash modulus, as long as it's big enough, is going to be one in terms of probability, but that's obviously out of practice.

The situation you mentioned is quite special, because a smaller prime number, N, is selected to be a larger prime number, then only a bit of the N base system will fail. Considering the characteristics of the computer system, N base bit notation is often not critical, while the commonly used 2^N base system is critical, so conflicts can be avoided.

In fact, even some large Numbers have been tested to store an adjacency matrix compressed into base 2. When the modulus is large enough, even composite Numbers can be very close to prime Numbers. However, in some (several 10) composite Numbers, the efficiency will be severely reduced, so prime Numbers are safe.

You might as well make the experiment, don't go to choose a random integer, and to consider some common applications, with prime Numbers and sum Numbers, mainly inspects the average load factor, you may come to conclusion and I 1:1 most of the time the result is right also, but surprisingly poor effect on sum Numbers in part 1, almost all the prime Numbers have very good effect.

I personally think that the understanding of the more common sense, if you don't take prime Numbers is there will be 1 set dangerous, dangerous occurs when assuming the selected the prime m = x * y, if need hash key precisely with the few existing relationships x'll be up shit creek, worst case assumption for x multiples, you can imagine hash results as follows: 1 ~ y, rather than 1 ~ m. But if the size of the bucket is prime, you won't have this problem.

Thank you for reading, I hope to help you, thank you for your support of this site!


Related articles: