php algorithm hash introduction

2021-01-06 00:29:17
OfStack

Hash Table is the core of PHP, and it's not too much to say.

PHP containers are used for arrays, associative arrays, object properties, function tables, symbol tables, and so on.

HashTable of PHP zipper method to solve conflicts, this since needless to say, my main focus today is PHP Hash algorithm, and this algorithm itself revealed some ideas.

Hash of PHP is the most common DJBX33A (Daniel J. Bernstein, Times 33 with Addition), this algorithm is widely used with a number of software projects,Apache, Perl and Berkeley DB, etc. This is the best known hash algorithm for strings because it is very fast and has very good classification (small collisions, uniform distribution).

The core idea of the algorithm is:


hash(i) = hash(i-1) * 33 + str[i]

In ES36en_ES37en.ES38en, we can find the algorithm in ES39en:


static inline ulong zend_inline_hash_func(char *arKey, uint nKeyLength)
{
    register ulong hash = 5381;

    /* variant with the hash unrolled eight times */
    for (; nKeyLength >= 8; nKeyLength -=  {
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
        hash = ((hash << 5) + hash) + *arKey++;
    }
    switch (nKeyLength) {
        case 7: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 6: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 5: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 4: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 3: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 2: hash = ((hash << 5) + hash) + *arKey++; /* fallthrough... */
        case 1: hash = ((hash << 5) + hash) + *arKey++; break;
        case 0: break;
EMPTY_SWITCH_DEFAULT_CASE()
    }
    return hash;
}

Compared with the classical Times 33 algorithm directly used in Apache and Perl:


hashing function used in Perl 5.005:
  # Return the hashed value of a string: $hash = perlhash("key")
  # (Defined by the PERL_HASH macro in hv.h)
  sub perlhash
  {
      $hash = 0;
      foreach (split //, shift) {
          $hash = $hash*33 + ord($_);
      }
      return $hash;
  }

In the hash algorithm of PHP, we can see a very subtle difference.

First of all, the most unusual thing is that ES57en does not use direct multiplication by 33. Instead, it uses:


hash << 5 + hash

This will certainly be faster than multiplying.

I read an article about the caching mechanism of ES65en a few days ago. One of the articles said that ES66en uses different caching strategies according to the popularity of posts. According to users' habits, ES66en only caches the first page of posts (because few people will read posts).

In a similar way, PHP encourages character indexing under 8-bit 1. It uses unrolled in units of 8 to improve efficiency, which has to be said is also a very detailed, very detailed place.

There are also inline, register variables... As you can see, the developers of ES75en also put a lot of effort into the optimization of ES76en

Finally, the initial value of hash is set to 5381. Compared to the times algorithm in Apache and Hash algorithm in Perl (both using initial hash as 0), why choose 5381? I don't know exactly why, but I have found some features of 5381:


Magic Constant 5381:
1. odd number
2. prime number
3. deficient number

Looking at this, I have reason to believe that the choice of initial values provides a better classification.