An in depth analysis of Redis's LRU elimination strategy

  • 2020-06-15 10:29:35
  • OfStack

preface

When Redis is used as a cache, the space consumption of memory should be considered in some scenarios. Redis removes expired keys to free up space, and there are two deletion strategies for expired keys:

Lazy deletion: Every time a key is fetched from a key space, it is checked to see if the key is expired, and if so, it is deleted. If it is not expired, the key is returned. Periodically delete: every 1 period of time, the program on the database once check, delete the expired key inside.

In addition, Redis can also enable LRU to automatically eliminate 1 key pair.

LRU algorithm

When it comes to weeding out data from the cache, we want to weed out data that won't be used in the future and keep data that will be accessed frequently in the future, but the biggest problem is that caching doesn't predict the future. One solution is to use LRU to predict that recently accessed data is more likely to be accessed in the future. Data 1 in the cache typically has access distributions like this: part 1 of the data gets the majority of the traffic. When the access mode is rarely changed, the last access time of each data can be recorded, and the data with the least free time can be considered to be the most likely to be accessed in the future.

For example, A accesses 1 per 5s, B accesses 1 per 2s, C and D accesses 1 per 10s, and | represents the cut-off point for calculating free time:

[

~~~~~A~~~~~A~~~~~A~~~~A~~~~~A~~~~~A~~|
~~B~~B~~B~~B~~B~~B~~B~~B~~B~~B~~B~~B~|
~~~~~~~~~~C~~~~~~~~~C~~~~~~~~~C~~~~~~|
~~~~~D~~~~~~~~~~D~~~~~~~~~D~~~~~~~~~D|

]

It can be seen that LRU works well for A, B and C, and perfectly predicts the probability of being visited in the future > A > C, but for D it predicted the least free time.

Overall, however, the LRU algorithm is a good enough algorithm

LRU configuration parameters

There are three Redis configurations related to LRU:

maxmemory: Specify a limited memory size when configuring Redis to store data, such as 100m. When the cache consumes more than this amount of memory, the data cull is triggered. When this data is configured to be 0, it means there is no limit to the amount of cached data, i.e. the LRU function does not take effect. The 64-bit system default value is 0 and the 32-bit system default memory limit is 3GB maxmemory_policy: Elimination strategy after triggering data elimination maxmemory_samples: the precision of random sampling, that is, the number of key immediately taken out. The larger the value configuration is, the closer it is to the real LRU algorithm. However, the larger the value is, the higher the corresponding consumption will be, which has a certain influence on the performance. The sample value defaults to 5.

Elimination strategy

The elimination strategy of maxmemory_policy has the following types of assignment:

noeviction: Returns an error response to the client if the cached data exceeds the maxmemory limit and the commands the client is executing (most write instructions, but DEL and a few exceptions) cause memory allocation allkeys-lru: All keys are LRU obsolete volatile-lru: LRU is eliminated only for keys with an expiration time set allkeys-random: Randomly recycle all keys volatile-random: Randomly recycle keys that set expiration time volatile-ttl: Only the key with the expiration time set is eliminated -- the smaller key with the survival time TTL(Time To Live)

The three elimination strategies of ES99en-ES100en, ES101en-ES102en and ES103en-ES104en do not use full data and may not be able to eliminate enough memory space. These three strategies are similar to noeviction in the absence of expired keys or keys with timeout properties set.

Rule of thumb:

Use the allkeys-ES110en policy: This policy can be chosen when the expected request conforms to a power distribution (28 rule, etc.), such as when more subset elements of part 1 are accessed than other elements. allkeys-random: When all keys are accessed in a continuous loop, or expected request distribution is evenly distributed (all elements are accessed equally likely) Use volatile-ES114en: To adopt this strategy, it is desirable that the TTL value of the cached object be different

volatile-lru and ES119en-ES120en policies are useful when you want to use Redis instances of single 1 to both cache obsolescence and persist 1's frequently used set of keys. Keys with no expiration time are persisted, and keys with expiration time participate in cache obsolescence. But running two instances as usual is a better way to solve this problem.

Setting an expiration time for keys is also memory-consuming, so using the ES124en-ES125en strategy is more space-efficient because it allows you to set an expiration time for keys.

Approximate LRU algorithm

We know that the LRU algorithm requires a bidirectional linked list to record the order in which the data was recently accessed, but for memory savings, the LRU algorithm is not a complete implementation. Redis does not select the most unused key to recycle, instead it tries to run an algorithm that is similar to LRU by sampling a small number of keys and then recycling the longest one. The precision of the adjustment algorithm can be achieved by adjusting the number of samples maxmemory-ES138en in each collection.

, according to author Redis each Redis Object can squeeze the space of 24 bits, but 24 bits is not enough to store two Pointers, and store a timestamp is low enough, Redis Object object in seconds to store the new or updated unix time, namely LRU clock, 24 bits data to overflow need 194 days, but the cache data update is very frequent, is enough.

Redis's key space is placed in one hash table, and to select one of the keys that has not been accessed for the longest time requires another data structure to store the source information, which is clearly not cost-effective. At first, Redis just randomly selected 3 key and eliminated them. Later, the algorithm was improved to N key strategy, with the default of 5.

Redis 3.0 later improved the performance of the algorithm, providing one candidate key to be eliminated. By default, there are 16 key, sorted according to the free time. During the update, N key were randomly selected from the Redis key space and their free time was calculated respectively. key would enter pool only when pool was not satisfied or when the free time was greater than the minimum in pool. Then, key with the maximum free time was selected from pool to eliminate.

The real LRU algorithm and the approximate LRU algorithm can be compared through the following images:

The light gray bands are obsolete objects, the gray bands are non-obsolete objects, and the green bands are new additions. It can be seen that when the value of maxmemory-samples is 5, Redis 3.0 has better effect than Redis 2.8. The LRU algorithm approximates the theoretical performance using 10 sample sizes of Redis 3.0.

The LRU approximation algorithm works well when the data access pattern is very close to a power distribution, where most access is concentrated on a few keys.

In the simulation experiment, we found that there is little difference between the real LRU algorithm and the approximate LRU algorithm if the power distribution access mode is used.

LRU source code analysis

The keys and values in Redis are redisObject objects:


typedef struct redisObject {
 unsigned type:4;
 unsigned encoding:4;
 unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
       * LFU data (least significant 8 bits frequency
       * and most significant 16 bits access time). */
 int refcount;
 void *ptr;
} robj;

unsigned's low 24 bits's lru recorded redisObj's LRU time.

When the Redis command accesses cached data, it calls the function lookupKey:


robj *lookupKey(redisDb *db, robj *key, int flags) {
 dictEntry *de = dictFind(db->dict,key->ptr);
 if (de) {
  robj *val = dictGetVal(de);

  /* Update the access time for the ageing algorithm.
   * Don't do it if we have a saving child, as this will trigger
   * a copy on write madness. */
  if (server.rdb_child_pid == -1 &&
   server.aof_child_pid == -1 &&
   !(flags & LOOKUP_NOTOUCH))
  {
   if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
    updateLFU(val);
   } else {
    val->lru = LRU_CLOCK();
   }
  }
  return val;
 } else {
  return NULL;
 }
}

This function updates the lru value of the object when the policy is LRU(not LFU) and sets it to LRU_CLOCK() value:


/* Return the LRU clock, based on the clock resolution. This is a time
 * in a reduced-bits format that can be used to set and check the
 * object->lru field of redisObject structures. */
unsigned int getLRUClock(void) {
 return (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;
}

/* This function is used to obtain the current LRU clock.
 * If the current resolution is lower than the frequency we refresh the
 * LRU clock (as it should be in production servers) we return the
 * precomputed value, otherwise we need to resort to a system call. */
unsigned int LRU_CLOCK(void) {
 unsigned int lruclock;
 if (1000/server.hz <= LRU_CLOCK_RESOLUTION) {
  atomicGet(server.lruclock,lruclock);
 } else {
  lruclock = getLRUClock();
 }
 return lruclock;
}

LRU_CLOCK() depends on LRU_CLOCK_RESOLUTION(default value 1000), LRU_CLOCK_RESOLUTION represents the precision of LRU algorithm, that is, how long is the unit of one LRU. server.hz represents the frequency of server refreshes. If the server's time update accuracy value is smaller than LRU's accuracy value, LRU_CLOCK() USES the server's time directly, reducing overhead.

The entry to the Redis processing command is processCommand:


int processCommand(client *c) {

 /* Handle the maxmemory directive.
  *
  * Note that we do not want to reclaim memory if we are here re-entering
  * the event loop since there is a busy Lua script running in timeout
  * condition, to avoid mixing the propagation of scripts with the
  * propagation of DELs due to eviction. */
 if (server.maxmemory && !server.lua_timedout) {
  int out_of_memory = freeMemoryIfNeededAndSafe() == C_ERR;
  /* freeMemoryIfNeeded may flush slave output buffers. This may result
   * into a slave, that may be the active client, to be freed. */
  if (server.current_client == NULL) return C_ERR;

  /* It was impossible to free enough memory, and the command the client
   * is trying to execute is denied during OOM conditions or the client
   * is in MULTI/EXEC context? Error. */
  if (out_of_memory &&
   (c->cmd->flags & CMD_DENYOOM ||
    (c->flags & CLIENT_MULTI && c->cmd->proc != execCommand))) {
   flagTransaction(c);
   addReply(c, shared.oomerr);
   return C_OK;
  }
 }
}

Only the parts that free the memory space are listed. freeMemoryIfNeededAndSafe is the function to free the memory:


int freeMemoryIfNeeded(void) {
 /* By default replicas should ignore maxmemory
  * and just be masters exact copies. */
 if (server.masterhost && server.repl_slave_ignore_maxmemory) return C_OK;

 size_t mem_reported, mem_tofree, mem_freed;
 mstime_t latency, eviction_latency;
 long long delta;
 int slaves = listLength(server.slaves);

 /* When clients are paused the dataset should be static not just from the
  * POV of clients not being able to write, but also from the POV of
  * expires and evictions of keys not being performed. */
 if (clientsArePaused()) return C_OK;
 if (getMaxmemoryState(&mem_reported,NULL,&mem_tofree,NULL) == C_OK)
  return C_OK;

 mem_freed = 0;

 if (server.maxmemory_policy == MAXMEMORY_NO_EVICTION)
  goto cant_free; /* We need to free memory, but policy forbids. */

 latencyStartMonitor(latency);
 while (mem_freed < mem_tofree) {
  int j, k, i, keys_freed = 0;
  static unsigned int next_db = 0;
  sds bestkey = NULL;
  int bestdbid;
  redisDb *db;
  dict *dict;
  dictEntry *de;

  if (server.maxmemory_policy & (MAXMEMORY_FLAG_LRU|MAXMEMORY_FLAG_LFU) ||
   server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL)
  {
   struct evictionPoolEntry *pool = EvictionPoolLRU;

   while(bestkey == NULL) {
    unsigned long total_keys = 0, keys;

    /* We don't want to make local-db choices when expiring keys,
     * so to start populate the eviction pool sampling keys from
     * every DB. */
    for (i = 0; i < server.dbnum; i++) {
     db = server.db+i;
     dict = (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) ?
       db->dict : db->expires;
     if ((keys = dictSize(dict)) != 0) {
      evictionPoolPopulate(i, dict, db->dict, pool);
      total_keys += keys;
     }
    }
    if (!total_keys) break; /* No keys to evict. */

    /* Go backward from best to worst element to evict. */
    for (k = EVPOOL_SIZE-1; k >= 0; k--) {
     if (pool[k].key == NULL) continue;
     bestdbid = pool[k].dbid;

     if (server.maxmemory_policy & MAXMEMORY_FLAG_ALLKEYS) {
      de = dictFind(server.db[pool[k].dbid].dict,
       pool[k].key);
     } else {
      de = dictFind(server.db[pool[k].dbid].expires,
       pool[k].key);
     }

     /* Remove the entry from the pool. */
     if (pool[k].key != pool[k].cached)
      sdsfree(pool[k].key);
     pool[k].key = NULL;
     pool[k].idle = 0;

     /* If the key exists, is our pick. Otherwise it is
      * a ghost and we need to try the next element. */
     if (de) {
      bestkey = dictGetKey(de);
      break;
     } else {
      /* Ghost... Iterate again. */
     }
    }
   }
  }

  /* volatile-random and allkeys-random policy */
  else if (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM ||
     server.maxmemory_policy == MAXMEMORY_VOLATILE_RANDOM)
  {
   /* When evicting a random key, we try to evict a key for
    * each DB, so we use the static 'next_db' variable to
    * incrementally visit all DBs. */
   for (i = 0; i < server.dbnum; i++) {
    j = (++next_db) % server.dbnum;
    db = server.db+j;
    dict = (server.maxmemory_policy == MAXMEMORY_ALLKEYS_RANDOM) ?
      db->dict : db->expires;
    if (dictSize(dict) != 0) {
     de = dictGetRandomKey(dict);
     bestkey = dictGetKey(de);
     bestdbid = j;
     break;
    }
   }
  }

  /* Finally remove the selected key. */
  if (bestkey) {
   db = server.db+bestdbid;
   robj *keyobj = createStringObject(bestkey,sdslen(bestkey));
   propagateExpire(db,keyobj,server.lazyfree_lazy_eviction);
   /* We compute the amount of memory freed by db*Delete() alone.
    * It is possible that actually the memory needed to propagate
    * the DEL in AOF and replication link is greater than the one
    * we are freeing removing the key, but we can't account for
    * that otherwise we would never exit the loop.
    *
    * AOF and Output buffer memory will be freed eventually so
    * we only care about memory used by the key space. */
   delta = (long long) zmalloc_used_memory();
   latencyStartMonitor(eviction_latency);
   if (server.lazyfree_lazy_eviction)
    dbAsyncDelete(db,keyobj);
   else
    dbSyncDelete(db,keyobj);
   latencyEndMonitor(eviction_latency);
   latencyAddSampleIfNeeded("eviction-del",eviction_latency);
   latencyRemoveNestedEvent(latency,eviction_latency);
   delta -= (long long) zmalloc_used_memory();
   mem_freed += delta;
   server.stat_evictedkeys++;
   notifyKeyspaceEvent(NOTIFY_EVICTED, "evicted",
    keyobj, db->id);
   decrRefCount(keyobj);
   keys_freed++;

   /* When the memory to free starts to be big enough, we may
    * start spending so much time here that is impossible to
    * deliver data to the slaves fast enough, so we force the
    * transmission here inside the loop. */
   if (slaves) flushSlavesOutputBuffers();

   /* Normally our stop condition is the ability to release
    * a fixed, pre-computed amount of memory. However when we
    * are deleting objects in another thread, it's better to
    * check, from time to time, if we already reached our target
    * memory, since the "mem_freed" amount is computed only
    * across the dbAsyncDelete() call, while the thread can
    * release the memory all the time. */
   if (server.lazyfree_lazy_eviction && !(keys_freed % 16)) {
    if (getMaxmemoryState(NULL,NULL,NULL,NULL) == C_OK) {
     /* Let's satisfy our stop condition. */
     mem_freed = mem_tofree;
    }
   }
  }

  if (!keys_freed) {
   latencyEndMonitor(latency);
   latencyAddSampleIfNeeded("eviction-cycle",latency);
   goto cant_free; /* nothing to free... */
  }
 }
 latencyEndMonitor(latency);
 latencyAddSampleIfNeeded("eviction-cycle",latency);
 return C_OK;

cant_free:
 /* We are here if we are not able to reclaim memory. There is only one
  * last thing we can try: check if the lazyfree thread has jobs in queue
  * and wait... */
 while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
  if (((mem_reported - zmalloc_used_memory()) + mem_freed) >= mem_tofree)
   break;
  usleep(1000);
 }
 return C_ERR;
}

/* This is a wrapper for freeMemoryIfNeeded() that only really calls the
 * function if right now there are the conditions to do so safely:
 *
 * - There must be no script in timeout condition.
 * - Nor we are loading data right now.
 *
 */
int freeMemoryIfNeededAndSafe(void) {
 if (server.lua_timedout || server.loading) return C_OK;
 return freeMemoryIfNeeded();
}

Several elimination strategies maxmemory_policy are implemented in this function.

When using LRU, you can see that starting from database No. 0 (default is 16), according to different policies, select dict(all keys) or expires(keys with expiration time) to update candidate pool pool, and pool update strategy is evictionPoolPopulate:


void evictionPoolPopulate(int dbid, dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
 int j, k, count;
 dictEntry *samples[server.maxmemory_samples];

 count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
 for (j = 0; j < count; j++) {
  unsigned long long idle;
  sds key;
  robj *o;
  dictEntry *de;

  de = samples[j];
  key = dictGetKey(de);

  /* If the dictionary we are sampling from is not the main
   * dictionary (but the expires one) we need to lookup the key
   * again in the key dictionary to obtain the value object. */
  if (server.maxmemory_policy != MAXMEMORY_VOLATILE_TTL) {
   if (sampledict != keydict) de = dictFind(keydict, key);
   o = dictGetVal(de);
  }

  /* Calculate the idle time according to the policy. This is called
   * idle just because the code initially handled LRU, but is in fact
   * just a score where an higher score means better candidate. */
  if (server.maxmemory_policy & MAXMEMORY_FLAG_LRU) {
   idle = estimateObjectIdleTime(o);
  } else if (server.maxmemory_policy & MAXMEMORY_FLAG_LFU) {
   /* When we use an LRU policy, we sort the keys by idle time
    * so that we expire keys starting from greater idle time.
    * However when the policy is an LFU one, we have a frequency
    * estimation, and we want to evict keys with lower frequency
    * first. So inside the pool we put objects using the inverted
    * frequency subtracting the actual frequency to the maximum
    * frequency of 255. */
   idle = 255-LFUDecrAndReturn(o);
  } else if (server.maxmemory_policy == MAXMEMORY_VOLATILE_TTL) {
   /* In this case the sooner the expire the better. */
   idle = ULLONG_MAX - (long)dictGetVal(de);
  } else {
   serverPanic("Unknown eviction policy in evictionPoolPopulate()");
  }

  /* Insert the element inside the pool.
   * First, find the first empty bucket or the first populated
   * bucket that has an idle time smaller than our idle time. */
  k = 0;
  while (k < EVPOOL_SIZE &&
    pool[k].key &&
    pool[k].idle < idle) k++;
  if (k == 0 && pool[EVPOOL_SIZE-1].key != NULL) {
   /* Can't insert if the element is < the worst element we have
    * and there are no empty buckets. */
   continue;
  } else if (k < EVPOOL_SIZE && pool[k].key == NULL) {
   /* Inserting into empty position. No setup needed before insert. */
  } else {
   /* Inserting in the middle. Now k points to the first element
    * greater than the element to insert. */
   if (pool[EVPOOL_SIZE-1].key == NULL) {
    /* Free space on the right? Insert at k shifting
     * all the elements from k to end to the right. */

    /* Save SDS before overwriting. */
    sds cached = pool[EVPOOL_SIZE-1].cached;
    memmove(pool+k+1,pool+k,
     sizeof(pool[0])*(EVPOOL_SIZE-k-1));
    pool[k].cached = cached;
   } else {
    /* No free space on right? Insert at k-1 */
    k--;
    /* Shift all elements on the left of k (included) to the
     * left, so we discard the element with smaller idle time. */
    sds cached = pool[0].cached; /* Save SDS before overwriting. */
    if (pool[0].key != pool[0].cached) sdsfree(pool[0].key);
    memmove(pool,pool+1,sizeof(pool[0])*k);
    pool[k].cached = cached;
   }
  }

  /* Try to reuse the cached SDS string allocated in the pool entry,
   * because allocating and deallocating this object is costly
   * (according to the profiler, not my fantasy. Remember:
   * premature optimizbla bla bla bla. */
  int klen = sdslen(key);
  if (klen > EVPOOL_CACHED_SDS_SIZE) {
   pool[k].key = sdsdup(key);
  } else {
   memcpy(pool[k].cached,key,klen+1);
   sdssetlen(pool[k].cached,klen);
   pool[k].key = pool[k].cached;
  }
  pool[k].idle = idle;
  pool[k].dbid = dbid;
 }
}

Redis randomly selects key of the number of maxmemory_samples and calculates the free time of these key. idle time can enter pool when conditions are met (which is more free time than some keys in pool). When pool is updated, the key with the most free time in pool is eliminated.

estimateObjectIdleTime is used to calculate the free time of Redis objects:


/* Given an object returns the min number of milliseconds the object was never
 * requested, using an approximated LRU algorithm. */
unsigned long long estimateObjectIdleTime(robj *o) {
 unsigned long long lruclock = LRU_CLOCK();
 if (lruclock >= o->lru) {
  return (lruclock - o->lru) * LRU_CLOCK_RESOLUTION;
 } else {
  return (lruclock + (LRU_CLOCK_MAX - o->lru)) *
     LRU_CLOCK_RESOLUTION;
 }
}

Idle time is basically the difference between the object's lru and the global LRU_CLOCK() multiplied by the precision LRU_CLOCK_RESOLUTION, converting seconds to milliseconds.

Refer to the link

Random notes on improving the Redis LRU algorithm Using Redis as an LRU cache

conclusion


Related articles: