An example of how Redis implements distributed locking

  • 2020-05-30 21:16:27
  • OfStack

The timing task we used before was only deployed on a single machine. In order to solve the problem of single point, in order to ensure that one task was only executed by one machine, the problem of locking needed to be considered, so we spent time on this problem. How do you implement a distributed lock?

The essence of locks is mutual exclusion, to ensure that at any time there can be a client holding the same lock, if you consider using redis to achieve a distributed lock, the simplest solution is to create a key value in the instance, when the lock is released, the key value will be deleted. However, a reliable and complete distributed lock needs to consider more details, let's take a look at how to write a correct distributed lock.

Single edition distributed lock SETNX

So we implemented a simple lock directly based on redis's setNX (SET if Not eXists) command. Direct up the pseudo code

Lock acquisition:


SET resource_name my_random_value NX PX 30000

Lock release:


 if redis.call("get",KEYS[1]) == ARGV[1] then
  return redis.call("del",KEYS[1])
 else
  return 0
 end

A few details to note:

First we need to set the timeout when we get the lock. The timeout is set to prevent the client from crashing, or to prevent the lock from being held after a network problem occurs. The whole system is deadlocked.

Use the setNX command to ensure that the query and write steps are atomic

When the lock is released, we judge KEYS[1]) == ARGV[1]. Here, KEYS[1] is value taken from redis, and ARGV[1] is my_random_value generated above. The above judgment is made to secure the release of the lock holder. We assume that this step is not performed:

The client A takes the lock, and the post thread hangs. The time is greater than the expiration time of the lock. After the lock expires, the client B acquires the lock. After the client A is restored, after processing the relevant events, issue the del command to redis. The lock is released The client C gets the lock. This is when two clients on one system hold a lock at the same time.

The key to this problem is that the lock held by client B is released by client A.

The release of the lock must be done using the lua script to keep the operation atomic. The release of the lock consists of three steps: get, judge, del. If the atomicity of the three steps is not guaranteed, distributed locking will have concurrency problems.

With the above details in mind, a distributed lock for a single redis node is achieved.

There is still a single point of redis in this distributed lock. You might say that Redis is the architecture of master-slave, and it is good to switch to slave in case of failure, but the replication of Redis is asynchronous.

If on the client side A gets a lock on master. Before master synchronizes data to slave, master goes down. The client B gets the lock again from slave.

This caused multiple locks to be held at the same time due to the outage of Master. If your system can be used for short periods of time, there are more than one person holding the lock. This simple solution will solve the problem.

But if you solve this problem. The Redis official offers an Redlock solution.

The realization of the RedLock

To solve the problem, Redis single point. The authors of Redis propose a solution for RedLock. The scheme is ingenious and simple.
The core idea of RedLock is to use multiple Redis Master to be redundant at the same time, and these nodes are completely independent and do not need to synchronize the data between them.

If we have N of Redis nodes, N should be an odd number greater than 2. Implementation steps of RedLock:

Get the current time Obtain the Redis locks of the N nodes in sequence using the method mentioned above. If the number of locks obtained is greater than (N/2+1) and the acquisition time is less than the effective time of the lock (lock validity time), a valid lock is considered to have been obtained. The automatic lock release time is the initial lock release time minus the time spent acquiring the lock. If the number of locks obtained is less than (N/2+1), or if sufficient information is not obtained during the lock's validity period (lock validity time), the lock acquisition is considered a failure. At this point you need to send a message to all the nodes to release the lock.

The implementation of releasing the lock is simple. Initiate the release operation for all Redis nodes, regardless of whether the lock was successfully acquired or not.

A few details need to be noted:

The interval between reattempts to acquire a lock should be a random range rather than a fixed time. This prevents multiple clients from simultaneously sending lock acquisition operations to the Redis cluster and avoids simultaneous contention. The condition in which the same number of locks are acquired at the same time. (although the probability is very low)

If an master node fails, the recovery interval should be greater than the lock's effective time.

Let's say I have three Redis nodes: A, B, C. The client foo acquired two locks, A and B. At this point, B goes down and all the data in memory is lost. B node replies. At this time, the client bar reacquires the lock and obtains the two nodes B and C. At this point, two more clients have acquired the lock.

This can be avoided if the recovery time is greater than the effective time of the lock. You can even turn on the persistence option of Redis if performance is not high.

conclusion

After understanding the implementation of Redis distribution, I actually think that the principle of most distributed systems is very simple, but in order to ensure the reliability of distributed systems, we need to pay attention to a lot of details, trivial and abnormal.

The distributed locking implemented by RedLock algorithm is simple, efficient and ingenious.

But is RedLock safe for 1? I will also write an article on this issue. Please look forward to it.


Related articles: