Implement a high performance Counter of Counter instance in Django

  • 2020-04-02 13:51:22
  • OfStack

Counter is a very common functional component, and this blog takes the number of unread messages as an example to explain the basics of implementing a high-performance Counter in Django.

The beginning of the story:.count()

Suppose you have a Notification Model class that basically holds all in-station notifications:


class Notification(models.Model):
    """ A simplified one Notification Class, with three fields:     - `user_id`: User of the message owner ID
    - `has_readed`: Indicates whether the message has been read
    """     user_id = models.IntegerField(db_index=True)
    has_readed = models.BooleanField(default=False)

Of course, you'll start with a query like this to get the number of unread messages from a user:

# To obtain ID for 3074 The number of unread messages of the user
Notification.objects.filter(user_id=3074, has_readed=False).count()

This is fine when your Notification table is small, but slowly, as the business grows. There are hundreds of millions of pieces of data in the message table. Many lazy users have thousands of unread messages.

At this point, you need to implement a counter to count the number of unread messages per user, so that instead of counting (), we can perform a simple primary key query (or better) to get the number of unread messages in real time.

Better solution: set up a counter

First, let's create a new table to store the number of unread messages per user.


class UserNotificationsCount(models.Model):
    """ this Model Holds the number of unread messages per user """     user_id = models.IntegerField(primary_key=True)
    unread_count = models.IntegerField(default=0)     def __str__(self):
        return '<UserNotificationsCount %s: %s>' % (self.user_id, self.unread_count)

We provide each registered user with a corresponding record of UserNotificationsCount to hold the number of his unread messages. Every time when he was on the number of unread messages, only need UserNotificationsCount. Objects. The get (pk = user_id). Unread_count is ok.

Now, the big question is, how do we know when to update our counters? Does Django offer any shortcuts to this?

The challenge: update your counter in real time

In order for our counter to work properly, we must update it in real time. This includes:

1. When a new unread message arrives, the counter is +1
2. When the message is deleted by an exception, if the associated message is unread, it is counter -1
3. When a new message is read, it is counter -1
Let's tackle these cases one by one.

Before throwing out the solution, we need to introduce a feature in Django: Signals. Signals is an event notification mechanism that Django provides that lets you listen for certain custom or preset events, and when those events occur, call the methods defined by the implementation.

Such as django. Db. Models. Signals. Pre_save & django. Db. Models. Signals. Post_save represents a Model called the save method before and after the trigger event, it is a little on the function and Database triggers.

For more on the Signals, see the official documentation, and here's a look at what Signals can do for our counters.

1. When a new message comes, the counter is +1

This is probably the best situation to handle. Using Django Signals, we can update the counter in this case with only a few lines of code:


from django.db.models.signals import post_save, post_delete def incr_notifications_counter(sender, instance, created, **kwargs):
    # Only if this instance It's new, and has_readed Is the default false To update
    if not (created and not instance.has_readed):
        return     # call update_unread_count Method to update the counter +1
    NotificationController(instance.user_id).update_unread_count(1) # Listening to the Notification Model the post_save signal
post_save.connect(incr_notifications_counter, sender=Notification)

So every time you create a new Notification using methods like notification.create or.save(), our NotificationController gets notified with a counter of +1.

Note, however, that since our counters are based on Django signals, our counters won't be notified if you have places in your code that use raw SQL and don't add new notifications via the Django ORM method, so it's best to specify how all new notifications are created, such as using the same API.

2. When the message is deleted by an exception, if the associated message is unread, it is counter -1

With the experience of the first one, this situation is also relatively simple to handle. You just need to monitor the post_delete signal of the Notification. Here is an instance code:


def decr_notifications_counter(sender, instance, **kwargs):
    # When the deleted message has not been read, the counter -1
    if not instance.has_readed:
        NotificationController(instance.user_id).update_unread_count(-1) post_delete.connect(decr_notifications_counter, sender=Notification)

At this point, the delete event of the Notification can also update our counter normally.

3. When reading a new message, is counter -1

Next, when the user reads an unread message, we also need to update our unread message counter. You might say, why is that so hard? I just need to update my counter manually in my reading method.

Like this:


class NotificationController(object):     ... ...     def mark_as_readed(self, notification_id):
        notification = Notification.objects.get(pk=notification_id)
        # There is no need to double mark a notification that has been read
        if notication.has_readed:
            return         notification.has_readed = True
        notification.save()
        # Updating our counter here, well, I feel great
        self.update_unread_count(-1)

With a few simple tests, you might think that your counter works pretty well, but there's a fatal problem with this implementation, which doesn't handle concurrent requests properly.

For example, if you have an unread message object with id 100 and two requests come in at the same time, mark the notification as read:


# Because of two concurrent requests, it is assumed that the two methods are called almost simultaneously
NotificationController(user_id).mark_as_readed(100)
NotificationController(user_id).mark_as_readed(100)

Obviously, both of these methods successfully marked the notification as read, because in the case of concurrency, checks like if notification.has_readed would not work, so our counter would be incorrectly -1 twice, but we were reading only one request.

So how can such problems be solved?

Basically, there is only one way to resolve data conflicts caused by concurrent requests: lock. Here are two simpler solutions:

Use the select for update database query

The select... For update is a database-level solution for concurrent data retrieval and modification scenarios. Major relational databases such as mysql and postgresql support this feature, and the new version of Django ORM even provides shortcut to this feature. For more information, you can search the documentation for the database you are using.

Using select for update, our code might look like this:


from django.db import transaction class NotificationController(object):     ... ...     def mark_as_readed(self, notification_id):
        # The manual for select for update and update Statement occurs in a complete transaction
        with transaction.commit_on_success():
            # use select_for_update To ensure that only one request is being processed at a time and the others
            # Waiting for lock release
            notification = Notification.objects.select_for_update().get(pk=notification_id)
            # There is no need to double mark a notification that has been read
            if notication.has_readed:
                return             notification.has_readed = True
            notification.save()
            # Updating our counter here, well, I feel great
            self.update_unread_count(-1)

In addition to using ' 'select for update' ', there is a simpler way to solve this problem.

Use update for atomic modification

In fact, the simpler way, as long as we change our database to a single update can solve the problem of concurrency:


def mark_as_readed(self, notification_id):
        affected_rows = Notification.objects.filter(pk=notification_id, has_readed=False)
                                            .update(has_readed=True)
        # affected_rows Will return update Number of entries modified by statement
        self.update_unread_count(affected_rows)

In this way, concurrent token read operations can also correctly affect our counters.

High performance?

We explained earlier how to implement an unread message counter that can be updated correctly. We might use the UPDATE statement to modify our counter directly, like this:


from django.db.models import F def update_unread_count(self, count)
    # use Update Statement to update our counter
    UserNotificationsCount.objects.filter(pk=self.user_id)
                                  .update(unread_count=F('unread_count') + count)

However, in a production environment, this approach can cause serious performance problems, because a large number of updates can put a lot of pressure on the database if our counters are being updated frequently. So to achieve a high performance counter, we need to store the changes temporarily and then write them in bulk to the database.

Using redis's sorted set, we can do this quite easily.

We use sorted set to cache counter changes

Redis is a great in-memory database, and sorted set is the data type it provides: ordered sets, which allow us to simply cache all the counter changes and write them back to the database in batches.


RK_NOTIFICATIONS_COUNTER = 'ss_pending_counter_changes' def update_unread_count(self, count):
    """ The modified update_unread_count methods """
    redisdb.zincrby(RK_NOTIFICATIONS_COUNTER, str(self.user_id), count) # At the same time, we also need to modify the method of getting the number of unread messages of the user to make it get redis Those are not written back
# Buffer data to the database. The code is omitted here

Through the above code, we put the counter of the update buffer in redis, we also need a script to put this buffer of data periodically back to the database.

By customizing django's command, we can do this very easily:


# File: management/commands/notification_update_counter.py # -*- coding: utf-8 -*-
from django.core.management.base import BaseCommand
from django.db.models import F # Fix import prob
from notification.models import UserNotificationsCount
from notification.utils import RK_NOTIFICATIONS_COUNTER
from base_redis import redisdb import logging
logger = logging.getLogger('stdout')
class Command(BaseCommand):
    help = 'Update UserNotificationsCounter objects, Write changes from redis to database'     def handle(self, *args, **options):
        # First, through zrange Command to get all users of the buffer that have been modified ID
        for user_id in redisdb.zrange(RK_NOTIFICATIONS_COUNTER, 0, -1):
            # It's worth noting here that in order to ensure atomicity of the operation, we used redisdb the pipeline
            pipe = redisdb.pipeline()
            pipe.zscore(RK_NOTIFICATIONS_COUNTER, user_id)
            pipe.zrem(RK_NOTIFICATIONS_COUNTER, user_id)
            count, _ = pipe.execute()
            count = int(count)
            if not count:
                continue             logger.info('Updating unread count user %s: count %s' % (user_id, count))
            UserNotificationsCount.objects.filter(pk=obj.pk)
                                          .update(unread_count=F('unread_count') + count)

After that, a command like python manage.py notification_update_counter can write the batch of changes in the buffer back to the database. You can also configure this command to crontab to define execution.

conclusion

At this point in the article, a simple "high performance" unread message counter is done. Having said that, the main points are as follows:

1. Use Django's signals to get new/delete updates to the Model
2. Use the database's select for update to properly handle concurrent database operations
3. Use redis's sorted set to cache the changes to the counters
I hope it will be of some help to you. :)


Related articles: