Implementation of C++ language based on consistency hash algorithm

  • 2020-04-01 23:29:14
  • OfStack

      There are two key problems to be solved in the implementation of consistent hash algorithm, one is the choice of data structure for node storage and lookup, the other is the choice of node hash algorithm.

      First, let's talk about the data structure used to store nodes in a consistent hash algorithm. By understanding the principle of consistent hash, we know that nodes can be thought of as being stored on A ring data structure (as shown in the figure below), and nodes A, B, C, and D are ordered in the ring distribution by hash value, that is, nodes can be stored in an ordered queue by hash value. As shown in the following figure, when a request point P with a hash value of -2^20 looks up the routing node, the consistent hash algorithm will route clockwise to the first node (B) according to the hash value, which is equivalent to finding the node with the smallest value greater than the key value in the ordered structure of the storage node by the key value of the query. Therefore, we should choose a data structure, which should effectively support frequent addition and deletion of nodes, but also must have ideal query efficiency. So, the red-black tree satisfies these requirements. The red-black tree is an approximately balanced binary search tree, because the worst-case time required for operations such as insert, delete, and find a value is proportional to the height of the tree, and this theoretical upper limit on height allows the red-black tree to be efficient in the worst case, unlike a normal binary search tree. Therefore, we choose to use the red-black tree as the storage structure of the node. In addition to the basic functions of inserting, deleting and searching the red-black tree, we should also add another lookup function for finding the node that is larger than the smallest in the key.

< img style = "border = 0 BACKGROUND - IMAGE: none; BORDER - RIGHT - WIDTH: 0 px; PADDING - LEFT: 0 px; PADDING - RIGHT: 0 px; DISPLAY: inline. BORDER - TOP - WIDTH: 0 px; BORDER - BOTTOM - WIDTH: 0 px; BORDER - LEFT - WIDTH: 0 px; PADDING - TOP: 0 px "title = image border = 0 Alt = image SRC =" / / files.jb51.net/file_images/article/201305/2013050710373329.png "width = 572 height = 326 >

    Next, let's talk about the choice of hash algorithm. The consistent hash algorithm was originally proposed to solve the problem of load balancing. Each entity node contains many virtual nodes, and virtual nodes are the key to balancing the load. We hope that the virtual nodes can be evenly hashed on the whole "ring", so that not only can the routing requests with different hash values be loaded, but also when a node is down, the requests originally routed to the down node can be evenly routed to other nodes without causing a large number of load requests to a node. Here, we chose to use the MD5 algorithm. Through MD5 algorithm, an identifier string (used for marking virtual nodes) can be converted into a 16-byte character array, and then the array is processed to obtain an integer hash value. Because MD5 is highly discrete, the resulting hash values are also very discrete, being evenly hashed onto the "ring."

    The author implemented the consistency hash algorithm in C++ language, and I will describe some key details below.

1. Firstly, entity node class and virtual node class are defined. An entity node corresponds to multiple virtual nodes.

  Entity node CNode_s:



class CNode_s 
{ 
public: 
    
    CNode_s(); 
    CNode_s(char * pIden , int pVNodeCount , void * pData); 

    
    const char * getIden(); 

    
    int getVNodeCount(); 

    
    void setData(void * data); 

    
    void * getData(); 
private: 
    void setCNode_s(char * pIden, int pVNodeCount , void * pData); 
    char iden[100];
    int vNodeCount; 
    void * data;
};

CVirtualNode_s: a virtual node has a pointer to an entity node


class CVirtualNode_s 
{ 
public: 
    
    CVirtualNode_s(); 
    CVirtualNode_s(CNode_s * pNode); 

    
    void setNode_s(CNode_s * pNode); 

    
    CNode_s * getNode_s(); 

    
    void setHash(long pHash); 

    
    long getHash(); 
private: 
    long hash; 
    CNode_s * node; 
}; 

2. Hash algorithm is optional. A hash algorithm interface is defined to facilitate the expansion of other algorithms in the future.

          Here, the MD5hash class is created, and the interface is inherited. The MD5 algorithm is used to calculate the hash value.

Class diagram:

< img style = "border = 0 BACKGROUND - IMAGE: none; BORDER - RIGHT - WIDTH: 0 px; PADDING - LEFT: 0 px; PADDING - RIGHT: 0 px; DISPLAY: inline. BORDER - TOP - WIDTH: 0 px; BORDER - BOTTOM - WIDTH: 0 px; BORDER - LEFT - WIDTH: 0 px; PADDING - TOP: 0 px "title = image border = 0 Alt = image SRC =" / / files.jb51.net/file_images/article/201305/2013050710373330.png "width = 167 height = 209 >    

CHashFun interface:




class CHashFun 
{ 
public: 
    virtual long getHashVal(const char *) = 0; 
}; 

CMD5HashFun class inherits the CHashFun interface and implements the getHashVal function to get the hash value:


class CMD5HashFun : public CHashFun 
{ 
public: 
    virtual long getHashVal (const char * ); 
}; 

long CMD5HashFun::getHashVal(const char * instr) 
{ 
    int i; 
    long hash = 0; 
    unsigned char digest[16]; 

    
    md5_state_t md5state; 
    md5_init(&md5state); 
    md5_append(&md5state, (const unsigned char *)instr, strlen(instr)); 
    md5_finish(&md5state, digest); 

    
    for(i = 0; i < 4; i++) 
    { 
        hash += ((long)(digest[i*4 + 3]&0xFF) << 24) 
            | ((long)(digest[i*4 + 2]&0xFF) << 16) 
            | ((long)(digest[i*4 + 1]&0xFF) <<  8) 
            | ((long)(digest[i*4 + 0]&0xFF)); 
    } 
    return hash; 
}

3. Expand the search function in the red-black tree structure to find the smallest node in the red-black tree that is larger than the key value.

util_rbtree_node_t* util_rbtree_lookup(util_rbtree_t *rbtree, long key) 
{ 
    if((rbtree != NULL) && !util_rbtree_isempty(rbtree)) 
    { 
        util_rbtree_node_t *node = NULL; 
        util_rbtree_node_t *temp = rbtree->root; 
        util_rbtree_node_t *null = _NULL(rbtree); 
        while(temp != null) 
        { 
            if(key <= temp->key) 
            { 
                node = temp; 
                temp = temp->left; 
            } 
            else if(key > temp->key) 
            { 
                temp = temp->right; 
            } 
        } 
        
        return ((node != NULL) ? node : util_rbtree_min(rbtree)); 
    } 
    return NULL; 
} 

4. Create a consistent hash class. Make it has the function of inserting, deleting and finding entity node.

The specific algorithm and operation process have been described in the code comments.


class CConHash 
{ 
public: 
    
    CConHash(CHashFun * pFunc); 

    
    void setFunc(CHashFun * pFunc); 

    
    int addNode_s(CNode_s * pNode); 

    
    int delNode_s(CNode_s * pNode); 

    
    CNode_s * lookupNode_s(const char * object); 

    
    int getVNodes(); 
private: 
    
    CHashFun * func; 
    
    int vNodes; 
    
    util_rbtree_t * vnode_tree; 
}; 

util_rbtree_node_t * vNode2RBNode(CVirtualNode_s * vnode); 

  
CConHash::CConHash(CHashFun * pFunc) 
{ 
    
    assert(pFunc!=NULL); 
    this->func = pFunc; 
    this->vNodes = 0; 
    
    vnode_tree = new util_rbtree_s(); 
    util_rbtree_init(vnode_tree); 
} 

int CConHash::addNode_s(CNode_s * pNode) 
{ 
    if(pNode==NULL) return -1; 
    int vCount = pNode->getVNodeCount(); 
    if(vCount<=0) return -1; 
    CVirtualNode_s * virtualNode ; 
    util_rbtree_node_t * rbNode; 
    char str [100]; 
    char num[10]; 
    strcpy(str,pNode->getIden()); 
    long hash = 0; 
    
    for(int i=0;i<vCount;i++) 
    { 
        virtualNode = new CVirtualNode_s(pNode); 
        
        itoa(i,num,10); 
        strcat(str,num); 
        hash = func->getHashVal(str); 
        virtualNode->setHash(hash); 
        if(!util_rbtree_search(vnode_tree,hash)) 
        { 
            
            rbNode = vNode2RBNode(virtualNode);  
            if(rbNode!=NULL) 
            { 
                
                util_rbtree_insert(vnode_tree,rbNode); 
                this->vNodes++; 
            } 
        } 
    } 
    return 0; 
} 

int CConHash::delNode_s(CNode_s * pNode) 
{ 
    if(pNode==NULL) return -1; 
    util_rbtree_node_t * rbNode; 
    char str [100]; 
    char num [10]; 
    strcpy(str,pNode->getIden());  
    int vCount = pNode->getVNodeCount(); 
    long hash = 0; 
    CVirtualNode_s * node = NULL; 
    
    for(int i=0;i<vCount;i++) 
    { 
        itoa(i,num,10); 
        strcat(str,num);
        hash = func->getHashVal(str); 
        rbNode = util_rbtree_search(vnode_tree,hash); 
        if(rbNode!=NULL) 
        { 
            node = (CVirtualNode_s *) rbNode->data; 
            if(node->getNode_s()==pNode && node->getHash()==hash) 
            { 
                this->vNodes--; 
                
                util_rbtree_delete(vnode_tree,rbNode); 
                delete rbNode; 
                delete node; 
            } 
        } 
    } 
    return 0; 
} 

CNode_s * CConHash::lookupNode_s(const char * object) 
{ 
    if(object==NULL||this->vNodes==0) return NULL; 
    util_rbtree_node_t * rbNode; 
    int key = this->func->getHashVal(object); 
    
    rbNode = util_rbtree_lookup(vnode_tree,key); 
    if(rbNode!=NULL) 
    { 
        return ((CVirtualNode_s *) rbNode->data)->getNode_s(); 
    } 
    return NULL; 
} 

int CConHash::getVNodes() 
{ 
    return this->vNodes; 
} 

  
util_rbtree_node_t * vNode2RBNode(CVirtualNode_s * vnode) 
{ 
    if(vnode==NULL) return NULL; 
    util_rbtree_node_t *rbNode = new util_rbtree_node_t();  
    rbNode->key = vnode->getHash(); 
    rbNode->data = vnode; 
    return rbNode; 
} 

5. Create a client class to test the consistency hash algorithm.

          Write a getIP function to simulate the random IP string.


#include<iostream> 
#include"CNode_s.h" 
#include"CVirtualNode_s.h" 
#include"CHashFun.h" 
#include"CMD5HashFun.h" 
#include"CConHash.h" 
#include<string.h> 
#include<time.h> 

using namespace std; 

void getIP(char * IP) 
{ 
    int a=0, b=0 , c=0 , d=0; 
    a = rand()%256; 
    b = rand()%256; 
    c = rand()%256; 
    d = rand()%256; 
    char aa[4],bb[4],cc[4],dd[4]; 
    itoa(a, aa, 10); 
    itoa(b, bb, 10); 
    itoa(c, cc, 10); 
    itoa(d, dd, 10); 
    strcpy(IP,aa); 
    strcat(IP,"."); 
    strcat(IP,bb); 
    strcat(IP,"."); 
    strcat(IP,cc); 
    strcat(IP,"."); 
    strcat(IP,dd); 
} 

int main() 
{ 
    srand(time(0)); 
    freopen("out.txt","r",stdin); 
    
    CHashFun * func = new CMD5HashFun(); 
    
    CConHash * conhash = new CConHash(func); 

    
    CNode_s * node1 = new CNode_s("machineA",50,"10.3.0.201"); 
    CNode_s * node2 = new CNode_s("machineB",80,"10.3.0.202"); 
    CNode_s * node3 = new CNode_s("machineC",20,"10.3.0.203"); 
    CNode_s * node4 = new CNode_s("machineD",100,"10.3.0.204"); 

    conhash->addNode_s(node1); 
    conhash->addNode_s(node2); 
    conhash->addNode_s(node3); 
    conhash->addNode_s(node4); 

    
//  node1->setData("99999999"); 

    int ans1 ,ans2 ,ans3 ,ans4; 
    ans1=ans2=ans3=ans4=0; 

    char object[100]; 
    CNode_s * node ; 
    
    //conhash->delNode_s(node2); 
    for(int i =0;i<30;i++) 
    { 
    //  getIP(object); 
    //  cout<<object<<endl; 
        cin>>object; 
        node = conhash->lookupNode_s(object); 
        if(node!=NULL) 
        { 
            cout<<object<<"----->t"<<node->getIden()<<" t "<<(char *)node->getData()<<endl; 
            if(strcmp(node->getIden(),"machineA")==0) ans1++; 
            if(strcmp(node->getIden(),"machineB")==0) ans2++; 
            if(strcmp(node->getIden(),"machineC")==0) ans3++; 
            if(strcmp(node->getIden(),"machineD")==0) ans4++; 
        } 
    } 

    cout<<"Total test cases : "<<ans1+ans2+ans3+ans4<<endl; 
    cout<<"Map to MachineA : "<<ans1<<endl; 
    cout<<"Map to MachineB : "<<ans2<<endl; 
    cout<<"Map to MachineC : "<<ans3<<endl; 
    cout<<"Map to MachineD : "<<ans4<<endl; 
    fclose(stdin); 
    return 0; 
}

6. Test the influence of deleting nodes on hash routing

< img style = "border = 0 BACKGROUND - IMAGE: none; BORDER - RIGHT - WIDTH: 0 px; PADDING - LEFT: 0 px; PADDING - RIGHT: 0 px; DISPLAY: inline. BORDER - TOP - WIDTH: 0 px; BORDER - BOTTOM - WIDTH: 0 px; BORDER - LEFT - WIDTH: 0 px; PADDING - TOP: 0 px "title = image border = 0 Alt = image SRC =" / / files.jb51.net/file_images/article/201305/2013050710373331.png "width = 445 height = 103 >

Screenshot of test results:

< img style = "border = 0 BACKGROUND - IMAGE: none; BORDER - RIGHT - WIDTH: 0 px; PADDING - LEFT: 0 px; PADDING - RIGHT: 0 px; DISPLAY: inline. BORDER - TOP - WIDTH: 0 px; BORDER - BOTTOM - WIDTH: 0 px; BORDER - LEFT - WIDTH: 0 px; PADDING - TOP: 0 px "title = image border = 0 Alt = image SRC =" / / files.jb51.net/file_images/article/201305/2013050710373332.png "width = 387 height = 506 > < img style =" border = 0 BACKGROUND - image: none; BORDER - RIGHT - WIDTH: 0 px; PADDING - LEFT: 0 px; PADDING - RIGHT: 0 px; DISPLAY: inline. BORDER - TOP - WIDTH: 0 px; BORDER - BOTTOM - WIDTH: 0 px; BORDER - LEFT - WIDTH: 0 px; PADDING - TOP: 0 px "title = image border = 0 Alt = image SRC =" / / files.jb51.net/file_images/article/201305/2013050710373333.png "width = 392 height = 506 >

Analysis: In the above two figures, the left side shows the routing situation of the original four entity nodes, and the latter shows the routing situation after Node2 (Node2) is deleted. It is not difficult to find that after MachineB down, the original routing request is more evenly loaded to other machine nodes and has no impact on the original routing to other nodes. For example, 139.149.184.125 this request will still be routed to MachineD and will not be affected by the reduction of nodes. However, if the entity node is added, it may cause the phenomenon of inconsistent routing before and after the increase, because the route interval is narrower, but it will not have a particularly big impact. On the other hand, it can be found that the proportion distribution of the number of virtual nodes of the entity node greatly affects the load routing of the node, and the proportion is roughly consistent with the number of virtual nodes.

Conclusion:

   This paper firstly introduces the key algorithms to implement the consistent hash algorithm and the selection and analysis of the data structure, selects the red-black tree as the storage structure of the virtual node, and the MD5 algorithm as the hash function to calculate the hash value of the node. In addition, C++ language is used to implement the consistency hash algorithm, and the basic functions of adding, deleting and searching the entity nodes of consistency hash are realized. Because the author's level is limited, there are many areas for improvement, so this paper is only for your reference, discussion and study.


Related articles: