Using C language to implement HashTable

  • 2020-04-02 01:29:16
  • OfStack

HashTable is a very important structure in practice, and we'll discuss a simple implementation that is simple, but has all the pieces.

One, access interface
Create a hashtable.
Hashtable hashtable_new (int size)/where size represents the number of contacts contained.

Store the key-value into the hashtable.
Void hashtable_put (hashtable h,const char* key,void *val);

Fetch the value from the hashtable based on the key.
Void * hashtable_get (hashtable h,const char *key);

Release the hashtable.
Void hashtable_free (hashtable h);

Release a single hash contact
Void hashtable_delete_node (hashtable h, const char *key);

Two, data structure
Hash contact structure:


typedef struct hashnode_struct{ 
struct hashnode_struct *next; 
const char *key; 
void *val; 
}*hashnode,_hashnode; 

This structure is easy to understand and includes a linked list structure for conflicts in addition to the required key-value.
The data structure of hashtable:

typedef struct hashtable_struct{ 
pool_t p; 
int size; 
int count; 
struct hashnode_struct *z; 
}*hashtable,_hashtable; 

The structure is explained as follows:
Pool_t: the memory pool structure manages the memory used by hashtable. Structure reference "C language memory pool usage model"
Size: the size of the contact space for the current hash.
Count: represents the number of hash contacts available in the current contact space
Z: used to store contacts in the contact space.

Three, create hashtable
The code is as follows:


hashtable hashtable_new ( int size )  
{ 
hashtable ht; 
pool_t p; 
p = _pool_new_heap ( sizeof ( _hashnode ) *size + sizeof ( _hashtable ));  
ht= pool_malloc ( p, sizeof ( _hashtable ));  
ht->size = size; 
ht->p = p; 
ht->z = pool_malloc ( p, sizeof ( _hashnode ) *prime );  
return ht; 
} 

This function is relatively simple, first define and initialize a memory pool, the size is determined by the size, so in the actual use, our size should be allocated to a relatively large point, it is better.

4. Store key-value value
Before you do this, you define a function that calculates hashcode based on the KEY value.


static int hashcode ( const char *s, int len )  
{ 
const unsigned char *name =  ( const unsigned char * ) s; 
unsigned long h = 0, g; 
int i; 
for ( i=0;i 
{ 
h =  ( h  "  4 )  +  ( unsigned long ) ( name[i] );  //Hash moved 4 bits to the left, the current character ASCII into the hash
if  (( g =  ( h & 0xF0000000UL ))! =0 )  
h ^=  ( g  "  24 );  
h &= ~g; //Clear 28-31 bits.
} 
return  ( int ) h; 
} 

This function USES the classic ELF hash function.
The code is as follows:

void hashtable_put ( hashtable h, const char *key, void *val )  
{ 
if ( h == NULL || key == NULL )  
return; 
int len = strlen ( key );  
int index = hashcode ( key,len );  
hashtable node; 
h->dirty++; 
if (( node = hashtable_node_get ( h, key,len, index ))  != NULL )  //If it already exists, replace it with the current value, because it's new.
{ 
n->key = key; 
n->val = val; 
return; 
} 
node = hashnode_node_new ( h, index );  //Create a new HASH NODE contact.
node->key = key; 
node->val = val; 
} 
hashtable_node_get To find the KEY Whether in HASH In already exists, the implementation is very simple, as follows:  
static hashnode hashtable_node_get ( hashtable h, const char *key, int len, int index )  
{ 
hashnode node; 
int i = index % h->size; 
for ( node = &h->z[i]; node != NULL; node = node->next )  //The search is traversed over the HASH bucket corresponding to the index value [HASH value]
if ( node->key != NULL &&  ( strlen ( node->key ) ==len )  &&  ( strncmp ( key, node->key, len )  == 0 ))  
return node; 
return NULL; 
} 

Create a new HASH NODE contact as follows:

static hashnode hashnode_node_new ( hashtable h, int index )  
{ 
hashnode node; 
int i = index % h->size; 
h->count++; 
for ( node = &h->z[i]; node != NULL; node = node->next )  
if ( node->key == NULL )  //Here's the deal: if a value exists in the HASH bucket and the KEY is empty, the value is no longer useful, and it is replaced with the new contact that is now ready to be written.
return node; 
node = pool_malloc ( h->p, sizeof ( _hashnode ));  //Create a new contact
node->next = h->z[i].next; //To add to the bucket is to add to the first contact of the list.
h->z[i].next = node; 
return node; 
} 

Five, get the contacts from the HASHTABLE
To get a contact from a hashtable based on the KEY, calculate the hash value based on the KEY, and then find the specified contact or a linked list of contacts from the hashtable. As follows:

void *hashtable_get ( hashtable h, const char *key )  
{ 
if ( h == NULL || key == NULL )  
return NULL; 
hashnode node; 
int len = strlen ( key );  
if ( h == NULL || key == NULL || len <= 0 ||  ( node = hashtable_node_get ( h, key, len, hashcode ( key,len )))  == NULL )  
{ 
return NULL; 
} 
return node->val; 
} 

This function is easy to understand.

Six, release the HASHTABLE
The release of hashtable is relatively simple, since all of our memory requests are completed on the memory pool, we only need to release the memory pool, as follows:


void hashtable_free ( hashtable h )  
{ 
if ( h != NULL )  
pool_free ( h->p );  
} 

Seven, release a single hash contact
The code is as follows:

void hashtable_delete_node ( hashtable h, const char *key )  
{ 
if ( h == NULL || key == NULL )  
return; 
hashnode node; 
int len = strlen ( key );  
if ( h == NULL || key == NULL ||  ( node = hashtable_node_get ( h, key, len, hashcode ( key,len )))  == NULL )  //There is no such contact
return; 
node->key = NULL; 
node->val = NULL; 
h->count--; 
} 

This implemented a simple HASHTABLE structure, of course, there are still shortcomings, such as traversing the HASHTABLE, if the array method to traverse, the efficiency is certainly very low, the following discussion of an implementation plan, used to traverse the HASHTABLE.

Eight, the traversal discussion of hashtable
The struct hashnode_struct array in hashtable can be traversed directly by arrays. However, if only one contact is included, all arrays should be traversed as follows:


void hashtable_traverse ( hashtable h )  
{ 
int i; 
hashnode node; 
if ( h == NULL )  
return; 
for ( i = 0; i < h->prime; i++ )  
for ( node = &h->z[i]; node != NULL; node = node->next )  
if ( node->key != NULL && node->val != NULL )  
XXXXXXXXXXXXXXXXX //Here are some operations.
} 

This is inefficient, and actually contains the next field in the contact, which you can use to traverse.
We need to make a simple change to the previous hashtable data structure by adding two fields:

typedef struct hashtable_struct{ 
pool_t p; 
int size; 
int count; 
struct hashnode_struct *z; 
int bucket; 
hashnode node; 
}*hashtable,_hashtable; 

This is to add two domains, bucket and node. The idea of adding these two domains is as follows:
Node represents the cursor that is currently being traversed. During traversal, the node is constantly moved to the point that the node is pointing to.
A bucket is associated with a node and is used to record which bucket the current node is in.
Firstly, the connection is established, that is, all the contacts are connected. By convention, the XXX_iter_first function is also used to initialize, as follows:

int hashtable_iter_first ( hashtable h )  { 
if ( h == NULL )  
return 0; 
h->bucket = -1; 
h->node = NULL; 
return hashtable_iter_next ( h );  
} 
hashtable_iter_next To get the next contact, if the cursor has been determined, the next contact will be determined quickly, as follows:  
int xhash_iter_next ( xht h )  { 
if ( h == NULL )  return 0; 
while ( h->node != NULL )  { 
h->node = h->node->next; //Move to the next contact and return success if the contact is valid
if ( h->node != NULL && h->node->key != NULL && h->node->val != NULL )  
return 1; 
} 
for ( h->bucket++; h->bucket < h->prime; h->bucket++ )  { 
h->node = &h->z[h->bucket]; 
while ( h->node != NULL )  { 
if ( h->node->key != NULL && h->node->val != NULL )  
return 1; 
h->node = h->node->next; 
} 
} 
h->bucket = -1; //There is no next contact.
h->node = NULL; 
return 0; 
} 

With the above two methods, the traversal operation is as follows:

hashtable ht 
if ( hashtable_iter_first ( ht ))  //Take the first contact.
do{ 
//Ht -> Node, which represents the current contact.
}while ( hashtable_iter_next ( ht ));  //Take the next contact

In this way, it is not much more efficient. Of course, on the first pass, you still need to go through the entire array and the buckets under the array. But after that, when you delete a node, you have to do something. When deleting a contact, you need to consider the current h- > Node is not the currently deleted node, and if so, h- > Node is called to the next contact. After deleting namely, should make following processing, if deleted.

If the node is deleted, the following processing is required:
If (h - > The node = = n)
Hashtable_iter_next (h);
The h - > Node moves to the next node.


Related articles: