C language B tree in depth understanding

  • 2020-04-01 21:30:54
  • OfStack

A B tree is a balanced lookup tree designed for disks or other direct storage devices. As shown in the following figure. Each node that the arrow points to is called the entry degree, and the one that points to is called the exit degree. The degree of tree structure is 1, otherwise it becomes a graph, so we generally say that the degree of the tree refers to the degree of the tree, that is, the number of children of a node. With the concept of degree, we simply define B tree (suppose the minimum degree of a tree is M) :
1. Each node has at least m-1 key codes and at most 2M-1 key codes;
2. Except the root node and leaf node, each node has at least M children nodes and at most 2M children nodes;
3. The root node has at least two children, the only exception is the case of only the root node, there is no child;
4. All leaves are on the same layer.

(link: #)

Let's take a look at the structure of its nodes, as shown in the following figure:

(link: #)
Each node holds a key and a pointer to a child, and it is easy to see that there is one more pointer than the key.

From the definition of B tree, we can see some characteristics of it:
1. Tree height equilibrium, all leaves are in the same layer;
2. The keyword is not repeated and sorted in ascending order. The key code of the parent node is the boundary of the child node.
3.B tree makes use of the principle of access locality by placing related records with similar values on the same disk page;
4.B tree ensures that a certain proportion of nodes are full, which can improve the space utilization.

How do we determine the size of the B tree? To minimize disk operations, the node size is usually set to the size of a disk page. The height of the tree is typically no more than three levels, which means that finding a key requires only three disk operations.
In the implementation, I refer to the content of "introduction to algorithms" and assume:
1. The root node of B tree is always in main memory, no need to read the disk operation; However, a write disk operation is performed after the root node changes;
2. When any node is passed as a parameter, read the disk once.

In the implementation of the time also made a simplification, each node in addition to the key code and pointer, should also have the key code corresponding to the record of the file information, such as the file offset, otherwise how to find this record. This additional data is not placed in the node during the implementation. Below is the structure of the tree defined. The file name is btrees.

 
 
# define M 2 
 
typedef int bool ; 
struct btnode{  
int keyNum;  
int k[2*M-1];  
struct btnode * p[2*M];  
bool isleaf; 
}; 
struct searchResult{ 
struct btnode *ptr;  
int pos;  
}; 

Here is the code to create an empty tree. The file name is btree.c:
 
# include <stdio.h> 
# include <stdlib.h> 
# include "btrees.h" 
 
struct btnode * allocateNode( struct btnode *ptr){ 
int i,max; 
ptr = ( struct btnode *) malloc ( sizeof ( struct btnode)); 
if (!ptr){ 
printf ( "allocated error!/n" ); 
exit (1); 
} 
max = 2*M; 
for (i=0; i<max; i++) 
ptr->p[i] = NULL;  
memset (ptr->k, 0, (max-1)* sizeof ( int ));  
return ptr; 
} 
 
struct btnode * btreeCreate( struct btnode *root){ 
root = allocateNode(root); 
root->keyNum = 0; 
root->isleaf = 1; 
return root; 
} 
(link: #) (# link:) B tree is inserted in leaf nodes. Because the number of key codes in nodes of B tree is limited, the number of nodes of B tree with minimum degree M is from m-1 to 2M-1. For example, the following figure is a B tree with a minimum degree of 2 (also known as a 2-3 tree). As shown in the following figure, the number of nodes is 1-3.
(link: #)
First, locate the position to be inserted. If the number of key codes of leaf nodes has not reached the upper limit, for example, insert 32, it is relatively simple to insert directly. If the critical code of the leaf reaches the upper limit, it will split into two children and put the middle key up into the parent node. But in the extreme case, even if the parent is full, you have to split again, and maybe you end up splitting the root. But this algorithm is not easy to implement.
In the introduction to algorithms, another idea is implemented, which is to split first. In the process of finding the insertion location, if a node is full, it will split first, which ensures that when data is inserted at the last leaf, the parent of the leaf is always dissatisfied. Here's an example:

We use the method of node by node insertion to create a B tree, the node order is {18, 31, 12, 10, 15, 48, 45, 47, 50, 52, 23, 30, 20}, let's see the specific process:
1. Create an empty B tree;
2. Insert 18, which is non-full, as shown below:

(link: #)
3. Similarly, it is simpler to insert 31 and 12, as shown below:

(link: #)
4. Insert 10, at this time, the root node is full, it will split, because the root node is special, there is no parent, it will be treated separately, Mr. Into a nil as the new root node, and then split, as shown below:

(link: #)
5. Insert 15, 48 and 45 again. Because it is not full, insert directly, as shown below:

(link: #)
6. Insert 47. When the leaf is full, split and insert, as shown below:

(link: #)

Other are the same reason, will not be repeated, the following is the source code, added to btree

(link: http://hi.baidu.com/kurt023/blog/item/4c368d8b51c59ed3fc1f10cc.html)

He did it on his blog, except that the number of Pointers and key yards were the same when he defined the B tree, so I rewrote it myself.

 
//Function purpose: to split the number of storage to the largest node
void btreeSplitChild( struct btnode *parent, int pos, struct btnode *child){ 
struct btnode *child2; 
int i; 
//Allocate space for newly split nodes
child2 = allocateNode(child2); 
//Is the same as the split point
child2->isleaf = child->isleaf; 
//Set the number of nodes
child2->keyNum = M-1; 
//Copy the data
for (i=0; i<M-1; i++) 
child2->k[i] = child->k[i+M]; 
//If it is not a leaf node, copy the pointer
if (!child->isleaf) 
for (i=0; i<M; i++) 
child2->p[i] = child->p[i+M]; 
child->keyNum = M-1; 
//Inserts the middle number into the parent node as an index
//The key and pointer after the insertion point are moved back one position
for (i=parent->keyNum; i>pos; i--){ 
parent->k[i] = parent->k[i-1]; 
parent->p[i+1] = parent->p[i]; 
} 
parent->k[pos] = child->k[M-1]; 
parent->keyNum++; 
parent->p[pos+1] = child2; 
} 
 
void btreeInsertNoneFull( struct btnode *ptr, int data){ 
int i; 
struct btnode *child; //The child of the node to insert
i = ptr->keyNum; 
//If it is a leaf node, insert the data directly
if (ptr->isleaf){ 
while ((i>0) && (data<ptr->k[i-1])){ 
ptr->k[i] = ptr->k[i-1]; 
i--; 
} 
//Insert data
ptr->k[i] = data; 
ptr->keyNum++; 
} 
else { //Instead of a leaf node, find the child node where the data should be inserted and insert it
while ((i>0) && (data<ptr->k[i-1])) 
i--; 
child = ptr->p[i]; 
if (child->keyNum == 2*M-1){ 
btreeSplitChild(ptr, i, child); 
if (data > ptr->k[i]) 
i++; 
} 
child = ptr->p[i]; 
btreeInsertNoneFull(child, data); //Recursion in the subtree
} 
} 
 
struct btnode * btreeInsert( struct btnode *root, int data){ 
struct btnode * new ; 
 
if (root->keyNum == 2*M-1){ 
new = allocateNode( new ); 
new ->isleaf = 0; 
new ->keyNum = 0; 
new ->p[0] = root; 
btreeSplitChild( new , 0, root); 
btreeInsertNoneFull( new , data); 
return new ; 
} 
else { //Not to the maximum number of data, just insert
btreeInsertNoneFull(root, data); 
return root; 
} 
} 
//Function purpose: breadth-first display tree
void btreeDisplay( struct btnode *root){ 
int i, queueNum=0; 
int j; 
struct btnode *queue[20]; 
struct btnode *current; 
//Join the queue
queue[queueNum] = root; 
queueNum++; 
while (queueNum>0){ 
//Out of the team
current = queue[0]; 
queueNum--; 
//After the first element is removed, the next element moves forward one position
for (i=0; i<queueNum; i++) 
queue[i] = queue[i+1]; 
//According to the node
j = current->keyNum; 
printf ( "[ " ); 
for (i=0; i<j; i++){ 
printf ( "%d " , current->k[i]); 
} 
printf ( "] " ); 
//Child node enqueue
if (current!=NULL && current->isleaf!=1){ 
for (i=0; i<=(current->keyNum); i++){ 
queue[queueNum] = current->p[i]; 
queueNum++; 
} 
} 
} 
printf ( "/n" ); 
} 
int main() 
{ 
struct btnode *root; 
int a[13] = {18, 31, 12, 10, 15, 48, 45, 47, 50, 52, 23, 30, 20}; 
int i; 
root = btreeCreate(root); 
for (i=0; i<13; i++){ 
root = btreeInsert(root, a[i]); 
btreeDisplay(root); 
} 
return 0; 
} 
(link: #) (# link:)


(link: #)
B trees generated by the same batch of keys with different algorithms may be different. For example, when the nodes of four keys [1,2,3,4] are split, it is ok to put 2 or 3 up; The same algorithm may be inserted in different order.
The attachment is the source code, compiled in Linux through.

Related articles: