The Python Trie tree implements dictionary sort
- 2020-04-02 13:29:51
- OfStack
What is a Trie tree
Trie trees, also known as dictionary trees, word lookup trees, or prefix trees, are a multi-tree structure for quick retrieval. The dictionary of the figure is a 10-fork tree:
< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201403/201403282342282.jpg" >
Similarly, the dictionary number of lowercase or uppercase English letters is a 26-fork tree. As can be seen from the figure above, the root node of the Trie tree does not store data, and all data is stored in its child node. There are strings go, golang, PHP, python, perl, and its Trie tree can be constructed as shown in the following figure:< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201403/201403282342283.jpg" >
So let's look at the top picture. Except for the root node, each child node stores only one character. Go and golang share the go prefix, while PHP, perl, and python share only the p prefix. In order to implement lexicographical sort, the characters stored on each layer of nodes are stored in lexicographical sort (depending on how they are traversed). Let's first look at how to lexicographically sort individual characters. This article only considers lowercase letters, otherwise similar. 'a' comes before 'b', and 'a' has less ASCII than 'b', so you can get the dictionary order by subtracting their ASCII. And python has built-in lexicographical sort apis, such as:
#!/usr/bin/env python
#coding: utf8
if __name__ == '__main__':
arr = [c for c in 'python']
arr.sort()
print arr
It can also be implemented using bitmap, which I introduced in a previous article: (link: #). The implementation code is as follows:
#!/usr/bin/env python
#coding: utf8
class Bitmap(object):
def __init__(self, max):
self.size = self.calcElemIndex(max, True)
self.array = [0 for i in range(self.size)]
def calcElemIndex(self, num, up=False):
'''up for True Is rounded up , Otherwise, it is rounded down '''
if up:
return int((num + 31 - 1) / 31) # Take up the whole
return num / 31
def calcBitIndex(self, num):
return num % 31
def set(self, num):
elemIndex = self.calcElemIndex(num)
byteIndex = self.calcBitIndex(num)
elem = self.array[elemIndex]
self.array[elemIndex] = elem | (1 << byteIndex)
def clean(self, i):
elemIndex = self.calcElemIndex(i)
byteIndex = self.calcBitIndex(i)
elem = self.array[elemIndex]
self.array[elemIndex] = elem & (~(1 << byteIndex))
def test(self, i):
elemIndex = self.calcElemIndex(i)
byteIndex = self.calcBitIndex(i)
if self.array[elemIndex] & (1 << byteIndex):
return True
return False
if __name__ == '__main__':
MAX = ord('z')
suffle_array = [c for c in 'python']
result = []
bitmap = Bitmap(MAX)
for c in suffle_array:
bitmap.set(ord(c))
for i in range(MAX + 1):
if bitmap.test(i):
result.append(chr(i))
print ' The original array is : %s' % suffle_array
print ' The sorted array is : %s' % result
The bitmap sort cannot have duplicate characters. In fact, there are already many sophisticated algorithms for sorting dictionaries based on ASCII subtraction, such as insertion sort, hill sort, bubble sort and heap sort. For the sake of simplicity, this article USES Python's built-in sorted method to do single-character dictionary sorting. This is also possible if the reader implements the sorting of a single-character array, and this will allow you to customize the sorting of strings.
Implementation approach
The entire implementation consists of two classes: the Trie class and the Node class. The Node class represents the nodes in the Trie tree and is organized by the Trie class into a Trie tree. Let's start with the Node class:
#!/usr/bin/env python
#coding: utf8
class Node(object):
def __init__(self, c=None, word=None):
self.c = c # A single character stored by a node
self.word = word # The node stores the words
self.childs = [] # Child of this node
Node contains three member variables. C is the character stored on each node. Word means a complete word, in this case a string. Childs contains all children of this node. Since c is stored in each node, what is the use of storing word? And which node should this word be on? Let's use the same example: go and golang, for example, share the go prefix, but it's a good idea to do a string search, because it will provide the original string, just search the Trie tree by the path. However, for sorting, no input is provided, so there is no way to know where the word boundary is, and word in the Node class is the word boundary. Specifically, it is stored on the last node of the word, as shown in the figure:
< img border = 0 SRC = "/ / files.jb51.net/file_images/article/201403/201403282342284.jpg" >
The c member of the Node class can be undefined if the tree is not searched, because it is not required in the sort.
Next let's look at the definition of the Trie class:
#!/usr/bin/env python
#coding: utf8
'''Trie Tree implements string array dictionary sort '''
class Trie(object):
def __init__(self):
self.root = Node() # Trie The tree root Node reference
def add(self, word):
''' Add string '''
node = self.root
for c in word:
pos = self.find(node, c)
if pos < 0:
node.childs.append(Node(c))
# For the sake of simplicity, use it directly here Python The built-in sorted To sort
#pos Question, because sort After the pos Get off , So again find To get real pos
# Custom single character array sort way can achieve arbitrary rules of the string array sort
node.childs = sorted(node.childs, key=lambda child: child.c)
pos = self.find(node, c)
node = node.childs[pos]
node.word = word
def preOrder(self, node):
''' First order output '''
results = []
if node.word:
results.append(node.word)
for child in node.childs:
results.extend(self.preOrder(child))
return results
def find(self, node, c):
''' Find the position where the character is inserted '''
childs = node.childs
_len = len(childs)
if _len == 0:
return -1
for i in range(_len):
if childs[i].c == c:
return i
return -1
def setWords(self, words):
for word in words:
self.add(word)
Trie contains 1 member variable and 4 methods. Root is used to refer to a root node. It does not store specific data, but it has child nodes. The setWords method is used for initialization, and the add method is called to initialize the Trie tree on a per-string basis. The add method adds each character to the child node, shares it if it exists and looks for the next child node, and so on. Find is used to find if a child node has been established to store a character, while preOrder is used to get the stored word in order first. There are three ways to traverse a tree: in order, in order, and in order, if you don't understand, you can Google to understand. Let's test it out:
#!/usr/bin/env python
#coding: utf8
'''Trie Tree implements string array dictionary sort '''
class Trie(object):
def __init__(self):
self.root = Node() # Trie The tree root Node reference
def add(self, word):
''' Add string '''
node = self.root
for c in word:
pos = self.find(node, c)
if pos < 0:
node.childs.append(Node(c))
# For the sake of simplicity, use it directly here Python The built-in sorted To sort
#pos Question, because sort After the pos Get off , So again find To get real pos
# Custom single character array sort way can achieve arbitrary rules of the string array sort
node.childs = sorted(node.childs, key=lambda child: child.c)
pos = self.find(node, c)
node = node.childs[pos]
node.word = word
def preOrder(self, node):
''' First order output '''
results = []
if node.word:
results.append(node.word)
for child in node.childs:
results.extend(self.preOrder(child))
return results
def find(self, node, c):
''' Find the position where the character is inserted '''
childs = node.childs
_len = len(childs)
if _len == 0:
return -1
for i in range(_len):
if childs[i].c == c:
return i
return -1
def setWords(self, words):
for word in words:
self.add(word)
class Node(object):
def __init__(self, c=None, word=None):
self.c = c # A single character stored by a node
self.word = word # The node stores the words
self.childs = [] # Child of this node
if __name__ == '__main__':
words = ['python', 'function', 'php', 'food', 'kiss', 'perl', 'goal', 'go', 'golang', 'easy']
trie = Trie()
trie.setWords(words)
result = trie.preOrder(trie.root)
print ' Raw string array : %s' % words
print 'Trie Tree sorted : %s' % result
words.sort()
print 'Python the sort After ordering : %s' % words
conclusion
There are so many kinds of trees. In the implementation of tree structure, tree traversal is difficult and requires more practice. The above code was written in a hurry, without any optimization, but on this basis you can achieve any kind of string sorting, string search, and so on.