FP Growth algorithm Java implementation + concrete implementation ideas + code
- 2021-10-11 18:22:15
- OfStack
Principle of FP-Growth algorithm
Explanations from other bosses
Detailed Explanation of FP-Growth Algorithm
Java Implementation of FP-Growth Algorithm
This article focuses on the realization of 1. If you look at the explanation given above, you can see that you need two scans to build the FP tree
First scan
The first scan filters out all items that do not meet the minimum support; Items that meet the minimum support are sorted in descending order of global support.
With this requirement, the possible difficulty is how to sort item in each transaction according to global support. My realization ideas
Scan the original data set and save it in the 2D list sourceData Maintain 1 Table to save the global support TotalSup of each item Items below threshold minSup are filtered out in Table Convert Table to List and sort it in descending order of TotalSup Create a new 2D list freqSource that retains the frequent entries in sourceData and sorts each transaction in descending order of global supportCode
/**
* Scan the original data set , Generate transaction set
* @param path Dataset path
* @throws IOException
*/
private void scanDataSet(String path) throws IOException {
if(path.equals("")){
path = filePath;
}
FileReader fr = new FileReader(path);
BufferedReader bufferedReader = new BufferedReader(fr);
String str;
// int maxLength = 0;
while ( (str = bufferedReader.readLine())!=null){
ArrayList<Integer> transaction = new ArrayList<>();
String[] tempEntry ;
tempEntry = str.split(" ");
for(int i =0;i< tempEntry.length;i++){
if(!tempEntry[i].equals("")){
int itemValue = Integer.parseInt(tempEntry[i]);
transaction.add(itemValue);
if(!similarSingleItemLinkedListHeadsTable.containsKey(itemValue)){
similarSingleItemLinkedListHeadsTable.put(itemValue, new SimilarSingleItemLinkedListHead(itemValue,null,1));
}else{
// Set the global support of the item +1
similarSingleItemLinkedListHeadsTable.get(itemValue).addSupTotal();
}
}
}
// if(tempEntry.length>maxLength){
// maxLength = tempEntry.length;
// }
sourceDataSet.add(transaction);
}
// System.out.println(maxLength);
deleteNonFreqInSSILLHTAndSort();
deleteNonFreqInSDSAndSort();
bufferedReader.close();
fr.close();
}
/**
* Remove the list of similar items (similarSingleItemLinkedListHeadsTable) Infrequent items of , And according to global support for similarSingleItemLinkedListHeads Descending sort
*/
private void deleteNonFreqInSSILLHTAndSort() {
Hashtable<Integer,SimilarSingleItemLinkedListHead> copyOfSSILLHT =
(Hashtable<Integer, SimilarSingleItemLinkedListHead>) similarSingleItemLinkedListHeadsTable.clone();
Set<Integer> keySet = copyOfSSILLHT.keySet();
// Delete infrequent items
for(int key: keySet){
if(similarSingleItemLinkedListHeadsTable.get(key).getSupTotal()<minSupCnt){// Below the support threshold
similarSingleItemLinkedListHeadsTable.remove(key);
}
}
// Sort by Global Support
similarSingleItemLinkedListHeadList = new ArrayList<>(similarSingleItemLinkedListHeadsTable.values());
similarSingleItemLinkedListHeadList.sort(new Comparator<SimilarSingleItemLinkedListHead>() {
@Override
public int compare(SimilarSingleItemLinkedListHead o1, SimilarSingleItemLinkedListHead o2) {
return o2.getSupTotal() - o1.getSupTotal();
}
});
}
/**
* Remove transaction set (sourceDataSet) Infrequent items of , And for each transaction according to global support item Sort in descending order
* The results are stored in the freqSourceSortedDataSet
*/
private void deleteNonFreqInSDSAndSort(){
freqSourceSortedDataSet = (ArrayList<ArrayList<Integer>>) sourceDataSet.clone();
for(int i =0;i<sourceDataSet.size();i++){
for(int j = 0;j<sourceDataSet.get(i).size();j++){
int item = sourceDataSet.get(i).get(j);
// Because at this time SSILLHT Items in are frequent items , Just make sure item Whether it exists in it or not , Existence means frequency .
if(visitSupTotal(item)==-1){
// Mark infrequent items as the smallest integer value
freqSourceSortedDataSet.get(i).set(j,Integer.MIN_VALUE);
}
}
// Remove marked items .
freqSourceSortedDataSet.get(i).removeIf(e->e == Integer.MIN_VALUE);
insertSort(freqSourceSortedDataSet.get(i));
}
freqSourceSortedDataSet.removeIf(e->e.size() == 0);
}
Second scan
In the second scan, the FP tree is constructed. Participating in scanning is filtered data. If a data item is encountered for the first time, create the node and add a pointer to the node in headTable; Otherwise, find the node corresponding to the item according to the path, and modify the node information
It is relatively simple here, because there is already filtered and sorted data freqSourceSortedDataSet. We just need to
Traverse every transaction trans of freqSourceSortedDataSet and every item in trans to build FP tree and similar necklace list If an item is encountered for the first time, you need to create the node and link it in a similar necklace list. Needless to say, the linked list. The number of children of FP tree is indefinite, so special data structure is needed. I use HashTable here
/**
* Build FP Trees
*/
private void buildFPTree(){
for(ArrayList<Integer>trans:freqSourceSortedDataSet){
Node curTreeNode = fpTree.root;
for(int item :trans){
if(!curTreeNode.children.containsKey(item)){
Node node = new Node(item,1);
curTreeNode.children.put(item,node);
node.father = curTreeNode;
buildSimilarSingleItemLinkedList(item,curTreeNode);
}else{
curTreeNode.children.get(item).sup++;
}
curTreeNode=curTreeNode.children.get(item);
}
}
}
/**
* Build a list of similar necklaces
*/
private void buildSimilarSingleItemLinkedList(int item,Node curTreeNode){
// Find the item Position in similar necklace list
int index = searchForItemInHeadsList(item,
(ArrayList<SimilarSingleItemLinkedListHead>) similarSingleItemLinkedListHeadList);
if(similarSingleItemLinkedListHeadList.get(index).next == null){
similarSingleItemLinkedListHeadList.get(index).next = curTreeNode.children.get(item);
}else{
Node visitNode = similarSingleItemLinkedListHeadList.get(index).next;
while (visitNode.nextSimilar!=null){
visitNode = visitNode.nextSimilar;
}
if(visitNode != curTreeNode.children.get(item))
visitNode.nextSimilar = curTreeNode.children.get(item);
}
}
/**
* In HeadList Search for an item in the
* @param item Items
* @param similarSingleItemLinkedListHeads Head node linked list
* @return Position ,-1 Indicates that it was not found
*/
private int searchForItemInHeadsList(int item, ArrayList<SimilarSingleItemLinkedListHead> similarSingleItemLinkedListHeads) {
for(int i =0;i<similarSingleItemLinkedListHeads.size();i++){
if(similarSingleItemLinkedListHeads.get(i).getItemValue() == item){
return i;
}
}
return -1;
}
Mining frequent itemsets
I think this part is the most difficult part to realize. But I speak very quickly when it comes to this place at B or somewhere else (B doesn't have two videos about this thing, vomit). There are different concepts, such as prefix paths in the Black Book, and conditional schema bases in other places. The following code is implemented according to the prefix path.
Let's stroke our thinking and what we need to do to dig out frequent item sets.
First, you need to traverse back-to-front every 1 item of the list of similar necklaces (this 1 list has been sorted by global support in the first scan). Recursively perform the following steps for each item:① Record the prefix path. The method I use is to record all nodes appearing in the prefix path with one HashSet.
The support degree of every 1item of the FP tree is recorded. Similar to the previous first scan.
③ According to the recorded support degree, if item is frequent, the item and the current suffix are frequent itemsets.
Then, according to record, the list of similar necklaces of the FP tree is constructed, and the infrequent items (similar to the first scan) and the current item constitute the conditional FP tree are removed. There is no need to re-establish the structure of an FP tree to form a conditional FP tree, because recording prefix paths only needs to access similar entries and parent entries.
The remaining items in the list of similar necklaces are further executed with step ① until there are no items in the list of similar necklaces, which is terminated.
/**
* Algorithm execution function
* @param minSupCnt Minimum support count
* @param path File path
* @param pT Threshold of itemset size for output results
*/
public void run(int minSupCnt,String path,int pT) throws IOException {
this.printThreshold = pT;
this.minSupCnt = minSupCnt;
scanDataSet(path);
buildFPTree();
for(int i = similarSingleItemLinkedListHeadList.size()-1;i>=0;i--){
genFreqItemSet(similarSingleItemLinkedListHeadList.get(i).getItemValue()
,fpTree,similarSingleItemLinkedListHeadList,new TreeSet<>());
}
//genFreqItemSet(14,fpTree,similarSingleItemLinkedListHeadList,new TreeSet<>());
System.out.println(" Number of frequent itemsets :\t"+cntOfFreqSet);
}
/**
* Generate frequent itemsets
* @param last Last item
* @param fPTree Condition FP Trees
* @param fatherSimilarSingleItemLinkedListHeads Linked list of similar header nodes of parent tree
* @param freqItemSet Frequent itemset
*/
private void genFreqItemSet(int last,FPTree fPTree,
List<SimilarSingleItemLinkedListHead>fatherSimilarSingleItemLinkedListHeads,TreeSet<Integer>freqItemSet) {
FPTree conditionalFPTree = new FPTree();
List<SimilarSingleItemLinkedListHead>similarSingleItemLinkedListHeads = new ArrayList<>();
TreeSet<Integer>localFreqItemSet = (TreeSet<Integer>) freqItemSet.clone();
int index ;
index = searchForItemInHeadsList(last,
(ArrayList<SimilarSingleItemLinkedListHead>) fatherSimilarSingleItemLinkedListHeads);
Node firstNode = fatherSimilarSingleItemLinkedListHeads.get(index).next;
HashSet<Node>record = new HashSet<>(); // Used to record the nodes appearing on the prefix path
// Record prefix path
if(firstNode!=null){
record.add(firstNode);
Node nodeToVisitFather = firstNode;
Node nodeToVisitSimilar = firstNode;
while (nodeToVisitSimilar!=null){
nodeToVisitSimilar.supInCFP = nodeToVisitSimilar.sup;
nodeToVisitFather = nodeToVisitSimilar;
while (nodeToVisitFather!=null){
// Calculation supInCFT
if(nodeToVisitFather!=nodeToVisitSimilar)
nodeToVisitFather.supInCFP += nodeToVisitSimilar.supInCFP;
record.add(nodeToVisitFather);
nodeToVisitFather = nodeToVisitFather.father;
}
nodeToVisitSimilar = nodeToVisitSimilar.nextSimilar;
}
// Record support in subtree
Hashtable<Integer,Integer> supRecord = new Hashtable<>();
record.forEach(new Consumer<Node>() {
@Override
public void accept(Node node) {
int item = node.item;
if(item == -1 ){ // Root node
return;
}
if(supRecord.containsKey(item)){
supRecord.put(item,supRecord.get(item)+ node.supInCFP);
}else{
supRecord.put(item,node.supInCFP);
}
}
});
// Output result
if(supRecord.get(last)>=minSupCnt){
localFreqItemSet.add(last);
if(localFreqItemSet.size()>=printThreshold && !result.contains(localFreqItemSet)){
cntOfFreqSet++;
// for(int i = localFreqItemSet.size()-1;i>=0;i--){
// System.out.print(localFreqItemSet.get(i)+" ");
// }
localFreqItemSet.forEach(new Consumer<Integer>() {
@Override
public void accept(Integer integer) {
System.out.print(integer+" ");
}
});
result.add(localFreqItemSet);
System.out.println("");
}
}
// Build a list of similar necklaces
record.forEach(new Consumer<Node>() {
@Override
public void accept(Node node) {
if(node.item == -1){ // Root node
Node visitNode = node;
buildConditionalFPTree(conditionalFPTree.root, visitNode,record,
(ArrayList<SimilarSingleItemLinkedListHead>) similarSingleItemLinkedListHeads,supRecord,last);
}
}
});
// Sort in descending order of support
similarSingleItemLinkedListHeads.sort(new Comparator<SimilarSingleItemLinkedListHead>() {
@Override
public int compare(SimilarSingleItemLinkedListHead o1, SimilarSingleItemLinkedListHead o2) {
return o2.getSupTotal() - o1.getSupTotal();
}
});
if(similarSingleItemLinkedListHeads.size()>=1){
// Recursive search for frequent items
for(int i =similarSingleItemLinkedListHeads.size()-1;i>=0;i--){
genFreqItemSet(similarSingleItemLinkedListHeads.get(i).getItemValue(),
conditionalFPTree,similarSingleItemLinkedListHeads,localFreqItemSet);
// similarSingleItemLinkedListHeads.remove(i);
}
}
}
}
/**
* Recursive build condition FP Trees
* @param rootNode Establish a condition down with this node as the root FP Trees
* @param originalNode rootNode Corresponding to nodes in the original tree
* @param record Prefix path
* @param similarSingleItemLinkedListHeads Similar item header linked list
* @param supRecord Record of support count
* @param last Last item
*/
private void buildConditionalFPTree(Node rootNode,Node originalNode,HashSet<Node>record
,ArrayList<SimilarSingleItemLinkedListHead>similarSingleItemLinkedListHeads,Hashtable<Integer,Integer>supRecord,int last){
if(originalNode.children!=null){
for(int key:originalNode.children.keySet()){ // Traversal originalNode All son nodes of , Check whether it is in the prefix path
Node tempNode = originalNode.children.get(key);
if(record.contains(tempNode)){
Node addedNode = new Node(tempNode.item, tempNode.supInCFP);
if(last == key){ // Removal last All nodes of the
tempNode.supInCFP = 0;
continue;
}
if(supRecord.get(key)>=minSupCnt){
//addedNode Copy tempNode Attributes other than son nodes
addedNode.supInCFP = tempNode.supInCFP;
rootNode.children.put(tempNode.item, addedNode);
addedNode.father = rootNode;
// Build a table of similar items
int i = searchForItemInHeadsList(tempNode.item,similarSingleItemLinkedListHeads);
if(i==-1){
similarSingleItemLinkedListHeads.add(new SimilarSingleItemLinkedListHead(key,addedNode, addedNode.supInCFP));
}else{
similarSingleItemLinkedListHeads.get(i).setSupTotal(similarSingleItemLinkedListHeads.get(i).getSupTotal()+addedNode.supInCFP);
Node visitNode = similarSingleItemLinkedListHeads.get(i).next;
while (visitNode.nextSimilar!=null){
visitNode = visitNode.nextSimilar;
}
if(visitNode!=addedNode){
visitNode.nextSimilar= addedNode;
}
}
buildConditionalFPTree(addedNode,originalNode.children.get(key),record,similarSingleItemLinkedListHeads,supRecord,last);
addedNode.supInCFP = 0; // Will supInCFP Reset to 0;
}else{
buildConditionalFPTree(rootNode,originalNode.children.get(key),record,similarSingleItemLinkedListHeads,supRecord,last);
}
}
}
}
}
Summarize
That's all for this article. I hope I can help you and I hope you can pay more attention to other wonderful contents of this site!