In this post we will review the frequest pattern growth algorithm (aka FP Growth).
The goal of the algorithm is to find frequent sets in a large database. This goal is similar to the Apriori algorithm, but it requires only a single scan of the database. To achieve it goal, FP Growth builds a FP tree that is used to find the relations.
Let's assume that we have a database of transactions, each containing list of items.
First we count occurrences for each item, and notice the order of items by count.
The transactions are treated as if items were sorted by the occurrences. For simplicity, let update the transactions:
Now we construct the FP tree, keeping counters on each node. Notice that the items are added according to the ordered transaction.
After the 1st transaction:
After the 5th transaction:
After the 6th transaction:
After the 7th transaction:
After the 8th transaction:
Next, for each item, we mark what are the paths that lead to the item, and keep the item node score.
For example, item A can be reached by :
- C, where the A node has count of 1
- D,E, where the A node has count of 1
- D,C,E, where the A node has count of 1
Now let us use a minimum support of 2 occurrences, which configures of common patterns we are searching.
For each item, we find sets of items which are common in the conditional paths, and have count at least as the minimum support.
Finally, we create frequent pattern rules where the conditional FP tree is used with the item:
No comments:
Post a Comment