Today, I'm going to explain in plain English the top 10 most influential data mining algorithms as voted on by 3 separate panels in this survey paper.
Once you know what they are, how they work, what they do and where you can find them, my hope is you'll have this blog post as a springboard to learn even more about data mining.
What are we waiting for? Let's get started!
C4.5 constructs a classifier in the form of a decision tree. In order to do this, C4.5 is given a set of data representing things that are already classified.
k-means creates groups from a set of objects so that the members of a group are more similar. It’s a popular cluster analysis technique for exploring a dataset.
Support vector machine (SVM) learns a hyperplane to classify data into 2 classes. At a high-level, SVM performs a similar task like C4.5 except SVM doesn’t use decision trees at all.
The Apriori algorithm learns association rules and is applied to a database containing a large number of transactions.
In data mining, expectation-maximization (EM) is generally used as a clustering algorithm (like k-means) for knowledge discovery.
PageRank is a link analysis algorithm designed to determine the relative importance of some object linked within a network of objects.
AdaBoost is a boosting algorithm which constructs a classifier. As you probably remember, a classifier takes a bunch of data and attempts to predict or classify which class a new data element belongs to.
kNN, or k-Nearest Neighbors, is a classification algorithm. However, it differs from the classifiers previously described because it’s a lazy learner.
Naive Bayes is not a single algorithm, but a family of classification algorithms that share one common assumption: Every feature of the data being classified is independent of all other features given the class.
CART stands for classification and regression trees. It is a decision tree learning technique that outputs either classification or regression trees. Like C4.5, CART is a classifier.
Now it's your turn...
Now that I've shared my thoughts and research around these data mining algorithms, I want to turn it over to you.
- Are you going to give data mining a try?
- Which data mining algorithms have you heard of but weren't on the list?
- Or maybe you have a question about an algorithm?
Let me know what you think by leaving a comment below right now.