CART data mining algorithm in plain English

The CART data mining algorithm is part of a longer article about many more data mining algorithms.

What does it do?

CART stands for classification and regression trees. It is a decision tree learning technique that outputs either classification or regression trees. Like C4.5, CART is a classifier.

Is a classification tree like a decision tree?

A classification tree is a type of decision tree. The output of a classification tree is a class.

For example, given a patient dataset, you might attempt to predict whether the patient will get cancer. The class would either be "will get cancer" or "won't get cancer."

What's a regression tree?

Unlike a classification tree which predicts a class, regression trees predict a numeric or continuous value e.g. a patient's length of stay or the price of a smartphone.

Here's an easy way to remember...

Classification trees output classes, regression trees output numbers.

Since we've already covered how decision trees are used to classify data, let's jump right into things...

How does this compare with C4.5?

C4.5	CART
Uses information gain to segment data during decision tree generation.	Uses Gini impurity (not to be confused with Gini coefficient). A good discussion of the differences between the impurity and coefficient is available on Stack Overflow.
Uses a single-pass pruning process to mitigate over-fitting.	Uses the cost-complexity method of pruning. Starting at the bottom of the tree, CART evaluates the misclassification cost with the node vs. without the node. If the cost doesn't meet a threshold, it is pruned away.
The decision nodes can have 2 or more branches.	The decision nodes have exactly 2 branches.
Probabilitically distributes missing values to children.	Uses surrogates to distribute the missing values to children.

Is this supervised or unsupervised?

CART is a supervised learning technique, since it is provided a labeled training dataset in order to construct the classification or regression tree model.

Why use CART?

Many of the reasons you'd use C4.5 also apply to CART, since they are both decision tree learning techniques. Things like ease of interpretation and explanation also apply to CART as well.

Like C4.5, they are also quite fast, quite popular and the output is human readable.

Where is it used?

scikit-learn implements CART in their decision tree classifier. R's tree package has an implementation of CART. Weka and MATLAB also have implementations.

Finally, Salford Systems has the only implementation of the original proprietary CART code based on the theory introduced by world-renowned statisticians at Stanford University and the University of California at Berkeley.

Checkout how I used CART

Or see more data mining algorithms on the main list...

About the Author

Ray Li

Ray is a software engineer and data enthusiast who has been blogging for over a decade. He loves to learn, teach and grow. You’ll usually find him wrangling data, programming and lifehacking.