For Classification Trees

There are many criteria by which node impurity is minimized in a classification problem, but four commonly used metrics include:

Misclassification error. The misclassification error is simply the proportion of observations in the node that are not members of the majority class in that node.

Gini index. Suppose there are a total of K classes, each indexed by k. Let pmk be the proportion of class k observations in node m. The Gini index can then be written as Yl!k=1 Pmk{ 1 -Pmd. This measure is frequently used in practice, and is more sensitive than the misclassification error to changes in node probability.

Entropy index. Also called the cross-entropy or deviance measure of impurity, the entropy index can be written SK= 1 pmklogPmk. This too is more sensitive than misclassification error to changes in node probability.

Twoing. Designed for multiclass problems, this approach favors separation between classes rather than node heterogeneity. Every multiclass split is treated as a binary problem. Splits that keep related classes together are favored. The approach offers the advantage of revealing similarities between classes and can be applied to ordered classes as well.

Was this article helpful?

0 0
10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook


Post a comment