For Classification Trees

There are many criteria by which node impurity is minimized in a classification problem, but four commonly used metrics include:

Misclassification error. The misclassification error is simply the proportion of observations in the node that are not members of the majority class in that node.

Gini index. Suppose there are a total of K classes, each indexed by k. Let pmk be the proportion of class k observations in node m. The Gini index can then be written as Yl!k=1 Pmk{ 1 -Pmd. This measure is frequently used in practice, and is more sensitive than the misclassification error to changes in node probability.

Entropy index. Also called the cross-entropy or deviance measure of impurity, the entropy index can be written SK= 1 pmklogPmk. This too is more sensitive than misclassification error to changes in node probability.

Twoing. Designed for multiclass problems, this approach favors separation between classes rather than node heterogeneity. Every multiclass split is treated as a binary problem. Splits that keep related classes together are favored. The approach offers the advantage of revealing similarities between classes and can be applied to ordered classes as well.

Was this article helpful?

0 0
Project Earth Conservation

Project Earth Conservation

Get All The Support And Guidance You Need To Be A Success At Helping Save The Earth. This Book Is One Of The Most Valuable Resources In The World When It Comes To How To Recycle to Create a Better Future for Our Children.

Get My Free Ebook


Post a comment