For Classification Trees

There are many criteria by which node impurity is minimized in a classification problem, but four commonly used metrics include:

Misclassification error. The misclassification error is simply the proportion of observations in the node that are not members of the majority class in that node.

Gini index. Suppose there are a total of K classes, each indexed by k. Let pmk be the proportion of class k observations in node m. The Gini index can then be written as Yl!k=1 Pmk{ 1 -Pmd. This measure is frequently used in practice, and is more sensitive than the misclassification error to changes in node probability.

Entropy index. Also called the cross-entropy or deviance measure of impurity, the entropy index can be written SK= 1 pmklogPmk. This too is more sensitive than misclassification error to changes in node probability.

Twoing. Designed for multiclass problems, this approach favors separation between classes rather than node heterogeneity. Every multiclass split is treated as a binary problem. Splits that keep related classes together are favored. The approach offers the advantage of revealing similarities between classes and can be applied to ordered classes as well.

Was this article helpful?

0 0
Oplan Termites

Oplan Termites

You Might Start Missing Your Termites After Kickin'em Out. After All, They Have Been Your Roommates For Quite A While. Enraged With How The Termites Have Eaten Up Your Antique Furniture? Can't Wait To Have Them Exterminated Completely From The Face Of The Earth? Fret Not. We Will Tell You How To Get Rid Of Them From Your House At Least. If Not From The Face The Earth.

Get My Free Ebook

Post a comment