For Regression Trees

For regression trees, two common impurity measures are:

Least squares. This method is similar to minimizing least squares in a linear model. Splits are chosen to minimize the sum of squared error between the observation and the mean in each node.

N=1544 Present: 21% Absent: 79%

Classify as Absent Is ELEV < 2954 m ?

Yes No

Yes No

Is ELEV < 2444 m ? Classify as Absent Yes No

Classify as Present Classify as Absent

Figure 1 A simple example of a classification tree describing the relationship between presence/absence of P. menziesii and explanatory factors of elevation (ELEV) and aspect (ASP) in the mountains of northern Utah. Thin-lined boxes indicate a node from which a split emerges. Thick-lined boxes indicate a terminal node.

Classify as Present Classify as Absent

Figure 1 A simple example of a classification tree describing the relationship between presence/absence of P. menziesii and explanatory factors of elevation (ELEV) and aspect (ASP) in the mountains of northern Utah. Thin-lined boxes indicate a node from which a split emerges. Thick-lined boxes indicate a terminal node.

Least absolute deviations. This method minimizes the mean absolute deviation from the median within a node. The advantage of this over least squares is that it is not as sensitive to outliers and provides a more robust model. The disadvantage is in insensitivity when dealing with data sets containing a large proportion of zeros.

Was this article helpful?

0 0
Worm Farming

Worm Farming

Do You Want To Learn More About Green Living That Can Save You Money? Discover How To Create A Worm Farm From Scratch! Recycling has caught on with a more people as the years go by. Well, now theres another way to recycle that may seem unconventional at first, but it can save you money down the road.

Get My Free Ebook


Post a comment