For Regression Trees

For regression trees, two common impurity measures are:

Least squares. This method is similar to minimizing least squares in a linear model. Splits are chosen to minimize the sum of squared error between the observation and the mean in each node.

N=1544 Present: 21% Absent: 79%

Classify as Absent Is ELEV < 2954 m ?

Yes No

Yes No

Is ELEV < 2444 m ? Classify as Absent Yes No

Classify as Present Classify as Absent

Figure 1 A simple example of a classification tree describing the relationship between presence/absence of P. menziesii and explanatory factors of elevation (ELEV) and aspect (ASP) in the mountains of northern Utah. Thin-lined boxes indicate a node from which a split emerges. Thick-lined boxes indicate a terminal node.

Classify as Present Classify as Absent

Figure 1 A simple example of a classification tree describing the relationship between presence/absence of P. menziesii and explanatory factors of elevation (ELEV) and aspect (ASP) in the mountains of northern Utah. Thin-lined boxes indicate a node from which a split emerges. Thick-lined boxes indicate a terminal node.

Least absolute deviations. This method minimizes the mean absolute deviation from the median within a node. The advantage of this over least squares is that it is not as sensitive to outliers and provides a more robust model. The disadvantage is in insensitivity when dealing with data sets containing a large proportion of zeros.

Project Earth Conservation

Project Earth Conservation

Get All The Support And Guidance You Need To Be A Success At Helping Save The Earth. This Book Is One Of The Most Valuable Resources In The World When It Comes To How To Recycle to Create a Better Future for Our Children.

Get My Free Ebook


Post a comment