A tree can be grown to be quite large, almost to the point where it fits the training data perfectly, that is, sometimes having just one observation in each leaf. However, this results in overfitting and poor predictions on independent test sets. A tree may also be constructed that is too small and does not extract all the useful relationships that exist. Appropriate tree size can be determined in a number of ways. One way is to set a threshold for the reduction in impurity measure, below which no split will be made. A preferred approach is to grow an overly large tree until some minimum node size is reached. Then prune the tree back to an optimal size. Optimal size can be determined using an independent test set or cross-validation (described below). In either case, what results is a tree of optimal size accompanied by an independent measure of its error rate.

Project Earth Conservation

Project Earth Conservation

Get All The Support And Guidance You Need To Be A Success At Helping Save The Earth. This Book Is One Of The Most Valuable Resources In The World When It Comes To How To Recycle to Create a Better Future for Our Children.

Get My Free Ebook

Post a comment