A tree can be grown to be quite large, almost to the point where it fits the training data perfectly, that is, sometimes having just one observation in each leaf. However, this results in overfitting and poor predictions on independent test sets. A tree may also be constructed that is too small and does not extract all the useful relationships that exist. Appropriate tree size can be determined in a number of ways. One way is to set a threshold for the reduction in impurity measure, below which no split will be made. A preferred approach is to grow an overly large tree until some minimum node size is reached. Then prune the tree back to an optimal size. Optimal size can be determined using an independent test set or cross-validation (described below). In either case, what results is a tree of optimal size accompanied by an independent measure of its error rate.
Was this article helpful?
Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.