If the sample size is sufficiently large, the data can be divided into two subsets randomly, namely, one for training and other for testing. Defining sufficiently large is problem specific, but one rule of thumb in classification problems is to allow a minimum of 200 observations for a binary classification model, with an additional 100 observations for each additional class. An overly large tree is grown on the training data. Then, using the test set, error rates are calculated for the full tree as well as all smaller subtrees (i.e., trees having fewer terminal nodes than the full tree). Error rates for classification trees are typically the overall misclassification rate, while for regression problems, mean squared error or mean absolute deviation from the median are the criteria used to rank trees of different size. The subtree with the smallest error rate based on the independent test set is then chosen as the optimal tree.
Was this article helpful?