Independent Test

If the sample size is sufficiently large, the data can be divided into two subsets randomly, namely, one for training and other for testing. Defining sufficiently large is problem specific, but one rule of thumb in classification problems is to allow a minimum of 200 observations for a binary classification model, with an additional 100 observations for each additional class. An overly large tree is grown on the training data. Then, using the test set, error rates are calculated for the full tree as well as all smaller subtrees (i.e., trees having fewer terminal nodes than the full tree). Error rates for classification trees are typically the overall misclassification rate, while for regression problems, mean squared error or mean absolute deviation from the median are the criteria used to rank trees of different size. The subtree with the smallest error rate based on the independent test set is then chosen as the optimal tree.

Was this article helpful?

0 0
Solar Power Sensation V2

Solar Power Sensation V2

This is a product all about solar power. Within this product you will get 24 videos, 5 guides, reviews and much more. This product is great for affiliate marketers who is trying to market products all about alternative energy.

Get My Free Ebook


Post a comment