In the case of regression trees, heteroscedasticity, or the tendency for higher-value responses to have more variation, can be problematic. Because regression trees seek to minimize within-node impurity, there will be a tendency to split nodes with high variance. Yet, the observations within that node may, in fact, belong together. The remedy is to apply variance-stabilizing transformations to the response as one would do in a linear regression problem. Although regression trees are invariant to monotonic transformations on explanatory variables, transformations like a natural log or square root may be appropriate for the response variable.
Was this article helpful?