As mentioned in the introduction, one of the most important uses ofstatistical models is to predict future values of the response variable for given values of the predictors. If we know that the form of the model equation is correct, predictions for the expected value ofthe response usually follow naturally from the statement of the model, especially for regression-based methods (see, e.g., eqns , , and ). If we want information on the uncertainty in predictions, this can be derived using estimates of the standard errors of the model parameters (when we are interested in uncertainty in the mean prediction) or the combination of parameter standard errors and the error variance (when we are interested in uncertainty associated with a single new measurement) (see Figure 1).
While regression-based methods only provide estimates of summary statistics, such as the predictive mean and variance ofy, Bayesian methods provide estimates of the full predictive distribution. Once the posterior distribution of the model parameters has been estimated, the predictive distribution is easily calculated by integrating the product of the posterior and the likelihood function over all parameter values. For nonlinear or nonnormal models, the predictive distribution can have a shape that would not have been easily anticipated. A full representation of this shape is often of considerable value for many of the purposes of model prediction.
Of course, in most real world ecological applications, we do not know the correct functional form ofthe model. In these cases, the model is often empirically chosen from among many possibilities based on fit to the data. Over a limited range of predictor variables, most reasonably well-fitting models will provide similar predictions. However, there can be substantial differences when using the models to extrapolate to new sets of conditions. Various methods have been suggested to address this situation. One possibility is to use alternative model selection criteria when choosing the 'best'-fitting model, such as cross-validation statistics. In cross-validation, the data are split into two or more subsets. The model parameters are estimated from one subset, and then used together with the predictors from the other subset to generate predicted values of the response. This error that results from comparing these predictions against the observed values is then used as the basis for model selection and for describing the uncertainty that can be expected when applying the model to new data.
A second approach explicitly acknowledges that all models are approximations and there is unlikely to ever be a single correct model. Thus, we want predictions that are 'unconditional' on any one model. In the regression setting, this viewpoint suggests fitting multiple candidate models to the same data set and then using a weighted combination of the parameter estimates from each model as the overall, unconditional estimate. The weights can be derived according to the relative fit of each model, as measured by the AIC. This procedure of 'model averaging' can be applied to both the means and standard errors of the parameters. Some have suggested adding an additional component to parameter uncertainty that accounts for model selection uncertainty, and specific methods for estimating this component have been proposed.
From a Bayesian perspective, model averaging can occur as a natural application of Bayes' theorem. One simply needs to replace the continuous parameter term, 0, in eqn  by a discrete term, M, representing alternative model formulations. By combining one's prior belief in the appropriateness of a model with the likelihood ofthe observed data conditional on that model, the posterior belief in the model can be calculated. Prediction then consists of integrating, over all models and all parameter values, the product of the likelihood function and the combined posterior of each model and its parameters. Essentially, this amounts to weighing each model's predictive distribution by the posterior belief in that model, thereby yielding a combined distribution of predictions, unconditional on any one model formulation.
Was this article helpful?