Perhaps the biggest assumption so far is that the statistical model being used is known - and correct. Typically, however, the statistical model employed is only one of many that can be fit to the observed patterns. For example, Table 4 summarizes the results of seven statistical models fit to the same algal dynamics. Given that all models can explain the observations to some extent, how do we go about selecting the best model? The approach depends on which part of the statistical model is less certain. If the overall structure of the model is known with some certainty, but the functions that describe the population rates are unknown, then semimechanistic models provide a framework to estimate functional relationships from data. The approach of semimechanistic models (also referred to as semiparametric models and partially specified models) is to replace a mathematical function with local polynomial equations (e.g., splines). For example, if we were uncertain about the form of density dependence in the algal dynamics of Figure 1 , we could use the following semi-mechanistic model:
where s(Nt) is a spline that depends on algal density. The shape of the spline function is determined by a set of parameters, and the statistical model is fit to data using the same procedure as for eqn . Once the most likely parameter estimates are found, the shape of the spline function is used to suggest functional forms.
Outside of semimechanistic models, the approach to studying model uncertainty is to select the best model from a suite of candidate models. If models are nested within each other (such that models with fewer parameters can be recovered from those with more), then the likelihood ratio distribution provides a hypothesis-testing method for model selection. The method is similar to estimating confidence intervals; the asymptotic limit of twice the negative log-likelihood ratio of two nested models has a Chi-squared distribution. If i(A) and i(B) are the maximum negative log-likelihood values of model A and model B respectively under the same data, then model A is significantly better than model B if 2 (l (A) -1 (B)) > xf a (the degrees of freedom are equal to the difference in the number of parameters between models).
An alternative framework for model selection is called the Akaike information criterion (AIC). Where the likelihood framework uses hypothesis testing to seek the model with the highest likelihood given the data, the information criterion framework uses the expected Kullback-Leibler distance to select the model closest to the true model that generated the data. It is not necessary to have nested candidate models in the AIC framework. The AIC formula for large sample sizes is given by
AIC = - 2ln( + 2k where 8\y^j is the maximum likelihood value given the model and observed data, and k is the number of parameters in the model. The model with the lowest AIC value has the smallest expected Kullback-Leibler distance, but models with AIC values within of the lowest AIC value should be considered to have empirical support.
See also: Spatial Distribution; Spatial Distribution Models.
Was this article helpful?