## Info

Logit-scale effect of sex

Figure 2.7. Profile likelihood ratio for the logit-scale effect of sex on mortality of moths exposed to cypermethrin. Dashed vertical lines indicate 95% confidence limits. Horizontal line and alternative models, respectively. Given these priors, the posterior densities are defined in the usual way (using Bayes' rule) and have normalizing constants where f(y|0) specifies a model-specific likelihood of the observed data y. An additional requirement for Bayesian analysis is that we assign prior probabilities to H0 and Hi. Let q0 = Pr(Ho) = Pr(0 G ©o) quantify our prior opinion of whether the null model should be accepted. Since only two models are involved in the comparison, Pr(Hi) = Pr(0 G ©0) = 1 — q0. As an example, we might use q0 = 0.5 to specify an impartial prior for the two models.

A Bayesian solution of the model selection problem is obtained by computing the posterior probability of H0 given y is drawn at Xo.95(1).

which follows from a single application of Bayes' rule. Because there are only two models involved, the posterior probability of H1 must be Pr(H1|y) = 1 — Pr(H0|y). It's worth mentioning that Eq. (2.5.8) could have been derived by specifying the prior density of 0 as a finite mixture of model-specific priors as follows: n(0)= q0n(0|H0) + (1 — <Z0M0|#1).

How should a Bayesian use posterior model probabilities in model selection? One possibility is to reject the null model (and accept the alternative) if Pr(H0 |y) < Pr(H1|y) or, equivalently, if Pr(H0|y) < 0.5. On the other hand, if a Bayesian wants to guard against falsely rejecting the null model, he may wish to reject H0 under a more conservative (lower) threshold (e.g., Pr(H0|y) < 0.1).

The Bayesian approach to model selection possesses some appealing features. Unlike the classical approach, the analyst does not have to choose a particular statistic for testing. Furthermore, decisions to reject or accept H0 do not require samples to be large enough for the application of asymptotic distributions. That said, the Bayesian approach to model selection is not without its problems. For one thing, the posterior model probabilities can be sensitive to the priors assumed for each model's parameters. In addition, the model-specific normalizing constants needed for computing posterior model probabilities can be difficult to calculate when models contain many parameters. Recall that this difficulty impeded routine use of Bayesian statistics for many years and motivated the development of MCMC sampling algorithms.

In Section 2.5.4 we will examine some alternative Bayesian methods of model selection. For now, however, we illustrate a clever approach developed by Kuo and Mallick (1998), which avoids the calculation of normalizing constants but is equivalent conceptually to the approach described earlier in this section. Kuo and Mallick introduce a latent, binary inclusion parameter, say w, that is used as a multiplier on those parameters which are present in the alternative model and absent from the null model. By assuming w — Bern(0), the prior probability of the alternative model is specified implicitly because

Therefore, we have the identity q0 = 1 — 0. To specify prior impartiality in the selection of null and alternative models, we simply assume 0 = 0.5.

Kuo and Mallick showed that Gibbs sampling may be used to fit this elaboration of the alternative model. The Gibbs output yields a sample from the marginal posterior distribution of w| y, which provides a direct solution to the model selection problem because the posterior probability of the null model Pr(H0|y) = Pr(w = 0|y) is easy to estimate from the sample.

The model-selection methodology developed by Kuo and Mallick (1998) applies equally well to much more difficult problems. For example, suppose selection is required among regression models that may include as many as p distinct predictors. In this problem a p x 1 vector w of inclusion parameters may be used to select among the 2p possible regression models by computing posterior probabilities for the 2p values of w. Alternatively, these posterior probabilities can be used to compute some quantity of scientific interest, by averaging over model uncertainty. Thus, model-averaging can be done at practically no additional computational expense once the posterior is calculated. We conclude this section with an illustration of this approach based on a comparison of two logistic regression models described in a previous example (see Section 2.5.1.3).

2.5.3.1 Example: Mortality of moths exposed to cypermethrin

Recall that the logistic-regression model of moth mortalities contains 3 parameters, an intercept a, the effect of cypermethrin /, and the effect of sex y. In the test of H0 we wish to compare a null model for which y = 0 against an alternative for which Y = 0. To apply the approach of Kuo and Mallick, we elaborate the alternative model using the binary inclusion parameter w as follows:

yi|N,p - Bin(N,pi) logit(pi) = a + /xi + w YZi w — Bern(0.5) a - N(0, a2) / - N(0, a2) Y - N(0, a2), where a is chosen to be sufficiently large to specify vague priors for the regression parameters.

We fit this model using WinBUGS and estimated the posterior probability of the null model Pr(w = 0|y) to be 0.13. Since 0.13 < 0.5, we reject H0 and accept the alternative model. We can estimate the parameters of the alternative model by computing summary statistics from the portion of the Gibbs output for which w =1. This yields the following posterior means and standard errors (in parentheses) of the regression parameters: a = —3.53(0.47), /? = 1.08(0.13), and Y = 1.11(0.36). Given the sample size and our choice of priors for this model's parameters, it is not surprising that the posterior means are similar in magnitude to the MLEs that we reported earlier.

### 2.5.4 Assessment and Comparison of Models

In this chapter we have described both classical and Bayesian approaches for conducting model-based inference. In applications of either approach, the role of the approximating model of the data is paramount, as noted in Section 2.2. The approximating model provides an unambiguous specification of the data-gathering process and of one or more ecological processes that are the targets of scientific interest.

Statistical theory assures us that inferences computed using either classical or Bayesian approaches are valid, provided the model correctly specifies the processes that have generated the data (i.e., nature and sampling). However, since we never know all of the processes that might have influenced the data, we are never really able to determine the accuracy of an approximating model. What we can do is define, in precise terms, the operating characteristics of an 'acceptable' model. This definition of acceptability is necessarily subjective, but it provides an honest basis for assessing a candidate model's adequacy. This approach is also useful in the comparison and selection of alternative approximating models. For example, if we are asked questions such as, "Is model A better than model B?" or "Should model A be selected in favor of model B?", a clear definition is needed for what 'better than' means in the context of the scientific problem.

In defining the operating characteristics of an acceptable model, one clearly must consider how the model is to be used. Will the model be used simply to compute inferences for a single data set, or will the model be used to make decisions that depend on the model's predictions? In either case, decision theory may be used to define acceptability formally in terms of a utility function that specifies the benefits and costs of accepting a particular model (Draper, 1996). Similarly, utility functions may be used to specify the benefits and costs of selecting a particular model in favor of others. Such functions provide a basis for deciding whether a model should be simplified, expanded, or left alone (that is, assuming the model is acceptable as is) (Spiegelhalter, 1995; Key et al., 1999). In ecology a popular method of model selection involves the comparison of a scalar-valued criterion, such as Akaike information criteria (AIC) (Burnham and Anderson, 2002) or deviance information criteria (DIC) (Spiegelhalter et al., 2002). This approach is an application of decision theory, though the utility function on which the criterion is based is not always stated explicitly.

The need for a unified theory or set of procedures for assessing and comparing statistical models seems to be more important than ever, given the complexity of models that can be fitted with today's computing algorithms (e.g., MCMC samplers) and software. Unfortunately, no clear solution or consensus seems to have emerged regarding this problem. There does seem to be a growing appreciation among many statisticians, that while inference and prediction are best conducted using Bayes' theorem, the evaluation and comparison of alternative models are best accomplished from a frequentist perspective (Box, 1980; Rubin, 1984; Draper, 1996; Gelman et al., 1996; Little, 2006). The idea here is that if a model's inferences and predictions are to be well-calibrated, they should have good operating characteristics in a sequence of hypothetical repeated samples.

In the absence of a unified theory for assessing and comparing statistical models, a variety of approaches are currently practiced. We do not attempt to provide an exhaustive catalog of these approaches because recent reviews (Claeskens and Hjort, 2003; Kadane and Lazar, 2004) and books (Zellner et al., 2001; Burnham and Anderson, 2002; Miller, 2002) on the subject are available. Instead we list a few of the more commonly used methods, noting some of their advantages and disadvantages.

In Section 2.5.1 we described the likelihood ratio test for comparing nested models and for assessing the goodness of fit of models of counts. These are frequentist procedures based on the calculation of MLEs and on their asymptotic distributions in repeated samples. We also described a Bayesian procedure for model selection that is based on the calculation of posterior model probabilities. An advantage of the Bayesian procedure is that it can be used to select among non-nested models; however, the posterior model probabilities can be extremely sensitive to the form of priors assumed for model parameters when such priors are intended to convey little or no information (Kass and Raftery, 1995; Kadane and Lazar, 2004). Several remedies have been proposed for this deficiency (e.g., intrinsic Bayes factors, fractional Bayes factors, etc.), but none is widely used or is without criticism, as noted by Kadane and Lazar (2004).

Some approaches to model selection involve the comparison of an omnibus criterion, which typically values a model's goodness of fit and penalizes a model's complexity in the interest of achieving parsimony. Examples of such criteria include the Akaike, the Bayesian, and the deviance information criteria (i.e., AIC, BIC, and DIC) (Burnham and Anderson, 2002; Spiegelhalter et al., 2002), but there are many others (Claeskens and Hjort, 2003).

Alternative approaches to model selection recognize that some components of a model's specification may not be adequately summarized in a single omnibus criterion. For example, scientific interest may be focused on a particular quantity that requires predictions of the model. In this instance, the operating characteristics of a model's predictions of the scientifically relevant estimand should be used to define a basis for comparing models. Examples of this approach include the use of posterior-predictive checks (Laud and Ibrahim, 1995; Gelman et al., 1996; Gelfand and Ghosh, 1998; Chen et al., 2004) and the focused information criterion (Claeskens and Hjort, 2003).

At this point, the reader may realize that the list of approaches for assessing and comparing statistical models is long. In fact, the published literature on this subject is vast. That the field of Statistics has not produced a unified theory or set of procedures for assessing and comparing models may seem disconcerting; however, given the wide variety of problems to which models are applied, the absence of unified procedures is perhaps understandable. At present, the construction and evaluation of statistical models necessarily include elements of subjectivity, and perhaps that is as it should be. One cannot automate clear thinking and the subjective inputs required for a principled analysis of data.