As with any human endeavor, the process of science shares many characteristics with "everyday" activities. For example, observations of recurring events - a fundamental attribute of science - are used to infer general patterns in shopping, cooking, and donning clothing: individuals and institutions rely on their observations and previous experience to make decisions about purchasing items, preparing food, and selecting clothing. This discussion, however, focuses on features that are unique to science. It assumes that science is obliged in part to offer explanatory and predictive power about the natural world. An additional assumption is that the scientific method, which includes explicit hypothesis testing, is the most efficient technique for acquiring reliable knowledge. The scientific method should be used to elucidate mechanisms underlying observed patterns; such elucidation is the key to predicting and understanding natural systems (Levin 1992; but see Pickett et al. 1994). In other words, we can observe patterns in nature and ask why a pattern occurs, and then design and conduct experiments to try to answer that question. The answer to the question "why" not only gives us insight into the system in which we are interested, but also gives us direction for the manipulation and management of that resource (Gavin 1989, 1991).
From a modern scientific perspective, a hypothesis is a candidate explanation for a pattern observed in nature (Medawar 1984; Matter and Mannan 1989); that is, a hypothesis is a potential reason for the pattern and it should be testable and falsifiable (Popper 1981). Hypothesis testing is a fundamental attribute of science that is absent from virtually all other human activities. Science is a process by which competing hypotheses are examined, tested, and rejected. Failure to falsify a hypothesis with an appropriately designed test is interpreted as confirmatory evidence that the hypothesis is accurate, although it should be recognized that alternative and perhaps as yet unformulated hypotheses could be better explanations.
A hypothesis is not merely a statement likely to be factual, which is then "tested" by observation (McPherson 2001a). If we accept any statement (e.g., one involving a pattern) as a hypothesis, then the scientific method need not be invoked - we can merely look for the pattern. Such statements are not hypotheses (although the term is frequently applied to them); they are more appropriately called predictions. Indeed, if observation is sufficient to develop reliable knowledge, then science has little to offer beyond everyday activities. Much ecological research is terminated after the discovery of a pattern and the cause of the pattern is not determined (Romesburg 1981; Willson 1981). For example, multiple petitions to list the northern goshawk (Accipiter gentilis atricapillus) under the Endangered Species Act of 1978 as a Threatened or Endangered Species in the western United States prompted several studies of their nesting habitat (Kennedy 1997; DeStefano 1998). One pattern that emerged from these studies is that goshawks, across a broad geographical range from southeastern Alaska to the Pacific Northwest to the southwestern United States, often build their nests in forest stands with old-growth characteristics, i.e., stands dominated by large trees and dense cover formed by the canopy of these large trees (Daw et al. 1998). This pattern has been verified, and the existence of the pattern is useful information for the conservation and management of this species and its nesting habitat. However, because these studies were observational and not experimental, we do not know why goshawks nest in forest stands with this kind of structure. Some likely hypotheses include protection offered by old-growth forests against predators, such as great horned owls (Bubo virginianus), or unfavorable weather in secondary forests, such as high ambient temperatures during the summer nesting season. An astute naturalist with sufficient time and energy could have detected and described this pattern, but the scientific method (including hypothesis testing) is required to answer the question of why. Knowledge of the pattern increases our information base; knowledge of the mechanism underlying the pattern increases our understanding (Figure 1.1).
Some researchers have questioned the use of null hypothesis testing as a valid approach in science. The crux of the argument is aimed primarily at: (1) the development of trivial or "strawman" null hypotheses that we know a priori will be false; and (2) the selection of an arbitrary a-level or P-value, such as 0.05 (Box 1.1). We encourage readers to peruse and consider the voluminous and growing literature on this topic (e.g., Harlow et al. 1997; Cherry 1998; Johnson 1999; Anderson et al. 2000). Researchers such as Burnham and Anderson (1998) argue that we should attempt to estimate the magnitude of differences between or among experimental groups (an estimation problem) and then decide if these differences are large enough to justify inclusion in a model (a model selection problem). Inference would thus be based on multiple model
building and would use information theoretic techniques, such as Akaike's Information Criterion (AIC) (Burnham and Anderson 1998), as an objective means of selecting models from which to derive estimates and variances of parameters of interest (Box 1.2). In addition, statistical hypothesis testing can, and should, go beyond simple tests of significance at a predetermined P-value, especially when the probability of rejecting the null hypothesis is high. For example, to test the null hypothesis that annual survival rates for male and female mule deer do not differ is to establish a "strawman" hypothesis (D. R. Anderson, personal communication; Harlow et al. 1997). Enough is known about the demography of deer to realize that the annual survival of adult females differs from adult males. Thus, rejecting this null hypothesis does not advance our knowledge. In this and many other cases, it is time to advance beyond a simple rejection of the null hypothesis and to seek accurate and precise estimates of parameters of interest (e.g., survival) that will indicate what and how different the survival rates are for these age-and-sex cohorts. Another approach is to design an experiment rather than an observational study, and to craft more interesting hypotheses: for example, does application of a drug against avian cholera improve survival in snow geese? In this case, determining how different would be important, but even a simple rejection of the null hypothesis would be interesting and informative.
The testing of null hypotheses has been a major approach used by ecologists to examine questions about natural systems (Cherry 1998; Anderson et al. 2000). Simply stated, null hypotheses are phrased so that the primary question of interest is that there is no difference between two or more populations or among treatment and control groups. The researcher then hopes to find that there is indeed a difference at some prescribed probability level - often P^0.05, sometimes P^0.1. Criticism of the null hypothesis approach has existed in some scientific fields for a while, but is relatively new to ecology. Recent criticism of null hypothesis testing and the reporting of P-values in ecology has ranged from suggested overuse and abuse to absolute frivolity and nonsensicality, and null hypotheses have been termed strawman hypotheses (i.e., a statement that the scientist knows from the onset is not true) by some authors. Opponents to null hypothesis testing also complain that this approach often confuses the interpretation of data, adds very little to the advancement of knowledge, and is not even a part of the scientific method (Cherry 1998; Johnson 1999; Anderson et al. 2000).
Alternatives to the testing of null hypotheses and the reporting of P-values tend to focus on the estimation of parameters of interest and their associated measures of variability. The use of confidence interval estimation or Bayesian inference have been suggested as superior approaches (Cherry 1996). Possibly the most compelling alternative is the use of information theoretic approaches, which use model building and selection, coupled with intimate knowledge of the biological system of interest, to estimate parameters and their variances (Burnham and Anderson 1998). The questions then focus on the values of parameters of interest, confidence in the estimates, and how estimates vary among the populations of interest. Before any of these approaches are practiced, however, the establishment of clear questions and research hypotheses, rather than null hypotheses, is essential.
These arguments against the use of statistical hypotheses are compelling and important, but are different, in our view, from the development of research hypotheses and the testing of these hypotheses in an experimental framework. It is the latter that we suggest is fundamental
Inference from models can take many forms, some of which are misleading. For example, collection of large amounts of data as fodder for multivariate models without a clear purpose can lead to spurious results (Rexstad et al. 1988; Anderson et al. 2001). A relatively new wave of model selection and inference, however, is based on information theoretic approaches. Burnham and Anderson (1998:1) describe this as "making valid inferences from scientific data when a meaningful analysis depends on a model." This approach is based on the concept that the data, no matter how large the data set, will only support limited inference. Thus, a proper model has: (1) the full support of the data, (2) enough parameters to avoid bias, and (3) not too many parameters (so that precision is not lost). The latter two criteria combine to form the "Principle of Parsimony" (Burnham and Anderson 1992): a trade off between the extremes of underfitting (not enough parameters) and overfitting (too many parameters) the model, given a set of a priori alternative models for the analysis of a given data set.
One objective method of evaluating a related set of models is "Akaike's Information Criterion" (AIC), based on the pioneering work of mathematician Hirotugu Akaike (Parzen et al. 1998). A simplified version of the AIC equation can be written as:
AIC = DEV + 2K, where DEV is deviance and K is the number of parameters in the model. As more parameters (structure) are added to the model, the fit will improve. If model selection were based only on this criterion, one would end up always selecting the model with the most possible parameters, which usually results in overfitting, especially with complex data sets. The second component, K is the number of parameters in the model and serves as a "penalty" in which the penalty increases as the number of parameters increase. AIC thus strikes a balance between overfitting and underfitting. Many software packages now compute AIC. In very general terms, the model with the lowest AIC value is the "best" model, although other approaches such as model averaging can be applied.
The development of models within this protocol depends on the a priori knowledge of both ecologists and analysts working
together, rather than the blind use of packaged computer programs. Information theoretic approaches allow for the flexibility to develop a related set of models, based on empirical data, and to select among or weight those models based on objective criteria. Parameters of interest, such as survival rates or abundance, and their related measures of variance can be computed under a unified framework, thereby giving the researcher confidence that these estimates were determined in an objective manner.
to advancing our knowledge of ecological processes and our ability to apply that knowledge to management problems.
Use of sophisticated technological (e.g., microscopes) or methodological (e.g., statistical) tools does not imply that hypothesis testing is involved, if these tools are used merely to detect a pattern. Pattern recognition (i.e., assessment of statements likely to be factual) often involves significant technological innovation. In contrast, hypothesis testing is a scientific activity that need not involve state-of-the-art technology.
Was this article helpful?