Ecologists often use variables that are neither quantitative nor ordered (Table 1.2). Variables of this type may be of physical or biological nature. Examples of qualitative physical descriptors are the colour, locality, geological substrate, or nature of surface deposits. Qualitative biological descriptors include the captured or observed species; the different states of this nonordered descriptor are the different possible species. Likewise, the presence or absence of a species cannot, in most cases, be analysed as a quantitative variable; it must be treated as a semiquantitative or qualitative descriptor. A third group of qualitative descriptors includes the results of classifications — for example, the biological associations to which the zooplankton of various lakes belong, or the chemical groups describing soil cores. Such classifications, obtained or not by clustering (Chapter 8), define qualitative descriptors and, as such, they are amenable to numerical interpretation (see Chapter 10).

The present Chapter discusses the analysis of qualitative descriptors; methods appropriate for bivariate or multivariate analysis are presented. Information theory is an intuitively appealing way of introducing these methods of analysis. Section 6.1 shows how the amount of information in a qualitative descriptor may be measured. This paradigm is then used in the following sections.

The comparison of qualitative descriptors is based on contingency tables. In order to compare pairs of qualitative descriptors, the objects are first allocated to the cells of a table with two criteria (i.e. the rows and columns). In a two-way contingency table, the number of rows is equal to the number of states of the first descriptor and the number of columns to that of the second descriptor. Any cell in the table, at the intersection of a row and a column, corresponds to one state of each descriptor; the number of objects with these two states is recorded in this cell. The analysis of two-way contingency tables is described in Section 6.2. When there are more than two descriptors, multiway (or multidimensional) contingency tables are constructed as extensions of two-way tables. Their analysis is discussed in Section 6.3. Finally, Section 6.4 deals with the correspondence between descriptors in a contingency table.

Contingency table analysis is the qualitative equivalent of both correlation analysis and analysis of variance; in the particular case of a two-way contingency table, the analysis is the equivalent of a one-way Anova. It involves the computation of X (chi-square) statistics or related measures, instead of correlation or F statistics. Two types Anova of null hypotheses (H0) may be tested. The first one is the independence of the two hypothesis descriptors, which is the usual null hypothesis in correlation analysis (H0: the correlation coefficient p = 0 in the statistical population). The second type of Correlation hypothesis is similar to that of the analysis of variance. in a two-way contingency hypothesis table, the classification criterion of the analysis of variance corresponds to the states of one of the descriptors. The null hypothesis says that the distributions of frequencies among the states of the second descriptor (dependent variable) are the same, among the groups defined by the states of the first descriptor. in other words, the observations form a homogeneous group. For example, if the groups (classification criterion) form the columns whereas the dependent variable is in the rows, H0 states that the frequency distributions in all columns are the same. These two types of hypotheses require the calculation of the same expected values and the same test statistics. in multiway tables, the hypotheses tested are often quite complex because they take into account interactions among the descriptors (Section 6.3).

Considering species data, the various species observed at a sampling site are the states of a qualitative multi-state descriptor. Section 6.5 will discuss species diversity as a measure of dispersion of this qualitative descriptor.

The mathematics used throughout this chapter are quite simple and require no prior knowledge other than the intuitive notion of probability. Readers interested in applications only may skip Section 6.1 and come back to it when necessary. To simplify the notation, the following conventions are followed throughout this chapter. When a single descriptor is considered, this descriptor is called a and its states have subscripts i going from 1 to q, as in Fig. 1.1. In two-way contingency tables, the descriptors are called a and b. The states of a are denoted ai with subscripts i varying from 1 to r (number of rows), while the states of b are denoted bj with subscripts j varying from 1 to c (number of columns).

Was this article helpful?

## Post a comment