The meaningful components

The successive principal components correspond to progressively smaller fractions of the total variance. One problem is therefore to determine how many components are meaningful in ecological terms or, in other words, what should be the number of dimension of the reduced space. The best approach may be to study the representativeness of the projections in reduced space for two, three, or more dimensions, using Shepard diagrams (Fig. 9.1). However, principal component analysis being a form of variance partitioning, researchers may wish to test the significance of the variance associated with the successive principal axes.

There are a number of classical statistical approaches to this question, such as Bartlett's (1950) test of sphericity. These approaches have been reviewed by Burt (1952) and Jackson (1993). The problem is that these formal tests require normality of all descriptors, a condition which is rarely met by ecological data.

Kaiser-

Guttman criterion

There is an empirical rule suggesting that one should only interpret a principal components if the corresponding eigenvalue X is larger than the mean of the X's. In the particular case of standardized data, where S is a correlation matrix, the mean of the X's is 1 so that, according to the rule, only the components whose X's are larger than 1 should be interpreted. This is the so-called Kaiser-Guttman criterion. Ibanez (1973) has provided a theoretical framework for this empirical rule. He showed that, if a variable made of randomly selected numbers is introduced among the descriptors, it is impossible to interpret the eigenvectors that follow the one on which this random-number variable has the highest loading. One can show that this random-number variable, which has covariances near zero with all the other descriptors, introduces in the analysis an eigenvalue equal to 1 if the descriptors have been standardized. For non-standardized descriptors, this eigenvalue is equal to the mean of the k's if the variance of the random-number variable is made equal to the mean variance of the other descriptors.

Frontier (1976) proposed to compare the list of decreasing eigenvalues to the Broken decreasing values of the broken stick model (Subsection 6.5.2). This comparison is stick based on the following idea. Consider the variance shared among the principal axes to be a resource embedded in a stick of unit length. If principal component analysis had divided the variance at random among the principal axes, the fractions of total variation explained by the various axes would be about the same as the relative lengths of the pieces obtained by breaking the unit stick at random into as many pieces as there are axes. If a unit stick is broken at random into p = 2, 3, ... pieces, the expected values (E) of the relative lengths of the successively smaller pieces (j) are given by eq. 6.49:

The expected values are equal to the mean lengths that would be obtained by breaking the stick at random a large number of times and calculating the mean length of the longest pieces, the second longest pieces, etc. A stick of unit length may be broken at random into p pieces by placing on the stick (p - 1) random break points selected using a uniform [0, 1] random number generator. Frontier (1976) has computed the percentage of variance associated with successive eigenvalues, under the broken stick null model, for 2 to 20 eigenvalues (Table D, end of this book).

Coming back to the eigenvalues, it would be meaningless to interpret the principal axes that explain a fraction of the variance as small as or smaller than that predicted by the broken stick null model. The test may be carried out in two ways. One may compare individual eigenvalues to individual predictions of the broken stick model (Table D) and select for interpretation only the eigenvalues that are larger than the values predicted by the model. Or, to decide whether eigenvalue kk should be interpreted, one may compare the sum of eigenvalues, from 1 to k, to the sum of the values from 1 to k predicted by the model. This test usually recognizes the first two or three principal components as meaningful; this corresponds to the experience of ecologists.

After an empirical study using a variety of matrix types, using simulated and real ecological data, Jackson (1993) concluded that two methods consistently pointed to the correct number of ecologically meaningful components in data sets: the broken-stick model and a bootstrapped eigenvalue-eigenvector method proposed in his paper.

Chapter 10 will discuss how to use explanatory variables to ecologically interpret the first few principal components that are considered to be meaningful according to one of the criteria mentioned in the present Subsection.