Adequacy of the PCA Solution

PCA generates as many PC axes as there are variables. The axes with the larger eigenvalues hopefully describe trends in the data, whereas the axes with smaller eigenvalues simply represent random variation. There are no authoritative rules for deciding how many PCs are interpretable. Initial recommendations were based on the cumulative percentage of variation explained by the eigenvalues. However ecological data sets differ in their correlation structure, so defining an arbitrary level of variation (e.g., 75-95%) is not a biologically relevant criterion and its use has been widely disregarded. The plot of the eigenvalues against the axis order (a scree plot) can guide the identification of'important' PCs (Figure 5). Scree plots can be used to visually identify breaks between PCs that potentially explain trends, and those that represent statistical noise. Typically the trivial eigenvectors on the right of the scree plot will form a linear series, and major magnitude changes on the left may represent trends. The efficacy of this visual determination of breaks is dependent on the underlying data structure. The Kaiser-Guttman criterion requires that eigenvalues that exceed the average expectation should be retained. In

Figure 5 Scree plot of the proportion of variation explained by each successive principal component (PC) of an analysis on the covariance matrix of x025-transformed triplefin data (filled circle). The Kaiser-Guttman criterion (average proportion explained by eigenvalues) suggests that three axes should be retained for analysis. The intersection of the broken stick model (open circle) with the scree plot suggests that two axes should be retained for analysis. The inflexion point of the scree plot also suggests that three axes should be retained. While it is clear that at least two axes should be retained, most users of PCA would also examine the third axis to ascertain if it described some ecologically interpretable pattern.

Principal component axis number

Figure 5 Scree plot of the proportion of variation explained by each successive principal component (PC) of an analysis on the covariance matrix of x025-transformed triplefin data (filled circle). The Kaiser-Guttman criterion (average proportion explained by eigenvalues) suggests that three axes should be retained for analysis. The intersection of the broken stick model (open circle) with the scree plot suggests that two axes should be retained for analysis. The inflexion point of the scree plot also suggests that three axes should be retained. While it is clear that at least two axes should be retained, most users of PCA would also examine the third axis to ascertain if it described some ecologically interpretable pattern.

a PCA of the correlation matrix, all variables have equal variances and hence the sum of eigenvalues is equal to the number of variables. Consequently the Kaiser-Guttman criterion on a PCA on the correlation matrix is that eigenvalues greater than 1 should be interpreted. While intuitively the Kaiser-Guttman criterion seems reasonable, there is sampling variability in ecological data sets and so the average expectation may not be a suitable null model. An alternative approach is to use the 'broken stick model' to identify the null distribution of eigenvalues, if there was no structure in the data (Figure 5). Expected eigenvalues for a given axis under the broken stick model can be calculated as bk = 1/pY^i=kl/i, where p is the number of variables, and bk is the expected proportional eigenvalue for the kth component. Computationally intensive randomization tests such as bootstrap confidence intervals can also be used to identify which eigenvalues are nontrivial. Formal statistical tests such as Bartlett's test of sphericity, and both Bartlett's and Lawley's test of homogeneity of the correlation have generally fared poorly in simulations. A general recommendation would be to use the broken stick model to identify nontrivial PC axes if bootstrapping was not available.

Similar issues exist for interpreting which eigenvectors are important in a PCA. When eigenvectors are normalized, their total length is 1. Consequently if an eigenvector on a particular PC axis has a value close to 1 then that variable is well represented on that axis and less well represented on other axes. However if a variable is not strongly associated with any PC, the eigenvectors for that variable should be equal across axes. The expected eigenvector for a variable that is not associated with any PCs is known as the equilibrium contribution, and is given by \Jd/ p, where d is the number of dimensions of interest, and p is the number of variables. Eigenvectors with values larger than the equilibrium contribution for a single axis can be considered to be associated with that axis. Similarly eigenvectors with values larger than the equilibrium contribution for two axes could be considered to be associated with forming a 2D space. If the eigenvectors are not normalized then the equilibrium contribution must be calculated separately for each variable and is given by sj^d/p, where Sj is the standard deviation of the jth variable. If the eigenvectors are normalized, the equilibrium contribution can be presented on a graph as a circle (e.g., Figure 2c).

Was this article helpful?

0 0
10 Ways To Fight Off Cancer

10 Ways To Fight Off Cancer

Learning About 10 Ways Fight Off Cancer Can Have Amazing Benefits For Your Life The Best Tips On How To Keep This Killer At Bay Discovering that you or a loved one has cancer can be utterly terrifying. All the same, once you comprehend the causes of cancer and learn how to reverse those causes, you or your loved one may have more than a fighting chance of beating out cancer.

Get My Free Ebook


Post a comment