## Biplot Scaling

In addition to data-scaling considerations, biplots can also be scaled differently, which may in turn alter their interpretation. Two common biplot scalings are used to display PCA. A superimposed plot of the PCs (F) and the normalized eigenvectors (U) is known as a distance or Euclidean biplot. In this biplot, the PC scores are scaled so that their sums of squares equals the eigenvalue (A) for a given axis, the positions of samples in ordination space approximate their distance in Euclidean space, and the eigenvectors represent the projection of the dependent variable axes into the ordination space (Figure 4a). The length of the eigenvector indicates the contribution of the variable to the space - an eigenvector approaching a length of 1 indicates that the variable contributes strongly to defining the ordination space. In addition, the approximate values of the dependent variables in each sample can be reconstructed by projection at right angles of the sample values onto the eigenvector axes. Another common scaling is to scale the eigenvectors to equal their standard deviation (UL0 ) and standardize the PC scores to unit sum of squares (G = FL~a ). This is the covar-iance (or correlation if the data have been standardized) biplot. Unlike the Euclidean biplot, the distances between samples in the reduced space do not approximate their Euclidean distances - they have been standardized by a variance measure. In the covariance biplot the eigenvectors are rescaled to equal the square root of the eigenvalue (cf. normalized in the Euclidean biplot). This projection effectively rescales the eigenvectors to standard deviation units, and has some interesting properties. The length of the vector approximates the standard deviation of the variable, not its contribution to the ordination space. The angle between dependent variable vectors provides a measure of their covariance: covariance ~ cos 0, where 0 is the angle between dependent variable vectors (Figure 4b). If the PCA has been carried out on standardized data, then this angle will represent the correlation. These angles will only provide a good covariance or correlation estimate if the number of samples is large, the vectors are well represented in the analysis, and the variation explained by the axes is large. Both biplots have the property that the centered data can be reconstructed from the sample scores and the variable vectors: FU' = G(UL05)' = Y.

The correlations between the original variables and the values of the samples on the Euclidean PC axes may also be used to project dependent variables into a reduced space plot. These values are often termed factor loadings or factor patterns, but their usage should be treated with caution. If the PCA has been carried out on standardized data (i.e., the correlation matrix) then the covariance biplot eigenvector scaling (UL0.5) is equal to the factor

Raw data, covariance

 80 %) .7 60 2. (2 40 2 nt 20 e n 0 o p m o -20 c al ip ci in -60 Pr -80
 o o ■o o ■ ■ B ^ T T

-120-100-80-60-40-20 0 20 40 60 80 100120 Principal component 1 (66.9%)

-120-100-80-60-40-20 0 20 40 60 80 100120 Principal component 1 (66.9%)

c in

Raw data, correlation

, covariance

 /s > " T

-3 -2 -1 0 1 2 3 4 5 6 Principal component 1 (32.0%)

-3 -2 -1 0 1 2 3 4 5 6 Principal component 1 (32.0%)

ent 0

 ft r " o o o° O ▼
 varium R. hero F. lapillum 1 \ L_ N. segmentatus

-1.0-0.8-0.6-0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 Eigenvector 1

 0.6 0.4 2 r 0.2 to ct e v 0.0 n e ig Ei -0.2 -0.4 -0.6
 F. variu P. laticlavius ero N. segmentatus F. lapillum F. flavonigrum N. yaldwyni N. caerulepunctus K. stewarti

-0.6-0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Eigenvector 1

 F. varium F. malcolmi F. flavonigrum P.laticlavius F. lapillum R. wh ero N. segmentatus N. yaldwyni B. lesleyae N. caerulepunctus K. stewarti

-1.0-0.8-0.6-0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 Eigenvector 1

-0.6-0.4-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Eigenvector 1

Figure 3 Effects of data transformation and standardization on PCAof triplefin assemblages. Top row of graphs are the reduced space plots, the bottom row of graphs are the corresponding eigenvector plots with equilibrium contribution circles. (a) Untransformed covariance matrix analysis is strongly influenced by the two numerically dominant species, Notoclinops segmentatus and Forsterygion varium. (b) Untransformed correlation matrix analysis reduces the influence of the numerically dominant species, and increases the weight of the rarer species. (c) Covariance analysis of fourth-root transformed data also reduces the influence of the numerically dominant species, and increases the weight of the rarer species.

Normalized eigenvectors

Normalized eigenvectors

 F. varium R. whero F. lapillum Y NN. segmentatus

Eigenvectors scaled to the square root of the eigenvalue

"5

 F. lapillum F. varium whero whero N. segmentatus

-20 -10 0 10 20 30 40 50 60 70 80 Eigenvector 1 (uVA)

-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 Eigenvector 1 (normalized)

 F. lapillum F. varium vhero P.laticlavius F. malcolmi N. yaldwyni N. caerulepunctus K. stewarti

Figure 4 Effects of eigenvector scaling on a PCA of the raw covariance matrix. (a) Normalized eigenvectors associated with distance biplots are scaled to a length of 1. (b) Covariance biplots scale the eigenvectors so that the length of the vector approximates their standard deviation, and the cosine of the angle between variables approximates their covariance. (c) Factor loadings rescale the covariance biplot scaling by the standard deviation of the variable. This gives an estimate of the importance of the axis in explaining the variance of the variable - it does not represent the importance of the variable in explaining the axis. Note that if the PCA was carried on the correlation matrix the factor loadings would be equal to the covariance biplot because each variable's standard deviation is made equal to 1. In this example the ordination space is defined primarily by two species - Notoclinops segmentatus and Forsterygion varium.

loadings. Differences may occur when the analysis has been conducted on a covariance matrix. Factor loadings for each variable are calculated by dividing the covariance biplot-scaled eigenvectors by each variable's standard deviation. This has important consequences for interpretation of the plot. A variable could be very highly correlated with a PC axis, for example, but if it had a small variance then the factor loading might designate it as unimportant. Conversely variables with large variance might appear to be strongly associated with an axis, when in fact they contribute nothing to its construction. Factor loadings describe how important an axis is to a variable, not how important a variable is to an axis (Figure 4c). The rationale for this approach comes from a related method - 'factor analysis' - which considers measured variables as a function of a hypothesized causal process represented by the PCs, rather than the variables defining a reduced ordination space.