Like most (if not all) multivariate methods PCA can be sensitive to data transformation and standardization. Because PCA is an eigenanalysis of a variance-covariance matrix, which is dependent on the numerical scale of the data, variables with large absolute values will dominate the data structure. If the data table consisted of variables measured on different scales (e.g., abundance, kilograms, milliliters, pH) then this scale dependency could exert unwanted effects on the analysis. In addition, a quantity such as a volume if measured in milliliters in one sample, for example, would exert more effect than a volume measured in liters in another sample, even if both samples contained the same volume. In the triplefin example, the PCA of the covariance of untransformed triplefin abundance data was dominated by the numerically dominant species, N. segmentatus and F. varium (Figure 3 a) and largely insensitive to less-common species. This may be a problem if the intent of the analysis is to retain information on less-abundant species or, more generally, variables with small but biologically important values
There are two ways to reduce the effect of variables with large absolute values, and increase the effect or weight of variables with small values. First, the data can be centered and transformed to standard deviation units. This process is called standardization, and is implicit in many software implementations of PCA. If data are standardized to unit variance prior to the analysis, then PCA becomes an eigenanalysis of a correlation rather than a covariance matrix. The effect of this standardization is to give all variables equal weight in the analysis and is commonly used and recommended in ecological applications. In contrast with the covariance matrix PCA, an analysis of the correlation matrix of untransformed triplefin abundance data yields an ordination in which both common and uncommon species are important in defining the ordination space (Figure 3b). Second, data can be numerically transformed using functions such as a square-root, fourth-root, and log transform. Transformations are often used to improve linearity between variables or to reduce the effect of variables with large values in the analysis. However different transformations will alter the importance of different variables in defining the ordination space, and hence may alter the ecological interpretation. In general, increasing levels of transformation (e.g., x0 5, x025, log(x)) will progressively shift analytical emphasis from abundance to compositional aspects of the data. For example, less-abundant triplefin species assume more importance in a covariance matrix PCA with a fourth-root transform (x02 ), although not to the same extent as a PCA on the correlation matrix (Figure 3 c).
Was this article helpful?