Mathematically, PCA can be calculated from a mean-centered (i.e., the mean of each variable is subtracted from all values of that variable) data matrix, Y. From Y, the covariance matrix is calculated using the formula: 1/(»—1)Y'Y (i.e., the sums of squares and cross-products matrix, rescaled by the n - 1 degrees of freedom). This square, symmetric matrix can be decomposed by an eigenanalysis or 'singular value decomposition' into eigenvalues (L) and eigenvectors (U), which are normalized or scaled to a length of 1. The eigenvalues represent the amount of variation explained by each axis and are usually expressed as a proportion or percentage of the sum of all the eigenvalues. The PCs (F) are calculated by projecting the mean-centered data into the ordination space by postmultiplying the centered data by the eigenvectors: F = YU. An important point to note is that the value of a sample on the PC is a linear combination of the values of the variables in the sample, multiplied by their corresponding eigenvectors. The eigenvectors represent the projection of the old species axes into the new ordination space.
An alternative method of calculating PCA is to use an iterative method such as the two-way weighted summation (TWWS) algorithm. This method starts with a mean-centered data matrix, and arbitrary initial scores on the first PC axis are assigned. The eigenvectors on the initial PC scores are calculated, and then the sample PC scores on these eigenvectors are calculated and rescaled to a length of 1. An estimate of the eigenvalue is obtained from the standard deviation divided by the number of samples, and the procedure is re-run until the eigenvalue does not change with further iterations. Upon convergence, the eigenvectors are scaled to a length of 1, and the PCs are scaled to the eigenvalue. Subsequent axes are calculated in a similar way, except that the PC score estimates at each iteration stage are made uncorrelated with previous ones using the Gram-Schmidt orthogonalization procedure. Both methods yield the same result (within iterative tolerance limits). The eigenanalysis method is easier to program in languages that support matrix operations, whereas the TWWS algorithm can be more efficient for very large data sets because each PC axis is calculated sequentially.
Was this article helpful?