## Correspondence analysis CA

Correspondence analysis is distinct from PCA and PCOA due to the intrinsic assumptions and the corresponding transformations applied. In CA the data table is assumed to be a contingency table; that is, a table containing counts. Some of the very many alternative names for CA reflect this fact:

• Contingency table analysis (Fisher 1940)

• Analyse factorielle des correpondances (Benzecri 1969)

• Reciprocal averaging (Hill 1973)

• Reciprocal ordering (Orloci 1978)

• Dual scaling (Nishisato 1980)

The elements of the initial data table, fj, are frequency counts, which are then relativized by the row and column sums, and from this deviations from expectations, uj, are derived according to:

The notation used is shown in Table 5.1 (a). Examples (b) and (c) illustrate the typical effects of this transformation. (b) points to the fact that the scores analysed are deviations from expectation and not the raw information. The first element, fn = 3, turns out to be a relatively low score and the final un = -0.05 is negative as it is below expectation. It is an element of a row with a fairly high marginal total (fi. = 4) and also of a column of high marginal total (f.1 = 4). The elements f12 = 1 and f21 = 1, in contrast, are above expectation. It is important to note that CA will use these derived values and not the original scores!

A typical effect of the adjustment by rows and columns is demonstrated in Table 5.1 (c). What matters is the proportions of the elements of the data vectors. CA causes species 1 and species 2 to reflect the same pattern since the proportion of the scores are the same. Similarly, releves 1 and 2 are rated identical. In terms of CA, the two releves and the two species are identical and no usable information can be analysed.

Like in some variants of PCA, the calculations are now based on a matrix of product moments, S = UU', computed for p attributes with a characteristic element:

From this similarity matrix the non-zero Eigenvalues, à,... ,kt, and the associated Eigenvectors, a1,...,at, are extracted. The Eigenvalues have the form of correlation coefficients: the mth Eigenvalue is the square of the mth canonical correlation. The Eigenvector matrix A, after the adjustment shown below, gives ordination scores for the attributes.

Table 5.1 (a) Notation used in correspondence analysis. (b) An illustrative numerical example. (c) An example with no information content in terms of CA. Frequency tables are in the left row, deviations from expectation in the right row.

 rel 1 rel 2 spi fu CI < /1. Sp2 f21 f 22 f 2. /.I f.. rel 1 rel 2 spi 3 1 4 Sp2 1 0 1 4 1 5 rel 1 rel 2 spi 3 6 9 Sp2 1 2 3 4 8 12

Table 5.1 (a) Notation used in correspondence analysis. (b) An illustrative numerical example. (c) An example with no information content in terms of CA. Frequency tables are in the left row, deviations from expectation in the right row.

 rel 1 rel 2 sp 1 U H U 12 u 1. sp2 U 21 "22 "2. u.i U.2 u.. rel 1 rel 2 sp 1 -0.05 0.10 0.05 SP2 0.10 -0.20 -0.10 0.05 -0.10 -0.05 rel 1 rel 2 sp 1 0.00 0.00 0.00 SP2 0.00 0.00 0.00 0.00 0.00 0.00

As explained in Legendre & Legendre (1998, p. 456), there are different ways to scale the scores. For ecological applications it is most appropriate to choose an adjustment that allows the joint plot of row (releve) and column (species) scores. From the Eigenvectors, the species scores X are derived directly by weighting with the square root of the inverse of the marginal totals:

This formula also involves standardization of the Eigenvectors to fulfil the following conditions:

To compute the relevé matrix Y, one could transpose the data matrix F and repeat all the computations. If so, one would observe that the resulting Eigenvalues were exactly the same. There is a direct way to derive the releve scores from the species scores:

Efhjxhm /c

h=i f-jRm where Rm is the mth canonical correlation. If all the calculations are done on the transposed data matrix then the results will be entirely identical. When analysing large data sets, carrying out the analysis on the smaller similarity matrix will save computation time.

The difference between PCA (or PCOA) and CA is considerable in terms of the content and shape of the point cloud. Using the same data set, a CA and a PCA ordination are computed and displayed in Figure 5.7 for comparison. Superimposed is the same classification of releves. As in many other cases, it can be observed that the gradient displayed in CA is v-shaped, whereas the gradient in PCA is u-shaped. Also, the two-dimensional resolution of the classification (the distinction of groups) is usually somewhat better in PCA than in CA. The order of the releves along the main gradient is roughly the same and I therefore conclude that both of the methods reveal the underlying pattern.

CA has some unpleasant properties to be kept in mind when using it. First, it is more sensitive to outliers (see Section 11.3) than other ordination

 O 2 □ □ □ x V • i 1 •• £ *x 3 rS A K ^ 0 0 X ^x . o X X X
 * X

Figure 5.7 Comparison of CA and PCA. Data points are identically classified. 'Schlaenggli' data set used (Appendix B).

methods. Unlike in PCA and PCOA, the range of the coordinates increases with higher dimension (and lower Eigenvalue). It is good practice to restrict adjustment of scales of the ordination axes to the ones used for plotting.