Canonical correlation analysis CCorA

Canonical correlation analysis (CCorA; Hotelling, 1936), differs from redundancy analysis (RDA) in the same way as linear correlation differs from simple linear regression. In CCorA, the two matrices under consideration are treated in a symmetric way whereas, in RDA, the Y matrix is considered to be dependent on an explanatory matrix X. The algebraic consequence is that, in CCorA, the matrix whose eigenvalues and eigenvectors are sought (eq. 11.22) is constructed from all four parts of eq. 11.2 whereas, in the asymmetric RDA, eq. 11.3 does not contain the Syy portion.

The brief discussion below is only meant to show the general principles of the CCorA method. Ecologists interested in delving deeper into the method will find detailed accounts of the theory in Kendall & Stuart (1966), and computation procedures in Anderson (1958). Gittins (1985) presents a comprehensive review of the theory and applications of CCorA in ecology. Now that RDA (Section 11.1) and CCA (Section 11.2) are available, CCorA has limited applications; RDA and CCA correspond better than CCorA to the way most two-matrix problems are formulated.

In CCorA, the objects (sites) under study are described by two sets of quantitative descriptors; for example, a first set Yj of p chemical and a second set Y2 of m geomorphological descriptors of the sampling sites; or, a first set Yj of p species and a second set Y2 of m descriptors of the physical environment. The dispersion matrix S of these p + m descriptors is therefore made of four blocks, as in eq. 11.2:

S11 S12 S 12 S22

The algebra that follows applies equally well to S matrices defined as variance-covariance matrices (e.g. Syy = (1/(n - 1)) Y'Y) or matrices of sums of squares and cross products (e.g. Syy = Y'Y). Submatrices S11 (order p x p) and S22 (order m x m), refer, respectively, to one of the two sets of descriptors, whereas S12 (order p x m) and its transpose S'12 = S21 (order m x p) account for the interactions between the two sets of descriptors. Numbers (1, 2) are used here to designate matrices, instead of letters (X, Y) as in eq. 11.2, to emphasize the fact that the two data matrices (Y1, Y2) play equivalent roles in CCorA.

The problem consists in maximizing the between-set dispersion with respect to the within-set dispersion. The expression to be optimized is S12S22S'12 Sn since S12S'12 /SnS22 does not exist in matrix algebra. Finding solutions to this optimization problem calls for eigenvalues and eigenvectors. Canonical correlations are obtained by solving the characteristic equation:

which corresponds to one of the following equations, resulting from the multiplication of both members of eq. 11.22 by either S11 or S22:

Canonical correlations rk are the square roots of the eigenvalues Xk (Xk = rk). The same Xk values are found using either equation. The next step is to calculate the eigenvectors of the two equation systems, corresponding to each eigenvalue. The two eigenvectors give the linear combinations of the two sets of original descriptors (Y1, Y2) corresponding to each eigenvalue. For eigenvalue Xk, the eigenvectors Uk and Vk are computed using the following matrix equations:

Was this article helpful?

0 0

Post a comment