Association coefficients

The most usual approach to assess the resemblance among objects or descriptors is to first condense all (or the relevant part of) the information available in the ecological data matrix (Section 2.1) into a square matrix of association among the objects or descriptors (Section 2.2). In most instances, the association matrix is symmetric. Non-symmetric matrices can be decomposed into symmetric and skew-symmetric components, as described in Section 2.3; the components may then be analysed separately. In Chapters 8 and 9, objects or descriptors will be clustered or represented in reduced space after analysing an association matrix. It follows that the structure resulting from the numerical analysis is that of the association matrix; the results of the analysis do not necessarily reflect all the information originally contained in the ecological data matrix. This stresses the importance of choosing an appropriate measure of association. This choice determines the issue of the analysis. Hence, it must take into account the following considerations:

• The nature of the study (i.e. the initial question and the hypothesis) determines the kind of ecological structure to be evidenced through an association matrix, and consequently the type of measure of resemblance to be used.

• The various measures available are subject to different mathematical constraints. The methods of analysis to which the association matrix will be subjected (clustering,

* Program 3WAYPACK (Kroonenberg, 1996) for three-way principal component analysis is available from Pieter M. Kroonenberg, Department of Educational Sciences, Leiden University, Wassenaarseweg 52, NL-2333 AK Leiden, The Netherlands. Other three-mode software is described on the WWWeb page: <http://www.fsw.leidenuniv.nl/~kroonenb/>.

Dependence

Similarity

Distance ordination) often require measures of resemblance with specific mathematical properties.

• One must also consider the computational aspect, and thus preferably choose a measure which is available in a computer package (Section 7.7) or can easily be programmed.

Ecologists are, in principle, free to define and use any measure of association suitable to the ecological phenomenon under study; mathematics impose few constraints to this choice. This is why so many association coefficients are found in the literature. Some of them are of wide applicability whereas others have been created for specific needs. Several coefficients have been rediscovered by successive authors and may be known under various names. Reviews of some coefficients may be found in Cole (1949, 1957), Goodman & Kruskal (1954, 1959, 1963), Dagnelie (1960), Sokal & Sneath (1963), Williams & Dale (1965), Cheetham & Hazel (1969), Sneath & Sokal (1973), Clifford & Stephenson (1975), Orloci (1978), Daget (1976), Blanc et al. (1976), Gower (1985), and Gower & Legendre (1986).

In the following sections, association will be used as a general term to describe any measure or coefficient used to quantify the resemblance or difference between objects or descriptors, as proposed by Orloci (1975). With dependence coefficients, used in the R mode, zero corresponds to no association. In Q-mode studies, similarity coefficients between objects will be distinguished from distance (or dissimilarity) coefficients. Similarities are maximum when the two objects are identical and minimum when the two objects are completely different; distances follow the opposite rule. Figure 7.2 (left) clearly shows the difference between the two types of measures: the length of the line between two objects is a measure of their distance, whereas its thickness, which decreases as the two objects get further apart, is proportional to their similarity. If needed, a similarity can be transformed into a distance, for example by computing its one-complement. For a similarity measure varying between 0 and 1, as is generally the case, the corresponding distance may be computed as:

Distances, which in some cases are not bound by a pre-determined upper value, may be normalized, using eqs. 1.10 or 1.11:

max max min where Dnorm is the distance normalized between [0, 1] whereas Dmax and Dmin are the maximum and minimum values taken by the distance coefficient, respectively. Normalized distances can be used to compute similarities, by reversing the transformations given above:

The following three sections describe the coefficients that are most useful with ecological data. Criteria to be used as guidelines for choosing a coefficient are discussed in Section 7.6. Computer programs are briefly reviewed in Section 7.7.