## Projecting data sets in a few dimensions

Ordination (from the Latin ordinatio and German Ordnung) is the arrangement of units in some order (Goodall, 1954). This operation is well-known to ecologists. It consists in plotting object-points along an axis representing an ordered relationship, or forming a scatter diagram with two or more axes. The ordered relationships are usually quantitative, but it would suffice for them to be of the type "larger than", "equal to", or "smaller than" (semiquantitative descriptors) to permit meaningful ordinations. Gower Ordination (1984) points out that the term ordination, used in multivariate statistics, actually comes from ecology where it refers to the representation of objects (sites, stations, relevés, etc.) as points along one or several axes of reference.

In ecology, several descriptors are usually observed for each object under study. In most instances, ecologists are interested in characterizing the main trends of variation of the objects with respect to all descriptors, not only a few of them. Looking at scatter plots of the objects with respect to all possible pairs of descriptors is a tedious approach, which generally does not shed much light on the problem at hand. In contrast, the multivariate approach consists in representing the scatter of objects in a multidimensional diagram, with as many axes as there are descriptors in the study. It is not possible to draw such a diagram on paper with more than two or eventually three dimensions, however, even though it is a perfectly valid mathematical construct. For the purpose of analysis, ecologists therefore project the multidimensional scatter diagram onto bivariate graphs whose axes are known to be of particular interest. The axes of these graphs are chosen to represent a large fraction of the variability of the multidimensional data matrix, in a space with reduced (i.e. lower) dimensionality relative to the original data set. Methods for ordination in reduced space also allow one to derive quantitative information on the quality of the projections and study the relationships among descriptors as well as objects.

 Method Distance preserved variables Principal component analysis (PCA) Euclidean distance Quantitative data, linear relationships (beware of double-zeros) Principal coordinate analysis (PCoA), metric (multidimensional) scaling, classical scaling Any distance measure Quantitative, semiquantitative, qualitative, or mixed Nonmetric multidimensional scaling (NMDS, MDS) Any distance measure Quantitative, semiquantitative, qualitative, or mixed Correspondence analysis (CA) X2 distance Non-negative, dimensionally homogeneous quantitative or binary data; species abundance or presence/absence data Factor analysis sensu stricto Euclidean distance Quantitative data, linear relationships (beware of double-zeros)

Ordination in reduced space is often referred to as factor (or inertia) analysis since it is based on the extraction of the eigenvectors or factors of the association matrix. In Factor the present book, the expression factor analysis will be restricted to the methods analysis discussed in Section 9.5. Factor analysis sensu stricto is mainly used in the social sciences; it aims at representing the covariance structure of the descriptors in terms of a hypothetical causal model.

The domains of application of the techniques discussed in the present chapter are summarized in Table 9.1. Section 9.1 is devoted to principal component analysis, a powerful technique for ordination in reduced space which is, however, limited to sets of quantitative descriptors. Results are also sensitive to the presence of double-zeros. Sections 9.2 and 9.3 are concerned with principal coordinate analysis (metric scaling) and nonmetric multidimensional scaling, respectively. Both methods project, in reduced space, the distances among objects computed using some appropriate association measure (S or D; Chapter 7); the descriptors may be of any mathematical kind. Section 9.4 discusses correspondence analysis, a most useful ordination technique for species presence/absence or abundance data. Finally, and as mentioned above, Section 9.5 summarizes the principles of factor analysis sensu stricto. The presentation of the various forms of canonical analysis, which are also eigenvector-based techniques (like PCA, PCoA, and CA), is deferred to Chapter 11.

It often happens that the structure of the objects under study is not continuous. In such a case, an ordination in reduced space, or a scatter diagram produced using two important variables, may be sufficient to make the grouping structure of the objects obvious. Ordination methods may thus be used, sometimes, to delineate clusters of objects (Section 8.1); see however the remarks of Section 8.9 about the use of ordination methods in the study of species associations. More generally, ordinations may always be used as complements to cluster analyses. The reason is that clustering investigates pairwise distances among objects, looking for fine relationships, whereas ordination in reduced space considers the variability of the whole association matrix and thus brings out general gradients. Different methods for superimposing the results of clustering onto ordinations of the same objects are described in Section 10.1.

Ecologists generally use ordination methods to study the relative positions of Reduced objects in reduced space. An important aspect to consider is the representativeness of space the representation in reduced space, which usually has d = 2 or 3 dimensions. To what extent does the reduced space preserve the distance relationships among objects? To answer this, one can compute the distances between all pairs of objects, both in the multidimensional space of the original p descriptors and in the reduced d-dimensional space. The resulting values are plotted in a scatter diagram such as Fig. 9.1. When the projection in reduced space accounts for a high fraction of the variance, the distances between projections of the objects in reduced space are quite similar to the original distances in multidimensional space (case a). When the projection is less efficient, the distances between objects in reduced space are much smaller than in the original space. Two situations may then occur. When the objects are at proportionally similar distances in the two spaces (case b), the projection is still useful even if it accounts for a small fraction of the variance. When, however, the relative positions of objects are not the same in the two spaces (case c), the projection is useless. Ecologists often disregard the interpretation of ordinations when the reduced space does not account for a high fraction of the variance. This is not entirely justified, since a projection in reduced space may be informative even if that space only accounts for a small fraction of the variance (case b).

Shepard The scatter diagram of Fig. 9.1, which is often referred to as a Shepard diagram diagram (Shepard, 1962; diagrams in Shepard's paper had their axes transposed relative to Fig. 9.1), may be used to estimate the representativeness of ordinations obtained using any reduced-space ordination method. In principal component analysis (Section 9.1), the distances among objects, in both the multidimensional space of original descriptors and the reduced space, are calculated using Euclidean distances (D1, eq. 7.34). The F matrix of principal components (eq. 9.4 below) gives the coordinates of the objects in the reduced space. In principal coordinate analysis (Section 9.2) and nonmetric multidimensional scaling (Section 9.3), Euclidean distances among the objects in reduced space are compared to distances Dhi found in matrix D used as the basis for computing the ordination. In correspondence analysis (Section 9.4), it is the %2 distance (D16, eq. 54) among objects which is used on the abscissa (Table 9.1). Shepard-like diagrams can also be constructed for cluster analysis (Fig. 8.23).

Figure 9.1 Shepard diagram. Three situations encountered when comparing distances among objects, in the ^-dimensional space of the p original descriptors (abscissa) versus the ¿-dimensional reduced space (ordinate). The figure only shows the contours of the scatters of points. (a) The projection in reduced space accounts for a high fraction of the variance; the relative positions of objects in the ¿-dimensional reduced space are similar to those in the p-dimensional space. (b) The projection accounts for a small fraction of the variance, but the relative positions of the objects are similar in the two spaces. (c) Same as (b), but the relative positions of the objects differ in the two spaces. Adapted from Rohlf (1972). Compare to Fig. 8.23.

Figure 9.1 Shepard diagram. Three situations encountered when comparing distances among objects, in the ^-dimensional space of the p original descriptors (abscissa) versus the ¿-dimensional reduced space (ordinate). The figure only shows the contours of the scatters of points. (a) The projection in reduced space accounts for a high fraction of the variance; the relative positions of objects in the ¿-dimensional reduced space are similar to those in the p-dimensional space. (b) The projection accounts for a small fraction of the variance, but the relative positions of the objects are similar in the two spaces. (c) Same as (b), but the relative positions of the objects differ in the two spaces. Adapted from Rohlf (1972). Compare to Fig. 8.23.

The following sections discuss the ordination methods most useful to ecologists. They are written to be easily understood by ecologists, so that they may not entirely fulfil the expectations of statisticians. Many programs are available to carry out

ordination analysis; several of them are described by Michael Palmer . For detailed discussions on the theory or computing methods, one may refer to ter Braak (1987c) and Morrison (1990), among other works. Important references about correspondence analysis are Benzecri and coll. (1973), Hill (1974), Greenacre (1983), and ter Braak (1987c). Gower (1984 and 1987) reviewed the ordination methods described in this chapter, plus a number of other techniques developed by psychometricians. Several of these are progressively finding their way into numerical ecology. They include methods of metric scaling other than principal coordinate analysis, multidimensional unfolding, orthogonal Procrustes analysis (the Procrustes statistic m2 is described in Subsection 10.5.4) and its generalized form, scaling methods for several distance matrices, and a method for ordination of non-symmetric matrices.

* WWWeb site: <http://www.okstate.edu/artsci/botany/ordinate/software.htm>.