The smallest number in this matrix is 4.47, which indicates that sample

3 joins the previously formed group. The subsequent matrix is:

The smallest number in this matrix is 5.83, which indicates that samples

4 and 6 join to form a group. This leaves the following distances between entities:

Figure 3.14 Dendrogram for the data shown in Figure 3.13, based on single-linkage clustering.

The smallest number in this matrix is 8.94, which indicates that sample 5 joins the group comprised of quadrats 1, 2, and 3. Finally, the two remaining groups are joined at a distance of 9.85. The resulting dendrogram summarizes the algorithm (Figure 3.14).

Single-linkage clustering is rarely used to analyze ecological data. This algorithm ignores group properties by calculating distances based only on the distances between individual samples. In addition, single-linkage clustering is said to be "space contracting:" as a group grows, it becomes more similar to other groups. The space-contracting nature of single-linkage clustering tends to cause samples to be added to preceding groups one at a time (this is sometimes termed "chaining"), which hampers interpretability of the resulting dendrogram.

Complete-linkage (farthest-neighbor) clustering (Sneath and Sokal 1973) is identical to single-linkage clustering except that the distance between entities is defined as the point of maximum distance between samples in the groups being compared. For example, the distance between group (1,2) and sample 3 is the maximum distance involved in the comparison (7.07) rather than the minimum distance used in single-linkage clustering (4.47).

Complete-linkage clustering overcomes the "chaining" produced by single-linking clustering. In fact, complete-linkage clustering is "space dilating:" as a group grows, it becomes less similar to other groups. As with single-linkage clustering, group properties are ignored because distances are calculated on the basis of distance between individual samples.

Average-linkage clustering and centroid clustering (Sokal and Michener 1958) are similar to single-linkage and complete-linkage

Figure 3.15 Dendrogram with reversals. Reproduced with permission from Gauch and Whittaker (1981).

clustering, except that average distances form the basis for joining entities. Therefore, the properties of groups are used to assess similarity, an approach which is viewed as advantageous relative to single-linkage and complete-linkage clustering. However, these techniques are characterized by peculiar behavior that creates difficulties with interpretation and which, subsequently, has limited their widespread adoption. Specifically, these algorithms have the potential to exhibit "reversals," a situation in which entities are joined at a shorter distance than a previous fusion (Figure 3.15). This implies that entities joined in the second fusion are more similar than those joined in the first.

Single-linkage, complete-linkage, average-linkage, and centroid clustering optimize the route along which structure is sought in the data. These algorithms focus on the distance between entities, not on the properties of the entities themselves. These algorithms are within the general class of techniques called hierarchical methods. In contrast, nonhierarchical methods optimize some property of the group being formed (e.g., increase in sums of squares). Thus, group properties are the explicit focus of nonhierarchical methods.

Minimum-variance clustering (syn. Ward's method, Orloci's method) (Ward 1963) is a nonhierarchical clustering algorithm that uses the properties of groups to assess similarity. Thus, it incorporates information about groups, not merely about individual samples. Like the clustering algorithms previously described, minimum-variance clustering employs Euclidean distance as a distance measure. The concept underlying minimum-variance clustering is that the distance between members of a group and the group's centroid (i.e., center) can be used as an indicator of group heterogeneity. Specifically, the fusion rule used by minimum-variance clustering is: join groups only if the increase in the squared distances is less for that pair of groups than for any other pair. Thus, minimum-variance clustering minimizes heterogeneity within groups and therefore favors the formation of small clusters of approximately equal size.

This algorithm lends itself to a measure of classification efficiency; the total sums of squares is expressed as the squared distance between all quadrats and the centroid - in our example, the group centroid is given by ((15 +12 + 17 + 0 + 8 + 3)/6, (9 + 8 +13 + 7 + 0 + 12)/6), or (9.167, 8.167). At any point in the analysis, sums of squares can be calculated for each group, and this measure represents within-group heterogeneity. The proportion of total variability explained by a particular group is an indicator of that cluster's importance in the data set (i.e., SSgroup/SStotal).

TWINSPAN is a divisive clustering method that relies on an ordination algorithm, RA. A crude dichotomy is formed in the data, with the RA centroid serving as the dividing line between two groups. This dichotomy is refined by a process comparable to iterative character weighting (Hogeweg 1976), a summary of which is provided by Jongman et al. (1987:194-5). Dichotomies are then "ordered" so that similar clusters are near each other. The TWINSPAN algorithm ensures that dichotomies are determined by relatively large groups, so these dichotomies depend on general relations rather than on single observations, which may be atypical.

Unlike prior clustering algorithms, TWINSPAN also produces a classification of species. This classification is based on the fidelity of species to specific samples or clusters of samples. Thus, in addition to a dendrogram, a structured table is produced (Table 3.3). However, a structured table produced from large data sets is rarely presented because patterns in a large table are not readily discernible.

Table 3.3 Structured table from TWINSPAN

Was this article helpful?

0 0

Post a comment