Gower

This measure, also called stress 1 (Kendall, 1938), takes values in the interval [0, it is used as a measure of goodness-of-fit in nonmetric multidimensional scaling (eq. 9.28). Small values indicate high fit. Like the cophenetic correlation, this measure only has relative value when comparing clustering results obtained from the same original similarity matrix. Several other such functions are listed in Rohlf (1974).

Other measures have been proposed for comparing different partitions of the same objects. Consider in turn all pairs of objects and determine, for each one, whether the two objects are placed in the same group, or not, by the partition. One can construct a 2 x 2 contingency table, similar to the one shown at the beginning of Subsection 7.3.1, comparing the pair assignments made by two partitions. The simple matching coefficient (eq. 7.1), computed on this contingency table, is often called the Rand index (1971). Hubert & Arabie (1985) have suggested a modified form that corrects the Rand index as follows: if the relationship between two partitions is comparable to that of partitions picked at random, the corrected Rand index returns a value near 0. The modified Rand index is widely used for comparing partitions.

A Shepard diagram is a scatter plot comparing distances in a space of reduced dimension, obtained by ordination methods, to distances in the original association matrix (Fig. 9.1). This type of diagram has been proposed by Shepard (1962) in the paper where he first described nonmetric multidimensional scaling (Section 9.3). Shepard-like Shepard-like diagrams can be constructed to compare the similarities (or distances) of diagram the cophenetic matrix (Section 8.3) to the similarities (or distances) of the original resemblance matrix (Fig. 8.23). Such a plot may help choose between parametric and nonparametric cophenetic correlation coefficients. If the relationship between original

Original similarities (S15) Original similarities (S15) Original similarities (S15)

Figure 8.23 Shepard-like diagrams comparing cophenetic similarities to original similarities for 21 lakes clustered using (a) single linkage (Co = 0, cophenetic r = 0.64, T = 0.45), (b) proportional link linkage (Co = 0.5, cophenetic r = 0.75, T = 0.58), and (c) complete linkage clustering (Co = 1, cophenetic r = 0.68, T = 0.51). Co is the connectedness of the linkage clustering method (Subsection 8.5.3). There are 210 points (i.e. 210 similarity pairs) in each graph. The diagonal lines are visual references.

Original similarities (S15) Original similarities (S15) Original similarities (S15)

Figure 8.23 Shepard-like diagrams comparing cophenetic similarities to original similarities for 21 lakes clustered using (a) single linkage (Co = 0, cophenetic r = 0.64, T = 0.45), (b) proportional link linkage (Co = 0.5, cophenetic r = 0.75, T = 0.58), and (c) complete linkage clustering (Co = 1, cophenetic r = 0.68, T = 0.51). Co is the connectedness of the linkage clustering method (Subsection 8.5.3). There are 210 points (i.e. 210 similarity pairs) in each graph. The diagonal lines are visual references.

and cophenetic similarities is curvilinear in the Shepard-like diagram, as it is the case in Figs. 23a and c, a nonparametric correlation coefficient should be used.

Fig. 8.23 also helps understand the space-contraction effect of single linkage clustering, where the cophenetic similarities are always larger than or equal to the original similarities; the space-conservation effect of intermediate linkage clustering with connectedness values around Co = 0.5; and the space-dilation effect of complete linkage clustering, in which cophenetic similarities can never exceed the original similarities. There are (n - 1) clustering levels in a dendrogram. This limits to (n - 1) the number of different values that can be found in a cophenetic matrix and, hence, along the ordinate of a Shepard-like diagram. This is why points form horizontal bands in Fig. 8.23.

Following are three measures of goodness-of-fit between the single linkage clustering results and the original similarity matrix, for the pond example:

Pearson r cophenetic correlation = 0.941 Kendall Tb cophenetic correlation = 0.774 Gower distance = 0.191

Was this article helpful?

## Post a comment