Averagelinkage clustering Upgma Wpgma Upgmc and WPGMC

In many fields of science an intermediate solution to clustering is preferred as standard. The term 'average-linkage clustering' has been used for many different methods (e.g. by Sneath & Sokal 1973), such as the abbreviations given in the title of this section. They have in common that the definition of group similarity is based on all members of a group and not just one as in single- and complete-linkage clustering. It is in this regard that they are related to minimum-variance clustering (Section 6.3).

The four methods are discussed in some detail in Legendre & Legen-dre (1998), where a small numerical example is given for illustration and comparison. I will continue using the abbreviations and refer to Table 6.1 for the full names. The methods are distinguished by two alternative criteria (Table 6.1). UPGMA and WPGMA use the average resemblance of all group members as a criterion for between-group resemblance. In UPGMC and WPGMC, a group centroid is established: in geometrical terms this is the centre of gravity of any one group. Between-group resemblance is thus the distance or similarity of any two centroids. As with many groups of methods, the results depend on the data analysed and the method used, and they may be identical or differ considerably.

The second criterion of distinction concerns weighting. 'U' (UPGMA, UPGMC) signifies 'unweighted', which, however, may be somewhat misleading. When computing average resemblance as well as centroids, group sizes are taken into account. 'Unweighted' means that the weight of the original set of resemblances is retained. 'W' (WPGMA, WPGMC) signifies 'weighted'. This means that groups of different size get the same weight when fused. In this case, the weight is the inverse of group size.

Table 6.1 Properties of four popular clustering methods (adapted from Legendre & Legendre (1998)).

Properties

Consider the average similarities or distances of all members of a cluster as candidates for further fusions

Consider the centroid of all members of a cluster as candidates for further fusions

Give equal weight to the original resemblances (weight of groups proportional to group size)

UPGMA (unweighted arithmetic average clustering)

UPGMC

(unweighted centroid clustering)

Give equal weight to any two branches of the dendrogram (weight of groups identical irrespective of size)

WPGMA (weighted arithmetic average clustering)

WPGMC (weighted centroid clustering)

The relationship between the four methods is shown in Table 6.1, an extension of Table 8.2 in Legendre & Legendre (1998). Even though these methods seem to be popular (partly due to the appealing abbreviations), it must be noted that they all represent special cases rather than a justified standard. Furthermore, especially when using centroid clustering, reversals may occur in the dendrograms: subsequent fusions may take place at lower levels than the previous. Dendrograms of this kind are both difficult to draw and difficult to interpret.

Was this article helpful?

0 0

Post a comment