Efficient polythetic hierarchical divisive clustering can be obtained by partitioning the objects according to the axes of an ordination space. Using principal component analysis (PCA, Section 91), the set of objects may be partitioned in two groups: those that have positive values along the first PCA axis and those that have negative values. The PCA analysis is repeated for each of the groups so obtained and a new partition of each group is performed. The process is repeated until the desired level of resolution is obtained (Williams, j976b).
Following a similar suggestion by Piazza & Cavalli-Sforza G975), Lefkovitch Q976) developed a hierarchical classification method for very large numbers of objects, based on principal coordinate analysis (PCoA, Section 9.2). The dendrogram is constructed from the successive principal coordinate axes, the signs of the objects on the coordinate axes indicating their membership in one of the two groups formed at each branching step. The objects are partitioned in two groups according to their signs along the first PCoA axis; each group is then divided according to the positions of the objects along the second axis; and so on. This differs from the method used with PCA above, where the analysis is repeated for each group before a new division takes place.
To calculate the principal coordinates of a large number of objects, Lefkovitch proposed to first measure the similarity among objects by an equation which, like the covariance or correlation, is equivalent to the product of a matrix with its transpose. He described such a measure, applicable if necessary to combinations of binary, semiquantitative, and quantitative descriptors. The association matrix among objects is obtained by the matrix product YY' (order n x n). In many problems where there are many more objects than descriptors, computation of the eigenvalues and eigenvectors of the association matrix among descriptors, Y'Y, represents an important saving of computer time because Y'Y (order p x p) is much smaller than YY' (order n x n). After Rao (1964) and Gower (1966), Lefkovitch showed that the principal coordinates V of the association matrix among objects can then be found, using the relation V = YU where U is the matrix of the principal coordinates among descriptors. The principal coordinates thus calculated allow one to position the objects, numerous as they may be, in the reduced space. Principal coordinates can be used for the binary hierarchical divisive classification procedure that was Lefkovitch's goal.
Another way of obtaining Lefkovitch's classification is to compute the principal coordinates from the (n x n) similarity matrix among objects, using the TWWS algorithm described in Subsection 9.2.6. This algorithm allows one to quickly obtain a small number of principal coordinates for a fairly large number of objects. The first few principal coordinates, calculated in this way, could be used as the basis for Lefkovitch's clustering method.
A divisive algorithm of the same type is used in Twinspan (below). It is based upon an ordination obtained by correspondence analysis instead of PCA or PCoA.
Was this article helpful?