One appeal of molecular clocks is that they are relatively easy to use once the correct calibration has been done, but with a bit more work a great deal more information on the evolutionary relationships of genetic lineages can be obtained from DNA sequences through the reconstruction of phylogenies. Traditionally, most phylogenetic inferences have been depicted in the form of hierarchical bifurcating trees, in other words trees that reflect a series of branching processes in which one lineage splits into two descendant lineages. These trees can be based on morphological characters, although in this book we will limit our discussion to phylogenetic trees that are inferred from genetic characters. The positioning of organisms on a tree is generally based on their genetic similarity to one another.
Cordulegaster dorsalis | Cordulegastridae Anax junius
Aeshna multicolor Aeshnidae
Aeshna californica h:
Tramea lacerata Tramea onusta Libellula saturata Libellula luctuosa
Pachydiplax longipennis Libellulidae
Sympetrum illotum Perithemis tenera Erythemis simplicicollis
Figure 5.1 A phylogeny of 13 dragonfly species based on the mitochondrial 12S ribosomal DNA gene. First species names, and then family names, are shown to the right of the tree. Note that congeneric species are closest together on the tree because they are genetically most similar to one another. Adapted from Saux, Simon and Spicer, (2003)
This is illustrated in Figure 5.1, which shows a tree that portrays the evolutionary relationships of some dragonfly species, genera and families. Congeneric species that diverged from a common ancestor relatively recently, such as Libellula saturata and L. luctuosa, will be close to each other on the tree. Confamilial genera, such as Libellula and Erythemis (Figure 5.2), are further apart on the tree because their common ancestor was more remote, and members of different families are even more widely spaced.
There are many different ways in which phylogenies can be reconstructed from genetic data, but most of them fall into one of four categories: distance, parsimony, likelihood and Bayesian methods. Note that the following discussion will focus on the phylogenies of closely related populations and species, and the limitations outlined below are not necessarily relevant to the phylogenies of more distantly related taxa.
Distance methods are based on measures of evolutionary distinctiveness between all pairs of taxa (Figure 5.3). These metrics may be calculated from the number of nucleotide differences if based on DNA sequence data or from estimates such as Nei's D (Chapter 4) if based on allele frequency data, such as that provided by allozymes or microsatellites. There are many different algorithms that can be used to reconstruct trees from genetic distances, the most common being the neighbour-joining method (Saitou and Nei, 1987). Details of these various methods are beyond the scope of this book; suffice it to say that the goal is to build a tree that accurately reflects how much genetic change has occurred -- and therefore roughly how much time has passed -- since lineages split from one other. Because branch lengths reflect the evolutionary distance between two points on a tree, this approach should ensure that neighbouring branches on a tree are
occupied by those lineages that have descended most recently from a common ancestor. When applied to closely related lineages, distance-based trees may be poorly resolved because a number of different lineages may be separated by the same distance, in which case decisions as to which lineages should be closest to each other on the tree are arbitrary.
Figure 5.3 A general distance method for reconstructing phylogenies. (a) The pairwise genetic distances between species A-D are provided in a matrix format, with the number referring to the percentage difference between any pair of species, e.g. the sequence from species A differs from that of species B sequence by 2%. (b) The genetic distances are then used to reconstruct a tree in which species that are separated by the smallest genetic distances are grouped together. Note that the branch lengths are proportional to the amount of genetic change that has occurred, and these add up to the total genetic distances that are given in (A)
Species a Species b Species c
1 1 1 b 7 mutations d
Figure 5.4 A maximum parsimony (MP) phylogenetic analysis based on the DNA sequences shown in (a) of species a, b, c and d. Three possible trees are shown in (b). Vertical bars on branches represent the mutations that must have occurred at particular sequence sites. The tree that requires six mutations is more parsimonious than the trees that require seven mutations and therefore under MP analysis would be considered the correct tree c
A maximum parsimony tree is the tree that contains the minimum number of steps possible, in other words the smallest number of mutations that can explain the distribution of lineages on the tree (Fitch, 1971; Figure 5.4). Parsimony is based on Ockham's Razor, the principle proposed by William of Ockham in the 14th century, which states that the best hypothesis for explaining a process is the one that requires the fewest assumptions. A maximum parsimony tree will maximize the agreement between characters on a tree. However, although intuitively appealing, parsimony trees may remain unresolved if data are insufficiently polymorphic, which is often the case in the recently diverged lineages that are typically found within and among populations. The small number of mutational changes that differentiate many conspecific haplotypes may mean that multiple, equally parsimonious trees exist, once again leading to a situation in which it may be impossible to determine which haplotypes should be adjacent to one another on the tree.
The third and fourth categories of phylogenetic analysis are maximum likelihood (ML; Chapter 3) and Bayesian approaches, both of which are based on specific models that describe the evolution of individual characters. Each model will make a particular set of assumptions, for example that all nucleotide substitutions are equally likely or, alternatively, that each nucleotide is replaced by each alternative nucleotide at a particular rate. Models are typically complex, for example they can accommodate different rates of transitions and transversions, and heterogeneous substitution rates, along a particular stretch of DNA. Once the assumptions have been established, ML determines the probability that a data set is best represented by a particular tree by calculating the likelihood of each possible phylogenetic tree occurring within a specified evolutionary model
(Felsenstein, 1981). Although similar in some respects, an important difference in the more recently developed and increasingly popular Bayesian approach is that it maximizes the probability that a particular tree is the correct one, given the evolutionary model and the data that are being analysed (Huelsenbeck et al., 2001). In both of these approaches all variable sites are informative, and these methods can be powerful if the parameters of the model can be set with confidence.
Traditional phylogenetic analyses have been invaluable in evolutionary biology. However, although bifurcating trees are appropriate for taxonomic groups at the species level and beyond, which have experienced a period of reproductive isolation long enough to allow for the fixation of different alleles, a hierarchical bifurcating tree will not always be appropriate for population studies. This is partly because, as outlined above, there may be insufficient polymorphism in comparisons of conspecific sequences. In addition, bifurcating trees allow for neither the co-existence of ancestors and descendants nor the rejoining of lineages through hybridization or recombination (reticulated evolution), two processes that occur commonly at the population level. As a result, traditional phylogenetic trees are not always the most appropriate method for analysing the genealogies within and among conspecific populations, and in these cases can result in poorly resolved and sometimes misleading phylogenetic trees (Posada and Crandall, 2001). In recent years, this limitation has provided the impetus for researchers to develop a number of methods for phylogenetic anlaysis that are specifically tailored to accommodate the similar sequences that often emerge from comparisons of populations and closely related species.
Was this article helpful?