Nested Clade Phylogeographic Analysis and Statistical Phylogeography

Once we have established the genealogical relationships among haplotypes, the next step in phylogeography is to identify which historical and geographical factors may have influenced the current distributions of haplotypes. Traditionally, phylogeography has been based on the practice of gathering genetic data from samples collected across a geographical range and then looking for possible explanations for the genealogical patterns that are inferred; for example, a founder effect may explain pronounced genetic divergence between an island and a mainland population, and a mountain range in a north--south orientation may explain why eastern and western populations show independent evolutionary histories. This approach of seeking post hoc explanations for the current distribution of genetic variation has been an integral part of phylogeography since its inception, and may provide a useful initial assessment; at the same time, it is a largely descriptive approach that does not provide a rigorous framework within which specific hypotheses can be tested. For one thing, there is no way to determine whether or not the sample size of individuals and populations is large enough to rule out the possibility that the current distribution of genotypes resulted from chance alone.

In recent years, a number of increasingly rigorous methods based on statistical analyses and coalescent theory have been developed. One of these is nested clade phylogeographic analysis (NCPA; Templeton, Routman and Phillips, 1995), also known as nested clade analysis (NCA). The first step in NCPA is to construct a network such as the statistical parsimony network outlined in the previous section. NCPA then uses explicit rules to define a series of hierarchically nested clades within this network. The first level is made up of the clades that are formed by haplotypes that are separated by only one mutation. These one-step clades are then nested into two-step clades that contain haplotypes that are separated by two mutations, and so on. This is continued until the point when the next highest nesting level would result in a single clade encompassing the entire network. From our previous discussion on statistical parsimony networks we know that the oldest haplotypes should be central to the network and the newest haplotypes should be peripheral. As a result, the nested arrangement corresponds to evolutionary time, with higher nested levels corresponding to earlier coalescent events.

The next step is to superimpose geography over the clades, which then allows us to calculate two distance measures: Dc, which measures the mean distance of clade members from the geographical centre of the clade; and Dn, which measures the mean distance of nested clade members from the geographical centre of the nested clade. Permutation tests are then used to determine whether or not there is a non-random association between genetic lineages and geographical locations, in other words if there is an association between genotypes and geography. If the null hypothesis of no assocation between genotypes and geography can be rejected, an

Figure 5.7 A North American bullfrog (Rana catesbeiana). This species is native to a wide area across eastern North America and is the largest true frog on that continent, weighing up to 0.5 kg. Photograph provided by Jim Austin and reproduced with permission

a posteriori inference key is used to determine which of several alternative scenarios, such as range expansion or allopatric fragmentation, is the most likely explanation for the patterns that have been revealed (Templeton, 2004).

An NCPA based on 41 haplotypes was used to test the hypothesis that the current distribution of genetic diversity in the North American bullfrog (Rana catesbeiana; Figure 5.7) was influenced by changing environmental conditions throughout the last Ice Age. Figure 5.8 shows the three nesting levels that were identified. Most haplotypes differed by a single mutation, although a notable exception was the connection between the eastern and western lineages (clades 3-1 and 3-2), which spanned at least five mutations. This greater than average divergence, together with the geographical distributions of these lineages either side of the Mississippi River, was interpreted as evidence for an early Pleistocene (last Ice Age) isolation of eastern and western populations. At the same time, widespread haplotypes within each of the two most divergent clades suggest that more recent levels of gene flow have been reasonably high on either side of the river (Austin, Lougheed and Boag, 2004).

NCPA is increasing in popularity because it allows researchers to test specific hypotheses about the geographical distribution of lineages based on both mito-chondrial and nuclear sequence data. The power of nested analyses will, of course, be limited by the sampling regime, because the network upon which NCPA is based may be inaccurate if based on too few individuals or populations.

Figure 5.8 A nested clade phylogeographic analysis based on DNA sequences from part of the mitochondrial cytochrome b gene of the North American bullfrog (Rana catesbeiana). The 41 haplotypes are labelled a - z and aa - oo. The size of the font is proportional to the frequency of the haplotype. One-step clades are prefixed with 1 (e.g. 1-1,1-2) and are bounded by solid lines. Two-step clades are prefixed with 2 (e.g. 2-1, 2-2) and are bounded by dashed lines. The total network is divided into two three-step clades: clade 3-1, which occurs east of the Mississippi River, and clade 3-2, which occurs west of the river. Each line represents a single mutation change, and dark circles represent unsampled or extinct haplotypes. Redrawn by J. Austin from Austin, Lougheed and Boag (2004)

Figure 5.8 A nested clade phylogeographic analysis based on DNA sequences from part of the mitochondrial cytochrome b gene of the North American bullfrog (Rana catesbeiana). The 41 haplotypes are labelled a - z and aa - oo. The size of the font is proportional to the frequency of the haplotype. One-step clades are prefixed with 1 (e.g. 1-1,1-2) and are bounded by solid lines. Two-step clades are prefixed with 2 (e.g. 2-1, 2-2) and are bounded by dashed lines. The total network is divided into two three-step clades: clade 3-1, which occurs east of the Mississippi River, and clade 3-2, which occurs west of the river. Each line represents a single mutation change, and dark circles represent unsampled or extinct haplotypes. Redrawn by J. Austin from Austin, Lougheed and Boag (2004)

Nevertheless, a recent review of the performance of NCPA was conducted using 150 data sets that had strong a priori expectations based on known events such as post-glacial expansions or human-mediated introductions. The method generally performed well, although in a few cases it failed to detect an expected event (Templeton, 2004). Despite this track record, NCPA has been criticized for failing to provide any estimate of uncertainty along with its conclusions, because the a posteriori inference key provides only yes or no answers that have no confidence limits attached (Knowles and Maddison, 2002). This failing may be at least partially redressed by a suite of recently developed analytical methods that are known as statistical phylogeography (Rosenberg and Nordborg, 2002; Knowles, 2004).

The general approach of statistical phylogeography is to start with the development of specific hypotheses that may explain the current distribution of species. Models based on coalescent theory are then used for statistically testing these hypotheses by comparing the actual data set to the frequencies and distributions of alleles that we would expect to find under a variety of historical and ongoing scenarios. By using the coalescent to build models that reflect the complex demographic processes associated with alternative hypotheses, we should be able to accommodate all possible scenarios and hopefully identify specific historical events such as founder effects, geographical barriers to gene flow, and the relative roles of selection and drift.

At the moment, statistical phylogeography has great promise but is a newly emerging field that needs further development before applications become widespread. One difficulty lies with defining hypotheses that are simple enough to be tested but can nevertheless accommodate the complexities that are often associated with a species' evolutionary history. Parameters as varied as mutation rates, fluctuating population sizes, asymmetric migration, and geographical affiliations will often need to be accounted for. Models therefore may be highly complex, and detailed descriptions are beyond the scope of this textbook. This is nevertheless an area of investigation that should feature much more prominently in phylogeographic analysis in the coming years, and researchers in this field should be aware of the need to follow future developments in statistical phylogeography.

Was this article helpful?

0 0

Post a comment