## Statistics

Perhaps the most common method for quantifying the genetic differentiation between populations is based on F-statistics, which were developed by Wright (1951). F-Statistics use inbreeding coefficients to describe the partitioning of genetic variation within and among populations and can be calculated at three different levels. Note that in describing these levels we will follow the convention set by the literature underlying the theory of F-statistics, and refer to spatially discrete samples as subpopulations instead of populations; elsewhere we will revert to the more common practice of referring to discrete breeding units as populations. The first F-statistic, FIS, measures the degree of inbreeding within individuals relative to the rest of their subpopulation. This reflects the probability that two alleles within the same individual are identical by descent, and is the same as the inbreeding coefficient F that was introduced in Chapter 3. It is calculated as:

where HI is the observed heterozygosity in a subpopulation at the time of investigation (individual heterozygosity) and HS is the heterozygosity that would be expected if the subpopulation was in HWE (subpopulation heterozygosity)

The second F-statistic is FST (also known as the fixation index), and this provides an estimate of the genetic differentiation between subpopulations. it is a measure of the degree of inbreeding within a subpopulation relative to the total population (total population here meaning all of the subpopulations combined), and reflects the probability that two alleles drawn at random from within a subpopulation are identical by descent. it is calculated as:

where HS is the same as in Equation 4.4 and Ht is the expected heterozygosity of the total population. The third F-statistic, which is used much less frequently than the other two, is FIT. This provides an overall inbreeding coefficient for an individual by measuring the heterozygosity of an individual relative to the total population. Fit is therefore influenced by both non-random mating within a subpopulation (FIS) and population subdivision (FST), and is calculated as:

where Ht and HI are the same as in Equations 4.4 and 4.5. The relationship between the three statistics is given as:

Since FST measures the extent to which populations have differentiated from one another, this is the F-statistic with which we are most concerned in this chapter (see Box 4.2).

Table 4.2 Frequencies of alternative wing genotypes in two hypothetical subpopulations of the scarlet tiger moth

No. of individuals Genotype frequency

Table 4.2 Frequencies of alternative wing genotypes in two hypothetical subpopulations of the scarlet tiger moth

No. of individuals Genotype frequency

 AA Aa aa AA Aa aa Subpopulation 1 352 63 12 0.824 0.148 0.028 Subpopulation 2 312 77 27 0.750 0.185 0.065

Box 4.2 Calculating F-statistics

We will use some hypothetical data from two scarlet tiger moth subpopulations to calculate F-statistics. Recall from Chapter 3 that moths with white spotting were assigned the genotype AA, moths with little spotting were aa and moths with intermediate spotting were the heterozygotes, Aa. The frequencies of the alternative genotypes from two subpopulations are given in Table 4.2, and from these data we need to calculate Hj, HS and HT.

First we calculate Hj, which is the average observed heterozygosity across populations:

Next we calculate the subpopulation heterozygosity, HS, which is the heterozygosity that would be expected if the subpopulations were in HWE. We know that under HWE heterozygosity is equal to 2pq, therefore we need to first calculate p and q. In subpopulation 1 there are 427 moths and therefore 854 alleles, so:

p = [(2)(352) + 63]/854 = 0.898 q = [(2)(12) + 63]/854 = 0.102

In subpopulation 2 there are 416 moths and therefore 832 alleles, so:

p = [(2)(312) + 77]/832 = 0.843 q = [(2)(27) + 77]/832 = 0.157

The expected heterozygosity is therefore:

2pq = 2(0.898)(0.102) = 0.183 in subpopulation 1 and

2pq = 2(0.843)(0.157) = 0.265 in subpopulation 2. This means that:

Next, we calculate HT by using the average allele frequencies from the two subpopulations to calculate the expected heterozygosity if the total population was in HWE. In this case:

p = (0.898 + 0.843)/2 = 0.871 q = (0.102 + 0.157)/2 = 0.130

Therefore:

Ht = 2(0.871)(0.130) = 0.226. We can now calculate inbreeding within populations as:

and inbreeding due to population differentiation as:

with overall inbreeding (both within and differentiation among populations) as:

Interpreting FST

If two populations have identical allele frequencies they will not be genetically differentiated and therefore FST will be zero. At the other extreme, if two populations are fixed for different alleles then FSTwill be equal to one (see Table 4.3 for some examples of FST values and Box 4.3 for analogues of FST). Within that range, FST values of 0--0.05 are generally considered to indicate little genetic differentiation; values of 0.05--0.25 indicate moderate genetic differentiation; and values of >0.25 represent pronounced levels of genetic differentiation. However, these are only approximate guidelines, and in reality even very low levels of Fst may represent important levels of genetic differentiation. In the European eel example referred to at the beginning of this section, the global FST value was extremely low at 0.0017 but was nevertheless significant (P = 0.0014). The significance of FST estimates is based on a permutation procedure that shuffles genotypes among populations thousands of times, and an FST value is calculated for each permutation. The P value of the test is based on the number of times that these FST values are equal to or larger than that calculated from the actual data set.

 Distance Molecular fst Species (km)a marker (or analogue)b Reference Stonefly 3.5 Mitochondrial 0.004-0.21 Schultheis, Weigt and (Peltoperla tarteri) sequence Hendricks (2002) Nematode 175 Microsatellites 0-0.107 Plantard and (Heterodera schachtii) Porte (2004) Collembola 10 AFLPs 0-0.09 van der Wurff et al. (Orchesella cincta) (2003) Red seaweed 5 Microsatellites 0-0.031 Engel, Restombe (Gracilaria gracilis) and Valero (2004) Canada thistle 5 AFLPs 0.63 Sole et al. (2004) (Cirsium arvense) Common frog 1600 Microsatellites 0.24 Palo et al. (2003) (Rana temporaria) Plunkett Mallee tree 500 Microsatellites 0.22 Smith, Hughes and (Eucalyptus curtisii) (2003) Island fox 13 Microsatellites 0.11 Roemer et al. (Urocyon littoralis) (2001)

"Maximum distance between pairs of populations.

bSingle values represent the FST values averaged across all populations. See Box 4.3 for analogues of FST.

"Maximum distance between pairs of populations.

bSingle values represent the FST values averaged across all populations. See Box 4.3 for analogues of FST.

### Box 4.3 Analogues of FST

Fst, which was developed by Wright (1951), was the original method for estimating population differentiation from allele frequencies. Since then several variations have been developed, with each measuring population differentiation in a slightly different way. One of these is GST, developed by Nei (1973), which is equivalent to FSTwhen there are only two alleles at a locus. In the case of multiple alleles, GST is equivalent to the weighted average of FST for all alleles. A similar measurement is 0 (Weir and Cockerham, 1984), which is often preferred because it takes into account the effects of uneven sample sizes and the number of sampled populations. More recently, RST was developed by Slatkin (1995) specifically for the analysis of microsatellite data. RST differs from the other measurements because it assumes a stepwise mutation model (SMM; Chapter 2) that may be more realistic for microsatellite data than the infinite alleles model (IAM).

It is beyond the scope of this text to provide a comprehensive discussion on the many different measures of population differentiation. The theoretical literature surrounding these measurements is copious and often opaque; interested readers are referred to recommended reading at the end of this chapter. Nevertheless, even if few of us can understand all of the mathematics that underly estimates of genetic differentiation, it is important to understand that these estimates often will vary depending on which method is used. This may be particularly important if we are comparing the results of multiple studies, for example we may be interested in whether populations of perennial plants show greater differentiation than populations of annual plants, but such a comparison will be valid only if it is based on estimates of genetic differentiation that were calculated in the same way. This was illustrated by a study that compared 227 measurements of GST that had been calculated using both the method of Nei (1973) and the method of Hamrick and Godt (1990). The main difference between the two methods is that in the former GST is calculated from the mean values of HT and HS after they have been averaged across all loci, whereas in the latter the GST values are calculated separately for all loci and then averaged. A comparison of the two methods revealed that, although results often were comparable, GST values in 15 per cent of all studies differed by >0.10 (Culley et al., 2002; Figure 4.1)

60 50 40 30 20 10

Difference between Nei and H/G calculations

Figure 4.1 Range of differences between GST values calculated using both Nei's (1973) and Hamrick and Godt's (1990) methods (H/G), based on 227 comparisons. Data from Culley et al. (2002)

It can be extremely difficult to make predictions about the level of differentiation that may be expected between any pair of populations because the range of population divergence is remarkable both within and among species. FST values are affected by many aspects of a species' ecology and demographic history. Later in

this chapter we shall look at how genetic drift and natural selection influence population differentiation, but we shall start with a discussion of gene flow, a process that is of fundamental importance because without gene flow a group of populations could not be maintained as a cohesive species.