## Estimates of genetic diversity

Now that we have a better understanding of allele and genotype frequencies, we will look at some ways to quantify genetic diversity within populations. One of the simplest estimates is allelic diversity (often designated A), which is simply the average number of alleles per locus. In a population that has four alleles at one locus and six alleles at another locus, A= (4+6)/2 = 5. Although straightforward, this method is very sensitive to sample size, meaning that the number of alleles identified will depend in part on how many individuals are screened. A second measure of genetic diversity is the proportion of polymorphic loci (often designated P). If a population is screened at ten loci and six of these are variable, then P = 6/10 = 0.60. This can be of some utility in studies based on relatively invariant loci such as allozymes, although it also is sensitive to sample size. Furthermore, it is often a completely uninformative measure of genetic diversity in studies based on variable markers such as microsatellites which tend to be chosen for analysis only if they are polymorphic and therefore will often have P values of 1.00 in all populations. A third measure of genetic diversity that is also influenced by the number of individuals that are sampled is observed heterozygosity (Ho), which is obtained by dividing the number of heterozygotes at a particular locus by the total number of individuals sampled. The observed heterozygosity of the scarlet tiger moth based on the data in Table 3.1 is 138/1612 = 0.085.

Although one or more of the estimates outlined in the preceding paragraph are often included in studies of genetic diversity, they are generally supplemented with an alternative measure known as gene diversity (h; Nei, 1973). The advantage of gene diversity is that it is much less sensitive than the other methods to sampling effects. Gene diversity is calculated as:

where xi is the frequency of allele i, and m is the number of alleles that have been found at that locus. Note that the only data required for calculating gene diversity are the allele frequencies within a population. For any given locus, h represents the probability that two alleles randomly chosen from the population will be different from one another. In a randomly mating population, h is equivalent to the expected heterozygosity (He), and represents the frequency of heterozygotes that would be expected if a population is in HWE; for this reason, h is often presented as He. Most calculations of He will be based on multiple loci, in which case He is calculated for each locus and then averaged over all loci to present a single estimate of diversity for each population (see Box 3.3).

### Box 3.3 Calculating He

In the following example, we will use Equation 3.6 to calculate He from some data that were generated by a study of the southern house mosquito (Culex quinquefasciatus) in the Hawaiian Islands (Fonseca, LaPointe and Fleischer, 2000). This is an introduced species that has caused considerable devastation on the Hawaiian archipelago because it is the vector for avian malaria. Table 3.2 shows the allele frequencies at one locus calculated from two populations.

Following Equation 3.6 and using the data from Table 3.2, He from the Midway population can be calculated as:

He = 1 - (0.2502 + 0.2002 + 0.5502) = 1 - (0.0625 + 0.04 + 0.3025) = 0.595

Similarly, He from the Kauai population can be calculated as:

He = 1 - (0.0222 + 0.3332 + 0.3332 + 0.3112) = 1 - (0.000484 + 0.111 + 0.111 + 0.0967) = 0.68

In this case, He is higher in Kauai than Midway, which is not surprising since the former population has a greater number of alleles and also a more even distribution of allele frequencies than the latter.

Table 3.2 Allele frequency data for one microsatellite locus characterized in two Hawaiian populations of C. quinquefasciatus. Data are from Fonseca, LaPointe and fleischer. (2000)

Allele frequencies

Microsatellite alleles (bp) Midway population Kauai population

212 0 0.022

216 0.250 0.333

218 0.200 0.333

### 224 0.550 0.311

Research papers typically report several different calculations of a population's genetic diversity, and these often include both Ho and He. By comparing these two values, we can determine whether or not the heterozygosity within a population is significantly different from that expected under HWE. If Ho is lower than He then we may have to rule out the possibility of null alleles. Although potentially applicable to a range of markers, this term is used most commonly to describe microsatellite alleles that do not amplify during PCR. The most common cause of this is a mutation in one or both of the primer-binding sequences. If only one allele from a heterozygote is amplified then it will be genotyped erroneously as a homozygote. When Ho is significantly less than He we should also be open to the possibility of a Wahlund effect, which, as noted earlier, will decrease Ho. If neither null alleles nor a Wahlund effect have caused an observed heterozygosity deficit then we may conclude that the population is not in HWE. As noted earlier, this deviation could result from one or more of a number of factors, including non-random mating (e.g. inbreeding), natural selection or a small population size.

It can be difficult to determine just what is responsible for disparities between Ho and He. In one study, estimates of He and Ho were obtained for twelve European populations of the common ash (Fraxinus excelsior) based on microsatellite data from five loci. Deviations from HWE were apparent in ten of these populations, which is an unusual finding in forest tree populations (Morand et al., 2002). These deviations were caused by Ho deficits at all five loci (Table 3.3), a

 Locus 1 Locus 2 Locus 3 Locus 4 Locus 5 Population 1 No. of alleles 12 13 12 9 16 He 0.938 0.888 0.905 0.833 0.937 Ho 0.385 0.895 0.571 0.750 0.737 Population 2 No. of alleles 12 12 12 11 9 He 0.938 0.825 0.936 0.892 0.917 Ho 0.462 0.647 0.333 0.526 0.500 Population 3 No. of alleles 16 12 13 12 12 He 0.932 0.905 0.859 0.862 0.918 Ho 0.667 0.875 0.750 0.556 0.882

consistent result that was unlikely to be attributable to natural selection acting on all five putatively neutral loci. Inbreeding also seemed unlikely in this windpollinated species, because long-distance dispersal of pollen should minimize mating between relatives. A comparison of microsatellite genotypes between parents and offspring suggested that null alleles were unlikely to be the cause but, because no plausible explanation for the observed heterozygote deficit has been found, the authors could not conclusively rule out either null alleles or a possible Wahlund effect.

### Haploid diversity

Gene diversity (h) also can be calculated for haploid data. Estimates of genetic diversity based on mitochondrial data, for example, often use h as a measure of haplotype diversity. In this context, h describes the numbers and frequencies of different mitochondrial haplotypes and is essentially the heterozygosity equivalent for haploid loci. However, the haplotype diversity of relatively rapidly evolving genomes such as animal mtDNA will often approach 1.0 within a population if a high proportion of individuals have unique haplotypes. It can be more informative, therefore, to consider the number of nucleotide differences between any two sequences as opposed to simply determining whether or not they are different. This can be done by calculating nucleotide diversity Nei, 1987), which quantifies the mean divergence between sequences. Nucleotide diversity is calculated as:

where fi and f represent the frequencies of the ith and jth haplotypes in the population, and pij represents the sequence divergence between these haplotypes. By factoring in both the frequencies and the pairwise divergences of the different sequences, p calculates the probability that two randomly chosen homologous nucleotides will be identical.