An Unlimited Source of Data

Even very small organisms have extremely complex genomes. The unicellular yeast Saccharomyces cerevisiae, despite being so small that around four billion of them can fit in a teaspoon, has a genome size of around 12 megabases (Mb; 1 Mb = 1 million base pairs) (Goffeau et al., 1996). The genome of the considerably larger nematode worm Caenorhabditis elegans, which is 1 mm long, is approximately 97 Mb (Caenorhabditis elegans Sequencing Consortium, 1998), and that of the flowering plant Arabidopsis thaliana is around 157 Mb (Arabidopsis Genome Initiative, 2000). The relatively enormous mouse Mus musculus contains somewhere in the region of 2600 Mb (Waterston et al., 2002), which is not too far off the human genome size of around 3200 Mb (International Human Genome Mapping Consortium, 2001). Within each genome there is a tremendous diversity of DNA. This diversity is partly attributable to the incredible range of functional products that are encoded by different genes. Furthermore, not all DNA codes for a functional product; in fact, the International Human Genome Sequencing Consortium has suggested that the human genome contains only around 20 000-25 000 genes, which is not much more than the ~19 500 found in the substantially smaller C. elegans genome (International Human Genome Sequencing Consortium, 2004). Non-coding DNA includes introns (intervening sequences) and pseudogenes (derived from functional genes but having undergone mutations that prevent transcription).

Many stretches of nucleotide sequences are repeated anywhere from several times to several million times throughout the genome. Short, highly repetitive sequences include minisatellites (motifs of 10-100 bp repeated many times in succession) and microsatellites (repeated motifs of 1-6bp). Another class of repetitive gene regions that has been used sometimes in molecular ecology is middle-repetitive DNA. These are sequences of hundreds or thousands of base




Figure 1.4 Diagram showing the arrangement of the nuclear ribosomal DNA gene family as it occurs in animals. The regions coding for the 5.8S, 18S and 28S subunits of rRNA are shown by bars; NTS — non-transcribed spacer, ETS — external transcribed spacer and ITS — internal transcribed spacer. The entire array is repeated many times pairs that occur anywhere from dozens to hundreds of times in the genome. Examples of these include the composite region that codes for nuclear ribosomal DNA (Figure 1.4). In contrast, single-copy nuclear DNA (scnDNA) occurs only once in a genome, and it is within scnDNA that most transcribed genes are located. The proportion of scnDNA varies greatly between species, e.g. it comprises approximately 95 per cent of the genome in the midge Chironomus tentans but only 12 per cent of the genome in the mudpuppy salamander Necturus maculosus (John and Miklos, 1988).

Although the structure and function of genes vary between species, they are typically conserved among members of the same species. This does not, however, mean that all members of the same species are genetically alike. Variations in both coding and non-coding DNA sequences mean that, with the possible exception of clones, no two individuals have exactly the same genome. This is because DNA is altered by events during replication that include recombination, duplication and mutation. It is worth examining in some detail how these occur, because if we remain ignorant about the mechanisms that generate DNA variation then our understanding of genetic diversity will be incomplete.

Was this article helpful?

0 0

Post a comment