Insect comparative genomics

The unrivalled evolutionary diversity of the insect orders can be exploited to unravel the complexities of interactions between insects and their environments, including infectious micro-organisms. These ongoing interactions have allowed the Hexapoda to flourish in a myriad of ecological niches, by successfully defending against micro-bial challenges. Comparative analysis of insect genomes provides an exceptional opportunity to build an extensive knowledgebase combining genomic, molecular, and biological information; a powerful approach to elucidating trends and features that shape and distinguish this diverse animal group. These approaches can take advantage of multi-species comparisons, to explore the evolutionary processes acting on genes and genomes, and to understand how these processes translate into new functions and phenotypes. At the same time, however, insect comparative genomics faces a special challenge: to develop robust methodologies to handle high levels of sequence divergence.

* Fotis Kafatos dedicates this chapter to Tom Eisner, his mentor and friend who taught him to love all of biology.

Much of the progress in insect genomics has been fuelled by the positive and negative impacts of insects on the environment, agriculture, health, and the economy across the globe. Limiting the damaging effects of insects has traditionally involved their control through the use of pesticides, but with variable and seemingly declining success. Novel approaches to insect control require a detailed understanding of insect biology, to facilitate highly targeted interventions that address specific pests while limiting possible ecological knock-on effects. Elucidating the molecular mechanisms that underpin the key processes of insect innate immunity, and the metabolism of drugs and xenobiotics, is therefore of utmost importance. The recent rapid progress in genomics of innate immunity in disease vector insects reflects the great social relevance of these diseases.

As a prominent model organism subjected to decades of genetic and molecular research, the fruit fly Drosophila melanogaster was the obvious first target for insect genome sequencing. The release of its approximately 120 Mb euchromatic genome in 2000, validated the process of whole-genome shotgun sequencing for eukaryotes, and made the fruit fly the first insect, indeed the second multicellular organism, to have its genome sequenced completely (Adams et al., 2000). Since then, the fruit fly genome has played a pioneering role in genomics research, including the development of analytical techniques for the interpretation of an ever-increasing volume and variety of data. This genome has served as the logical framework upon which to build a comprehensive biological knowledge base (Ashburner and Bergman, 2005). Next, as the prime vector of human malaria in Africa, the Anopheles gambiae mosquito was prioritized for genome sequencing. The availability of its complete genome sequence (Holt et al., 2002), just 2 years after the fruit fly genome, provided the very first opportunity for extensive comparative genomics studies between two insect species (Christophides et al., 2002; Zdobnov et al, 2002).

In subsequent years, multiple species from several insect orders were chosen for genome sequencing because of their agricultural, economic, or health impacts. The genomes of the silk moth, Bombyx mori (Lepidoptera; Xia et al, 2004), the honey bee, Apis mellifera (Hymenoptera; HGSC, 2006), the mosquito vector of arboviruses, Aedes aegypti (Diptera; Nene et al., 2007), and a stored-food pest, the flour beetle, Tribolium castaneum (Coleoptera; TGSC, 2008), have provided an evolutionary perspective of holome-tabolous insects spanning over 300 million years of divergence. This wealth of data has presented new opportunities for expanding and even revising our understanding of insect biology and evolution, through multidimensional comparative analyses on a genomic scale. Indeed, phylogenomic approaches have placed Hymenoptera at the base of the holo-metabolous insect radiation (Savard et al, 2006; Zdobnov and Bork, 2007) (Figure 6.1a), whereas morphological and molecular marker analyses had previously failed to provide a confident resolution. Newly sequenced genomes that are currently being analysed include a third mosquito species, Culex pipiens quinquefasciatus, the parasitoid Nasonia wasps, and outgroup species to the holometabolous insects: the human body louse, Pediculus humanus, and the pea aphid, Acyrthosiphon pisum. Genome projects that are under way include additional insects of relevance to global health, such as the tsetse, sand, and house flies, and the Hemipteran bug, Rhodnius prolixus, an important vector of Chagas' disease.

The increasing wealth of available genomic data has facilitated a quantitative approach to describing the incredible diversity of insects in the perspective of metazoan evolution as a whole. Assorted metrics of evolutionary diversification have been utilized, such as changing numbers of gene family members in different species, sequence divergence of orthol-ogous proteins, and the extent of genome shuffling that disrupts ancestral gene arrangements. These metrics consistently reveal strikingly faster rates of genomic evolution among insects compared to vertebrates (Figure 6.1b) (Wyder et al, 2007; Zdobnov and Bork, 2007). Insect genomics, therefore, represents a comprehensive resource that spans tremendous diversity and provides a broad framework to support and orient research in insect biology.

Even before the genome-sequencing era, pioneering work in insects had highlighted the exceptional power of multi-species comparative sequence analysis. Sequence comparisons of several chorion (eggshell) genes from diverse Drosophila species led to the recognition of conserved, short, putatively





100 100

rf c

100 100

Drosophila melanogaster

Drosophila erecta

Drosophila ananassae

Drosophila pseudoobscura

Drosophila mojavensis

Drosophila virilis

Drosophila grimshawi


Anopheles gambiae


Aedes aegypti

Bombyx mori

Tribolium castaneum


ot r

Amel-Tcas Bmor-all

Agam- Aaeg


Agam- Aaeg


Amel-Tcas Bmor-all

O Físh-human Agam-Drosophílíds Aaeg-Drosophílíds

20 40 60

Orthologues ín synteny (%)

Figure 6.1 Insect phylogeny and insect genome divergence. (a) Phylogenetic tree based on the protein sequences of 2302 single-copy orthologues from the 12 insect genomes as described in Zdobnov and Bork (2007). (b) The genomic diversity across the insect orders is highlighted by two different measures of genomic evolution as described in Zdobnov and Bork (2007): the sequence divergence of orthologous proteins and the extent of genome shuffling, which disrupts ancestral gene arrangements. The pairwise average protein sequence identity of single-copy orthologues correlates well with the fraction of these genes remaining in synteny (maintaining ancestral gene arrangements). This diversity is markedly elevated compared with corresponding measurements of vertebrate data from human-chicken and human-fish genomic comparisons. Strikingly, measures of orthologue sequence identity and synteny show that despite diverging only about 250 million years ago, mosquitoes are more divergent from fruit flies than humans are to pufferfish, which are separated by some 450 million years. Aeag, Aedes aegypti; Agam, Anopheles gambiae; Amel, Apis mellifera; Bmor, Bombyx mori; Dere, Drosophila erecta; Dmel, Drosophila melanogaster; Tcas, Tribolium castaneum. Both panels are adapted from Zdobnov and Bork (2007).

functional regulatory sequence motifs (Martinez-Cruzado et al., 1988). Experimental testing using transgenic and nucleotide substitution technologies identified one of these motifs, TCACGT, as essential for chorion gene expression in both Drosophila and silk moths (Fenerjian and Kafatos, 1994).

Despite major differences in promoter sequence and architecture, a short, bidirectionally active promoter region from a silk moth chorion gene pair was shown to direct proper gene expression in Drosophila, where chorion genes are totally different and unidirectionally oriented (Mitsialis et al., 1987).

Only 7 years after the release of the first fruit fly genome, continued advances in sequencing technologies have fuelled the growth of genome resources. The community is already able to take advantage of the fully sequenced genomes of 12 Drosophila species (Clark et al., 2007) that span about 40 million years of divergence and inhabit a wide range of ecologies including rainforests, deserts, and islands, with generalist as well as specialist feeders. The combination of fast sequence diversification in insects and the plethora of closely related species opened new research territories. Comparative analysis of the 12 fruit fly genomes demonstrated how identification and characterization of evolutionary sequence signatures can accurately define encoded functional elements, improving protein-coding gene prediction, as well as discovering novel functional elements such as microRNA genes (Stark et al, 2007; Lin et al, 2008).

This analysis revealed the remarkable power of comparative genomic approaches to make revisions even to the 'gold standard' of D. melanogaster annotations, despite many years of intensive expert curation and experimental validation. Manual curation cannot possibly be scaled up to keep pace with the accelerating sequencing revolution. Thus, insect genomics has firmly established the comparative approach as immensely valuable for genome annotation and mining, despite the high rates of insect genome diversification. Annotation techniques have evolved along with the rapid increase in available sequence data: from initial single-genome ab initio methods, to dual- and then multi-species approaches, to full-genome alignments and the discovery of characteristic conservation patterns. Indeed, the de novo discovery of functional elements through analysis of evolutionary signatures across 12 Drosophila genomes represents a methodological milestone. Where high sequence divergence precludes reliable alignment at the DNA level, feature annotation must rely on: (1) single-genome ab initio methods that analyse sequence composition properties to recognize gene features and (2) knowledge-based approaches that utilize homology to the growing universe of known proteins, primary expressed sequence tags (ESTs) and cDNAs to recognize coding regions. These complementary approaches are integrated through strategies that evaluate all the available evidence, producing sets of consensus gene models that balance sensitivity and specificity to produce high-quality genome annotations: the basis for higher-level comparative analyses.

How To Bolster Your Immune System

How To Bolster Your Immune System

All Natural Immune Boosters Proven To Fight Infection, Disease And More. Discover A Natural, Safe Effective Way To Boost Your Immune System Using Ingredients From Your Kitchen Cupboard. The only common sense, no holds barred guide to hit the market today no gimmicks, no pills, just old fashioned common sense remedies to cure colds, influenza, viral infections and more.

Get My Free Audio Book

Post a comment