Concepts and methods in comparative genomics

The volume of data from genome sequencing presents a wealth of opportunities, but also immense challenges: to identify meaningful encoded elements, elucidate their functions, and interpret broad principles of genome evolution. Comparative methodologies have been instrumental for understanding important generators of diversity such as alternative splicing, and the extent and importance of non-protein-coding elements. It is now understood that recognizable biological functions are encoded by the interaction of a variety of elements: protein-coding genes, non-protein-coding RNA genes, and conserved non-coding functional elements. The insights gained from comparative genomics, in combination with functional data, can propel comparative analysis stepwise, to the systems level: from macromolecu-lar complexes to regulatory networks, signalling pathways, and coordinated physiological reactions to environmental stimuli, such as responses and modulation of the immune system. To this end, large-scale comparative analyses employ an array of methodologies, often with a focus on characterizing evolutionary relationships among genes and genomes. Here we outline the key concepts and methods applied to the analysis of gene families, defining orthology and paralogy, and exploring dynamics of genome shuffling. Additional approaches, particularly those taking advantage of DNA evolutionary signatures to identify traces of selection, become more valuable with the increasing availability of the genomes of more closely-related species.

Comparative analyses of protein-coding genes often aim to trace the evolutionary histories of genes, and infer their putative functions, whether highly specific or widely shared. Pairwise sequence comparisons, such as the Basic Local Alignment Search Tool (BLAST) (Altschul et al, 1997) are complemented by profiles from multiple-sequence alignments like those employing Hidden Markov Model methods (Eddy, 1998). Together they provide a plethora of analysis tools for detection of hom-ology; the shared ancestry of biological sequences. Curated or semi-curated alignments form the basis of many protein-domain recognition profiles, which capture patterns of defining amino acids exemplified by several resources integrated through the InterProScan application (Zdobnov and Apweiler, 2001). Protein families are based on recognizable domains, but may also be defined by particular domain combinations or indeed by sequence relationships that have not yet been described by current protein-domain resources. Therefore, clustering of scores from all-against-all full-length sequence comparisons provides complementary protein family definitions. Importantly, domain-based approaches classify a multi-domain protein into several groups, whereas clustering techniques usually produce mutually exclusive groups of proteins. Comparative analysis of gene family dynamics can identify major differences between families: it may reveal expansions and contractions that depart from a random gene birth/death model, or may even document family extinctions and the appearance of novelties reminiscent of the concept of punctuated equilibrium. For example, the Anopheles-Drosophila comparison identified several prominent genomic features, including a major mosquito expansion of fibrinogen-related proteins (FREPs), potentially implicated in antibacterial immune responses (Zdobnov et al., 2002). Honey bees exhibit an expansion of the major royal jelly proteins, important dietary components that function in caste differentiation (HGSC, 2006). In the flour beetle the odorant receptor family has expanded, with concomitant reduction in the number of opsin genes, probably reflecting adaptation to low light conditions and increased reliance on smell in the evolutionary line of this stored-food pest insect (TGSC, 2008).

The homology which defines protein families implies common ancestry, which can be further refined to distinguish between orthologous and paralogous genes (Koonin, 2005). Orthologues derive from a single gene in the last common ancestor and therefore, most likely retain the ancestral function, especially if they have remained as single-copy genes over a long evolutionary period. Paralogues arise from gene-duplication events in a given lineage; these copies may share the ancestral function, or may have acquired new, often related, functions. Accurate delineation of orthologues and paralogues is vital for confident functional inferences (Figure 6.2). The functional annotation of D. melanogaster genes, accumulated over many decades, is an invaluable resource for inferring putative gene functions in other species. As orthology is defined relative to the last common ancestor, classification is inherently hierarchical. Analysing distantly related species produces large gene groups, potentially all the descendants of an ancestral gene; analysis of closely related species identifies the one-to-one orthologous relations. All-againstall sequence comparisons are widely employed by genome-scale orthology analysis as a means to identify genes representing best reciprocal hits (Koonin, 2005). Phylogenetic methods that provide estimates of evolutionary distances calculated from refined models of amino acid substitutions generally produce more accurate orthology assignments, for example TreeFam (Ruan et al., 2008). However, these methods are computationally challenging and can be error-prone when scaled up to the level of whole-genome analysis. Better assignments are achievable by the development of hybrid methodologies, together with finely tuned distance measurements and clustering procedures, for example OrthoDB (Kriventseva et al., 2008), or by incorporation of additional evidence for orthology such as conserved gene neighbourhoods, for example SYNERGY (Wapinski et al, 2007).

High levels of sequence divergence common among insects may preclude detailed comparison based on whole-genome alignments. However, protein orthology assignments can help identify orthologous genomic regions, within which discrete genes are found to preserve their local gene neighbourhoods (synteny). The ancestral genomic state is eroded through evolutionary time by sequence rearrangements such as duplications, inversions, deletions, and accumulation of repetitive DNA arising from the activity of transposable elements. Nevertheless, regions exhibiting local conservation of orthologous gene arrangements can define

Figure 6.2 Phylogenetic tree of the Toll-like receptors from Drosophila melanogaster (Dm), Anopheles gambiae (Ag), Aedes aegypti (Aa), and Apis mellifera (Am), where orthologous groups can be clearly distinguished that together describe the evolutionary history of this gene family. The Toll-1/5 group shows expansions in both the mosquito species, creating groups of paralogous genes such as the Anopheles Toll-5A-5B-1B group. The Toll-8s and Toll-6s on the other hand remain as single-copy orthologues. Duplications appear to have occurred in D. melanogaster in the Toll-2/7 group and in A. aegypti in the Toll-9 group. The Toll-10/11 group might appear as a mosquito-specific duplicated group if only the dipterans were compared; however, the presence of AmToll-10 rather points to the loss of this gene from D. melanogaster. The Toll-9s form a clearly distinct group which in fact shows more similarities with the mammalian Toll-like receptors than with other insect Toll family members.

Figure 6.2 Phylogenetic tree of the Toll-like receptors from Drosophila melanogaster (Dm), Anopheles gambiae (Ag), Aedes aegypti (Aa), and Apis mellifera (Am), where orthologous groups can be clearly distinguished that together describe the evolutionary history of this gene family. The Toll-1/5 group shows expansions in both the mosquito species, creating groups of paralogous genes such as the Anopheles Toll-5A-5B-1B group. The Toll-8s and Toll-6s on the other hand remain as single-copy orthologues. Duplications appear to have occurred in D. melanogaster in the Toll-2/7 group and in A. aegypti in the Toll-9 group. The Toll-10/11 group might appear as a mosquito-specific duplicated group if only the dipterans were compared; however, the presence of AmToll-10 rather points to the loss of this gene from D. melanogaster. The Toll-9s form a clearly distinct group which in fact shows more similarities with the mammalian Toll-like receptors than with other insect Toll family members.

synteny blocks despite such erosion (Zdobnov et al, 2002; Zdobnov and Bork, 2007). Elevated transposable-element activity is likely to promote genomic instability and contribute to increased genome size. Almost half of the approximately 1.4 Gb A. aegypti genome is made up of recognizable transposable-element sequences, which result in increased shuffling as indicated by the approximately 2.5-fold higher level of estimated synteny breaks in Aedes compared to Anopheles (Nene et al, 2007). Remnants of longer-range synteny at the level of their five major chromosome elements can also be identified between the mosquitoes and the fruit fly. However, few confident correspondences can be established with the 16 chromosomes of the more distant honey bee (HGSC, 2006). Although highly conserved gene arrangements such as the Hox gene cluster may reflect functional constraints, strict preservation of gene order appears to be under limited selection: only a few hundred genes maintain their local gene neighbourhoods across the insect orders (Zdobnov and Bork, 2007).

Was this article helpful?

0 0
How To Bolster Your Immune System

How To Bolster Your Immune System

All Natural Immune Boosters Proven To Fight Infection, Disease And More. Discover A Natural, Safe Effective Way To Boost Your Immune System Using Ingredients From Your Kitchen Cupboard. The only common sense, no holds barred guide to hit the market today no gimmicks, no pills, just old fashioned common sense remedies to cure colds, influenza, viral infections and more.

Get My Free Audio Book


Post a comment