Little is known about the genetic changes that distinguish domestic cat populations from their wild progenitors. Here we describe a high-quality domestic cat reference genome assembly and comparative inferences made with other cat breeds, wildcats, and other mammals. Based upon these comparisons, we identified positively selected genes enriched for genes involved in lipid metabolism that underpin adaptations to a hypercarnivorous diet. We also found positive selection signals within genes underlying sensory processes, especially those affecting vision and hearing in the carnivore lineage. We observed an evolutionary tradeoff between functional olfactory and vomeronasal receptor gene repertoires in the cat and dog genomes, with an expansion of the feline chemosensory system for detecting pheromones at the expense of odorant detection. Genomic regions harboring signatures of natural selection that distinguish domestic cats from their wild congeners are enriched in neural crest-related genes associated with behavior and reward in mouse models, as predicted by the domestication syndrome hypothesis. Our description of a previously unidentified allele for the gloving pigmentation pattern found in the Birman breed supports the hypothesis that cat breeds experienced strong selection on specific mutations drawn from random bred populations. Collectively, these findings provide insight into how the process of domestication altered the ancestral wildcat genome and build a resource for future disease mapping and phylogenomic studies across all members of the Felidae.
Sexual reproduction is an ancient feature of life on earth, and the familiar X and Y chromosomes in humans and other model species have led to the impression that sex determination mechanisms are old and conserved. In fact, males and females are determined by diverse mechanisms that evolve rapidly in many taxa. Yet this diversity in primary sex-determining signals is coupled with conserved molecular pathways that trigger male or female development. Conflicting selection on different parts of the genome and on the two sexes may drive many of these transitions, but few systems with rapid turnover of sex determination mechanisms have been rigorously studied. Here we survey our current understanding of how and why sex determination evolves in animals and plants and identify important gaps in our knowledge that present exciting research opportunities to characterize the evolutionary forces and molecular pathways underlying the evolution of sex determination.
The metaphor of 'genomic islands of speciation' was first used to describe heterogeneous differentiation among loci between the genomes of closely related species. The biological model proposed to explain these differences was that the regions showing high levels of differentiation were resistant to gene flow between species, while the remainder of the genome was being homogenized by gene flow and consequently showed lower levels of differentiation. However, the conditions under which such differentiation can occur at multiple unlinked loci are restrictive; additionally, essentially, all previous analyses have been carried out using relative measures of divergence, which can be misleading when regions with different levels of recombination are compared. Here, we test the model of differential gene flow by asking whether absolute divergence is also higher in the previously identified 'islands'. Using five species pairs for which full sequence data are available, we find that absolute measures of divergence are not higher in genomic islands. Instead, in all cases examined, we find reduced diversity in these regions, a consequence of which is that relative measures of divergence are abnormally high. These data therefore do not support a model of differential gene flow among loci, although islands of relative divergence may represent loci involved in local adaptation. Simulations using the program IMa2 further suggest that inferences of any gene flow may be incorrect in many comparisons. We instead present an alternative explanation for heterogeneous patterns of differentiation, one in which postspeciation selection generates patterns consistent with multiple aspects of the data.
Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ?5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.
Current de novo whole-genome sequencing approaches often are inadequate for organisms lacking substantial preexisting genetic data. Problems with these methods are manifest as: large numbers of scaffolds that are not ordered within chromosomes or assigned to individual chromosomes, misassembly of allelic sequences as separate loci when the individual(s) being sequenced are heterozygous, and the collapse of recently duplicated sequences into a single locus, regardless of levels of heterozygosity. Here we propose a new approach for producing de novo whole-genome sequences-which we call recombinant population genome construction-that solves many of the problems encountered in standard genome assembly and that can be applied in model and nonmodel organisms. Our approach takes advantage of next-generation sequencing technologies to simultaneously barcode and sequence a large number of individuals from a recombinant population. The sequences of all recombinants can be combined to create an initial de novo assembly, followed by the use of individual recombinant genotypes to correct assembly splitting/collapsing and to order and orient scaffolds within linkage groups. Recombinant population genome construction can rapidly accelerate the transformation of nonmodel species into genome-enabled systems by simultaneously producing a high-quality genome assembly and providing genomic tools (e.g., high-confidence single-nucleotide polymorphisms) for immediate applications. In populations segregating for important functional traits, this approach also enables simultaneous mapping of quantitative trait loci. We demonstrate our method using simulated Illumina data from a recombinant population of Caenorhabditis elegans and show that the method can produce a high-fidelity, high-quality genome assembly for both parents of the cross.
Divergent selection based on aquatic larval ecology is a likely factor in the recent isolation of two broadly sympatric and morphologically identical African mosquito species, the malaria vectors Anopheles gambiae and An. coluzzii. Population-based genome scans have revealed numerous candidate regions of recent positive selection, but have provided few clues as to the genetic mechanisms underlying behavioural and physiological divergence between the two species, phenotypes which themselves remain obscure. To uncover possible genetic mechanisms, we compared global transcriptional profiles of natural and experimental populations using gene-based microarrays. Larvae were sampled as second and fourth instars from natural populations in and around the city of Yaoundé, capital of Cameroon, where the two species segregate along a gradient of urbanization. Functional enrichment analysis of differentially expressed genes revealed that An. coluzzii--the species that breeds in more stable, biotically complex and potentially polluted urban water bodies--overexpresses genes implicated in detoxification and immunity relative to An. gambiae, which breeds in more ephemeral and relatively depauperate pools and puddles in suburbs and rural areas. Moreover, our data suggest that such overexpression by An. coluzzii is not a transient result of induction by xenobiotics in the larval habitat, but an inherent and presumably adaptive response to repeatedly encountered environmental stressors. Finally, we find no significant overlap between the differentially expressed loci and previously identified genomic regions of recent positive selection, suggesting that transcriptome divergence is regulated by trans-acting factors rather than cis-acting elements.
Substitution rates vary between species, and many explanations regarding the causes of this variation have been proposed. Here we consider how new genomic data on the per-generation mutation rate impinge on proposed hypotheses for substitution rate variation in primates. We propose that the generation-time effect as it is usually understood cannot explain the observed rate variation, but instead that selection for decreased somatic mutation rates can. By considering the disparate causes underlying mutation rate changes in recent human history, we also show that the per-generation mutation rate is increasing even as the per-cell-division rate is decreasing.
Because spontaneous mutation is the source of all genetic diversity, measuring mutation rates can reveal how natural selection drives patterns of variation within and between species. We sequenced eight genomes produced by a mutation-accumulation experiment in Drosophila melanogaster. Our analysis reveals that point mutation and small indel rates vary significantly between the two different genetic backgrounds examined. We also find evidence that ?2% of mutational events affect multiple closely spaced nucleotides. Unlike previous similar experiments, we were able to estimate genome-wide rates of large deletions and tandem duplications. These results suggest that, at least in inbred lines like those examined here, mutational pressures may result in net growth rather than contraction of the Drosophila genome. By comparing our mutation rate estimates to polymorphism data, we are able to estimate the fraction of new mutations that are eliminated by purifying selection. These results suggest that ?99% of duplications and deletions are deleterious--making them 10 times more likely to be removed by selection than nonsynonymous mutations. Our results illuminate not only the rates of new small- and large-scale mutations, but also the selective forces that they encounter once they arise.
We report the imminent completion of a set of reference genome assemblies for 16 species of Anopheles mosquitoes. In addition to providing a generally useful resource for comparative genomic analyses, these genome sequences will greatly facilitate exploration of the capacity exhibited by some Anopheline mosquito species to serve as vectors for malaria parasites. A community analysis project will commence soon to perform a thorough comparative genomic investigation of these newly sequenced genomes. Completion of this project via the use of short next-generation sequence reads required innovation in both the bioinformatic and laboratory realms, and the resulting knowledge gained could prove useful for genome sequencing projects targeting other unconventional genomes.
Current sequencing methods produce large amounts of data, but genome assemblies constructed from these data are often fragmented and incomplete. Incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. This means that methods attempting to estimate rates of gene duplication and loss often will be misled by such errors and that rates of gene family evolution will be consistently overestimated. Here, we present a method that takes these errors into account, allowing one to accurately infer rates of gene gain and loss among genomes even with low assembly and annotation quality. The method is implemented in the newest version of the software package CAFE, along with several other novel features. We demonstrate the accuracy of the method with extensive simulations and reanalyze several previously published data sets. Our results show that errors in genome annotation do lead to higher inferred rates of gene gain and loss but that CAFE 3 sufficiently accounts for these errors to provide accurate estimates of important evolutionary parameters.
When speciation events occur in rapid succession, incomplete lineage sorting (ILS) can cause disagreement among individual gene trees. The probability that ILS affects a given locus is directly related to its effective population size (Ne ), which in turn is proportional to the recombination rate if there is strong selection across the genome. Based on these expectations, we hypothesized that low-recombination regions of the genome, as well as sex chromosomes and nonrecombining chromosomes, should exhibit lower levels of ILS. We tested this hypothesis in phylogenomic datasets from primates, the Drosophila melanogaster clade, and the Drosophila simulans clade. In all three cases, regions of the genome with low or no recombination showed significantly stronger support for the putative species tree, although results from the X chromosome differed among clades. Our results suggest that recurrent selection is acting in these low-recombination regions, such that current levels of diversity also reflect past decreases in the effective population size at these same loci. The results also demonstrate how considering the genomic context of a gene tree can assist in more accurate determination of the true species phylogeny, especially in cases where a whole-genome phylogeny appears to be an unresolvable polytomy.
Copy-number variants (CNVs) represent a functionally and evolutionarily important class of variation. Here we take advantage of the use of pooled sequencing to detect CNVs with large differences in allele frequency between population samples. We present a method for detecting CNVs in pooled population samples using a combination of paired-end sequences and read-depth. Highly differentiated CNVs show large differences in the number of paired-end reads supporting individual alleles and large differences in readdepth between population samples. We complement this approach with one that uses a hidden Markov model to find larger regions differing in read-depth between samples. Using novel pooled sequence data from two populations of Drosophila melanogaster along a latitudinal cline, we demonstrate the utility of our method for identifying CNVs involved in local adaptation.
The era of whole-genome sequencing has revealed that gene copy-number changes caused by duplication and deletion events have important evolutionary, functional, and phenotypic consequences. Recent studies have therefore focused on revealing the extent of variation in copy-number within natural populations of humans and other species. These studies have found a large number of copy-number variants (CNVs) in humans, many of which have been shown to have clinical or evolutionary importance. For the most part, these studies have failed to detect an important class of gene copy-number polymorphism: gene duplications caused by retrotransposition, which result in a new intron-less copy of the parental gene being inserted into a random location in the genome. Here we describe a computational approach leveraging next-generation sequence data to detect gene copy-number variants caused by retrotransposition (retroCNVs), and we report the first genome-wide analysis of these variants in humans. We find that retroCNVs account for a substantial fraction of gene copy-number differences between any two individuals. Moreover, we show that these variants may often result in expressed chimeric transcripts, underscoring their potential for the evolution of novel gene functions. By locating the insertion sites of these duplicates, we are able to show that retroCNVs have had an important role in recent human adaptation, and we also uncover evidence that positive selection may currently be driving multiple retroCNVs toward fixation. Together these findings imply that retroCNVs are an especially important class of polymorphism, and that future studies of copy-number variation should search for these variants in order to illuminate their potential evolutionary and functional relevance.
The association between fitness-related phenotypic traits and an environmental gradient offers one of the best opportunities to study the interplay between natural selection and migration. In cases in which specific genetic variants also show such clinal patterns, it may be possible to uncover the mutations responsible for local adaptation. The malaria vector, Anopheles gambiae, is associated with a latitudinal cline in aridity in Cameroon; a large inversion on chromosome 2L of this mosquito shows large differences in frequency along this cline, with high frequencies of the inverted karyotype present in northern, more arid populations and an almost complete absence of the inverted arrangement in southern populations. Here we use a genome resequencing approach to investigate patterns of population divergence along the cline. By sequencing pools of individuals from both ends of the cline as well as in the center of the cline- where the inversion is present in intermediate frequency- we demonstrate almost complete panmixia across collinear parts of the genome and high levels of differentiation in inverted parts of the genome. Sequencing of separate pools of each inversion arrangement in the center of the cline reveals large amounts of gene flux (i.e., gene conversion and double crossovers) even within inverted regions, especially away from the inversion breakpoints. The interplay between natural selection, migration, and gene flux allows us to identify several candidate genes responsible for the match between inversion frequency and environmental variables. These results, coupled with similar conclusions from studies of clinal variation in Drosophila, point to a number of important biological functions associated with local environmental adaptation.
Gene duplication via retrotransposition has been shown to be an important mechanism in evolution, affecting gene dosage and allowing for the acquisition of new gene functions. Although fixed retrotransposed genes have been found in a variety of species, very little effort has been made to identify retrogene polymorphisms. Here, we examine 37 Illumina-sequenced North American Drosophila melanogaster inbred lines and present the first ever data set and analysis of polymorphic retrogenes in Drosophila. We show that this type of polymorphism is quite common, with any two gametes in the North American population differing in the presence or absence of six retrogenes, accounting for ~13% of gene copy-number heterozygosity. These retrogenes were identified by a straightforward method that can be applied using any type of DNA sequencing data. We also use a variant of this method to conduct a genome-wide scan for intron presence/absence polymorphisms, and show that any two chromosomes in the population likely differ in the presence of multiple introns. We show that these polymorphisms are all in fact deletions rather than intron gain events present in the reference genome. Finally, by leveraging the known location of the parental genes that give rise to the retrogene polymorphisms, we provide direct evidence that natural selection is responsible for the excess of fixations of retrogenes moving off of the X chromosome in Drosophila. Further efforts to identify retrogene and intron presence/absence polymorphisms will undoubtedly improve our understanding of the evolution of gene copy number and gene structure.
Gene transposition puts a new gene copy in a novel genomic environment. Moreover, genes moving between the autosomes and the X chromosome experience change in several evolutionary parameters. Previous studies of gene transposition have not utilized the phylogenetic framework that becomes possible with the availability of whole genomes from multiple species. Here we used parsimonious reconstruction on the genomic distribution of gene families to analyze interchromosomal gene transposition in Drosophila. We identified 782 genes that have moved chromosomes within the phylogeny of 10 Drosophila species, including 87 gene families with multiple independent movements on different branches of the phylogeny. Using this large catalog of transposed genes, we detected accelerated sequence evolution in duplicated genes that transposed when compared to the parental copy at the original locus. We also observed a more refined picture of the biased movement of genes from the X chromosome to the autosomes. The bias of X-to-autosome movement was significantly stronger for RNA-based movements than for DNA-based movements, and among DNA-based movements there was an excess of genes moving onto the X chromosome as well. Genes involved in female-specific functions moved onto the X chromosome while genes with male-specific functions moved off the X. There was a significant overrepresentation of proteins involving chromosomal function among transposed genes, suggesting that genetic conflict between sexes and among chromosomes may be a driving force behind gene transposition in Drosophila.
Establishing the molecular basis of DNA mutations that cause inherited disease is of fundamental importance to understanding the origin, nature, and clinical sequelae of genetic disorders in humans. The majority of disease-associated mutations constitute single-base substitutions and short deletions and/or insertions resulting from DNA replication errors and the repair of damaged bases. However, pathological mutations can also be introduced by nonreciprocal recombination events between paralogous sequences, a phenomenon known as interlocus gene conversion (IGC). IGC events have thus far been linked to pathology in more than 20 human genes. However, the large number of duplicated gene sequences in the human genome implies that many more disease-associated mutations could originate via IGC. Here, we have used a genome-wide computational approach to identify disease-associated mutations derived from IGC events. Our approach revealed hundreds of known pathological mutations that could have been caused by IGC. Further, we identified several dozen high-confidence cases of inherited disease mutations resulting from IGC in ?1% of all genes analyzed. About half of the donor sequences associated with such mutations are functional paralogous genes, suggesting that epistatic interactions or differential expression patterns will determine the impact upon fitness of specific substitutions between duplicated genes. In addition, we identified thousands of hitherto undescribed and potentially deleterious mutations that could arise via IGC. Our findings reveal the extent of the impact of interlocus gene conversion upon the spectrum of human inherited disease.
Most of our knowledge of sex-chromosome evolution comes from male heterogametic (XX/XY) taxa. With the genome sequencing of multiple female heterogametic (ZZ/ZW) taxa, we can now ask whether there are patterns of evolution common to both sex chromosome systems. In all XX/XY systems examined to date, there is an excess of testis-biased retrogenes moving from the X chromosome to the autosomes, which is hypothesized to result from either sexually antagonistic selection or escape from meiotic sex chromosome inactivation (MSCI). We examined RNA-mediated (retrotransposed) and DNA-mediated gene movement in two independently evolved ZZ/ZW systems, birds (chicken and zebra finch) and lepidopterans (silkworm). Even with sexually antagonistic selection likely operating in both taxa and MSCI having been identified in the chicken, we find no evidence for an excess of genes moving from the Z chromosome to the autosomes in either lineage. We detected no excess for either RNA- or DNA-mediated duplicates, across a range of approaches and methods. We offer some potential explanations for this difference between XX/XY and ZZ/ZW sex chromosome systems, but further work is needed to distinguish among these hypotheses. Regardless of the root causes, we have identified an additional, potentially inherent, difference between XX/XY and ZZ/ZW systems.
RNA editing is an important cellular process by which the nucleotides in a mature RNA transcript are altered to cause them to differ from the corresponding DNA sequence. While this process yields essential transcripts in humans and other organisms, it is believed to occur at a relatively small number of loci. The rarity of RNA editing has been challenged by a recent comparison of human RNA and DNA sequence data from 27 individuals, which revealed that over 10,000 human exonic sites appear to exhibit RNA-DNA differences (RDDs). Many of these differences could not have been caused by either of the two previously known human RNA editing mechanisms--ADAR-mediated A?G substitutions or APOBEC1-mediated C?U switches--suggesting that a previously unknown mechanism of RNA editing may be active in humans. Here, we reanalyze these data and demonstrate that genomic sequences exist in these same individuals or in the human genome that match the majority of RDDs. Our results suggest that the majority of these RDD events were observed due to accurate transcription of sequences paralogous to the apparently edited gene but differing at the edited site. In light of our results it seems prudent to conclude that if indeed an unknown mechanism is causing RDD events in humans, such events occur at a much lower frequency than originally proposed.
The African malaria mosquito Anopheles gambiae is polymorphic for chromosomal inversion 2La, whose frequency strongly correlates with degree of aridity across environmental gradients. Recent physiological studies have associated 2La with resistance to desiccation in adults and thermal stress in larvae, consistent with its proposed role in aridity tolerance. However, the genetic basis of these traits remains unknown. To identify genes that could be involved in the differential response to thermal stress, we compared global gene expression profiles of heat-hardened 2La or 2L+(a) larvae at three time points, for up to eight hours following exposure to the heat stress. Treatment and control time series, replicated four times, revealed a common and massive induction of a core set of heat-shock genes regardless of 2La orientation. However, clear differences between the 2La and 2L+(a) arrangements emerged at the earliest (0.25 h) time point, in the intensity and nature of the stress response. Overall, 2La was associated with the more aggressive response: larger numbers of genes were heat responsive and up-regulated. Transcriptionally induced genes were enriched for functions related to ubiquitin-proteasomal degradation, chaperoning and energy metabolism. The more muted transcriptional response of 2L+(a) was largely repressive, including genes involved in proteolysis and energy metabolism. These results may help explain the maintenance of the 2La inversion polymorphism in An. gambiae, as the survival benefits offered by high thermal sensitivity in harsh climates could be offset by the metabolic costs of such a drastic response in more equable climates.
Many aspects of mutational processes are nonrandom, from the preponderance of transitions relative to transversions to the higher rate of mutation at CpG dinucleotides . However, it is still often assumed that single-nucleotide mutations are independent of one another, each being caused by separate mutational events. The occurrence of multiple, closely spaced substitutions appears to violate assumptions of independence and is often interpreted as evidence for the action of adaptive natural selection [2, 3], balancing selection , or compensatory evolution [5, 6]. Here we provide evidence of a frequent, widespread multinucleotide mutational process active throughout eukaryotes. Genomic data from mutation-accumulation experiments, parent-offspring trios, and human polymorphisms all show that simultaneous nucleotide substitutions occur within short stretches of DNA. Regardless of species, such multinucleotide mutations (MNMs) consistently comprise ~3% of the total number of nucleotide substitutions. These results imply that previous adaptive interpretations of multiple, closely spaced substitutions may have been unwarranted and that MNMs must be considered when interpreting sequence data.
A common assumption in comparative genomics is that orthologous genes share greater functional similarity than do paralogous genes (the "ortholog conjecture"). Many methods used to computationally predict protein function are based on this assumption, even though it is largely untested. Here we present the first large-scale test of the ortholog conjecture using comparative functional genomic data from human and mouse. We use the experimentally derived functions of more than 8,900 genes, as well as an independent microarray dataset, to directly assess our ability to predict function using both orthologs and paralogs. Both datasets show that paralogs are often a much better predictor of function than are orthologs, even at lower sequence identities. Among paralogs, those found within the same species are consistently more functionally similar than those found in a different species. We also find that paralogous pairs residing on the same chromosome are more functionally similar than those on different chromosomes, perhaps due to higher levels of interlocus gene conversion between these pairs. In addition to offering implications for the computational prediction of protein function, our results shed light on the relationship between sequence divergence and functional divergence. We conclude that the most important factor in the evolution of function is not amino acid sequence, but rather the cellular context in which proteins act.
Orang-utan is derived from a Malay term meaning man of the forest and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000?years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts.
The mosquito Anopheles gambiae has heteromorphic sex chromosomes, while the mosquito Aedes aegypti has homomorphic sex chromosomes. We use retrotransposed gene duplicates to show an excess of movement off the An. gambiae X chromosome only after the split with Ae. aegypti, suggesting that their ancestor had homomorphic sex chromosomes.
Differences between individuals in the copy-number of whole genes have been found in every multicellular species examined thus far. Such differences result in unique complements of protein-coding genes in all individuals, and have been shown to underlie adaptive phenotypic differences. Here, we review the evidence for copy-number variants (CNVs), focusing on the methods used to detect them and the molecular mechanisms responsible for generating this type of variation. Although there are multiple technical and computational challenges inherent to these experimental methods, next-generation sequencing technologies are making such experiments accessible in any system with a sequenced genome. We further discuss the connection between copy-number variation within species and copy-number divergence between species, showing that these values are exactly what one would expect from similar comparisons of nucleotide polymorphism and divergence. We conclude by reviewing the growing body of evidence for natural selection on copy-number variants. While it appears that most genic CNVs--especially deletions-are quickly eliminated by selection, there are now multiple studies demonstrating a strong link between copy-number differences at specific genes and phenotypic differences in adaptive traits. We argue that a complete understanding of the molecular basis for adaptive natural selection necessarily includes the study of copy-number variation.
Gene expression divergence and chromosomal rearrangements have been put forward as major contributors to phenotypic differences between closely related species. It has also been established that duplicated genes show enhanced rates of positive selection in their amino acid sequences. If functional divergence is largely due to changes in gene expression, it follows that regulatory sequences in duplicated loci should also evolve rapidly. To investigate this hypothesis, we performed likelihood ratio tests (LRTs) on all noncoding loci within 5 kb of every transcript in the human genome and identified sequences with increased substitution rates in the human lineage since divergence from Old World Monkeys. The fraction of rapidly evolving loci is significantly higher nearby genes that duplicated in the common ancestor of humans and chimps compared with nonduplicated genes. We also conducted a genome-wide scan for nucleotide substitutions predicted to affect transcription factor binding. Rates of binding site divergence are elevated in noncoding sequences of duplicated loci with accelerated substitution rates. Many of the genes associated with these fast-evolving genomic elements belong to functional categories identified in previous studies of positive selection on amino acid sequences. In addition, we find enrichment for accelerated evolution nearby genes involved in establishment and maintenance of pregnancy, processes that differ significantly between humans and monkeys. Our findings support the hypothesis that adaptive evolution of the regulation of duplicated genes has played a significant role in human evolution.
Nonallelic gene conversion has been proposed as a major force in homogenizing the sequences of paralogous genes. In this work, we investigate the extent and characteristics of gene conversion among gene families in nine species of the genus Drosophila. We carried out a genome-wide study of 2855 gene families (including 17,742 genes) and determined that conversion events involved 2628 genes. The proportion of converted genes ranged across species from 1 to 9% when paralogs of all ages were included. Although higher levels of gene conversion were found among young gene duplicates, at most 1-2% of the coding sequences of these duplicates were affected by conversion. Using a second approach relying on gene family size changes and gene-tree/species-tree reconciliation methods, we estimate that only 1-15% of gene trees are misled by gene conversion, depending on the lineage considered. Several features of paralogous genes correlate with gene conversion, such as intra-/interchromosomal location, level of nucleotide divergence, and GC content, although we found no definitive evidence for biased substitution patterns. After considering species-specific differences in the age and distance between paralogs, we found a highly significant difference in the amount of gene conversion among species. In particular, members of the melanogaster group showed the lowest proportion of converted genes. Our data therefore suggest underlying differences in the mechanistic basis of gene conversion among species.
The two "rules of speciation"--the Large X-effect and Haldanes rule--hold throughout the animal kingdom, but the underlying genetic mechanisms that cause them are still unclear. Two predominant explanations--the "dominance theory" and faster male evolution--both have some empirical support, suggesting that the genetic basis of these rules is likely multifarious. We revisit one historical explanation for these rules, based on dysfunctional genetic interactions involving genes recently moved between chromosomes. We suggest that gene movement specifically off or onto the X chromosome is another mechanism that could contribute to the two rules, especially as X chromosome movements can be subject to unique sex-specific and sex chromosome specific consequences in hybrids. Our hypothesis is supported by patterns emerging from comparative genomic data, including a strong bias in interchromosomal gene movements involving the X and an overrepresentation of male reproductive functions among chromosomally relocated genes. In addition, our model indicates that the contribution of gene movement to the two rules in any specific group will depend upon key developmental and reproductive parameters that are taxon specific. We provide several testable predictions that can be used to assess the importance of gene movement as a contributor to these rules in the future.
An appreciable fraction of the transcriptome differs in level of expression among individuals. Transcription factor (TF) expression and DNA binding causes cell-specific activation and repression of downstream targets, and TF expression levels vary across individuals. However, it is not clear how the strength of DNA binding for individual TFs translates into regulatory control, or whether a different set of binding motifs is used for strongly regulated modules. Here we integrate two publicly available data sets in Drosophila melanogaster, as well as conduct novel analyses, to address these questions.
The loss of previously established genes has been proposed as a major force in evolutionary change. While genome sequencing of many new species offers the opportunity to identify cases of gene loss, it is unclear which algorithms offer the greatest accuracy or sensitivity. A number of methods to identify gene losses rely on the presence of a pseudogene for each loss. If genes are deleted when lost, however, such methods will fail to identify these cases. As the fate of gene losses is still unclear, we identified gene losses through a method that does not require pseudogenes to identify human-specific gene losses. Of the several hundred probable gene losses initially identified, we were unable to find a single case of unambiguous gene loss via deletion. We were also able to identify a large number of previously unannotated genes in the human genome, some of which also had evidence for transcription. Though our results suggest that pseudogene-based methods for finding gene losses in humans will not miss many events, we discuss the dependence of these conclusions on the divergence times among the species considered. Supplementary Material is provided (see online Supplementary Material at www.liebertonline.com ).
Determining the evolutionary forces responsible for the maintenance of gene duplicates is key to understanding the processes leading to evolutionary adaptation and novelty. In his highly prescient book, Susumu Ohno recognized that duplicate genes are fixed and maintained within a population with 3 distinct outcomes: neofunctionalization, subfunctionalization, and conservation of function. Subsequent researchers have proposed a multitude of population genetic models that lead to these outcomes, each differing largely in the role played by adaptive natural selection. In this paper, I present a nonmathematical review of these models, their predictions, and the evidence collected in support of each of them. Though the various outcomes of gene duplication are often strictly associated with the presence or absence of adaptive natural selection, I argue that determining the outcome of duplication is orthogonal to determining whether natural selection has acted. Despite an ever-growing field of research into the fate of gene duplicates, there is not yet clear evidence for the preponderance of one outcome over the others, much less evidence for the importance of adaptive or nonadaptive forces in maintaining these duplicates.
Theoretical studies predict X chromosomes and autosomes should be under different selection pressures, and there should therefore be differences in sex-specific and sexually antagonistic gene content between the X and the autosomes. Previous analyses have identified an excess of genes duplicated by retrotransposition from the X chromosome in Drosophila melanogaster. A number of hypotheses may explain this pattern, including mutational bias, escape from X-inactivation during spermatogenesis, and the movement of male-favored (sexually antagonistic) genes from a chromosome that is predominantly carried by females. To distinguish among these processes and to examine the generality of these patterns, we identified duplicated genes in nine sequenced Drosophila genomes. We find that, as in D. melanogaster, there is an excess of genes duplicated from the X chromosome across the genus Drosophila. This excess duplication is due almost completely to genes duplicated by retrotransposition, with little to no excess from the X among genes duplicated via DNA intermediates. The only exception to this pattern appears within the burst of duplication that followed the creation of the Drosophila pseudoobscura neo-X chromosome. Additionally, we examined genes relocated among chromosomal arms (i.e., genes duplicated to new locations coupled with the loss of the copy in the ancestral locus) and found an excess of genes relocated off the ancestral X and neo-X chromosomes. Interestingly, many of the same genes were duplicated or relocated from the independently derived neo-X chromosomes of D. pseudoobscura and Drosophila willistoni, suggesting that natural selection favors the traffic of genes from X chromosomes. Overall, we find that the forces driving gene duplication from X chromosomes are dependent on the lineage in question, the molecular mechanism of duplication considered, the preservation of the ancestral copy, and the age of the X chromosome.
Duplicate genes act as a source of genetic material from which new functions arise. They exist in large numbers in every sequenced eukaryotic genome and may be responsible for many differences in phenotypes between species. However, recent work searching for the targets of positive selection in humans has largely ignored duplicated genes due to complications in orthology assignment. Here we find that a high proportion of young gene duplicates in the human, macaque, mouse, and rat genomes have experienced adaptive natural selection. Approximately 10% of all lineage-specific duplicates show evidence for positive selection on their protein sequences, larger than any reported amount of selection among single-copy genes in these lineages using similar methods. We also find that newly duplicated genes that have been transposed to new chromosomal locations are significantly more likely to have undergone positive selection than the ancestral copy. Human-specific duplicates evolving under adaptive natural selection include a surprising number of genes involved in neuronal and cognitive functions. Our results imply that genome scans for selection that ignore duplicated loci are missing a large fraction of all adaptive substitutions. The results are also in agreement with the classical model of evolution by gene duplication, supporting a common role for neofunctionalization in the long-term maintenance of gene duplicates.
Previous studies have shown that recombination between allelic sequences can cause likelihood-based methods for detecting positive selection to produce many false-positive results. In this article, we use simulations to study the impact of nonallelic gene conversion on the specificity of PAML to detect positive selection among gene duplicates. Our results show that, as expected, gene conversion leads to higher rates of false-positive results, although only moderately. These rates increase with the genetic distance between sequences, the length of converted tracts, and when no outgroup sequences are included in the analysis. We also find that branch-site models will incorrectly identify unconverted sequences as the targets of positive selection when their close paralogs are converted. Bayesian prediction of sites undergoing adaptive evolution implemented in PAML is affected by conversion, albeit in a less straightforward way. Our work suggests that particular attention should be devoted to the evolutionary analysis of recent duplicates that may have experienced gene conversion because they may provide false signals of positive selection. Fortunately, these results also imply that those cases most susceptible to false-positive results--i.e., high divergence between paralogs, long conversion tracts--are also the cases where detecting gene conversion is the easiest.
Gene conversion between duplicated genes has been implicated in homogenization of gene families and reassortment of variation among paralogs. If conversion is common, this process could lead to errors in gene tree inference and subsequent overestimation of rates of gene duplication. After performing simulations to assess our power to detect gene conversion events, we determined rates of conversion among young, lineage-specific gene duplicates in four mammal species: human, rhesus macaque, mouse, and rat. Gene conversion rates (number of conversion events/number of gene pairs) among young duplicates range from 8.3% in macaque to 18.96% in rat, including a 5% false-positive rate. For all lineages, only 1-3% of the total amount of sequence examined was converted. There is no increase in GC content in conversion tracts compared to flanking regions of the same genes nor in conversion tracts compared to the same region in nonconverted gene-family members, suggesting that ectopic gene conversion does not significantly alter nucleotide composition in these duplicates. While the majority of gene duplicate pairs reside on different chromosomes in mammalian genomes, the majority of gene conversion events occur between duplicates on the same chromosome, even after controlling for divergence between duplicates. Among intrachromosomal duplicates, however, there is no correlation between the probability of conversion and physical distance between duplicates after controlling for divergence. Finally, we use a novel method to show that at most 5-10% of all gene trees involving young duplicates are likely to be incorrect due to gene conversion. We conclude that gene conversion has had only a small effect on mammalian genomes and gene duplicate evolution in general.
In this paper we use the length of the shared synteny between genes to identify "parent" orthologs among multiple lineage specific duplicated genes. Genes in the region around each duplicated paralog are compared with the genes flanking an outgroup ortholog to estimate the probability of observing homologs in syntenic vs. non-syntenic regions. The length of the shared synteny is introduced as a hidden variable and is estimated using Expectation-Maximization for each lineage specific paralog. Assuming that the original, parental gene will preserve the longest synteny with the outgroup gene, and that any daughter genes will have a shorter syntenic block, we are able to determine parent-daughter relationships. We apply this method to lineage specific duplications in the human genome, and show that we are able to determine the direction and size of the duplication events that have created hundreds of genes.
One of the unique insights provided by the growing number of fully sequenced genomes is the pervasiveness of gene duplication and gene loss. Indeed, several metrics now suggest that rates of gene birth and death per gene are only 10-40% lower than nucleotide substitutions per site, and that per nucleotide, the consequent lineage-specific expansion and contraction of gene families may play at least as large a role in adaptation as changes in orthologous sequences. While gene family evolution is pervasive, it may be especially important in our own evolution since it appears that the "revolving door" of gene duplication and loss has undergone multiple accelerations in the lineage leading to humans. In this paper, we review current understanding of gene family evolution including: methods for inferring copy number change, evidence for adaptive expansion and adaptive contraction of gene families, the origins of new families and deaths of previously established ones, and finally we conclude with a perspective on challenges and promising directions for future research.
Gene duplication is a major driver of organismal adaptation and evolution and plays an important role in multiple human diseases. Whole-genome analyses have shown similar and high rates of gene duplication across a variety of eukaryotic species. Most of these studies, however, did not address the possible impact of interlocus gene conversion (IGC) on the evolution of gene duplicates. Because IGC homogenizes pairs of duplicates, widespread conversion would cause gene duplication events that happened long ago to appear more recent, resulting in artificially high estimates of duplication rates. Although the majority of genome-wide studies (including in the budding yeast Saccharomyces cerevisiae [Scer]) point to levels of IGC between paralogs ranging from 2% to 18%, Gao and Innan (Gao LZ, Innan H. 2004. Very low gene duplication rate in the yeast genome. Science 306:1367-1370.) found that gene conversion in yeast affected >80% of paralog pairs. If conversion rates really are this high, it would imply that the rate of gene duplication in eukaryotes is much lower than previously reported. In this work, we apply four different methodologies-including one approach that closely mirrors Gao and Innans method-to estimate the level of IGC in Scer. Our analyses point to a maximum conversion level of 13% between paralogs in this species, in close agreement with most estimates of IGC in eukaryotes. We also show that the exceedingly high levels of conversion found previously derive from application of an accurate method to an inappropriate data set. In conclusion, our work provides the most striking evidence to date supporting the reduced incidence of IGC among Scer paralogs and sets up a framework for future analyses in other eukaryotes.
This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5- and 3-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species.
The evolution of a pair of chromosomes that differ in appearance between males and females (heteromorphic sex chromosomes) has occurred repeatedly across plants and animals. Recent work has shown that the male heterogametic (XY) and female heterogametic (ZW) sex chromosomes evolved independently from different pairs of homomorphic autosomes in the common ancestor of birds and mammals but also that X and Z chromosomes share many convergent molecular features. However, little is known about how often heteromorphic sex chromosomes have either evolved convergently from different autosomes or in parallel from the same pair of autosomes and how universal patterns of molecular evolution on sex chromosomes really are. Among winged insects with sequenced genomes, there are male heterogametic species in both the Diptera (e.g., Drosophila melanogaster) and the Coleoptera (Tribolium castaneum), female heterogametic species in the Lepidoptera (Bombyx mori), and haplodiploid species in the Hymenoptera (e.g., Nasonia vitripennis). By determining orthologous relationships among genes on the X and Z chromosomes of insects with sequenced genomes, we are able to show that these chromosomes are not homologous to one another but are homologous to autosomes in each of the other species. These results strongly imply that heteromorphic sex chromosomes have evolved independently from different pairs of ancestral chromosomes in each of the insect orders studied. We also find that the convergently evolved X chromosomes of Diptera and Coleoptera share genomic features with each other and with vertebrate X chromosomes, including excess gene movement from the X to the autosomes. However, other patterns of molecular evolution--such as increased codon bias, decreased gene density, and the paucity of male-biased genes on the X--differ among the insect X and Z chromosomes. Our results provide evidence for both differences and nearly universal similarities in patterns of evolution among independently derived sex chromosomes.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.