Short oligonucleotides can be used as markers to tag and track DNA sequences. For example, barcoding techniques (i.e. Multiplex Identifiers or Indexing) use short oligonucleotides to distinguish between reads from different DNA samples pooled for high-throughput sequencing. A similar technique called molecule tagging uses the same principles but is applied to individual DNA template molecules. Each template molecule is tagged with a unique oligonucleotide prior to polymerase chain reaction. The resulting amplicon sequences can be traced back to their original templates by their oligonucleotide tag. Consensus building from sequences sharing the same tag enables inference of original template molecules thereby reducing effects of sequencing error and polymerase chain reaction bias. Several independent groups have developed similar protocols for molecule tagging; however, user-friendly software for build consensus sequences from molecule tagged reads is not readily available or is highly specific for a particular protocol.
HIV coinfection accelerates disease progression in chronic hepatitis C and reduces sustained antiviral responses (SVR) to interferon-based therapy. New direct-acting antivirals (DAAs) promise higher SVR rates, but the selection of preexisting resistance-associated variants (RAVs) may lead to virologic breakthrough or relapse. Thus, pretreatment frequencies of RAVs are likely determinants of treatment outcome but typically are below levels at which the viral sequence can be accurately resolved. Moreover, it is not known how HIV coinfection influences RAV frequency. We adopted an accurate high-throughput sequencing strategy to compare nucleotide diversity in HCV NS3 protease-coding sequences in 20 monoinfected and 20 coinfected subjects with well-controlled HIV infection. Differences in mean pairwise nucleotide diversity (?), Tajima's D statistic, and Shannon entropy index suggested that the genetic diversity of HCV is reduced in coinfection. Among coinfected subjects, diversity correlated positively with increases in CD4(+) T cells on antiretroviral therapy, suggesting T cell responses are important determinants of diversity. At a median sequencing depth of 0.084%, preexisting RAVs were readily identified. Q80K, which negatively impacts clinical responses to simeprevir, was encoded by more than 99% of viral RNAs in 17 of the 40 subjects. RAVs other than Q80K were identified in 39 of 40 subjects, mostly at frequencies near 0.1%. RAV frequency did not differ significantly between monoinfected and coinfected subjects. We conclude that HCV genetic diversity is reduced in patients with well-controlled HIV infection, likely reflecting impaired T cell immunity. However, RAV frequency is not increased and should not adversely influence the outcome of DAA therapy.
Despite modern sequencing efforts, the difficulty in assembly of highly repetitive sequences has prevented resolution of human genome gaps, including some in the coding regions of genes with important biological functions. One such gene, MUC5AC, encodes a large, secreted mucin, which is one of the two major secreted mucins in human airways. The MUC5AC region contains a gap in the human genome reference (hg19) across the large, highly repetitive, and complex central exon. This exon is predicted to contain imperfect tandem repeat sequences and multiple conserved cysteine-rich (CysD) domains. To resolve the MUC5AC genomic gap, we used high-fidelity long PCR followed by single molecule real-time (SMRT) sequencing. This technology yielded long sequence reads and robust coverage that allowed for de novo sequence assembly spanning the entire repetitive region. Furthermore, we used SMRT sequencing of PCR amplicons covering the central exon to identify genetic variation in four individuals. The results demonstrated the presence of segmental duplications of CysD domains, insertions/deletions (indels) of tandem repeats, and single nucleotide variants. Additional studies demonstrated that one of the identified tandem repeat insertions is tagged by nonexonic single nucleotide polymorphisms. Taken together, these data illustrate the successful utility of SMRT sequencing long reads for de novo assembly of large repetitive sequences to fill the gaps in the human genome. Characterization of the MUC5AC gene and the sequence variation in the central exon will facilitate genetic and functional studies for this critical airway mucin.
Horizontal gene transfer (HGT) is a widespread process that enables the acquisition of genes and metabolic pathways in single evolutionary steps. Previous reports have described fitness costs of HGT, but have largely focused on the acquisition of relatively small plasmids. We have previously shown that a Pseudomonas syringae pv. lachrymans strain recently acquired a cryptic megaplasmid, pMPPla107. This extrachromosomal element contributes hundreds of new genes to P. syringae and increases total genomic content by approximately 18%. However, this early work did not directly explore transmissibility, stability, or fitness costs associated with acquisition of pMPPla107.
Drosophila melanogaster, an ancestrally African species, has recently spread throughout the world, associated with human activity. The species has served as the focus of many studies investigating local adaptation relating to latitudinal variation in non-African populations, especially those from the United States and Australia. These studies have documented the existence of shared, genetically determined phenotypic clines for several life history and morphological traits. However, there are no studies designed to formally address the degree of shared latitudinal differentiation at the genomic level. Here we present our comparative analysis of such differentiation. Not surprisingly, we find evidence of substantial, shared selection responses on the two continents, probably resulting from selection on standing ancestral variation. The polymorphic inversion In(3R)P has an important effect on this pattern, but considerable parallelism is also observed across the genome in regions not associated with inversion polymorphism. Interestingly, parallel latitudinal differentiation is observed even for variants that are not particularly strongly differentiated, which suggests that very large numbers of polymorphisms are targets of spatially varying selection in this species.
As a species complex, Pseudomonas syringae exists in both agriculture and natural aquatic habitats. P.viridiflava, a member of this complex, has been reported to be phenotypically largely homogenous. We characterized strains from different habitats, selected based on their genetic similarity to previously described P.viridiflava strains. We revealed two distinct phylogroups and two different kinds of variability in phenotypic traits and genomic content. The strains exhibited phase variation in phenotypes including pathogenicity and soft rot on potato. We showed that the presence of two configurations of the Type III Secretion System [single (S-PAI) and tripartite (T-PAI) pathogenicity islands] are not correlated with pathogenicity or with the capacity to induce soft rot in contrast to previous reports. The presence/absence of the avrE effector gene was the only trait we found to be correlated with pathogenicity of P.viridiflava. Other Type III secretion effector genes were not correlated with pathogenicity. A genomic region resembling an exchangeable effector locus (EEL) was found in S-PAI strains, and a probable recombination between the two PAIs is described. The ensemble of the variability observed in these phylogroups of P.syringae likely contributes to their adaptability to alternating opportunities for pathogenicity or saprophytic survival.
Here, we report the draft genome sequences for 7 phylogenetically diverse isolates of Pseudomonas syringae, obtained from numerous environmental sources and geographically proximate crop species. Overall, these sequences provide a wealth of information about the differences (or lack thereof) between isolates from disease outbreaks and those from other sources.
Comparative genomic analyses have revealed that genes may arise from ancestrally nongenic sequence. However, the origin and spread of these de novo genes within populations remain obscure. We identified 142 segregating and 106 fixed testis-expressed de novo genes in a population sample of Drosophila melanogaster. These genes appear to derive primarily from ancestral intergenic, unexpressed open reading frames, with natural selection playing a significant role in their spread. These results reveal a heretofore unappreciated dynamism of gene content.
Pseudomonas syringae is a phylogenetically diverse species of Gram-negative bacterial plant pathogens responsible for crop diseases around the world. The HrpL sigma factor drives expression of the major P. syringae virulence regulon. HrpL controls expression of the genes encoding the structural and functional components of the type III secretion system (T3SS) and the type three secreted effector proteins (T3E) that are collectively essential for virulence. HrpL also regulates expression of an under-explored suite of non-type III effector genes (non-T3E), including toxin production systems and operons not previously associated with virulence. We implemented and refined genome-wide transcriptional analysis methods using cDNA-derived high-throughput sequencing (RNA-seq) data to characterize the HrpL regulon from six isolates of P. syringae spanning the diversity of the species. Our transcriptomes, mapped onto both complete and draft genomes, significantly extend earlier studies. We confirmed HrpL-regulation for a majority of previously defined T3E genes in these six strains. We identified two new T3E families from P. syringae pv. oryzae 1_6, a strain within the relatively underexplored phylogenetic Multi-Locus Sequence Typing (MLST) group IV. The HrpL regulons varied among strains in gene number and content across both their T3E and non-T3E gene suites. Strains within MLST group II consistently express the lowest number of HrpL-regulated genes. We identified events leading to recruitment into, and loss from, the HrpL regulon. These included gene gain and loss, and loss of HrpL regulation caused by group-specific cis element mutations in otherwise conserved genes. Novel non-T3E HrpL-regulated genes include an operon that we show is required for full virulence of P. syringae pv. phaseolicola 1448A on French bean. We highlight the power of integrating genomic, transcriptomic, and phylogenetic information to drive concise functional experimentation and to derive better insight into the evolution of virulence across an evolutionarily diverse pathogen species.
How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important.
Many insects feed on only one or a few types of host. These host specialists often evolve a preference for chemical cues emanating from their host and develop mechanisms for circumventing their hosts defenses. Adaptations like these are central to evolutionary biology, yet our understanding of their genetics remains incomplete. Drosophila sechellia, an emerging model for the genetics of host specialization, is an island endemic that has adapted to chemical toxins present in the fruit of its host plant, Morinda citrifolia. Its sibling species, D. simulans, and many other Drosophila species do not tolerate these toxins and avoid the fruit. Earlier work found a region with a strong effect on tolerance to the major toxin, octanoic acid, on chromosome arm 3R. Using a novel assay, we narrowed this region to a small span near the centromere containing 18 genes, including three odorant binding proteins. It has been hypothesized that the evolution of host specialization is facilitated by genetic linkage between alleles contributing to host preference and alleles contributing to host usage, such as tolerance to secondary compounds. We tested this hypothesis by measuring the effect of this tolerance locus on host preference behavior. Our data were inconsistent with the linkage hypothesis, as flies bearing this tolerance region showed no increase in preference for media containing M. citrifolia toxins, which D. sechellia prefers. Thus, in contrast to some models for host preference, preference and tolerance are not tightly linked at this locus nor is increased tolerance per se sufficient to change preference. Our data are consistent with the previously proposed model that the evolution of D. sechellia as a M. citrifolia specialist occurred through a stepwise loss of aversion and gain of tolerance to M. citrifolias toxins.
All bacteria use the conserved Sec pathway to transport proteins across the cytoplasmic membrane, with the SecA ATPase playing a central role in the process. Mycobacteria are part of a small group of bacteria that have two SecA proteins: the canonical SecA (SecA1) and a second, specialized SecA (SecA2). The SecA2-dependent pathway exports a small subset of proteins and is required for Mycobacterium tuberculosis virulence. The mechanism by which SecA2 drives export of proteins across the cytoplasmic membrane remains poorly understood. Here we performed suppressor analysis on a dominant negative secA2 mutant (secA2 K129R) of the model mycobacterium Mycobacterium smegmatis to better understand the pathway used by SecA2 to export proteins. Two extragenic suppressor mutations were identified as mapping to the promoter region of secY, which encodes the central component of the canonical Sec export channel. These suppressor mutations increased secY expression, and this effect was sufficient to alleviate the secA2 K129R phenotype. We also discovered that the level of SecY protein was greatly diminished in the secA2 K129R mutant, but at least partially restored in the suppressors. Furthermore, the level of SecY in a suppressor strongly correlated with the degree of suppression. Our findings reveal a detrimental effect of SecA2 K129R on SecY, arguing for an integrated system in which SecA2 works with SecY and the canonical Sec translocase to export proteins.
We describe improvements for sequencing 16S ribosomal RNA (rRNA) amplicons, a cornerstone technique in metagenomics. Through unique tagging of template molecules before PCR, amplicon sequences can be mapped to their original templates to correct amplification bias and sequencing error with software we provide. PCR clamps block amplification of contaminating sequences from a eukaryotic host, thereby substantially enriching microbial sequences without introducing bias.
Purifying selection often results in conservation of gene sequence and function. The most functionally conserved genes are also thought to be among the most biologically essential. These observations have led to the use of sequence conservation as a proxy for functional conservation. Here we describe two genes that are exceptions to this pattern. We show that lack of sequence conservation among orthologs of CG15460 and CG15323-herein named jean-baptiste (jb) and karr, respectively-does not necessarily predict lack of functional conservation. These two Drosophila melanogaster genes are among the most rapidly evolving protein-coding genes in this species, being nearly as diverged from their D. yakuba orthologs as random sequences are. jb and karr are both expressed at an elevated level in larval males and adult testes, but they are not accessory gland proteins and their loss does not affect male fertility. Instead, knockdown of these genes in D. melanogaster via RNA interference caused male-biased viability defects. These viability effects occur prior to the third instar for jb and during late pupation for karr. We show that putative orthologs to jb and karr are also expressed strongly in the testes of other Drosophila species and have similar gene structure across species despite low levels of sequence conservation. While standard molecular evolution tests could not reject neutrality, other data hint at a role for natural selection. Together these data provide a clear case where a lack of sequence conservation does not imply a lack of conservation of expression or function.
The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g., healthy vs. diseased cells) but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e., lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this article, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO, our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy and the inference of dominant set of transcripts than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.
Comparative genomics of closely related pathogens that differ in host range can provide insights into mechanisms of host-pathogen interactions and host adaptation. Furthermore, sequencing of multiple strains with the same host range reveals information concerning pathogen diversity and the molecular basis of virulence. Here we present a comparative analysis of draft genome sequences for four strains of Pseudomonas cannabina pathovar alisalensis (Pcal), which is pathogenic on a range of monocotyledonous and dicotyledonous plants. These draft genome sequences provide a foundation for understanding host range evolution across the monocot-dicot divide. Like other phytopathogenic pseudomonads, Pcal strains harboured a hrp/hrc gene cluster that codes for a type III secretion system. Phylogenetic analysis based on the hrp/hrc cluster genes/proteins, suggests localized recombination and functional divergence within the hrp/hrc cluster. Despite significant conservation of overall genetic content across Pcal genomes, comparison of type III effector repertoires reinforced previous molecular data suggesting the existence of two distinct lineages within this pathovar. Furthermore, all Pcal strains analyzed harbored two distinct genomic islands predicted to code for type VI secretion systems (T6SSs). While one of these systems was orthologous to known P. syringae T6SSs, the other more closely resembled a T6SS found within P. aeruginosa. In summary, our study provides a foundation to unravel Pcal adaptation to both monocot and dicot hosts and provides genetic insights into the mechanisms underlying pathogenicity.
The Toll-like receptors represent a largely evolutionarily conserved pathogen recognition machinery responsible for recognition of bacterial, fungal, protozoan, and viral pathogen associated microbial patterns and initiation of inflammatory response. Structurally the Toll-like receptors are comprised of an extracellular leucine rich repeat domain and a cytoplasmic Toll/Interleukin 1 receptor domain. Recognition takes place in the extracellular domain where as the cytoplasmic domain triggers a complex signal network required to sustain appropriate immune response. Signal transduction is regulated by the recruitment of different intracellular adaptors. The Toll-like receptors can be grouped depending on the usage of the adaptor, MyD88, into MyD88-dependent and MyD88 independent subsets. Herein, we present a unique phylogenetic analysis of domain regions of these receptors and their cognate signaling adaptor molecules. Although previously unclear from the phylogeny of full length receptors, these analyses indicate a separate evolutionary origin for the MyD88-dependent and MyD88-independent signaling pathway and provide evidence of a common ancestor for the vertebrate and invertebrate orthologs of the adaptor molecule MyD88. Together these observations suggest a very ancient origin of the MyD88-dependent pathway Additionally we show that early duplications gave rise to several adaptor molecule families. In some cases there is also strong pattern of parallel duplication between adaptor molecules and their corresponding TLR. Our results further support the hypothesis that phylogeny of specific domains involved in signaling pathway can shed light on key processes that link innate to adaptive immune response.
Viruses can create complex genetic populations within a host, and deep sequencing technologies allow extensive sampling of these populations. Limitations of these technologies, however, potentially bias this sampling, particularly when a PCR step precedes the sequencing protocol. Typically, an unknown number of templates are used in initiating the PCR amplification, and this can lead to unrecognized sequence resampling creating apparent homogeneity; also, PCR-mediated recombination can disrupt linkage, and differential amplification can skew allele frequency. Finally, misincorporation of nucleotides during PCR and errors during the sequencing protocol can inflate diversity. We have solved these problems by including a random sequence tag in the initial primer such that each template receives a unique Primer ID. After sequencing, repeated identification of a Primer ID reveals sequence resampling. These resampled sequences are then used to create an accurate consensus sequence for each template, correcting for recombination, allelic skewing, and misincorporation/sequencing errors. The resulting population of consensus sequences directly represents the initial sampled templates. We applied this approach to the HIV-1 protease (pro) gene to view the distribution of sequence variation of a complex viral population within a host. We identified major and minor polymorphisms at coding and noncoding positions. In addition, we observed dynamic genetic changes within the population during intermittent drug exposure, including the emergence of multiple resistant alleles. These results provide an unprecedented view of a complex viral population in the absence of PCR resampling.
Finding the genes underlying complex traits is difficult. We show that new sequencing technology combined with traditional genetic techniques can efficiently identify genetic regions underlying a complex and quantitative behavioral trait. As a proof of concept we used phenotype-based introgression to backcross loci that control innate food preference in Drosophila simulans into the genomic background of D. sechellia, which expresses the opposite preference. We successfully mapped D. simulans introgression regions in a small mapping population (30 flies) with whole-genome resequencing using light coverage (?1×). We found six loci contributing to D. simulans food preference, one of which overlaps a previously discovered allele. This approach is applicable to many systems, does not rely on laborious marker development or genotyping, does not require existing high quality reference genomes, and needs only small mapping populations. Because introgression is used, researchers can scale mapping population size, replication, and number of backcross generations to their needs. Finally, in contrast to more widely used mapping techniques like F(2) bulk-segregant analysis, our method produces near-isogenic lines that can be kept and reused indefinitely.
In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription between samples. Our focus is on samples observed using short-read RNA sequencing (RNA-seq).
Morphogenesis is an important component of animal development. Genetic redundancy has been proposed to be common among morphogenesis genes, posing a challenge to the genetic dissection of morphogenesis mechanisms. Genetic redundancy is more generally a challenge in biology, as large proportions of the genes in diverse organisms have no apparent loss of function phenotypes. Here, we present a screen designed to uncover redundant and partially redundant genes that function in an example of morphogenesis, gastrulation in Caenorhabditis elegans. We performed an RNA interference (RNAi) enhancer screen in a gastrulation-sensitized double-mutant background, targeting genes likely to be expressed in gastrulating cells or their neighbors. Secondary screening identified 16 new genes whose functions contribute to normal gastrulation in a nonsensitized background. We observed that for most new genes found, the closest known homologs were multiple other C. elegans genes, suggesting that some may have derived from rounds of recent gene duplication events. We predict that such genes are more likely than single copy genes to comprise redundant or partially redundant gene families. We explored this prediction for one gene that we identified and confirmed that this gene and five close relatives, which encode predicted substrate recognition subunits (SRSs) for a CUL-2 ubiquitin ligase, do indeed function partially redundantly with each other in gastrulation. Our results implicate new genes in C. elegans gastrulation, and they show that an RNAi-based enhancer screen in C. elegans can be used as an efficient means to identify important but redundant or partially redundant developmental genes.
Closely related pathogens may differ dramatically in host range, but the molecular, genetic, and evolutionary basis for these differences remains unclear. In many Gram- negative bacteria, including the phytopathogen Pseudomonas syringae, type III effectors (TTEs) are essential for pathogenicity, instrumental in structuring host range, and exhibit wide diversity between strains. To capture the dynamic nature of virulence gene repertoires across P. syringae, we screened 11 diverse strains for novel TTE families and coupled this nearly saturating screen with the sequencing and assembly of 14 phylogenetically diverse isolates from a broad collection of diseased host plants. TTE repertoires vary dramatically in size and content across all P. syringae clades; surprisingly few TTEs are conserved and present in all strains. Those that are likely provide basal requirements for pathogenicity. We demonstrate that functional divergence within one conserved locus, hopM1, leads to dramatic differences in pathogenicity, and we demonstrate that phylogenetics-informed mutagenesis can be used to identify functionally critical residues of TTEs. The dynamism of the TTE repertoire is mirrored by diversity in pathways affecting the synthesis of secreted phytotoxins, highlighting the likely role of both types of virulence factors in determination of host range. We used these 14 draft genome sequences, plus five additional genome sequences previously reported, to identify the core genome for P. syringae and we compared this core to that of two closely related non-pathogenic pseudomonad species. These data revealed the recent acquisition of a 1 Mb megaplasmid by a sub-clade of cucumber pathogens. This megaplasmid encodes a type IV secretion system and a diverse set of unknown proteins, which dramatically increases both the genomic content of these strains and the pan-genome of the species.
HIV-1 is present in anatomical compartments and bodily fluids. Most transmissions occur through sexual acts, making virus in semen the proximal source in male donors. We find three distinct relationships in comparing viral RNA populations between blood and semen in men with chronic HIV-1 infection, and we propose that the viral populations in semen arise by multiple mechanisms including: direct import of virus, oligoclonal amplification within the seminal tract, or compartmentalization. In addition, we find significant enrichment of six out of nineteen cytokines and chemokines in semen of both HIV-infected and uninfected men, and another seven further enriched in infected individuals. The enrichment of cytokines involved in innate immunity in the seminal tract, complemented with chemokines in infected men, creates an environment conducive to T cell activation and viral replication. These studies define different relationships between virus in blood and semen that can significantly alter the composition of the viral population at the source that is most proximal to the transmitted virus.
Heterotrimeric G proteins act as the physical nexus between numerous receptors that respond to extracellular signals and proteins that drive the cytoplasmic response. The G? subunit of the G protein, in particular, is highly constrained due to its many interactions with proteins that control or react to its conformational state. Various organisms contain differing sets of G?-interacting proteins, clearly indicating that shifts in sequence and associated G? functionality were acquired over time. These numerous interactions constrained much of G? evolution; yet G? has diversified, through poorly understood processes, into several functionally specialized classes, each with a unique set of interacting proteins. Applying a synthetic sequence-based approach to mammalian G? subunits, we established a set of seventy-five evolutionarily important class-distinctive residues, sites where a single G? class is differentiated from the three other classes. We tested the hypothesis that shifts at these sites are important for class-specific functionality. Importantly, we mapped known and well-studied class-specific functionalities from all four mammalian classes to sixteen of our class-distinctive sites, validating the hypothesis. Our results show how unique functionality can evolve through the recruitment of residues that were ancestrally functional. We also studied acquisition of functionalities by following these evolutionarily important sites in non-mammalian organisms. Our results suggest that many class-distinctive sites were established early on in eukaryotic diversification and were critical for the establishment of new G? classes, whereas others arose in punctuated bursts throughout metazoan evolution. These G? class-distinctive residues are rational targets for future structural and functional studies.
Changes in host specialization contribute to the diversification of phytophagous insects. When shifting to a new host, insects evolve new physiological, morphological, and behavioral adaptations. Our understanding of the genetic changes responsible for these adaptations is limited. For instance, we do not know how often host shifts involve gain-of-function vs. loss-of-function alleles. Recent work suggests that some genes involved in odor recognition are lost in specialists. Here we show that genes involved in detoxification and metabolism, as well as those affecting olfaction, have reduced gene expression in Drosophila sechellia-a specialist on the fruit of Morinda citrifolia. We screened for genes that differ in expression between D. sechellia and its generalist sister species, D. simulans. We also screened for genes that are differentially expressed in D. sechellia when these flies chose their preferred host vs. when they were forced onto other food. D. sechellia increases expression of genes involved with oogenesis and fatty acid metabolism when on its host. The majority of differentially expressed genes, however, appear downregulated in D. sechellia. For several functionally related genes, this decrease in expression is associated with apparent loss-of-function alleles. For example, the D. sechellia allele of Odorant binding protein 56e (Obp56e) harbors a premature stop codon. We show that knockdown of Obp56e activity significantly reduces the avoidance response of D. melanogaster toward M. citrifolia. We argue that apparent loss-of-function alleles like Obp56e potentially contributed to the initial adaptation of D. sechellia to its host. Our results suggest that a subset of genes reduce or lose function as a consequence of host specialization, which may explain why, in general, specialist insects tend to shift to chemically similar hosts.
High-throughput techniques for detecting DNA polymorphisms generally do not identify changes in which the genomic position of a sequence, but not its copy number, varies among individuals. To explore such balanced structural polymorphisms, we used array-based Comparative Genomic Hybridization (aCGH) to conduct a genome-wide screen for single-copy genomic segments that occupy different genomic positions in the standard laboratory strain of Saccharomyces cerevisiae (S90) and a polymorphic wild isolate (Y101) through analysis of six tetrads from a cross of these two strains. Paired-end high-throughput sequencing of Y101 validated four of the predicted rearrangements. The transposed segments contained one to four annotated genes each, yet crosses between S90 and Y101 yielded mostly viable tetrads. The longest segment comprised 13.5 kb near the telomere of chromosome XV in the S288C reference strain and Southern blotting confirmed its predicted location on chromosome IX in Y101. Interestingly, inter-locus crossover events between copies of this segment occurred at a detectable rate. The presence of low-copy repetitive sequences at the junctions of this segment suggests that it may have arisen through ectopic recombination. Our methodology and findings provide a starting point for exploring the origins, phenotypic consequences, and evolutionary fate of this largely unexplored form of genomic polymorphism.
Gbeta subunits from heterotrimeric G-proteins (guanine nucleotide-binding proteins) directly bind diverse proteins, including effectors and regulators, to modulate a wide array of signaling cascades. These numerous interactions constrained the evolution of the molecular surface of Gbeta. Although mammals contain five Gbeta genes comprising two classes (Gbeta1-like and Gbeta5-like), plants and fungi have a single ortholog, and organisms such as Caenorhabditis elegans and Drosophila melanogaster contain one copy from each class. A limited number of crystal structures of complexes containing Gbeta subunits and complementary biochemical data highlight specific sites within Gbetas needed for protein interactions. It is difficult to determine from these interaction sites what, if any, additional regions of the Gbeta molecular surface comprise interaction interfaces essential to Gbetas role as a nexus in numerous signaling cascades. We used a comparative evolutionary approach to identify five known and eight previously unknown putative interfaces on the surface of Gbeta. We show that one such novel interface occurs between Gbeta and phospholipase C beta2 (PLC-beta2), a mammalian Gbeta interacting protein. Substitutions of residues within this Gbeta-PLC-beta2 interface reduce the activation of PLC-beta2 by Gbeta1, confirming that our de novo comparative evolutionary approach predicts previously unknown Gbeta-protein interfaces. Similarly, we hypothesize that the seven remaining untested novel regions contribute to putative interfaces for other Gbeta interacting proteins. Finally, this comparative evolutionary approach is suitable for application to any protein involved in a significant number of protein-protein interactions.
We developed a novel approach for de novo genome assembly using only sequence data from high-throughput short read sequencing technologies. By combining data generated from 454 Life Sciences (Roche) and Illumina (formerly known as Solexa sequencing) sequencing platforms, we reliably assembled genomes into large scaffolds at a fraction of the traditional cost and without use of a reference sequence. We applied this method to two isolates of the phytopathogenic bacteria Pseudomonas syringae. Sequencing and reassembly of the well-studied tomato and Arabidopsis pathogen, Pto(DC3000), facilitated development and testing of our method. Sequencing of a distantly related rice pathogen, Por(1_)(6), demonstrated our methods efficacy for de novo assembly of novel genomes. Our assembly of Por(1_6) yielded an N50 scaffold size of 531,821 bp with >75% of the predicted genome covered by scaffolds over 100,000 bp. One of the critical phenotypic differences between strains of P. syringae is the range of plant hosts they infect. This is largely determined by their complement of type III effector proteins. The genome of Por(1_6) is the first sequenced for a P. syringae isolate that is a pathogen of monocots, and, as might be predicted, its complement of type III effectors differs substantially from the previously sequenced isolates of this species. The genome of Por(1_6) helps to define an expansion of the P. syringae pan-genome, a corresponding contraction of the core genome, and a further diversification of the type III effector complement for this important plant pathogen species.
Generation of meiotic crossovers in many eukaryotes requires the elimination of anti-crossover activities by using the Msh4-Msh5 heterodimer to block helicases. Msh4 and Msh5 have been lost from the flies Drosophila and Glossina, but we identified a complex of minichromosome maintenance (MCM) proteins that functionally replace Msh4-Msh5. We found that REC, an ortholog of MCM8 that evolved under strong positive selection in flies, interacts with MEI-217 and MEI-218, which arose from a previously undescribed metazoan-specific MCM protein. Meiotic crossovers were reduced in Drosophila rec, mei-217, and mei-218 mutants; however, removal of the Bloom syndrome helicase (BLM) ortholog restored crossovers. Thus, MCMs were co-opted into a novel complex that replaced the meiotic pro-crossover function of Msh4-Msh5 in flies.
The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at http://www.netlab.uky.edu/p/bioinfo/DiffSplice.
Two congeneric species of spadefoot toad, Spea multiplicata and Spea bombifrons, have been the focus of hybridization studies since the 1970s. Because complex hybrids are not readily distinguished phenotypically, genetic markers are needed to identify introgressed individuals. We therefore developed a set of molecular markers (amplified fragment length polymorphism, polymerase chain reaction-restriction fragment length polymorphism and single nucleotide polymorphism) for identifying pure-species, F1 hybrids and more complex introgressed types. To do so, we tested a series of markers across both species and known hybrids using populations in both allopatry and sympatry. We retained those markers that differentiated the two pure-species and also consistently identified known species hybrids. These markers are well suited for identifying hybrids between these species. Moreover, those markers that show variation within each species can be used in conjunction with existing molecular markers in studies of population structure and gene flow.
Phenotypic plasticity--the capacity of a single genotype to produce different phenotypes in response to varying environmental conditions--is widespread. Yet, whether, and how, plasticity impacts evolutionary diversification is unclear. According to a widely discussed hypothesis, plasticity promotes rapid evolution because genes expressed differentially across different environments (i.e., genes with "biased" expression) experience relaxed genetic constraint and thereby accumulate variation faster than do genes with unbiased expression. Indeed, empirical studies confirm that biased genes evolve faster than unbiased genes in the same genome. An alternative hypothesis holds, however, that the relaxed constraint and faster evolutionary rates of biased genes may be a precondition for, rather than a consequence of, plasticitys evolution. Here, we evaluated these alternative hypotheses by characterizing evolutionary rates of biased and unbiased genes in two species of frogs that exhibit a striking form of phenotypic plasticity. We also characterized orthologs of these genes in four species of frogs that had diverged from the two plastic species before the plasticity evolved. We found that the faster evolutionary rates of biased genes predated the evolution of the plasticity. Furthermore, biased genes showed greater expression variance than did unbiased genes, suggesting that they may be more dispensable. Phenotypic plasticity may therefore evolve when dispensable genes are co-opted for novel function in environmentally induced phenotypes. Thus, relaxed genetic constraint may be a cause--not a consequence--of the evolution of phenotypic plasticity, and thereby contribute to the evolution of novel traits.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.