The diverse Fusobacterium genus contains species implicated in multiple clinical pathologies, including periodontal disease, preterm birth, and colorectal cancer. The lack of genetic tools for manipulating these organisms leaves us with little understanding of the genes responsible for adherence to and invasion of host cells. Actively invading Fusobacterium species can enter host cells independently, whereas passively invading species need additional factors, such as compromise of mucosal integrity or coinfection with other microbes. We applied whole-genome sequencing and comparative analysis to study the evolution of active and passive invasion strategies and to infer factors associated with active forms of host cell invasion. The evolution of active invasion appears to have followed an adaptive radiation in which two of the three fusobacterial lineages acquired new genes and underwent expansions of ancestral genes that enable active forms of host cell invasion. Compared to passive invaders, active invaders have much larger genomes, encode FadA-related adhesins, and possess twice as many genes encoding membrane-related proteins, including a large expansion of surface-associated proteins containing the MORN2 domain of unknown function. We predict a role for proteins containing MORN2 domains in adhesion and active invasion. In the largest and most comprehensive comparison of sequenced Fusobacterium species to date, we have generated a testable model for the molecular pathogenesis of Fusobacterium infection and illuminate new therapeutic or diagnostic strategies.
In its largest outbreak, Ebola virus disease is spreading through Guinea, Liberia, Sierra Leone, and Nigeria. We sequenced 99 Ebola virus genomes from 78 patients in Sierra Leone to ~2000× coverage. We observed a rapid accumulation of interhost and intrahost genetic variation, allowing us to characterize patterns of viral transmission over the initial weeks of the epidemic. This West African variant likely diverged from central African lineages around 2004, crossed from Guinea to Sierra Leone in May 2014, and has exhibited sustained human-to-human transmission subsequently, with no evidence of additional zoonotic sources. Because many of the mutations alter protein sequences and other biologically meaningful targets, they should be monitored for impact on diagnostics, vaccines, and therapies critical to outbreak response.
We have developed a robust RNA sequencing method for generating complete de novo assemblies with intra-host variant calls of Lassa and Ebola virus genomes in clinical and biological samples. Our method uses targeted RNase H-based digestion to remove contaminating poly(rA) carrier and ribosomal RNA. This depletion step improves both the quality of data and quantity of informative reads in unbiased total RNA sequencing libraries. We have also developed a hybrid-selection protocol to further enrich the viral content of sequencing libraries. These protocols have enabled rapid deep sequencing of both Lassa and Ebola virus and are broadly applicable to other viral genomics studies.
Sporothrix schenckii is a pathogenic dimorphic fungus that grows as a yeast and as mycelia. This species is the causative agent of sporotrichosis, typically a skin infection. We report the genome sequence of S. schenckii, which will facilitate the study of this fungus and of the Sporothrix schenckii group.
The domestic ferret (Mustela putorius furo) is an important animal model for multiple human respiratory diseases. It is considered the 'gold standard' for modeling human influenza virus infection and transmission. Here we describe the 2.41 Gb draft genome assembly of the domestic ferret, constituting 2.28 Gb of sequence plus gaps. We annotated 19,910 protein-coding genes on this assembly using RNA-seq data from 21 ferret tissues. We characterized the ferret host response to two influenza virus infections by RNA-seq analysis of 42 ferret samples from influenza time-course data and showed distinct signatures in ferret trachea and lung tissues specific to 1918 or 2009 human pandemic influenza virus infections. Using microarray data from 16 ferret samples reflecting cystic fibrosis disease progression, we showed that transcriptional changes in the CFTR-knockout ferret lung reflect pathways of early disease that cannot be readily studied in human infants with cystic fibrosis disease.
Although biosynthetic gene clusters (BGCs) have been discovered for hundreds of bacterial metabolites, our knowledge of their diversity remains limited. Here, we used a novel algorithm to systematically identify BGCs in the extensive extant microbial sequencing data. Network analysis of the predicted BGCs revealed large gene cluster families, the vast majority uncharacterized. We experimentally characterized the most prominent family, consisting of two subfamilies of hundreds of BGCs distributed throughout the Proteobacteria; their products are aryl polyenes, lipids with an aryl head group conjugated to a polyene tail. We identified a distant relationship to a third subfamily of aryl polyene BGCs, and together the three subfamilies represent the largest known family of biosynthetic gene clusters, with more than 1,000 members. Although these clusters are widely divergent in sequence, their small molecule products are remarkably conserved, indicating for the first time the important roles these compounds play in Gram-negative cell biology.
A Winogradsky column is a clear glass or plastic column filled with enriched sediment. Over time, microbial communities in the sediment grow in a stratified ecosystem with an oxic top layer and anoxic sub-surface layers. Winogradsky columns have been used extensively to demonstrate microbial nutrient cycling and metabolic diversity in undergraduate microbiology labs. In this study, we used high-throughput 16s rRNA gene sequencing to investigate the microbial diversity of Winogradsky columns. Specifically, we tested the impact of sediment source, supplemental cellulose source, and depth within the column, on microbial community structure. We found that the Winogradsky columns were highly diverse communities but are dominated by three phyla: Proteobacteria, Bacteroidetes, and Firmicutes. The community is structured by a founding population dependent on the source of sediment used to prepare the columns and is differentiated by depth within the column. Numerous biomarkers were identified distinguishing sample depth, including Cyanobacteria, Alphaproteobacteria, and Betaproteobacteria as biomarkers of the soil-water interface, and Clostridia as a biomarker of the deepest depth. Supplemental cellulose source impacted community structure but less strongly than depth and sediment source. In columns dominated by Firmicutes, the family Peptococcaceae was the most abundant sulfate reducer, while in columns abundant in Proteobacteria, several Deltaproteobacteria families, including Desulfobacteraceae, were found, showing that different taxonomic groups carry out sulfur cycling in different columns. This study brings this historical method for enrichment culture of chemolithotrophs and other soil bacteria into the modern era of microbiology and demonstrates the potential of the Winogradsky column as a model system for investigating the effect of environmental variables on soil microbial communities.
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
Background.?Methicillin-resistant Staphylococcus aureus (MRSA) colonization predicts later infection, with both host and pathogen determinants of invasive disease.Methods.?This nested case-control study evaluates predictors of MRSA bacteremia in an 8-intensive care unit (ICU) prospective adult cohort from 1 September 2003 through 30 April 2005 with active MRSA surveillance and collection of ICU, post-ICU, and readmission MRSA isolates. We selected MRSA carriers who did (cases) and those who did not (controls) develop MRSA bacteremia. Generating assembled genome sequences, we evaluated 30 MRSA genes potentially associated with virulence and invasion. Using multivariable Cox proportional hazards regression, we assessed the association of these genes with MRSA bacteremia, controlling for host risk factors.Results.?We collected 1578 MRSA isolates from 520 patients. We analyzed host and pathogen factors for 33 cases and 121 controls. Predictors of MRSA bacteremia included a diagnosis of cancer, presence of a central venous catheter, hyperglycemia (glucose level, >200 mg/dL), and infection with a MRSA strain carrying the gene for staphylococcal enterotoxin P (sep). Receipt of an anti-MRSA medication had a significant protective effect.Conclusions.?In an analysis controlling for host factors, colonization with MRSA carrying sep increased the risk of MRSA bacteremia. Identification of risk-adjusted genetic determinants of virulence may help to improve prediction of invasive disease and suggest new targets for therapeutic intervention.
Enterococcus faecium, natively a gut commensal organism, emerged as a leading cause of multidrug-resistant hospital-acquired infection in the 1980s. As the living record of its adaptation to changes in habitat, we sequenced the genomes of 51 strains, isolated from various ecological environments, to understand how E. faecium emerged as a leading hospital pathogen. Because of the scale and diversity of the sampled strains, we were able to resolve the lineage responsible for epidemic, multidrug-resistant human infection from other strains and to measure the evolutionary distances between groups. We found that the epidemic hospital-adapted lineage is rapidly evolving and emerged approximately 75 years ago, concomitant with the introduction of antibiotics, from a population that included the majority of animal strains, and not from human commensal lines. We further found that the lineage that included most strains of animal origin diverged from the main human commensal line approximately 3,000 years ago, a time that corresponds to increasing urbanization of humans, development of hygienic practices, and domestication of animals, which we speculate contributed to their ecological separation. Each bifurcation was accompanied by the acquisition of new metabolic capabilities and colonization traits on mobile elements and the loss of function and genome remodeling associated with mobile element insertion and movement. As a result, diversity within the species, in terms of sequence divergence as well as gene content, spans a range usually associated with speciation.
Mycobacterium tuberculosis (M.tb), the cause of tuberculosis (TB), is estimated to infect a new host every second. While analyses of genetic data from natural populations of M.tb have emphasized the role of genetic drift in shaping patterns of diversity, the influence of natural selection on this successful pathogen is less well understood. We investigated the effects of natural selection on patterns of diversity in 63 globally extant genomes of M.tb and related pathogenic mycobacteria. We found evidence of strong purifying selection, with an estimated genome-wide selection coefficient equal to -9.5 × 10(-4) (95% CI -1.1 × 10(-3) to -6.8 × 10(-4)); this is several orders of magnitude higher than recent estimates for eukaryotic and prokaryotic organisms. We also identified different patterns of variation across categories of gene function. Genes involved in transport and metabolism of inorganic ions exhibited very low levels of non-synonymous polymorphism, equivalent to categories under strong purifying selection (essential and translation-associated genes). The highest levels of non-synonymous variation were seen in a group of transporter genes, likely due to either diversifying selection or local selective sweeps. In addition to selection, we identified other important influences on M.tb genetic diversity, such as a 25-fold expansion of global M.tb populations coincident with explosive growth in human populations (estimated timing 1684 C.E., 95% CI 1620-1713 C.E.). These results emphasize the parallel demographic histories of this obligate pathogen and its human host, and suggest that the dominant effect of selection on M.tb is removal of novel variants, with exceptions in an interesting group of genes involved in transportation and defense. We speculate that the hostile environment within a host imposes strict demands on M.tb physiology, and thus a substantial fitness cost for most new mutations. In this respect, obligate bacterial pathogens may differ from other host-associated microbes such as symbionts.
Glycoside hydrolases (GHs), the enzymes that breakdown complex carbohydrates, are a highly diversified class of key enzymes associated with the gut microbiota and its metabolic functions. To learn more about the diversity of GHs and their potential role in a variety of gut microbiomes, we used a combination of 16S, metagenomic and targeted amplicon sequencing data to study one of these enzyme families in detail. Specifically, we employed a functional gene-targeted metagenomic approach to the 1-4-?-glucan-branching enzyme (gBE) gene in the gut microbiomes of four host species (human, chicken, cow and pig). The characteristics of operational taxonomic units (OTUs) and operational glucan-branching units (OGBUs) were distinctive in each of hosts. Human and pig were most similar in OTUs profiles while maintaining distinct OGBU profiles. Interestingly, the phylogenetic profiles identified from 16S and gBE gene sequences differed, suggesting the presence of different gBE genes in the same OTU across different vertebrate hosts. Our data suggest that gene-targeted metagenomic analysis is useful for an in-depth understanding of the diversity of a particular gene of interest. Specific carbohydrate metabolic genes appear to be carried by distinct OTUs in different individual hosts and among different vertebrate species microbiomes, the characteristics of which differ according to host genetic background and/or diet.The ISME Journal advance online publication, 10 October 2013; doi:10.1038/ismej.2013.167.
Loa loa, the African eyeworm, is a major filarial pathogen of humans. Unlike most filariae, L. loa does not contain the obligate intracellular Wolbachia endosymbiont. We describe the 91.4-Mb genome of L. loa and that of the related filarial parasite Wuchereria bancrofti and predict 14,907 L. loa genes on the basis of microfilarial RNA sequencing. By comparing these genomes to that of another filarial parasite, Brugia malayi, and to those of several other nematodes, we demonstrate synteny among filariae but not with nonparasitic nematodes. The L. loa genome encodes many immunologically relevant genes, as well as protein kinases targeted by drugs currently approved for use in humans. Despite lacking Wolbachia, L. loa shows no new metabolic synthesis or transport capabilities compared to other filariae. These results suggest that the role of Wolbachia in filarial biology is more subtle than previously thought and reveal marked differences between parasitic and nonparasitic nematodes.
M. tuberculosis is evolving antibiotic resistance, threatening attempts at tuberculosis epidemic control. Mechanisms of resistance, including genetic changes favored by selection in resistant isolates, are incompletely understood. Using 116 newly sequenced and 7 previously sequenced M. tuberculosis whole genomes, we identified genome-wide signatures of positive selection specific to the 47 drug-resistant strains. By searching for convergent evolution--the independent fixation of mutations in the same nucleotide position or gene--we recovered 100% of a set of known resistance markers. We also found evidence of positive selection in an additional 39 genomic regions in resistant isolates. These regions encode components in cell wall biosynthesis, transcriptional regulation and DNA repair pathways. Mutations in these regions could directly confer resistance or compensate for fitness costs associated with resistance. Functional genetic analysis of mutations in one gene, ponA1, demonstrated an in vitro growth advantage in the presence of the drug rifampicin.
Listeria monocytogenes, a foodborne bacterial pathogen, is comprised of four phylogenetic lineages that vary with regard to their serotypes and distribution among sources. In order to characterize lineage-specific genomic diversity within L. monocytogenes, we sequenced the genomes of eight strains from several lineages and serotypes, and characterized the accessory genome, which was hypothesized to contribute to phenotypic differences across lineages. The eight L. monocytogenes genomes sequenced range in size from 2.85-3.14 Mb, encode 2,822-3,187 genes, and include the first publicly available sequenced representatives of serotypes 1/2c, 3a and 4c. Mapping of the distribution of accessory genes revealed two distinct regions of the L. monocytogenes chromosome: an accessory-rich region in the first 65° adjacent to the origin of replication and a more stable region in the remaining 295°. This pattern of genome organization is distinct from that of related bacteria Staphylococcus aureus and Bacillus cereus. The accessory genome of all lineages is enriched for cell surface-related genes and phosphotransferase systems, and transcriptional regulators, highlighting the selective pressures faced by contemporary strains from their hosts, other microbes, and their environment. Phylogenetic analysis of O-antigen genes and gene clusters predicts that serotype 4 was ancestral in L. monocytogenes and serotype 1/2 associated gene clusters were putatively introduced through horizontal gene transfer in the ancestral population of L. monocytogenes lineage I and II.
Segniliparus rugosus represents one of two species in the genus Segniliparus, the sole genus in the family Segniliparaceae. A unique and interesting feature of this family is the presence of extremely long carbon-chain length mycolic acids bound in the cell wall. S. rugosus is also a medically important species because it is an opportunistic pathogen associated with mammalian lung disease. This report represents the second species in the genus to have its genome sequenced. The 3,567,567 bp long genome with 3,516 protein-coding and 49 RNA genes is part of the NIH Roadmap for Medical Research, Human Microbiome Project.
The rapid spread of dengue is a worldwide public health problem. In two clinical studies of dengue in Managua, Nicaragua, we observed an abrupt increase in disease severity across several epidemic seasons of dengue virus serotype 2 (DENV-2) transmission. Waning DENV-1 immunity appeared to increase the risk of severe disease in subsequent DENV-2 infections after a period of cross-protection. The increase in severity coincided with replacement of the Asian/American DENV-2 NI-1 clade with a new virus clade, NI-2B. In vitro analyses of viral isolates from the two clades and analysis of viremia in patient blood samples support the emergence of a fitter virus in later, relative to earlier, epidemic seasons. In addition, the NI-1 clade of viruses was more virulent specifically in children who were immune to DENV-1, whereas DENV-3 immunity was associated with more severe disease among NI-2B infections. Our data demonstrate that the complex interaction between viral genetics and population dynamics of serotype-specific immunity contributes to the risk of severe dengue disease. Furthermore, this work provides insights into viral evolution and the interaction between viral and immunological determinants of viral fitness and virulence.
Little is known about the rate at which genetic variation is generated within intrahost populations of dengue virus (DENV) and what implications this diversity has for dengue pathogenesis, disease severity, and host immunity. Previous studies of intrahost DENV variation have used a low frequency of sampling and/or experimental methods that do not fully account for errors generated through amplification and sequencing of viral RNAs. We investigated the extent and pattern of genetic diversity in sequence data in domain III (DIII) of the envelope (E) gene in serial plasma samples (n = 49) taken from 17 patients infected with DENV type 1 (DENV-1), totaling some 8,458 clones. Statistically rigorous approaches were employed to account for artifactual variants resulting from amplification and sequencing, which we suggest have played a major role in previous studies of intrahost genetic variation. Accordingly, nucleotide sequence diversities of viral populations were very low, with conservative estimates of the average levels of genetic diversity ranging from 0 to 0.0013. Despite such sequence conservation, we observed clear evidence for mixed infection, with the presence of multiple phylogenetically distinct lineages present within the same host, while the presence of stop codon mutations in some samples suggests the action of complementation. In contrast to some previous studies we observed no relationship between the extent and pattern of DENV-1 genetic diversity and disease severity, immune status, or level of viremia.
The tumor microenvironment of colorectal carcinoma is a complex community of genomically altered cancer cells, nonneoplastic cells, and a diverse collection of microorganisms. Each of these components may contribute to carcinogenesis; however, the role of the microbiota is the least well understood. We have characterized the composition of the microbiota in colorectal carcinoma using whole genome sequences from nine tumor/normal pairs. Fusobacterium sequences were enriched in carcinomas, confirmed by quantitative PCR and 16S rDNA sequence analysis of 95 carcinoma/normal DNA pairs, while the Bacteroidetes and Firmicutes phyla were depleted in tumors. Fusobacteria were also visualized within colorectal tumors using FISH. These findings reveal alterations in the colorectal cancer microbiota; however, the precise role of Fusobacteria in colorectal carcinoma pathogenesis requires further investigation.
West Nile virus (WNV) has become firmly established in northeastern US, reemerging every summer since its introduction into North America in 1999. To determine whether WNV overwinters locally or is reseeded annually, we examined the patterns of viral lineage persistence and replacement in Connecticut over 10 consecutive transmission seasons by phylogenetic analysis. In addition, we compared the full protein coding sequence among WNV isolates to search for evidence of convergent and adaptive evolution. Viruses sampled from Connecticut segregated into a number of well-supported subclades by year of isolation with few clades persisting ?2 years. Similar viral strains were dispersed in different locations across the state and divergent strains appeared within a single location during a single transmission season, implying widespread movement and rapid colonization of virus. Numerous amino acid substitutions arose in the population but only one change, V?A at position 159 of the envelope protein, became permanently fixed. Several instances of parallel evolution were identified in independent lineages, including one amino acid change in the NS4A protein that appears to be positively selected. Our results suggest that annual reemergence of WNV is driven by both reintroduction and local-overwintering of virus. Despite ongoing evolution of WNV, most amino acid variants occurred at low frequencies and were transient in the virus population.
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences--the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The environmental packages apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
We have adapted a solution hybrid selection protocol to enrich pathogen DNA in clinical samples dominated by human genetic material. Using mock mixtures of human and Plasmodium falciparum malaria parasite DNA as well as clinical samples from infected patients, we demonstrate an average of approximately 40-fold enrichment of parasite DNA after hybrid selection. This approach will enable efficient genome sequencing of pathogens from clinical samples, as well as sequencing of endosymbiotic organisms such as Wolbachia that live inside diverse metazoan phyla.
Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.
The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus--occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.
Paracoccidioides is a fungal pathogen and the cause of paracoccidioidomycosis, a health-threatening human systemic mycosis endemic to Latin America. Infection by Paracoccidioides, a dimorphic fungus in the order Onygenales, is coupled with a thermally regulated transition from a soil-dwelling filamentous form to a yeast-like pathogenic form. To better understand the genetic basis of growth and pathogenicity in Paracoccidioides, we sequenced the genomes of two strains of Paracoccidioides brasiliensis (Pb03 and Pb18) and one strain of Paracoccidioides lutzii (Pb01). These genomes range in size from 29.1 Mb to 32.9 Mb and encode 7,610 to 8,130 genes. To enable genetic studies, we mapped 94% of the P. brasiliensis Pb18 assembly onto five chromosomes. We characterized gene family content across Onygenales and related fungi, and within Paracoccidioides we found expansions of the fungal-specific kinase family FunK1. Additionally, the Onygenales have lost many genes involved in carbohydrate metabolism and fewer genes involved in protein metabolism, resulting in a higher ratio of proteases to carbohydrate active enzymes in the Onygenales than their relatives. To determine if gene content correlated with growth on different substrates, we screened the non-pathogenic onygenale Uncinocarpus reesii, which has orthologs for 91% of Paracoccidioides metabolic genes, for growth on 190 carbon sources. U. reesii showed growth on a limited range of carbohydrates, primarily basic plant sugars and cell wall components; this suggests that Onygenales, including dimorphic fungi, can degrade cellulosic plant material in the soil. In addition, U. reesii grew on gelatin and a wide range of dipeptides and amino acids, indicating a preference for proteinaceous growth substrates over carbohydrates, which may enable these fungi to also degrade animal biomass. These capabilities for degrading plant and animal substrates suggest a duality in lifestyle that could enable pathogenic species of Onygenales to transfer from soil to animal hosts.
The Plasmodium falciparum parasites ability to adapt to environmental pressures, such as the human immune system and antimalarial drugs, makes malaria an enduring burden to public health. Understanding the genetic basis of these adaptations is critical to intervening successfully against malaria. To that end, we created a high-density genotyping array that assays over 17,000 single nucleotide polymorphisms (? 1 SNP/kb), and applied it to 57 culture-adapted parasites from three continents. We characterized genome-wide genetic diversity within and between populations and identified numerous loci with signals of natural selection, suggesting their role in recent adaptation. In addition, we performed a genome-wide association study (GWAS), searching for loci correlated with resistance to thirteen antimalarials; we detected both known and novel resistance loci, including a new halofantrine resistance locus, PF10_0355. Through functional testing we demonstrated that PF10_0355 overexpression decreases sensitivity to halofantrine, mefloquine, and lumefantrine, but not to structurally unrelated antimalarials, and that increased gene copy number mediates resistance. Our GWAS and follow-on functional validation demonstrate the potential of genome-wide studies to elucidate functionally important loci in the malaria parasite genome.
In Cambodia, dengue virus (DENV) was first isolated in 1963 and has become endemic with peak epidemic during raining season. Since 2000, the Dengue National Control Program has reported from 10,000 to 40,000 cases per year with fatality rates ranging from 0.7 to 1.7. All four dengue serotypes are found circulating in Cambodia with alternative predominance of serotypes DENV-2 and DENV-3. The DENV-1 represents from 5% to 20% of all circulating viruses, depending upon the year. In this work, 79 clinical strains of DENV-1 were isolated between 2000 and 2009 and their genome fully sequenced. Four distinct lineages with different dynamics were identified. The main evolutionary drive was negative selective pressure but each lineage was characterized by the presence of specific mutations acquired through evolution. Coexistence, extinction and replacement of lineages occurred over the 10-year period. Lineages 1, 2 and 3 were all detected since 2000-2002 and disappeared in 2003, 2004-2005 and 2007, respectively. Lineages 1 and 2 displayed different dynamics. Lineage 1 was very diverse whereas lineage 2 was very homogeneous. Lineage 4 which derived from lineage 3 in 2003 remained the only one at the end of the sampling period in 2008-2009 owing to a selective sweep. The lineages dynamic of DENV-1 viruses and consequences for molecular epidemiology are discussed.
The Actinomycetales bacteria Rhodococcus opacus PD630 and Rhodococcus jostii RHA1 bioconvert a diverse range of organic substrates through lipid biosynthesis into large quantities of energy-rich triacylglycerols (TAGs). To describe the genetic basis of the Rhodococcus oleaginous metabolism, we sequenced and performed comparative analysis of the 9.27 Mb R. opacus PD630 genome. Metabolic-reconstruction assigned 2017 enzymatic reactions to the 8632 R. opacus PD630 genes we identified. Of these, 261 genes were implicated in the R. opacus PD630 TAGs cycle by metabolic reconstruction and gene family analysis. Rhodococcus synthesizes uncommon straight-chain odd-carbon fatty acids in high abundance and stores them as TAGs. We have identified these to be pentadecanoic, heptadecanoic, and cis-heptadecenoic acids. To identify bioconversion pathways, we screened R. opacus PD630, R. jostii RHA1, Ralstonia eutropha H16, and C. glutamicum 13032 for growth on 190 compounds. The results of the catabolic screen, phylogenetic analysis of the TAGs cycle enzymes, and metabolic product characterizations were integrated into a working model of prokaryotic oleaginy.
Dengue is one of the most important infectious diseases of humans and has spread throughout much of the tropical and subtropical world. Despite this widespread dispersal, the determinants of dengue transmission in endemic populations are not well understood, although essential for virus control. To address this issue we performed a phylogeographic analysis of 751 complete genome sequences of dengue 1 virus (DENV-1) sampled from both rural (Dong Thap) and urban (Ho Chi Minh City) populations in southern Viet Nam during the period 2003-2008. We show that DENV-1 in Viet Nam exhibits strong spatial clustering, with likely importation from Cambodia on multiple occasions. Notably, multiple lineages of DENV-1 co-circulated in Ho Chi Minh City. That these lineages emerged at approximately the same time and dispersed over similar spatial regions suggests that they are of broadly equivalent fitness. We also observed an important relationship between the density of the human host population and the dispersion rate of dengue, such that DENV-1 tends to move from urban to rural populations, and that densely populated regions within Ho Chi Minh City act as major transmission foci. Despite these fluid dynamics, the dispersion rates of DENV-1 are relatively low, particularly in Ho Chi Minh City where the virus moves less than an average of 20 km/year. These low rates suggest a major role for mosquito-mediated dispersal, such that DENV-1 does not need to move great distances to infect a new host when there are abundant susceptibles, and imply that control measures should be directed toward the most densely populated urban environments.
Bacterial diversity among environmental samples is commonly assessed with PCR-amplified 16S rRNA gene (16S) sequences. Perceived diversity, however, can be influenced by sample preparation, primer selection, and formation of chimeric 16S amplification products. Chimeras are hybrid products between multiple parent sequences that can be falsely interpreted as novel organisms, thus inflating apparent diversity. We developed a new chimera detection tool called Chimera Slayer (CS). CS detects chimeras with greater sensitivity than previous methods, performs well on short sequences such as those produced by the 454 Life Sciences (Roche) Genome Sequencer, and can scale to large data sets. By benchmarking CS performance against sequences derived from a controlled DNA mixture of known organisms and a simulated chimera set, we provide insights into the factors that affect chimera formation such as sequence abundance, the extent of similarity between 16S genes, and PCR conditions. Chimeras were found to reproducibly form among independent amplifications and contributed to false perceptions of sample diversity and the false identification of novel taxa, with less-abundant species exhibiting chimera rates exceeding 70%. Shotgun metagenomic sequences of our mock community appear to be devoid of 16S chimeras, supporting a role for shotgun metagenomics in validating novel organisms discovered in targeted sequence surveys.
To study the evolution of dengue virus (DENV) serotype 2 in Puerto Rico, we examined the genetic composition and diversity of 160 DENV-2 genomes obtained through 22 consecutive years of sampling. A clade replacement took place in 1994-1997 during a period of high incidence of autochthonous DENV-2 and frequent, short-lived reintroductions of foreign DENV-2. This unique clade replacement was complete just before DENV-3 emerged. By temporally and geographically defining DENV-2 lineages, we describe a refuge of this virus through 4 years of low genome diversity. Our analyses may explain the long-term endurance of DENV-2 despite great epidemiologic changes in disease incidence and serotype distribution.
PriSM is a set of algorithms designed to select and match degenerate primer pairs for the amplification of viral genomes. The design of panels of hundreds of primer pairs takes just hours using this program, compared with days using a manual approach. PriSM allows for rapid in silico optimization of primers for downstream applications such as sequencing. As a validation, PriSM was used to create an amplification primer panel for human immunodeficiency virus (HIV) Clade B.
Dengue is a pantropic public health problem. In children, dengue shock syndrome (DSS) is the most common life-threatening complication. The ability to predict which patients may develop DSS may improve triage and treatment. To this end, we conducted a nested case-control comparison of the early host transcriptional features in 24 DSS patients and 56 sex-, age-, and virus serotype-matched uncomplicated (UC) dengue patients. In the first instance, we defined the "early dengue" profile. The transcriptional signature in acute rather than convalescent samples (?72 h post-illness onset) was defined by an overabundance of interferon-inducible transcripts (31% of the 551 overabundant transcripts) and canonical gene ontology terms that included the following: response to virus, immune response, innate immune response, and inflammatory response. Pathway and network analyses identified STAT1, STAT2, STAT3, IRF7, IRF9, IRF1, CEBPB, and SP1 as key transcriptional factors mediating the early response. Strikingly, the only difference in the transcriptional signatures of early DSS and UC dengue cases was the greater abundance of several neutrophil-associated transcripts in patients who progressed to DSS, a finding supported by higher plasma concentrations of several canonical proteins associated with neutrophil degranulation (bactericidal/permeability-increasing protein [BPI], elastase 2 [ELA2], and defensin 1 alpha [DEF1A]). Elevated levels of neutrophil-associated transcripts were independent of the neutrophil count and also of the genotype of the infecting virus, as genome-length sequences of dengue virus serotype 1 (DENV-1) (n = 15) and DENV-2 (n = 3) sampled from DSS patients were phylogenetically indistinguishable from those sampled from uncomplicated dengue patients (32 DENV-1 and 9 DENV-2 sequences). Collectively, these data suggest a hitherto unrecognized association between neutrophil activation, pathogenesis, and the development of DSS and point to future strategies for guiding prognosis.
The mosquito Culex quinquefasciatus poses a substantial threat to human and veterinary health as a primary vector of West Nile virus (WNV), the filarial worm Wuchereria bancrofti, and an avian malaria parasite. Comparative phylogenomics revealed an expanded canonical C. quinquefasciatus immune gene repertoire compared with those of Aedes aegypti and Anopheles gambiae. Transcriptomic analysis of C. quinquefasciatus genes responsive to WNV, W. bancrofti, and non-native bacteria facilitated an unprecedented meta-analysis of 25 vector-pathogen interactions involving arboviruses, filarial worms, bacteria, and malaria parasites, revealing common and distinct responses to these pathogen types in three mosquito genera. Our findings provide support for the hypothesis that mosquito-borne pathogens have evolved to evade innate immune responses in three vector mosquito species of major medical importance.
Culex quinquefasciatus (the southern house mosquito) is an important mosquito vector of viruses such as West Nile virus and St. Louis encephalitis virus, as well as of nematodes that cause lymphatic filariasis. C. quinquefasciatus is one species within the Culex pipiens species complex and can be found throughout tropical and temperate climates of the world. The ability of C. quinquefasciatus to take blood meals from birds, livestock, and humans contributes to its ability to vector pathogens between species. Here, we describe the genomic sequence of C. quinquefasciatus: Its repertoire of 18,883 protein-coding genes is 22% larger than that of Aedes aegypti and 52% larger than that of Anopheles gambiae with multiple gene-family expansions, including olfactory and gustatory receptors, salivary gland genes, and genes associated with xenobiotic detoxification.
The mushroom Coprinopsis cinerea is a classic experimental model for multicellular development in fungi because it grows on defined media, completes its life cycle in 2 weeks, produces some 10(8) synchronized meiocytes, and can be manipulated at all stages in development by mutation and transformation. The 37-megabase genome of C. cinerea was sequenced and assembled into 13 chromosomes. Meiotic recombination rates vary greatly along the chromosomes, and retrotransposons are absent in large regions of the genome with low levels of meiotic recombination. Single-copy genes with identifiable orthologs in other basidiomycetes are predominant in low-recombination regions of the chromosome. In contrast, paralogous multicopy genes are found in the highly recombining regions, including a large family of protein kinases (FunK1) unique to multicellular fungi. Analyses of P450 and hydrophobin gene families confirmed that local gene duplications drive the expansions of paralogous copies and the expansions occur in independent lineages of Agaricomycotina fungi. Gene-expression patterns from microarrays were used to dissect the transcriptional program of dikaryon formation (mating). Several members of the FunK1 kinase family are differentially regulated during sexual morphogenesis, and coordinate regulation of adjacent duplications is rare. The genomes of C. cinerea and Laccaria bicolor, a symbiotic basidiomycete, share extensive regions of synteny. The largest syntenic blocks occur in regions with low meiotic recombination rates, no transposable elements, and tight gene spacing, where orthologous single-copy genes are overrepresented. The chromosome assembly of C. cinerea is an essential resource in understanding the evolution of multicellularity in the fungi.
We have sequenced the genomes of 18 isolates of the closely related human pathogenic fungi Coccidioides immitis and Coccidioides posadasii to more clearly elucidate population genomic structure, bringing the total number of sequenced genomes for each species to 10. Our data confirm earlier microsatellite-based findings that these species are genetically differentiated, but our population genomics approach reveals that hybridization and genetic introgression have recently occurred between the two species. The directionality of introgression is primarily from C. posadasii to C. immitis, and we find more than 800 genes exhibiting strong evidence of introgression in one or more sequenced isolates. We performed PCR-based sequencing of one region exhibiting introgression in 40 C. immitis isolates to confirm and better define the extent of gene flow between the species. We find more coding sequence than expected by chance in the introgressed regions, suggesting that natural selection may play a role in the observed genetic exchange. We find notable heterogeneity in repetitive sequence composition among the sequenced genomes and present the first detailed genome-wide profile of a repeat-induced point mutation (RIP) process distinctly different from what has been observed in Neurospora. We identify promiscuous HLA-I and HLA-II epitopes in both proteomes and discuss the possible implications of introgression and population genomic data for public health and vaccine candidate prioritization. This study highlights the importance of population genomic data for detecting subtle but potentially important phenomena such as introgression.
The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.
A better description of the extent and structure of genetic diversity in dengue virus (DENV) in endemic settings is central to its eventual control. To this end we determined the complete coding region sequence of 187 DENV-2 genomes and 68 E genes from viruses sampled from Vietnamese patients between 1995 and 2009. Strikingly, an episode of genotype replacement was observed, with Asian 1 lineage viruses entirely displacing the previously dominant Asian/American lineage viruses. This genotype replacement event also seems to have occurred within DENV-2 in Thailand and Cambodia, suggestive of a major difference in viral fitness. To determine the cause of this major evolutionary event we compared both the infectivity of the Asian 1 and Asian/American genotypes in mosquitoes and their viraemia levels in humans. Although there was little difference in infectivity in mosquitoes, we observed significantly higher plasma viraemia levels in paediatric patients infected with Asian 1 lineage viruses relative to Asian/American viruses, a phenotype that is predicted to result in a higher probability of human-to-mosquito transmission. These results provide a mechanistic basis to a marked change in the genetic structure of DENV-2 and more broadly underscore that an understanding of DENV evolutionary dynamics can inform the development of vaccines and anti-viral drugs.
The enterococci are low-GC Gram-positive bacteria that have emerged as leading causes of hospital-acquired infection. They are also commensals of the gastrointestinal tract of healthy humans and most other animals with gastrointestinal flora and are important for food fermentations. Here we report the availability of draft genome sequences for 28 enterococcal strains of diverse origin, including the species Enterococcus faecalis, E. faecium, E. casseliflavus, and E. gallinarum.
The sequence of Saccharomyces cerevisiae enabled systematic genome-wide experimental approaches, demonstrating the power of having the complete genome of an organism. The rapid impact of these methods on research in yeast mobilized an effort to expand genomic resources for other fungi. The "fungal genome initiative" represents an organized genome sequencing effort to promote comparative and evolutionary studies across the fungal kingdom. Through such an approach, scientists can not only better understand specific organisms but also illuminate the shared and unique aspects of fungal biology that underlie the importance of fungi in biomedical research, health, food production, and industry. To date, assembled genomes for over 100 fungi are available in public databases, and many more sequencing projects are underway. Here, we discuss both examples of findings from comparative analysis of fungal sequences, with a specific emphasis on yeast genomes, and on the analytical approaches taken to mine fungal genomes. New sequencing methods are accelerating comparative studies of fungi by reducing the cost and difficulty of sequencing. This has driven more common use of sequencing applications, such as to study genome-wide variation in populations or to deeply profile RNA transcripts. These and further technological innovations will continue to be piloted in yeasts and other fungi, and will expand the applications of sequencing to study fungal biology.
Understanding the fine-structure molecular architecture of bacterial epidemics has been a long-sought goal of infectious disease research. We used short-read-length DNA sequencing coupled with mass spectroscopy analysis of SNPs to study the molecular pathogenomics of three successive epidemics of invasive infections involving 344 serotype M3 group A Streptococcus in Ontario, Canada. Sequencing the genome of 95 strains from the three epidemics, coupled with analysis of 280 biallelic SNPs in all 344 strains, revealed an unexpectedly complex population structure composed of a dynamic mixture of distinct clonally related complexes. We discovered that each epidemic is dominated by micro- and macrobursts of multiple emergent clones, some with distinct strain genotype-patient phenotype relationships. On average, strains were differentiated from one another by only 49 SNPs and 11 insertion-deletion events (indels) in the core genome. Ten percent of SNPs are strain specific; that is, each strain has a unique genome sequence. We identified nonrandom temporal-spatial patterns of strain distribution within and between the epidemic peaks. The extensive full-genome data permitted us to identify genes with significantly increased rates of nonsynonymous (amino acid-altering) nucleotide polymorphisms, thereby providing clues about selective forces operative in the host. Comparative expression microarray analysis revealed that closely related strains differentiated by seemingly modest genetic changes can have significantly divergent transcriptomes. We conclude that enhanced understanding of bacterial epidemics requires a deep-sequencing, geographically centric, comparative pathogenomics strategy.
Bacterial viruses (phages) play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage.
While most Ascomycetes tend to associate principally with plants, the dimorphic fungi Coccidioides immitis and Coccidioides posadasii are primary pathogens of immunocompetent mammals, including humans. Infection results from environmental exposure to Coccidiodies, which is believed to grow as a soil saprophyte in arid deserts. To investigate hypotheses about the life history and evolution of Coccidioides, the genomes of several Onygenales, including C. immitis and C. posadasii; a close, nonpathogenic relative, Uncinocarpus reesii; and a more diverged pathogenic fungus, Histoplasma capsulatum, were sequenced and compared with those of 13 more distantly related Ascomycetes. This analysis identified increases and decreases in gene family size associated with a host/substrate shift from plants to animals in the Onygenales. In addition, comparison among Onygenales genomes revealed evolutionary changes in Coccidioides that may underlie its infectious phenotype, the identification of which may facilitate improved treatment and prevention of coccidioidomycosis. Overall, the results suggest that Coccidioides species are not soil saprophytes, but that they have evolved to remain associated with their dead animal hosts in soil, and that Coccidioides metabolism genes, membrane-related proteins, and putatively antigenic compounds have evolved in response to interaction with an animal host.
Single-cell genome sequencing has the potential to allow the in-depth exploration of the vast genetic diversity found in uncultured microbes. We used the marine cyanobacterium Prochlorococcus as a model system for addressing important challenges facing high-throughput whole genome amplification (WGA) and complete genome sequencing of individual cells.
We report the discovery and validation of a set of single nucleotide polymorphisms (SNPs) between the reference Neurospora crassa strain Oak Ridge and the Mauriceville strain (FGSC 2555), of sufficient density to allow fine mapping of most loci. Sequencing of Mauriceville cDNAs and alignment to the completed genomic sequence of the Oak Ridge strain identified 19,087 putative SNPs. Of these, a subset was validated by cleaved amplified polymorphic sequence (CAPS), a simple and robust PCR-based assay that reliably distinguishes between SNP alleles. Experimental confirmation resulted in the development of 250 CAPS markers distributed evenly over the genome. To demonstrate the applicability of this map, we used bulked segregant analysis followed by interval mapping to locate the csp-1 mutation to a narrow region on LGI. Subsequently, we refined mapping resolution to 74 kbp by developing additional markers, resequenced the candidate gene, NCU02713.3, in the mutant background, and phenocopied the mutation by gene replacement in the WT strain. Together, these techniques demonstrate a generally applicable and straightforward approach for the isolation of novel genes from existing mutants. Data on both putative and validated SNPs are deposited in a customized public database at the Broad Institute, which encourages augmentation by community users.
Tularemia is a geographically widespread, severely debilitating, and occasionally lethal disease in humans. It is caused by infection by a gram-negative bacterium, Francisella tularensis. In order to better understand its potency as an etiological agent as well as its potential as a biological weapon, we have completed draft assemblies and report the first complete genomic characterization of five strains belonging to the following different Francisella subspecies (subsp.): the F. tularensis subsp. tularensis FSC033, F. tularensis subsp. holarctica FSC257 and FSC022, and F. tularensis subsp. novicida GA99-3548 and GA99-3549 strains. Here, we report the sequencing of these strains and comparative genomic analysis with recently available public Francisella sequences, including the rare F. tularensis subsp. mediasiatica FSC147 strain isolate from the Central Asian Region. We report evidence for the occurrence of large-scale rearrangement events in strains of the holarctica subspecies, supporting previous proposals that further phylogenetic subdivisions of the Type B clade are likely. We also find a significant enrichment of disrupted or absent ORFs proximal to predicted breakpoints in the FSC022 strain, including a genetic component of the Type I restriction-modification defense system. Many of the pseudogenes identified are also disrupted in the closely related rarely human pathogenic F. tularensis subsp. mediasiatica FSC147 strain, including modulator of drug activity B (mdaB) (FTT0961), which encodes a known NADPH quinone reductase involved in oxidative stress resistance. We have also identified genes exhibiting sequence similarity to effectors of the Type III (T3SS) and components of the Type IV secretion systems (T4SS). One of the genes, msrA2 (FTT1797c), is disrupted in F. tularensis subsp. mediasiatica and has recently been shown to mediate bacterial pathogen survival in host organisms. Our findings suggest that in addition to the duplication of the Francisella Pathogenicity Island, and acquisition of individual loci, adaptation by gene loss in the more recently emerged tularensis, holarctica, and mediasiatica subspecies occurred and was distinct from evolutionary events that differentiated these subspecies, and the novicida subspecies, from a common ancestor. Our findings are applicable to future studies focused on variations in Francisella subspecies pathogenesis, and of broader interest to studies of genomic pathoadaptation in bacteria.
Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.
Rhizopus oryzae is the primary cause of mucormycosis, an emerging, life-threatening infection characterized by rapid angioinvasive growth with an overall mortality rate that exceeds 50%. As a representative of the paraphyletic basal group of the fungal kingdom called "zygomycetes," R. oryzae is also used as a model to study fungal evolution. Here we report the genome sequence of R. oryzae strain 99-880, isolated from a fatal case of mucormycosis. The highly repetitive 45.3 Mb genome assembly contains abundant transposable elements (TEs), comprising approximately 20% of the genome. We predicted 13,895 protein-coding genes not overlapping TEs, many of which are paralogous gene pairs. The order and genomic arrangement of the duplicated gene pairs and their common phylogenetic origin provide evidence for an ancestral whole-genome duplication (WGD) event. The WGD resulted in the duplication of nearly all subunits of the protein complexes associated with respiratory electron transport chains, the V-ATPase, and the ubiquitin-proteasome systems. The WGD, together with recent gene duplications, resulted in the expansion of multiple gene families related to cell growth and signal transduction, as well as secreted aspartic protease and subtilase protein families, which are known fungal virulence factors. The duplication of the ergosterol biosynthetic pathway, especially the major azole target, lanosterol 14alpha-demethylase (ERG11), could contribute to the variable responses of R. oryzae to different azole drugs, including voriconazole and posaconazole. Expanded families of cell-wall synthesis enzymes, essential for fungal cell integrity but absent in mammalian hosts, reveal potential targets for novel and R. oryzae-specific diagnostic and therapeutic treatments.
Our variant ascertainment algorithm, VAAL, uses massively parallel DNA sequence data to identify differences between bacterial genomes with high sensitivity and specificity. VAAL detected approximately 98% of differences (including large insertion-deletions) between pairs of strains from three species while calling no false positives. VAAL also pinpointed a single mutation between Vibrio cholerae genomes, identifying an antibiotics site of action by identifying sequence differences between drug-sensitive strains and drug-resistant derivatives.
High-throughput sequencing of cDNA libraries (RNA-Seq) has proven to be a highly effective approach for studying bacterial transcriptomes. A central challenge in designing RNA-Seq-based experiments is estimating a priori the number of reads per sample needed to detect and quantify thousands of individual transcripts with a large dynamic range of abundance.
?Infection with hepatitis C virus (HCV) is a burgeoning worldwide public health problem, with 170 million infected individuals and an estimated 20 million deaths in the coming decades. While 6 main genotypes generally distinguish the global geographic diversity of HCV, a multitude of closely related subtypes within these genotypes are poorly defined and may influence clinical outcome and treatment options. Unfortunately, the paucity of genetic data from many of these subtypes makes time-consuming primer walking the limiting step for sequencing understudied subtypes.
The major cause of athletes foot is Trichophyton rubrum, a dermatophyte or fungal pathogen of human skin. To facilitate molecular analyses of the dermatophytes, we sequenced T. rubrum and four related species, Trichophyton tonsurans, Trichophyton equinum, Microsporum canis, and Microsporum gypseum. These species differ in host range, mating, and disease progression. The dermatophyte genomes are highly colinear yet contain gene family expansions not found in other human-associated fungi. Dermatophyte genomes are enriched for gene families containing the LysM domain, which binds chitin and potentially related carbohydrates. These LysM domains differ in sequence from those in other species in regions of the peptide that could affect substrate binding. The dermatophytes also encode novel sets of fungus-specific kinases with unknown specificity, including nonfunctional pseudokinases, which may inhibit phosphorylation by competing for kinase sites within substrates, acting as allosteric effectors, or acting as scaffolds for signaling. The dermatophytes are also enriched for a large number of enzymes that synthesize secondary metabolites, including dermatophyte-specific genes that could synthesize novel compounds. Finally, dermatophytes are enriched in several classes of proteases that are necessary for fungal growth and nutrient acquisition on keratinized tissues. Despite differences in mating ability, genes involved in mating and meiosis are conserved across species, suggesting the possibility of cryptic mating in species where it has not been previously detected. These genome analyses identify gene families that are important to our understanding of how dermatophytes cause chronic infections, how they interact with epithelial cells, and how they respond to the host immune response.
Colletotrichum species are fungal pathogens that devastate crop plants worldwide. Host infection involves the differentiation of specialized cell types that are associated with penetration, growth inside living host cells (biotrophy) and tissue destruction (necrotrophy). We report here genome and transcriptome analyses of Colletotrichum higginsianum infecting Arabidopsis thaliana and Colletotrichum graminicola infecting maize. Comparative genomics showed that both fungi have large sets of pathogenicity-related genes, but families of genes encoding secreted effectors, pectin-degrading enzymes, secondary metabolism enzymes, transporters and peptidases are expanded in C. higginsianum. Genome-wide expression profiling revealed that these genes are transcribed in successive waves that are linked to pathogenic transitions: effectors and secondary metabolism enzymes are induced before penetration and during biotrophy, whereas most hydrolases and transporters are upregulated later, at the switch to necrotrophy. Our findings show that preinvasion perception of plant-derived signals substantially reprograms fungal gene expression and indicate previously unknown functions for particular fungal cell types.
We sequenced and annotated the genomes of four P. vivax strains collected from disparate geographic locations, tripling the number of genome sequences available for this understudied parasite and providing the first genome-wide perspective of global variability in this species. We observe approximately twice as much SNP diversity among these isolates as we do among a comparable collection of isolates of P. falciparum, a malaria-causing parasite that results in higher mortality. This indicates a distinct history of global colonization and/or a more stable demographic history for P. vivax relative to P. falciparum, which is thought to have undergone a recent population bottleneck. The SNP diversity, as well as additional microsatellite and gene family variability, suggests a capacity for greater functional variation in the global population of P. vivax. These findings warrant a deeper survey of variation in P. vivax to equip disease interventions targeting the distinctive biology of this neglected but major pathogen.
The goal of the Human Microbiome Project (HMP) is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S) sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMPs 16S data sets to several reference 16S collections to create a most wanted list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the most wanted, and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the most wanted organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.
The dengue virus serotype 3 (DENV-3) Indian subcontinent strain emerged in Puerto Rico in 1998 after a 21-year absence. The rapid expansion of DENV-3 on the island correlated with the withdrawal of the other serotypes for 7 years. The DENV-3 prevalence declined in 2008 and remains undetected.
The Dengue National Control Program was established in Cambodia in 2000 and has reported between 10,000 and 40,000 dengue cases per year with a case fatality rate ranging from 0.7 to 1.7. In this study 39 DENV-2 and 57 DENV-3 viruses isolated from patients between 2000 and 2008 were fully sequenced. Five DENV2 and four DENV3 distinct lineages with different dynamics were identified. Each lineage was characterized by the presence of specific mutations with no evidence of recombination. In both DENV-2 and DENV-3 the lineages present prior to 2003 were replaced after that date by unrelated lineages. After 2003, DENV-2 lineages D2-3 and D2-4 cocirculated until 2007 when they were almost completely replaced by a lineage D2-5 which emerged from D2-3 Conversely, all DENV-3 lineages remained, diversified and cocirculated with novel lineages emerging. Years 2006 and 2007 were marked by a high prevalence of DENV-3 and 2007 with a large dengue outbreak and a high proportion of patients with severe disease. Selective sweeps in DENV-1 and DENV-2 were linked to immunological escape to a predominately DENV-3-driven immunological response. The complex dynamic of dengue in Cambodia in the last ten years has been associated with a combination of stochastic climatic events, cocirculation, coevolution, adaptation to different vector populations, and with the human population immunological landscape.
Methicillin-resistant Staphylococcus aureus (MRSA) strains are leading causes of hospital-acquired infections in the United States, and clonal cluster 5 (CC5) is the predominant lineage responsible for these infections. Since 2002, there have been 12 cases of vancomycin-resistant S. aureus (VRSA) infection in the United States-all CC5 strains. To understand this genetic background and what distinguishes it from other lineages, we generated and analyzed high-quality draft genome sequences for all available VRSA strains. Sequence comparisons show unambiguously that each strain independently acquired Tn1546 and that all VRSA strains last shared a common ancestor over 50 years ago, well before the occurrence of vancomycin resistance in this species. In contrast to existing hypotheses on what predisposes this lineage to acquire Tn1546, the barrier posed by restriction systems appears to be intact in most VRSA strains. However, VRSA (and other CC5) strains were found to possess a constellation of traits that appears to be optimized for proliferation in precisely the types of polymicrobic infection where transfer could occur. They lack a bacteriocin operon that would be predicted to limit the occurrence of non-CC5 strains in mixed infection and harbor a cluster of unique superantigens and lipoproteins to confound host immunity. A frameshift in dprA, which in other microbes influences uptake of foreign DNA, may also make this lineage conducive to foreign DNA acquisition.
We have developed a process for transcriptome analysis of bacterial communities that accommodates both intact and fragmented starting RNA and combines efficient rRNA removal with strand-specific RNA-seq. We applied this approach to an RNA mixture derived from three diverse cultured bacterial species and to RNA isolated from clinical stool samples. The resulting expression profiles were highly reproducible, enriched up to 40-fold for non-rRNA transcripts, and correlated well with profiles representing undepleted total RNA.
Viruses diversify over time within hosts, often undercutting the effectiveness of host defenses and therapeutic interventions. To design successful vaccines and therapeutics, it is critical to better understand viral diversification, including comprehensively characterizing the genetic variants in viral intra-host populations and modeling changes from transmission through the course of infection. Massively parallel sequencing technologies can overcome the cost constraints of older sequencing methods and obtain the high sequence coverage needed to detect rare genetic variants (< 1%) within an infected host, and to assay variants without prior knowledge. Critical to interpreting deep sequence data sets is the ability to distinguish biological variants from process errors with high sensitivity and specificity. To address this challenge, we describe V-Phaser, an algorithm able to recognize rare biological variants in mixed populations. V-Phaser uses covariation (i.e. phasing) between observed variants to increase sensitivity and an expectation maximization algorithm that iteratively recalibrates base quality scores to increase specificity. Overall, V-Phaser achieved > 97% sensitivity and > 97% specificity on control read sets. On data derived from a patient after four years of HIV-1 infection, V-Phaser detected 2,015 variants across the -10 kb genome, including 603 rare variants (< 1% frequency) detected only using phase information. V-Phaser identified variants at frequencies down to 0.2%, comparable to the detection threshold of allele-specific PCR, a method that requires prior knowledge of the variants. The high sensitivity and specificity of V-Phaser enables identifying and tracking changes in low frequency variants in mixed populations such as RNA viruses.
Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained regions of the virus in order to ensure the maintenance of immunodominant CD8 responses and the sustained decline of early viremia.
The enterococci are Gram-positive lactic acid bacteria that inhabit the gastrointestinal tracts of diverse hosts. However, Enterococcus faecium and E. faecalis have emerged as leading causes of multidrug-resistant hospital-acquired infections. The mechanism by which a well-adapted commensal evolved into a hospital pathogen is poorly understood. In this study, we examined high-quality draft genome data for evidence of key events in the evolution of the leading causes of enterococcal infections, including E. faecalis, E. faecium, E. casseliflavus, and E. gallinarum. We characterized two clades within what is currently classified as E. faecium and identified traits characteristic of each, including variation in operons for cell wall carbohydrate and putative capsule biosynthesis. We examined the extent of recombination between the two E. faecium clades and identified two strains with mosaic genomes. We determined the underlying genetics for the defining characteristics of the motile enterococci E. casseliflavus and E. gallinarum. Further, we identified species-specific traits that could be used to advance the detection of medically relevant enterococci and their identification to the species level.
Mycobacterium tuberculosis, the causative agent of most human tuberculosis, infects one third of the worlds population and kills an estimated 1.7 million people a year. With the world-wide emergence of drug resistance, and the finding of more functional genetic diversity than previously expected, there is a renewed interest in understanding the forces driving genome evolution of this important pathogen. Genetic diversity in M. tuberculosis is dominated by single nucleotide polymorphisms and small scale gene deletion, with little or no evidence for large scale genome rearrangements seen in other bacteria. Recently, a single report described a large scale genome duplication that was suggested to be specific to the Beijing lineage. We report here multiple independent large-scale duplications of the same genomic region of M. tuberculosis detected through whole-genome sequencing. The duplications occur in strains belonging to both M. tuberculosis lineage 2 and 4, and are thus not limited to Beijing strains. The duplications occur in both drug-resistant and drug susceptible strains. The duplicated regions also have substantially different boundaries in different strains, indicating different originating duplication events. We further identify a smaller segmental duplication of a different genomic region of a lab strain of H37Rv. The presence of multiple independent duplications of the same genomic region suggests either instability in this region, a selective advantage conferred by the duplication, or both. The identified duplications suggest that large-scale gene duplication may be more common in M. tuberculosis than previously considered.
The degree to which molecular epidemiology reveals information about the sources and transmission patterns of an outbreak depends on the resolution of the technology used and the samples studied. Isolates of Escherichia coli O104:H4 from the outbreak centered in Germany in May-July 2011, and the much smaller outbreak in southwest France in June 2011, were indistinguishable by standard tests. We report a molecular epidemiological analysis using multiplatform whole-genome sequencing and analysis of multiple isolates from the German and French outbreaks. Isolates from the German outbreak showed remarkably little diversity, with only two single nucleotide polymorphisms (SNPs) found in isolates from four individuals. Surprisingly, we found much greater diversity (19 SNPs) in isolates from seven individuals infected in the French outbreak. The German isolates form a clade within the more diverse French outbreak strains. Moreover, five isolates derived from a single infected individual from the French outbreak had extremely limited diversity. The striking difference in diversity between the German and French outbreak samples is consistent with several hypotheses, including a bottleneck that purged diversity in the German isolates, variation in mutation rates in the two E. coli outbreak populations, or uneven distribution of diversity in the seed populations that led to each outbreak.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.