Staphylococcus aureus subsp. aureus ATCC 25923 is commonly used as a control strain for susceptibility testing to antibiotics and as a quality control strain for commercial products. We present the completed genome sequence for the strain, consisting of the chromosome and a 27.5-kb plasmid.
The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.
The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible.
The Bacillus anthracis Carbosap genome, which includes the pXO1 and pXO2 plasmids, has been shown to encode the major B. anthracis virulence factors, yet this strains attenuation has not yet been explained. Here we report the draft genome sequence of this strain, and a comparison to fully virulent B. anthracis.
We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS.
Repetitive DNA sequences are abundant in a broad range of species, from bacteria to mammals, and they cover nearly half of the human genome. Repeats have always presented technical challenges for sequence alignment and assembly programs. Next-generation sequencing projects, with their short read lengths and high data volumes, have made these challenges more difficult. From a computational perspective, repeats create ambiguities in alignment and assembly, which, in turn, can produce biases and errors when interpreting results. Simply ignoring repeats is not an option, as this creates problems of its own and may mean that important biological phenomena are missed. We discuss the computational problems surrounding repeats and describe strategies used by current bioinformatics systems to solve them.
Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources.
A major goal of metagenomics is to characterize the microbial composition of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from metagenomic shotgun sequencing data by matching individual reads against a database of reference sequences. One major limitation of prior computational methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels.
Late Pleistocene North America hosted at least two divergent and ecologically distinct species of mammoth: the periglacial woolly mammoth (Mammuthus primigenius) and the subglacial Columbian mammoth (Mammuthus columbi). To date, mammoth genetic research has been entirely restricted to woolly mammoths, rendering their genetic evolution difficult to contextualize within broader Pleistocene paleoecology and biogeography. Here, we take an interspecific approach to clarifying mammoth phylogeny by targeting Columbian mammoth remains for mitogenomic sequencing.
A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including the lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality.
Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes--xenologs--persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes--paralogs--are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein-protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of lifes biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.
Pairwise local sequence alignment methods have been the prevailing technique to identify homologous nucleotides between related species. However, existing methods that identify and align all homologous nucleotides in one or more genomes have suffered from poor scalability and limited accuracy. We propose a novel method that couples a gapped extension heuristic with an efficient filtration method for identifying interspersed repeats in genome sequences. During gapped extension, we use the MUSCLE implementation of progressive global multiple alignment with iterative refinement. The resulting gapped extensions potentially contain alignments of unrelated sequence. We detect and remove such undesirable alignments using a hidden Markov model (HMM) to predict the posterior probability of homology. The HMM emission frequencies for nucleotide substitutions can be derived from any time-reversible nucleotide substitution matrix. We evaluate the performance of our method and previous approaches on a hybrid data set of real genomic DNA with simulated interspersed repeats. Our method outperforms a related method in terms of sensitivity, positive predictive value, and localizing boundaries of homology. The described methods have been implemented in freely available software, Repeatoire, available from: http://wwwabi.snv.jussieu.fr/public/Repeatoire.
DNA repeats are causes and consequences of genome plasticity. Repeats are created by intrachromosomal recombination or horizontal transfer. They are targeted by recombination processes leading to amplifications, deletions and rearrangements of genetic material. The identification and analysis of repeats in nearly 700 genomes of bacteria and archaea is facilitated by the existence of sequence data and adequate bioinformatic tools. These have revealed the immense diversity of repeats in genomes, from those created by selfish elements to the ones used for protection against selfish elements, from those arising from transient gene amplifications to the ones leading to stable duplications. Experimental works have shown that some repeats do not carry any adaptive value, while others allow functional diversification and increased expression. All repeats carry some potential to disorganize and destabilize genomes. Because recombination and selection for repeats vary between genomes, the number and types of repeats are also quite diverse and in line with ecological variables, such as host-dependent associations or population sizes, and with genetic variables, such as the recombination machinery. From an evolutionary point of view, repeats represent both opportunities and problems. We describe how repeats are created and how they can be found in genomes. We then focus on the functional and genomic consequences of repeats that dictate their fate.
ABSTRACT: A growing resource of methicillin-resistant Staphylococcus aureus (MRSA) genomes uncovers intriguing phylogeographic and recombination patterns and highlights challenges in identifying the source of these phenomena.
The oral microbiome, the complex ecosystem of microbes inhabiting the human mouth, harbors several thousands of bacterial types. The proliferation of pathogenic bacteria within the mouth gives rise to periodontitis, an inflammatory disease known to also constitute a risk factor for cardiovascular disease. While much is known about individual species associated with pathogenesis, the system-level mechanisms underlying the transition from health to disease are still poorly understood. Through the sequencing of the 16S rRNA gene and of whole community DNA we provide a glimpse at the global genetic, metabolic, and ecological changes associated with periodontitis in 15 subgingival plaque samples, four from each of two periodontitis patients, and the remaining samples from three healthy individuals. We also demonstrate the power of whole-metagenome sequencing approaches in characterizing the genomes of key players in the oral microbiome, including an unculturable TM7 organism. We reveal the disease microbiome to be enriched in virulence factors, and adapted to a parasitic lifestyle that takes advantage of the disrupted host homeostasis. Furthermore, diseased samples share a common structure that was not found in completely healthy samples, suggesting that the disease state may occupy a narrow region within the space of possible configurations of the oral microbiome. Our pilot study demonstrates the power of high-throughput sequencing as a tool for understanding the role of the oral microbiome in periodontal disease. Despite a modest level of sequencing (~2 lanes Illumina 76 bp PE) and high human DNA contamination (up to ~90%) we were able to partially reconstruct several oral microbes and to preliminarily characterize some systems-level differences between the healthy and diseased oral microbiomes.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.