Although many de novo genome assembly projects have recently been conducted using high-throughput sequencers, assembling highly heterozygous diploid genomes is a substantial challenge due to the increased complexity of the de Bruijn graph structure predominantly used. To address the increasing demand for sequencing of nonmodel and/or wild-type samples, in most cases inbred lines or fosmid-based hierarchical sequencing methods are used to overcome such problems. However, these methods are costly and time consuming, forfeiting the advantages of massive parallel sequencing. Here, we describe a novel de novo assembler, Platanus, that can effectively manage high-throughput data from heterozygous samples. Platanus assembles DNA fragments (reads) into contigs by constructing de Bruijn graphs with automatically optimized k-mer sizes followed by the scaffolding of contigs based on paired-end information. The complicated graph structures that result from the heterozygosity are simplified during not only the contig assembly step but also the scaffolding step. We evaluated the assembly results on eukaryotic samples with various levels of heterozygosity. Compared with other assemblers, Platanus yields assembly results that have a larger scaffold NG50 length without any accompanying loss of accuracy in both simulated and real data. In addition, Platanus recorded the largest scaffold NG50 values for two of the three low-heterozygosity species used in the de novo assembly contest, Assemblathon 2. Platanus therefore provides a novel and efficient approach for the assembly of gigabase-sized highly heterozygous genomes and is an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity.
Coelacanths are known as "living fossils," as they show remarkable morphological resemblance to the fossil record and belong to the most primitive lineage of living Sarcopterygii (lobe-finned fishes and tetrapods). Coelacanths may be key to elucidating the tempo and mode of evolution from fish to tetrapods. Here, we report the genome sequences of five coelacanths, including four Latimeria chalumnae individuals (three specimens from Tanzania and one from Comoros) and one L. menadoensis individual from Indonesia. These sequences cover two African breeding populations and two known extant coelacanth species. The genome is ?2.74 Gbp and contains a high proportion (?60%) of repetitive elements. The genetic diversity among the individuals was extremely low, suggesting a small population size and/or a slow rate of evolution. We found a substantial number of genes that encode olfactory and pheromone receptors with features characteristic of tetrapod receptors for the detection of airborne ligands. We also found that limb enhancers of bmp7 and gli3, both of which are essential for limb formation, are conserved between coelacanth and tetrapods, but not ray-finned fishes. We expect that some tetrapod-like genes may have existed early in the evolution of primitive Sarcopterygii and were later co-opted to adapt to terrestrial environments. These coelacanth genomes will provide a cornerstone for studies to elucidate how ancestral aquatic vertebrates evolved into terrestrial animals.
Commonly used classical inbred mouse strains have mosaic genomes with sequences from different subspecific origins. Their genomes are derived predominantly from the Western European subspecies Mus musculus domesticus, with the remaining sequences derived mostly from the Japanese subspecies Mus musculus molossinus. However, it remains unknown how this intersubspecific genome introgression occurred during the establishment of classical inbred strains. In this study, we resequenced the genomes of two M. m. molossinus-derived inbred strains, MSM/Ms and JF1/Ms. MSM/Ms originated from Japanese wild mice, and the ancestry of JF1/Ms was originally found in Europe and then transferred to Japan. We compared the characteristics of these sequences to those of the C57BL/6J reference sequence and the recent data sets from the resequencing of 17 inbred strains in the Mouse Genome Project (MGP), and the results unequivocally show that genome introgression from M. m. molossinus into M. m. domesticus provided the primary framework for the mosaic genomes of classical inbred strains. Furthermore, the genomes of C57BL/6J and other classical inbred strains have long consecutive segments with extremely high similarity (>99.998%) to the JF1/Ms strain. In the early 20th century, Japanese waltzing mice with a morphological phenotype resembling that of JF1/Ms mice were often crossed with European fancy mice for early studies of "Mendelism," which suggests that the ancestor of the extant JF1/Ms strain provided the origin of the M. m. molossinus genome in classical inbred strains and largely contributed to its intersubspecific genome diversity.
Volvocalean green algae have among the most diverse mitochondrial and plastid DNAs (mtDNAs and ptDNAs) from the eukaryotic domain. However, nearly all of the organelle genome data from this group are restricted to unicellular species, like Chlamydomonas reinhardtii, and presently only one multicellular species, the ?4,000-celled Volvox carteri, has had its organelle DNAs sequenced. The V. carteri organelle genomes are repeat rich, and the ptDNA is the largest plastome ever sequenced. Here, we present the complete mtDNA and ptDNA of the colonial volvocalean Gonium pectorale, which is comprised of ?16 cells and occupies a phylogenetic position closer to that of V. carteri than C. reinhardtii within the volvocine line. The mtDNA and ptDNA of G. pectorale are circular-mapping AT-rich molecules with respective lengths and coding densities of 16 and 222.6 kilobases and 73 and 44%. They share some features with the organelle DNAs of V. carteri, including palindromic repeats within the plastid compartment, but show more similarities with those of C. reinhardtii, such as a compact mtDNA architecture and relatively low organelle DNA intron contents. Overall, the G. pectorale organelle genomes raise several interesting questions about the origin of linear mitochondrial chromosomes within the Volvocales and the relationship between multicellularity and organelle genome expansion.
We conducted genome sequencing of the filamentous fungus Aspergillus sojae NBRC4239 isolated from the koji used to prepare Japanese soy sauce. We used the 454 pyrosequencing technology and investigated the genome with respect to enzymes and secondary metabolites in comparison with other Aspergilli sequenced. Assembly of 454 reads generated a non-redundant sequence of 39.5-Mb possessing 13 033 putative genes and 65 scaffolds composed of 557 contigs. Of the 2847 open reading frames with Pfam domain scores of >150 found in A. sojae NBRC4239, 81.7% had a high degree of similarity with the genes of A. oryzae. Comparative analysis identified serine carboxypeptidase and aspartic protease genes unique to A. sojae NBRC4239. While A. oryzae possessed three copies of ?-amyalse gene, A. sojae NBRC4239 possessed only a single copy. Comparison of 56 gene clusters for secondary metabolites between A. sojae NBRC4239 and A. oryzae revealed that 24 clusters were conserved, whereas 32 clusters differed between them that included a deletion of 18 508 bp containing mfs1, mao1, dmaT, and pks-nrps for the cyclopiazonic acid (CPA) biosynthesis, explaining the no productivity of CPA in A. sojae. The A. sojae NBRC4239 genome data will be useful to characterize functional features of the koji moulds used in Japanese industries.
A nearly complete genome sequence of Candidatus Acetothermum autotrophicum, a presently uncultivated bacterium in candidate division OP1, was revealed by metagenomic analysis of a subsurface thermophilic microbial mat community. Phylogenetic analysis based on the concatenated sequences of proteins common among 367 prokaryotes suggests that Ca. A. autotrophicum is one of the earliest diverging bacterial lineages. It possesses a folate-dependent Wood-Ljungdahl (acetyl-CoA) pathway of CO(2) fixation, is predicted to have an acetogenic lifestyle, and possesses the newly discovered archaeal-autotrophic type of bifunctional fructose 1,6-bisphosphate aldolase/phosphatase. A phylogenetic analysis of the core gene cluster of the acethyl-CoA pathway, shared by acetogens, methanogens, some sulfur- and iron-reducers and dechlorinators, supports the hypothesis that the core gene cluster of Ca. A. autotrophicum is a particularly ancient bacterial pathway. The habitat, physiology and phylogenetic position of Ca. A. autotrophicum support the view that the first bacterial and archaeal lineages were H(2)-dependent acetogens and methanogenes living in hydrothermal environments.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.