A metabolome--the collection of comprehensive quantitative data on metabolites in an organism--has been increasingly utilized for applications such as data-intensive systems biology, disease diagnostics, biomarker discovery, and assessment of food quality. A considerable number of tools and databases have been developed to date for the analysis of data generated by various combinations of chromatography and mass spectrometry. We report here a web portal named KOMICS (The Kazusa Metabolomics Portal), where the tools and databases that we developed are available for free to academic users. KOMICS includes the tools and databases for preprocessing, mining, visualization, and publication of metabolomics data. Improvements in the annotation of unknown metabolites and dissemination of comprehensive metabolomic data are the primary aims behind the development of this portal. For this purpose, PowerGet and FragmentAlign include a manual curation function for the results of metabolite feature alignments. A metadata-specific wiki-based database, Metabolonote, functions as a hub of web resources related to the submitters' work. This feature is expected to increase citation of the submitters' work, thereby promoting data publication. As an example of the practical use of KOMICS, a workflow for a study on Jatropha curcas is presented. The tools and databases available at KOMICS should contribute to enhanced production, interpretation, and utilization of metabolomic Big Data.
Correlations of gene-to-gene co-expression and metabolite-to-metabolite co-accumulation calculated from large amounts of transcriptome and metabolome data are useful for uncovering unknown functions of genes, functional diversities of gene family members and regulatory mechanisms of metabolic pathway flows. Many databases and tools are available to interpret quantitative transcriptome and metabolome data, but there are only limited ones that connect correlation data to biological knowledge and can be utilized to find biological significance of it. We report here a new metabolic pathway database, KaPPA-View4 (http://kpv.kazusa.or.jp/kpv4/), which is able to overlay gene-to-gene and/or metabolite-to-metabolite relationships as curves on a metabolic pathway map, or on a combination of up to four maps. This representation would help to discover, for example, novel functions of a transcription factor that regulates genes on a metabolic pathway. Pathway maps of the Kyoto Encyclopedia of Genes and Genomes (KEGG) and maps generated from their gene classifications are available at KaPPA-View4 KEGG version (http://kpv.kazusa.or.jp/kpv4-kegg/). At present, gene co-expression data from the databases ATTED-II, COXPRESdb, CoP and MiBASE for human, mouse, rat, Arabidopsis, rice, tomato and other plants are available.
MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry (EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MS(n) data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10,286 volatile natural and synthetic compounds, and 3045 ESI-MS(2) data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS(2) data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS(2) data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21-23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data.
Remineralization of organic matter in deep-sea sediments is important in oceanic biogeochemical cycles, and bacteria play a major role in this process. Shewanella violacea DSS12 is a psychrophilic and piezophilic gamma-proteobacterium that was isolated from the surface layer of deep sea sediment at a depth of 5110 m. Here, we report the complete genome sequence of S. violacea and comparative analysis with the genome of S. oneidensis MR-1, isolated from sediments of a freshwater lake. Unlike S. oneidensis, this deep-sea Shewanella possesses very few terminal reductases for anaerobic respiration and no c-type cytochromes or outer membrane proteins involved in respiratory Fe(iii) reduction, which is characteristic of most Shewanella species. Instead, the S. violacea genome contains more terminal oxidases for aerobic respiration and a much greater number of putative secreted proteases and polysaccharases, in particular, for hydrolysis of collagen, cellulose and chitin, than are encoded in S. oneidensis. Transporters and assimilatory reductases for nitrate and nitrite, and nitric oxide-detoxifying mechanisms (flavohemoglobin and flavorubredoxin) are found in S. violacea. Comparative analysis of the S. violacea genome revealed the respiratory adaptation of this bacterium to aerobiosis, leading to predominantly aerobic oxidation of organic matter in surface sediments, as well as its ability to efficiently use diverse organic matter and to assimilate inorganic nitrogen as a survival strategy in the nutrient-poor deep-sea floor.
The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation. Users can access ASTD through a variety of browsing and query tools, including expression state-based queries for the identification of tissue-specific isoforms. Participating laboratories have experimentally validated a subset of ASTD-predicted alternative splice forms and alternative polyadenylation forms that were not previously reported. The ASTD database can be accessed at http://www.ebi.ac.uk/astd.
Of the 1.1 million Alu retroposons in the human genome, about 10,000 are inserted in the 3 untranslated regions (UTR) of protein-coding genes and 1% of these (107 events) are active as polyadenylation sites (PASs). Strikingly, although Alus in 3 UTR are indifferently inserted in the forward or reverse direction, 99% of polyadenylation-active Alu sequences are forward oriented. Consensus Alu+ sequences contain sites that can give rise to polyadenylation signals and enhancers through a few point mutations. We found that the strand bias of polyadenylation-active Alus reflects a radical difference in the fitness of sense and antisense Alus toward cleavage/polyadenylation activity. In contrast to previous beliefs, Alu inserts do not necessarily represent weak or cryptic PASs; instead, they often constitute the major or the unique PAS in a gene, adding to the growing list of Alu exaptations. Finally, some Alu-borne PASs are intronic and produce truncated transcripts that may impact gene function and/or contribute to gene remodeling.
High-accuracy mass values detected by high-resolution mass spectrometry analysis enable prediction of elemental compositions, and thus are used for metabolite annotations in metabolomic studies. Here, we report an application of a relational database to significantly improve the rate of elemental composition predictions. By searching a database of pre-calculated elemental compositions with fixed kinds and numbers of atoms, the approach eliminates redundant evaluations of the same formula that occur in repeated calculations with other tools. When our approach is compared with HR2, which is one of the fastest tools available, our database search times were at least 109 times shorter than those of HR2. When a solid-state drive (SSD) was applied, the search time was 488 times shorter at 5 ppm mass tolerance and 1833 times at 0.1 ppm. Even if the search by HR2 was performed with 8 threads in a high-spec Windows 7 PC, the database search times were at least 26 and 115 times shorter without and with the SSD. These improvements were enhanced in a low spec Windows XP PC. We constructed a web service MFSearcher to query the database in a RESTful manner. Availability and implementation: Available for free at http://webs2.kazusa.or.jp/mfsearcher. The web service is implemented in Java, MySQL, Apache and Tomcat, with all major browsers supported.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.