In JoVE (1)

Other Publications (189)

Articles by David S. Wishart in JoVE

Other articles by David S. Wishart on PubMed

Probing the Structural Determinants of Type II' Beta-turn Formation in Peptides and Proteins

Journal of the American Chemical Society. Feb, 2002  |  Pubmed ID: 11841288

The structural determinants of type II' beta-turns were probed through a comprehensive CD, NMR, and molecular dynamics analysis of 10 specially designed beta-hairpin peptides. The peptide model used in this study is a synthetic, water-soluble, 14-residue cyclic analogue of gramicidin S which contains two well-defined type II' beta-turns connected by a highly stable, amphipathic, antiparallel beta-sheet. A variety of coded and noncoded amino acids were systematically substituted in one of the two type II' turns to analyze the effects of backbone chirality, side-chain steric restriction, and side-chain/side-chain interactions. beta-Sheet content (as measured through a variety of experimental methods), molecular dynamics, and 3D structural analysis of the turn regions were used to assess the effects of each amino acid substitution on type II' beta-turn stabilization. Our results demonstrate that backbone heterochirality, which determines equatorial and axial side-chain orientation at the i+1 and i+2 residues of type II' turns, may account for up to 60% of type II' beta-turn stabilization. Steric restriction through side-chain N-alkylation appears to enhance type II' beta-turn propensity and may account for up to 20% of type II' beta-turn stabilization. Finally, aromatic/proline side-chain interactions appear to account for approximately 10% of type II' beta-turn stabilization. We believe this information could be particularly useful for the prediction of beta-turn propensity, the development of peptide-based drugs, and the de novo design of peptides, proteins, and peptidyl mimetics.

Identification of a Novel Archaebacterial Thioredoxin: Determination of Function Through Structure

Biochemistry. Apr, 2002  |  Pubmed ID: 11939770

As part of a high-throughput, structural proteomic project we have used NMR spectroscopy to determine the solution structure and ascertain the function of a previously unknown, conserved protein (MtH895) from the thermophilic archeon Methanobacterium thermoautotrophicum. Our findings indicate that MtH895 contains a central four-stranded beta-sheet core surrounded by two helices on one side and a third on the other. It has an overall fold superficially similar to that of a glutaredoxin. However, detailed analysis of its three-dimensional structure along with molecular docking simulations of its interaction with T7 DNA polymerase (a thioredoxin-specific substrate) and comparisons with other known members of the thioredoxin/glutaredoxin family of proteins strongly suggest that MtH895 is more akin to a thioredoxin. Furthermore, measurement of the pK(a) values of its active site thiols along with direct measurements of the thioredoxin/glutaredoxin activity has confirmed that MtH895 is, indeed, a thioredoxin and exhibits no glutaredoxin activity. We have also identified a group of previously unknown proteins from several other archaebacteria that have significant (34-44%) sequence identity with MtH895. These proteins have unusual active site -CXXC- motifs not found in any known thioredoxin or glutaredoxin. On the basis of the results presented here, we predict that these small proteins are all members of a new class of truncated thioredoxins.

RefDB: a Database of Uniformly Referenced Protein Chemical Shifts

Journal of Biomolecular NMR. Mar, 2003  |  Pubmed ID: 12652131

RefDB is a secondary database of reference-corrected protein chemical shifts derived from the BioMagResBank (BMRB). The database was assembled by using a recently developed program (SHIFTX) to predict protein (1)H, (13)C and (15)N chemical shifts from X-ray or NMR coordinate data of previously assigned proteins. The predicted shifts were then compared with the corresponding observed shifts and a variety of statistical evaluations performed. In this way, potential mis-assignments, typographical errors and chemical referencing errors could be identified and, in many cases, corrected. This approach allows for an unbiased, instrument-independent solution to the problem of retrospectively re-referencing published protein chemical shifts. Results from this study indicate that nearly 25% of BMRB entries with (13)C protein assignments and 27% of BMRB entries with (15)N protein assignments required significant chemical shift reference readjustments. Additionally, nearly 40% of protein entries deposited in the BioMagResBank appear to have at least one assignment error. From this study it evident that protein NMR spectroscopists are increasingly adhering to recommended IUPAC (13)C and (15)N chemical shift referencing conventions, however, approximately 20% of newly deposited protein entries in the BMRB are still being incorrectly referenced. This is cause for some concern. However, the utilization of RefDB and its companion programs may help mitigate this ongoing problem. RefDB is updated weekly and the database, along with its associated software, is freely available at and the BMRB website.

Backbone 1H, 15N and 13C Assignments for the Human Rhinovirus 3C Protease (serotype 14)

Journal of Biomolecular NMR. May, 2003  |  Pubmed ID: 12766405

Rapid and Accurate Calculation of Protein 1H, 13C and 15N Chemical Shifts

Journal of Biomolecular NMR. Jul, 2003  |  Pubmed ID: 12766419

A computer program (SHIFTX) is described which rapidly and accurately calculates the diamagnetic 1H, 13C and 15N chemical shifts of both backbone and sidechain atoms in proteins. The program uses a hybrid predictive approach that employs pre-calculated, empirically derived chemical shift hypersurfaces in combination with classical or semi-classical equations (for ring current, electric field, hydrogen bond and solvent effects) to calculate 1H, 13C and 15N chemical shifts from atomic coordinates. The chemical shift hypersurfaces capture dihedral angle, sidechain orientation, secondary structure and nearest neighbor effects that cannot easily be translated to analytical formulae or predicted via classical means. The chemical shift hypersurfaces were generated using a database of IUPAC-referenced protein chemical shifts--RefDB (Zhang et al., 2003), and a corresponding set of high resolution (<2.1 A) X-ray structures. Data mining techniques were used to extract the largest pairwise contributors (from a list of approximately 20 derived geometric, sequential and structural parameters) to generate the necessary hypersurfaces. SHIFTX is rapid (<1 CPU second for a complete shift calculation of 100 residues) and accurate. Overall, the program was able to attain a correlation coefficient (r) between observed and calculated shifts of 0.911 (1Halpha), 0.980 (13Calpha), 0.996 (13Cbeta), 0.863 (13CO), 0.909 (15N), 0.741 (1HN), and 0.907 (sidechain 1H) with RMS errors of 0.23, 0.98, 1.10, 1.16, 2.43, 0.49, and 0.30 ppm, respectively on test data sets. We further show that the agreement between observed and SHIFTX calculated chemical shifts can be an extremely sensitive measure of the quality of protein structures. Our results suggest that if NMR-derived structures could be refined using heteronuclear chemical shifts calculated by SHIFTX, their precision could approach that of the highest resolution X-ray structures. SHIFTX is freely available as a web server at

VADAR: a Web Server for Quantitative Evaluation of Protein Structure Quality

Nucleic Acids Research. Jul, 2003  |  Pubmed ID: 12824316

VADAR (Volume Area Dihedral Angle Reporter) is a comprehensive web server for quantitative protein structure evaluation. It accepts Protein Data Bank (PDB) formatted files or PDB accession numbers as input and calculates, identifies, graphs, reports and/or evaluates a large number (>30) of key structural parameters both for individual residues and for the entire protein. These include excluded volume, accessible surface area, backbone and side chain dihedral angles, secondary structure, hydrogen bonding partners, hydrogen bond energies, steric quality, solvation free energy as well as local and overall fold quality. These derived parameters can be used to rapidly identify both general and residue-specific problems within newly determined protein structures. The VADAR web server is freely accessible at

Structural and Functional Characterization of a Thioredoxin-like Protein (Mt0807) from Methanobacterium Thermoautotrophicum

Biochemistry. Jul, 2003  |  Pubmed ID: 12834352

Mt0807 is an 85-residue thiol-redox protein from the anaerobic archaebacterium Methanobacterium thermoautotrophicum. Its small size, its participation in certain redox reactions, and the presence of a "classic" glutareodoxin active-site sequence have led to the suggestion that it might be a glutaredoxin. However, studies by previous workers indicated that it exhibited neither glutaredoxin-like nor thioredoxin-like properties. To clarify the true role of this protein and its structure/functional relationship with a paralogous thioredoxin (Mt0895, 28% sequence identity) and a recently characterized orthologous protein (Mj0307, 51% sequence identity), we undertook a series of biochemical and biophysical studies. Comparative enzymatic assays and thiol titration experiments were combined with NMR structural studies and detailed 3D structure comparisons. Structurally, our results show that Mt0807 has a glutaredoxin-like fold (central four-stranded beta-sheet core surrounded by two helices on one side and a third on the other). However, more detailed comparisons with other members of the thioredoxin superfamily indicate that Mt0807 actually has several key structural and active-site characteristics more common to a thioredoxin. Furthermore, biochemical tests show that Mt0807 actually behaves as true thioredoxin. Comparisons between Mt0807 and its paralogue, Mt0895, indicate these two archaebacterial thioredoxins share very similar folds, but exhibit very different activities and likely serve somewhat different roles. On the basis of its greater relative abundance and significantly stronger redox activity, we believe that Mt0807 is the primary thioredoxin for M. thermoautotrophicum, while Mt0895 plays a minor or supportive role. We also suggest that these two molecules (Mt0807 and Mt0895) may represent a group of ancient proteins that were ancestral to both thioredoxins and glutaredoxins.

The CyberCell Database (CCDB): a Comprehensive, Self-updating, Relational Database to Coordinate and Facilitate in Silico Modeling of Escherichia Coli

Nucleic Acids Research. Jan, 2004  |  Pubmed ID: 14681416

The CyberCell Database (CCDB: http://redpoll. is a comprehensive, web-accessible database designed to support and coordinate international efforts in modeling an Escherichia coli cell on a computer. The CCDB brings together both observed and derived quantitative data from numerous independent sources covering many aspects of the genomic, proteomic and metabolomic character of E.coli (strain K12). The database is self-updating but also supports 'community' annotation, and provides an extensive array of viewing, querying and search options including a powerful, easy-to-use relational data extraction system.

GelScape: a Web-based Server for Interactively Annotating, Manipulating, Comparing and Archiving 1D and 2D Gel Images

Bioinformatics (Oxford, England). Apr, 2004  |  Pubmed ID: 14764570

GelScape is a web-based tool that permits facile, interactive annotation, comparison, manipulation and storage of protein gel images. It uses Java applet-servlet technology to allow rapid, remote image handling and image processing in a platform-independent manner. It supports many of the features found in commercial, stand-alone gel analysis software including spot annotation, spot integration, gel warping, image resizing, HTML image mapping, image overlaying as well as the storage of gel image and gel annotation data in compliance with Federated Gel Database requirements.

Solution Structures of Reduced and Oxidized Bacteriophage T4 Glutaredoxin

Journal of Biomolecular NMR. May, 2004  |  Pubmed ID: 15017142

NMR Solution Structure of the Precursor for Carnobacteriocin B2, an Antimicrobial Peptide from Carnobacterium Piscicola

European Journal of Biochemistry. May, 2004  |  Pubmed ID: 15096213

Type IIa bacteriocins, which are isolated from lactic acid bacteria that are useful for food preservation, are potent antimicrobial peptides with considerable potential as therapeutic agents for gastrointestinal infections in mammals. They are ribosomally synthesized as precursors with an N-terminal leader, typically 18-24 amino acid residues in length, which is cleaved during export from the producing cell. We have chemically synthesized the full precursor of carnobacteriocin B2, precarnobacteriocin (preCbnB2), which has a C-terminal amide rather than a carboxyl, and also produced preCbnB2(1-64), which is missing two amino acid residues at the C-terminus (Arg65 and Pro66), via expression in Escherichia coli as a maltose-binding protein fusion that is then cut with Factor Xa. PreCbnB2(1-64) is readily labeled with (15)N and (13)C for NMR studies using the latter approach. Multidimensional NMR analysis of preCbnB2(1-64) shows that, like the parent bacteriocin, it exists as a random coil in water but assumes a defined conformation in water/trifluoroethanol mixtures. In 70 : 30 trifluoroethanol/water, the 3D structure of the preCbnB2 section corresponding to the mature bacteriocin is essentially the same as reported previously by us for carnobacteriocin B2 (CbnB2). This structure maintains the highly conserved alpha-helix corresponding to residues 20-38 of CbnB2 that is believed to be responsible for interaction with a target receptor in sensitive cells, including Listeria monocytogenes. PreCbnB2 also has a second alpha-helix from residues 3-13 (i.e. -15 to -5 relative to CbnB2) in the leader section of the peptide. This helix appears to be conserved in related type IIa bacteriocin precursors based on sequence analysis. It is likely to be a key recognition element for export and processing, and is probably responsible for the considerably reduced antimicrobial activity of preCbnB2. The latter effect may assist the producing cell in avoiding the toxic effects of the bacteriocin. This is the first 3D structure determined for a prebacteriocin from lactic acid bacteria.

Proteome Analyst: Custom Predictions with Explanations in a Web-based Tool for High-throughput Proteome Annotations

Nucleic Acids Research. Jul, 2004  |  Pubmed ID: 15215412

Proteome Analyst (PA) ( is a publicly available, high-throughput, web-based system for predicting various properties of each protein in an entire proteome. Using machine-learned classifiers, PA can predict, for example, the GeneQuiz general function and Gene Ontology (GO) molecular function of a protein. In addition, PA is currently the most accurate and most comprehensive system for predicting subcellular localization, the location within a cell where a protein performs its main function. Two other capabilities of PA are notable. First, PA can create a custom classifier to predict a new property, without requiring any programming, based on labeled training data (i.e. a set of examples, each with the correct classification label) provided by a user. PA has been used to create custom classifiers for potassium-ion channel proteins and other general function ontologies. Second, PA provides a sophisticated explanation feature that shows why one prediction is chosen over another. The PA system produces a Naïve Bayes classifier, which is amenable to a graphical and interactive approach to explanations for its predictions; transparent predictions increase the user's confidence in, and understanding of, PA.

SuperPose: a Simple Server for Sophisticated Structural Superposition

Nucleic Acids Research. Jul, 2004  |  Pubmed ID: 15215457

The SuperPose web server rapidly and robustly calculates both pairwise and multiple protein structure superpositions using a modified quaternion eigenvalue approach. SuperPose generates sequence alignments, structure alignments, PDB (Protein Data Bank) coordinates and RMSD statistics, as well as difference distance plots and images (both static and interactive) of the superimposed molecules. SuperPose employs a simple interface that requires only PDB files or accession numbers as input. All other superposition decisions are made by the program. SuperPose is uniquely able to superimpose structures that differ substantially in sequence, size or shape. It is also capable of handling a much larger range of superposition queries and situations than many standalone programs and yields results that are intuitively more in agreement with known biological or structural data. The SuperPose web server is freely accessible at

PlasMapper: a Web Server for Drawing and Auto-annotating Plasmid Maps

Nucleic Acids Research. Jul, 2004  |  Pubmed ID: 15215471

PlasMapper is a comprehensive web server that automatically generates and annotates high-quality circular plasmid maps. Taking only the plasmid/vector DNA sequence as input, PlasMapper uses sequence pattern matching and BLAST alignment to automatically identify and label common promoters, terminators, cloning sites, restriction sites, reporter genes, affinity tags, selectable marker genes, replication origins and open reading frames. PlasMapper then presents the identified features in textual form and as high-resolution, multicolored graphical output. The appearance and contents of the output can be customized in numerous ways using several supplied options. Further, PlasMapper images can be rendered in both rasterized (PNG and JPG) and vector graphics (SVG) formats to accommodate a variety of user needs or preferences. The images and textual output are of sufficient quality that they may be used directly in publications or presentations. The PlasMapper web server is freely accessible at

Dynamic Relationships Among Type IIa Bacteriocins: Temperature Effects on Antimicrobial Activity and on Structure of the C-terminal Amphipathic Alpha Helix As a Receptor-binding Region

Biochemistry. Jul, 2004  |  Pubmed ID: 15248758

Dynamic aspects of structural relationships among class IIa bacteriocins, which are antimicrobial peptides from lactic acid bacteria (LAB), have been examined by use of circular dichroism (CD), molecular dynamics (MD) simulations, and activity testing. Pediocin PA-1 is a potent class IIa bacteriocin, which contains a second C-terminal disulfide bond in addition to the highly conserved N-terminal disulfide bond. A mutant of pediocin PA-1, ped[M31Nle], wherein the replacement of methionine by norleucine (Nle) gives enhanced stability toward aerobic oxidation, was synthesized by solid-phase peptide synthesis to study the activity of the peptide in relation to its structure. The secondary structural analysis from CD spectra of ped[M31Nle], carnobacteriocin B2 (cbn B2), and leucocin A (leuA) at different temperatures suggests that the alpha-helical region of these peptides is important for target recognition and activity. Using molecular modeling and dynamic simulations, complete models of pediocin PA-1, enterocin P, sakacin P, and curvacin A in 2,2,2-trifluoroethanol (TFE) were generated to compare structural relationships among this class of bacteriocins. Their high sequence similarity allows for the use of homology modeling techniques. Starting from homology models based on solution structures of leuA (PDB code 1CW6) and cbnB2 (PDB code 1CW5), results of 2-4 ns MD simulations in TFE and water at 298 and 313 K are reported. The results indicate that these peptides have a common helical C-terminal domain in TFE but a more variable beta sheet or coiled N terminus. At elevated temperatures, pediocin PA-1 maintains its overall structure, whereas peptides without the second C-terminal disulfide bond, such as enterocin P, sakacin P, curvacin A, leuA, and cbnB2 experience partial disruption of the helical section. Pediocin PA-1 and ped[M31Nle] were found to be equally active at different temperatures, whereas the other peptides that lack the second C-terminal disulfide bond are 30-50 times less antimicrobially potent at 310 K (37 degrees C) than at 298 K (25 degrees C). These results indicate that the structural changes in the helical region observed at elevated temperatures account for the loss of activity of these peptides. The presence of C-terminal hydrophobic residues on one side of the amphipathic helix in class IIa bacteriocins is an important feature for receptor recognition and specificity toward particular organisms. This study assists in the understanding of structure-activity relationships in type IIa bacteriocins and demonstrates the importance of the conserved C-terminal amphipathic alpha helix for activity.

Synthesis and Evaluation of Keto-glutamine Analogues As Potent Inhibitors of Severe Acute Respiratory Syndrome 3CLpro

Journal of Medicinal Chemistry. Dec, 2004  |  Pubmed ID: 15566280

The 3C-like proteinase (3CL(pro)) of severe acute respiratory syndrome (SARS) coronavirus is a key target for structure-based drug design against this viral infection. The enzyme recognizes peptide substrates with a glutamine residue at the P1 site. A series of keto-glutamine analogues with a phthalhydrazido group at the alpha-position were synthesized and tested as reversible inhibitiors against SARS 3CL(pro). Attachment of tripeptide (Ac-Val-Thr-Leu) to these glutamine-based "warheads" generated significantly better inhibitors (4a-c, 8a-d) with IC(50) values ranging from 0.60 to 70 microM.

Complete 1H, 13C and 15N NMR Assignments of MTH0776 from Methanobacterium Thermoautotrophicum

Journal of Biomolecular NMR. Dec, 2004  |  Pubmed ID: 15630567

Circular Genome Visualization and Exploration Using CGView

Bioinformatics (Oxford, England). Feb, 2005  |  Pubmed ID: 15479716

CGView (Circular Genome Viewer) is a Java application and library for generating high-quality, zoomable maps of circular genomes. It converts XML or tab-delimited input into a graphical map (PNG, JPG or Scalable Vector Graphics format), complete with sequence features, labels, legends and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views. The linked maps can be explored using any Web browser, allowing rapid genome browsing and facilitating data sharing.

PA-GOSUB: a Searchable Database of Model Organism Protein Sequences with Their Predicted Gene Ontology Molecular Function and Subcellular Localization

Nucleic Acids Research. Jan, 2005  |  Pubmed ID: 15608166

PA-GOSUB (Proteome Analyst: Gene Ontology Molecular Function and Subcellular Localization) is a publicly available, web-based, searchable and downloadable database that contains the sequences, predicted GO molecular functions and predicted subcellular localizations of more than 107,000 proteins from 10 model organisms (and growing), covering the major kingdoms and phyla for which annotated proteomes exist ( The PA-GOSUB database effectively expands the coverage of subcellular localization and GO function annotations by a significant factor (already over five for subcellular localization, compared with Swiss-Prot v42.7), and more model organisms are being added to PA-GOSUB as their sequenced proteomes become available. PA-GOSUB can be used in three main ways. First, a researcher can browse the pre-computed PA-GOSUB annotations on a per-organism and per-protein basis using annotation-based and text-based filters. Second, a user can perform BLAST searches against the PA-GOSUB database and use the annotations from the homologs as simple predictors for the new sequences. Third, the whole of PA-GOSUB can be downloaded in either FASTA or comma-separated values (CSV) formats.

BacMap: an Interactive Picture Atlas of Annotated Bacterial Genomes

Nucleic Acids Research. Jan, 2005  |  Pubmed ID: 15608206

BacMap is an interactive visual database containing fully labeled, zoomable and searchable chromosome maps from more than 170 bacterial (archaebacterial and eubacterial) species. It uses a recently developed visualization tool (CGView) to generate high-resolution circular genome maps from sequence feature information. Each map includes an interface that allows the image to be expanded and rotated. In the default view, identified genes are drawn to scale and colored according to coding directions. When a region of interest is expanded, gene labels are displayed. Each label is hyperlinked to a custom 'gene card' which provides several fields of information concerning the corresponding DNA and protein sequences. Each genome map is searchable via a local BLAST search and a gene name/synonym search. BacMap is freely available at

Identification of Novel and Known Oocyte-specific Genes Using Complementary DNA Subtraction and Microarray Analysis in Three Different Species

Biology of Reproduction. Jul, 2005  |  Pubmed ID: 15744023

The main objective of the present study was to identify novel oocyte-specific genes in three different species: bovine, mouse, and Xenopus laevis. To achieve this goal, two powerful technologies were combined: a polymerase chain reaction (PCR)-based cDNA subtraction, and cDNA microarrays. Three subtractive libraries consisting of 3456 clones were established and enriched for oocyte-specific transcripts. Sequencing analysis of the positive insert-containing clones resulted in the following classification: 53% of the clones corresponded to known cDNAs, 26% were classified as uncharacterized cDNAs, and a final 9% were classified as novel sequences. All these clones were used for cDNA microarray preparation. Results from these microarray analyses revealed that in addition to already known oocyte-specific genes, such as GDF9, BMP15, and ZP, known genes with unknown function in the oocyte were identified, such as a MLF1-interacting protein (MLF1IP), B-cell translocation gene 4 (BTG4), and phosphotyrosine-binding protein (xPTB). Furthermore, 15 novel oocyte-specific genes were validated by reverse transcription-PCR to confirm their preferential expression in the oocyte compared to somatic tissues. The results obtained in the present study confirmed that microarray analysis is a robust technique to identify true positives from the suppressive subtractive hybridization experiment. Furthermore, obtaining oocyte-specific genes from three species simultaneously allowed us to look at important genes that are conserved across species. Further characterization of these novel oocyte-specific genes will lead to a better understanding of the molecular mechanisms related to the unique functions found in the oocyte.

A Simple Method to Adjust Inconsistently Referenced 13C and 15N Chemical Shift Assignments of Proteins

Journal of Biomolecular NMR. Feb, 2005  |  Pubmed ID: 15772753

Inconsistent 13C and 15N chemical shift referencing is a continuing problem associated with protein chemical shift assignments deposited in BioMagResBank (BMRB). Here we describe a simple and robust approach that can quantitatively determine the 13C and 15N referencing offsets solely from chemical shift assignment data and independently of 3D coordinate data. This novel structure-independent approach permitted the assessment and determination of 13C and 15N reference offsets for all protein entries deposited in the BMRB. Tests on 452 proteins with known 3D structures show that this structure-independent approach yields 13C and 15N referencing offsets that exhibit excellent agreement with those calculated on the basis of 3D structures. Furthermore, this protocol appears to improve the accuracy of chemical shift-derived secondary structural identification, and has been formally incorporated into a computer program called PSSI (http//

Bioinformatics in Drug Development and Assessment

Drug Metabolism Reviews. 2005  |  Pubmed ID: 15931766

Bioinformatics is playing an increasingly important role in nearly all aspects of drug discovery, drug assessment, and drug development. This growing importance lies not only in the role that bioinformatics plays in handling large volumes of data, but also in the utility of bioinformatics tools to predict, analyze, or help interpret clinical and preclinical findings. This review focuses on describing and evaluating some of the newer or more important bioinformatics resources (i.e., databases and software) that are of growing importance to understanding or predicting drug metabolism, especially with respect to the absorption, distribution, metabolism, excretion, (ADME), and toxicity (T) of both existing drugs and potential drug leads. Detailed descriptions and critical assessments of a number of potentially useful bioinformatics/cheminformatics databases and predictive ADMET software tools are provided. Additionally, several pharmaceutically important applications of both the databases and software are highlighted. Given the rapid growth in this area and the rapid changes that are taking place, a special emphasis is placed on freely available or Web-accessible resources.

Dynamic Cellular Automata: an Alternative Approach to Cellular Simulation

In Silico Biology. 2005  |  Pubmed ID: 15972011

A wide variety of approaches, ranging from Petri nets to systems of partial differential equations, have been used to model very specific aspects of cellular or biochemical functions. Here we describe how an agent-based or dynamic cellular automata (DCA) approach can be used as a very simple, yet very general method to model many different kinds of cellular or biochemical processes. Specifically, using simple pairwise interaction rules coupled with random object moves to simulate Brownian motion, we show how the DCA approach can be used to easily and accurately model diffusion, viscous drag, enzyme rate processes, metabolism (the Kreb's cycle), and complex genetic circuits (the repressilator). We also demonstrate how DCA approaches are able to accurately capture the stochasticity of many biological processes. The success and simplicity of this technique suggests that many other physical properties and significantly more complicated aspects of cellular behavior could be modeled using DCA methods. An easy-to-use, graphically-based computer program, called SimCell, was developed to perform the DCA simulations described here. It is available at

MovieMaker: a Web Server for Rapid Rendering of Protein Motions and Interactions

Nucleic Acids Research. Jul, 2005  |  Pubmed ID: 15980488

MovieMaker is a web server that allows short ( approximately 10 s), downloadable movies of protein motions to be generated. It accepts PDB files or PDB accession numbers as input and automatically calculates, renders and merges the necessary image files to create colourful animations covering a wide range of protein motions and other dynamic processes. Users have the option of animating (i) simple rotation, (ii) morphing between two end-state conformers, (iii) short-scale, picosecond vibrations, (iv) ligand docking, (v) protein oligomerization, (vi) mid-scale nanosecond (ensemble) motions and (vii) protein folding/unfolding. MovieMaker does not perform molecular dynamics calculations. Instead it is an animation tool that uses a sophisticated superpositioning algorithm in conjunction with Cartesian coordinate interpolation to rapidly and automatically calculate the intermediate structures needed for many of its animations. Users have extensive control over the rendering style, structure colour, animation quality, background and other image features. MovieMaker is intended to be a general-purpose server that allows both experts and non-experts to easily generate useful, informative protein animations for educational and illustrative purposes. MovieMaker is accessible at

BASys: a Web Server for Automated Bacterial Genome Annotation

Nucleic Acids Research. Jul, 2005  |  Pubmed ID: 15980511

BASys (Bacterial Annotation System) is a web server that supports automated, in-depth annotation of bacterial genomic (chromosomal and plasmid) sequences. It accepts raw DNA sequence data and an optional list of gene identification information and provides extensive textual annotation and hyperlinked image output. BASys uses >30 programs to determine approximately 60 annotation subfields for each gene, including gene/protein name, GO function, COG function, possible paralogues and orthologues, molecular weight, isoelectric point, operon structure, subcellular localization, signal peptides, transmembrane regions, secondary structure, 3D structure, reactions and pathways. The depth and detail of a BASys annotation matches or exceeds that found in a standard SwissProt entry. BASys also generates colorful, clickable and fully zoomable maps of each query chromosome to permit rapid navigation and detailed visual analysis of all resulting gene annotations. The textual annotations and images that are provided by BASys can be generated in approximately 24 h for an average bacterial chromosome (5 Mb). BASys annotations may be viewed and downloaded anonymously or through a password protected access system. The BASys server and databases can also be downloaded and run locally. BASys is accessible at

Solution Structure of MTH0776 from Methanobacterium Thermoautotrophicum

Journal of Biomolecular NMR. Sep, 2005  |  Pubmed ID: 16222557

A Simple Method to Predict Protein Flexibility Using Secondary Chemical Shifts

Journal of the American Chemical Society. Nov, 2005  |  Pubmed ID: 16248604

Protein motions play a critical role in many biological processes, such as enzyme catalysis, allosteric regulation, antigen-antibody interactions, and protein-DNA binding. NMR spectroscopy occupies a unique place among methods for investigating protein dynamics due to its ability to provide site-specific information about protein motions over a large range of time scales. However, most NMR methods require a detailed knowledge of the 3D structure and/or the collection of additional experimental data (NOEs, T1, T2, etc.) to accurately measure protein dynamics. Here we present a simple method based on chemical shift data that allows accurate, quantitative, site-specific mapping of protein backbone mobility without the need of a three-dimensional structure or the collection and analysis of NMR relaxation data. Further, we show that this chemical shift method is able to quantitatively predict per-residue RMSD values (from both MD simulations and NMR structural ensembles) as well as model-free backbone order parameters.

DrugBank: a Comprehensive Resource for in Silico Drug Discovery and Exploration

Nucleic Acids Research. Jan, 2006  |  Pubmed ID: 16381955

DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug (i.e. chemical) data with comprehensive drug target (i.e. protein) information. The database contains >4100 drug entries including >800 FDA approved small molecule and biotech drugs as well as >3200 experimental drugs. Additionally, >14,000 protein or drug target sequences are linked to these drug entries. Each DrugCard entry contains >80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. Many data fields are hyperlinked to other databases (KEGG, PubChem, ChEBI, PDB, Swiss-Prot and GenBank) and a variety of structure viewing applets. The database is fully searchable supporting extensive text, sequence, chemical structure and relational query searches. Potential applications of DrugBank include in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. DrugBank is available at

Improving the Accuracy of Protein Secondary Structure Prediction Using Structural Alignment

BMC Bioinformatics. Jun, 2006  |  Pubmed ID: 16774686

The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high.

Accurate Prediction of Protein Torsion Angles Using Chemical Shifts and Sequence Homology

Magnetic Resonance in Chemistry : MRC. Jul, 2006  |  Pubmed ID: 16823900

Torsion angle restraints are frequently used in the determination and refinement of protein structures by NMR. These restraints may be obtained by J coupling, cross-correlation measurements, nuclear Overhauser effects (NOEs) or secondary chemical shifts. Currently most backbone (phi/psi) torsion angles are determined using a combination of J(HNHalpha) couplings and chemical shift measurements while most side-chain (chi1) angles and cis/trans peptide bond angles (omega) are determined via NOEs. The dependency on multiple experimental (and computational) methods to obtain different torsion angle restraints is both time-consuming and error prone. The situation could be greatly improved if the determination of all torsion angles (phi, psi, chi and omega) could be made via a single type of measurement (i.e. chemical shifts). Here we describe a program, called SHIFTOR, that is able to accurately predict a large number of protein torsion angles (phi, psi, omega, chi1) using only 1H, 13C and 15N chemical shift assignments as input. Overall, the program is 100x faster and its predictions are approximately 20% better than existing methods. The program is also capable of predicting chi1 angles with 81% accuracy and omega angles with 100% accuracy. SHIFTOR exploits many of the recent developments and observations regarding chemical shift dependencies as well as using information in the Protein Databank to improve the quality of its shift-derived torsion angle predictions. SHIFTOR is available as a freely accessible web server at

PREDITOR: a Web Server for Predicting Protein Torsion Angle Restraints

Nucleic Acids Research. Jul, 2006  |  Pubmed ID: 16845087

Every year between 500 and 1000 peptide and protein structures are determined by NMR and deposited into the Protein Data Bank. However, the process of NMR structure determination continues to be a manually intensive and time-consuming task. One of the most tedious and error-prone aspects of this process involves the determination of torsion angle restraints including phi, psi, omega and chi angles. Most methods require many days of additional experiments, painstaking measurements or complex calculations. Here we wish to describe a web server, called PREDITOR, which greatly accelerates and simplifies this task. PREDITOR accepts sequence and/or chemical shift data as input and generates torsion angle predictions (with predicted errors) for phi, psi, omega and chi-1 angles. PREDITOR combines sequence alignment methods with advanced chemical shift analysis techniques to generate its torsion angle predictions. The method is fast (<40 s per protein) and accurate, with 88% of phi/psi predictions being within 30 degrees of the correct values, 84% of chi-1 predictions being correct and 99.97% of omega angles being correct. PREDITOR is 35 times faster and up to 20% more accurate than any existing method. PREDITOR also provides accurate assessments of the torsion angle errors so that the torsion angle constraints can be readily fed into standard structure refinement programs, such as CNS, XPLOR, AMBER and CYANA. Other unique features to PREDITOR include dihedral angle prediction via PDB structure mapping, automated chemical shift re-referencing (to improve accuracy), prediction of proline cis/trans states and a simple user interface. The PREDITOR website is located at:

Automated Bacterial Genome Analysis and Annotation

Current Opinion in Microbiology. Oct, 2006  |  Pubmed ID: 16931121

More than 300 bacterial genome sequences are publicly available, and many more are scheduled to be completed and released in the near future. Converting this raw sequence information into a better understanding of the biology of bacteria involves the identification and annotation of genes, proteins and pathways. This processing is typically done using sequence annotation pipelines comprised of a variety of software modules and, in some cases, human experts. The reference databases, computational methods and knowledge that form the basis of these pipelines are constantly evolving, and thus there is a need to reprocess genome annotations on a regular basis. The combined challenge of revising existing annotations and extracting useful information from the flood of new genome sequences will necessitate more reliance on completely automated systems.

Metabolomics in Monitoring Kidney Transplants

Current Opinion in Nephrology and Hypertension. Nov, 2006  |  Pubmed ID: 17053480

The success of any given kidney transplant is closely tied to the ability to monitor patients and responsively change their medications. Transplant monitoring is still, however, dependent on relatively old technologies: serum creatinine levels, urine output, blood pressure, blood glucose and histopathology of biopsy samples. These older technologies do not offer sufficient specificity, sensitivity, or accuracy to allow appropriate and timely interventions. Using the tools of genomics, proteomics and metabolomics new biomarkers are being found that may greatly improve transplant monitoring and significantly enhance graft survival. This review describes the basic principles of metabolomics and summarizes a number of recent developments in the use of metabolite biomarkers and metabolomics to monitor kidney transplants.

NMR: Prediction of Protein Flexibility

Nature Protocols. 2006  |  Pubmed ID: 17406296

We present a protocol for predicting protein flexibility from NMR chemical shifts. The protocol consists of (i) ensuring that the chemical shift assignments are correctly referenced or, if not, performing a reference correction using information derived from the chemical shift index, (ii) calculating the random coil index (RCI), and (iii) predicting the expected root mean square fluctuations (RMSFs) and order parameters (S2) of the protein from the RCI. The key advantages of this protocol over existing methods for studying protein dynamics are that (i) it does not require prior knowledge of a protein's tertiary structure, (ii) it is not sensitive to the protein's overall tumbling and (iii) it does not require additional NMR measurements beyond the standard experiments for backbone assignments. When chemical shift assignments are available, protein flexibility parameters, such as S2 and RMSF, can be calculated within 1-2 h using a spreadsheet program.

HMDB: the Human Metabolome Database

Nucleic Acids Research. Jan, 2007  |  Pubmed ID: 17202168

The Human Metabolome Database (HMDB) is currently the most complete and comprehensive curated collection of human metabolite and human metabolism data in the world. It contains records for more than 2180 endogenous metabolites with information gathered from thousands of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the HMDB also contains an extensive collection of experimental metabolite concentration data compiled from hundreds of mass spectra (MS) and Nuclear Magnetic resonance (NMR) metabolomic analyses performed on urine, blood and cerebrospinal fluid samples. This is further supplemented with thousands of NMR and MS spectra collected on purified, reference metabolites. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, biofluid concentrations, disease associations, pathway information, enzyme data, gene sequence data, SNP and mutation data as well as extensive links to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided. The HMDB is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. The HMDB is available at:

Computational Systems Biology in Drug Discovery and Development: Methods and Applications

Drug Discovery Today. Apr, 2007  |  Pubmed ID: 17395089

Computational systems biology is an emerging field in biological simulation that attempts to model or simulate intra- and intercellular events using data gathered from genomic, proteomic or metabolomic experiments. The need to model complex temporal and spatiotemporal processes at many different scales has led to the emergence of numerous techniques, including systems of differential equations, Petri nets, cellular automata simulators, agent-based models and pi calculus. This review provides a brief summary and an assessment of most of these approaches. It also provides examples of how these methods are being used to facilitate drug discovery and development.

The RCI Server: Rapid and Accurate Calculation of Protein Flexibility Using Chemical Shifts

Nucleic Acids Research. Jul, 2007  |  Pubmed ID: 17485469

Protein motions play important roles in numerous biological processes such as enzyme catalysis, muscle contractions, antigen-antibody interactions, gene regulation and virus assembly. Knowledge of protein flexibility is also important in rational drug design, protein docking and protein engineering. However, the experimental measurement of protein motions is often difficult, requiring sophisticated experiments, complex data analysis and detailed information about the protein's tertiary structure. As a result, there is a considerable interest in developing simpler, more effective ways of quantifying protein flexibility. Recently, we described a method, called the random coil index (RCI), which is able to quantitatively estimate backbone root mean square fluctuations (RMSFs) of structural ensembles and order parameters using only chemical shifts. The RCI method is very fast (<5 s) and exceedingly robust. It also offers an excellent alternative to traditional methods of measuring protein flexibility. We have recently extended the RCI concept and implemented it as a web server. This server allows facile, accurate and fully automated predictions of MD RMSF values, NMR RMSF values and model-free order parameters (S2) directly from chemical shift assignments. It also performs automatic chemical shift re-referencing to ensure consistency and reproducibility. On average, the correlation between RCI predictions and experimentally obtained motional amplitudes is within the range from 0.77 to 0.82. The server is available at

Proteomics and the Human Metabolome Project

Expert Review of Proteomics. Jun, 2007  |  Pubmed ID: 17552914

Drug-target Discovery in Silico: Using the Web to Identify Novel Molecular Targets for Drug Action

SEB Experimental Biology Series. 2007  |  Pubmed ID: 17608241

Current Progress in Computational Metabolomics

Briefings in Bioinformatics. Sep, 2007  |  Pubmed ID: 17626065

Being a relatively new addition to the 'omics' field, metabolomics is still evolving its own computational infrastructure and assessing its own computational needs. Due to its strong emphasis on chemical information and because of the importance of linking that chemical data to biological consequences, metabolomics must combine elements of traditional bioinformatics with traditional cheminformatics. This is a significant challenge as these two fields have evolved quite separately and require very different computational tools and skill sets. This review is intended to familiarize readers with the field of metabolomics and to outline the needs, the challenges and the recent progress being made in four areas of computational metabolomics: (i) metabolomics databases; (ii) metabolomics LIMS; (iii) spectral analysis tools for metabolomics and (iv) metabolic modeling.

NMR Solution Structures of the Apo and Peptide-inhibited Human Rhinovirus 3C Protease (Serotype 14): Structural and Dynamic Comparison

Biochemistry. Nov, 2007  |  Pubmed ID: 17944485

The human rhinovirus (HRV) is a positive sense RNA virus responsible for about 30% of "common colds". It relies on a 182 residue cysteine protease (3C) to proteolytically process its single gene product. Inhibition of this enzyme in vitro and in vivo has consistently demonstrated cessation of viral replication. This suggests that 3C protease inhibitors could serve as good drug candidates. However, significant proteolytic substrate diversity exists within the 110+ known rhinovirus serotypes. To investigate this variability we used NMR to solve the structure of the rhinovirus serotype 14 3C protease (subgenus B) covalently bound to a peptide (acetyl-LEALFQ-ethylpropionate) inhibitor. The inhibitor-bound structure was determined to an overall rmsd of 0.82 A (backbone atoms) and 1.49 A (all heavy atoms). Comparison with the X-ray structure of the serotype 2 HRV 3C protease from subgenus A (51% sequence identity) bound to the inhibitor ruprintrivir allowed the identification of conserved intermolecular interactions involved in proximal substrate binding as well as subgenus differences that might account for the variability observed in SAR studies. To better characterize the 3C protease and investigate the structural and dynamic differences between the apo and bound states we also solved the solution structure of the apo form. The apo structure has an overall rmsd of 1.07 +/- 0.17 A over backbone atoms, which is greater by 0.25 A than what is seen for the inhibited enzyme (2B0F.pdb). This increase is localized to the enzyme's C-terminal beta-barrel domain, which is responsible for recognizing and binding proteolytic substrates. Amide hydrogen exchange dynamics revealed dramatic differences between the two enzyme states. Furthermore, a number of residues exhibited exchange-broadened amide NMR signals in the apo state compared to the inhibited state. The majority of these residues are associated with proteolytic substrate interaction.

Improving Early Drug Discovery Through ADME Modelling: an Overview

Drugs in R&D. 2007  |  Pubmed ID: 17963426

Drug development is an intrinsically risky business. Like a high stakes poker game the entry costs are high and the probability of winning is low. Indeed, only a tiny percentage of lead compounds ever reach US FDA approval. At any point during the drug development process a prospective drug lead may be terminated owing to lack of efficacy, adverse effects, excessive toxicity, poor absorption or poor clearance. Unfortunately, the more promising a drug lead appears to be, the more costly it is to terminate its development. Typically, the cost of killing a drug grows exponentially as a drug lead moves further down the development pipeline. As a result there is considerable interest in developing either experimental or computational methods that can identify potentially problematic drug leads at the earliest stages in their development. One promising route is through the prediction or modelling of ADME (absorption, distribution, metabolism and excretion). ADME data, whether experimentally measured or computationally predicted, provide key insights into how a drug will ultimately be treated or accepted by the body. So while a drug lead may exhibit phenomenal efficacy in vitro, poor ADME results will almost invariably terminate its development. This review focuses on the use of ADME modelling to reduce late-stage attrition in drug discovery programmes. It also highlights what tools exist today for visualising and predicting ADME data, what tools need to be developed, and the importance of integrating ADME data to aid in compound selection during the earliest phases of drug discovery. In particular, it highlights what tools exist today for visualising and predicting ADME data including: (1) ADME parameter predictors; (2) metabolic fate predictors; (3) metabolic stability predictors; (4) cytochrome P450 substrate predictors; and (5) physiology-based pharmacokinetic (PBPK) modelling software. It also discusses what kinds of tools need to be developed, and the importance of integrating ADME data to aid in compound selection during the earliest phases of drug discovery.

BioSpider: a Web Server for Automating Metabolome Annotations

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2007  |  Pubmed ID: 17990488

One of the growing challenges in life science research lies in finding useful, descriptive or quantitative data about newly reported biomolecules (genes, proteins, metabolites and drugs). An even greater challenge is finding information that connects these genes, proteins, drugs or metabolites to each other. Much of this information is scattered through hundreds of different databases, abstracts or books and almost none of it is particularly well integrated. While some efforts are being undertaken at the NCBI and EBI to integrate many different databases together, this still falls short of the goal of having some kind of human-readable synopsis that summarizes the state of knowledge about a given biomolecule - especially small molecules. To address this shortfall, we have developed BioSpider. BioSpider is essentially an automated report generator designed specifically to tabulate and summarize data on biomolecules - both large and small. Specifically, BioSpider allows users to type in almost any kind of biological or chemical identifier (protein/gene name, sequence, accession number, chemical name, brand name, SMILES string, InCHI string, CAS number, etc.) and it returns an in-depth synoptic report (approximately 3-30 pages in length) about that biomolecule and any other biomolecule it may target. This summary includes physico-chemical parameters, images, models, data files, descriptions and predictions concerning the query molecule. BioSpider uses a web-crawler to scan through dozens of public databases and employs a variety of specially developed text mining tools and locally developed prediction tools to find, extract and assemble data for its reports. Because of its breadth, depth and comprehensiveness, we believe BioSpider will prove to be a particularly valuable tool for researchers in metabolomics. BioSpider is available at:

Human Metabolome Database: Completing the 'human Parts List'

Pharmacogenomics. Jul, 2007  |  Pubmed ID: 18240899

Introduction to Cheminformatics

Current Protocols in Bioinformatics. Jun, 2007  |  Pubmed ID: 18428788

Cheminformatics is a relatively new field of information technology that focuses on the collection, storage, analysis, and manipulation of chemical data. The chemical data of interest typically includes information on small molecule formulas, structures, properties, spectra, and activities (biological or industrial). Cheminformatics originally emerged as a vehicle to help the drug discovery and development process, however cheminformatics now plays an increasingly important role in many areas of biology, chemistry, and biochemistry. The intent of this unit is to give readers some introduction into the field of cheminformatics and to show how cheminformatics not only shares many similarities with the field of bioinformatics, but that it can also enhance much of what is currently done in bioinformatics.

In Silico Drug Exploration and Discovery Using DrugBank

Current Protocols in Bioinformatics. Jun, 2007  |  Pubmed ID: 18428789

DrugBank is a fully curated drug and drug target database that contains information on nearly 5000 drugs, including > 1200 FDA-approved small molecule and biotech drugs as well as >3200 experimental drugs. Additionally, more than 14,000 protein or drug target sequences are linked to these drug entries. DrugBank is primarily focused on providing both the query/search tools and the biophysical data needed to facilitate drug discovery and drug development. This unit provides readers with a detailed description of how to effectively use the DrugBank database and how to navigate through the DrugBank Web site. It also provides specific examples of how to find chemical homologs of potential drug leads and how to identify potential drug targets from newly sequenced pathogens. The intent of this unit is to give readers some introduction into the field of cheminformatics (the study of chemical information) and to show how cheminformatics can be seamlessly integrated into the field of bioinformatics.

Applications of Machine Learning in Cancer Prediction and Prognosis

Cancer Informatics. Feb, 2007  |  Pubmed ID: 19458758

Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.

Computational Systems Biology in Cancer: Modeling Methods and Applications

Gene Regulation and Systems Biology. Sep, 2007  |  Pubmed ID: 19936081

In recent years it has become clear that carcinogenesis is a complex process, both at the molecular and cellular levels. Understanding the origins, growth and spread of cancer, therefore requires an integrated or system-wide approach. Computational systems biology is an emerging sub-discipline in systems biology that utilizes the wealth of data from genomic, proteomic and metabolomic studies to build computer simulations of intra and intercellular processes. Several useful descriptive and predictive models of the origin, growth and spread of cancers have been developed in an effort to better understand the disease and potential therapeutic approaches. In this review we describe and assess the practical and theoretical underpinnings of commonly-used modeling approaches, including ordinary and partial differential equations, petri nets, cellular automata, agent based models and hybrid systems. A number of computer-based formalisms have been implemented to improve the accessibility of the various approaches to researchers whose primary interest lies outside of model development. We discuss several of these and describe how they have led to novel insights into tumor genesis, growth, apoptosis, vascularization and therapy.

Discovering Drug Targets Through the Web

Comparative Biochemistry and Physiology. Part D, Genomics & Proteomics. Mar, 2007  |  Pubmed ID: 20483274

Traditionally, drug-target discovery is a "wet-bench" experimental process, depending on carefully designed genetic screens, biochemical tests and cellular assays to identify proteins and genes that are associated with a particular disease or condition. However, recent advances in DNA sequencing, transcript profiling, protein identification and protein quantification are leading to a flood of genomic and proteomic data that is, or potentially could be, linked to disease data. The quantity of data generated by these high throughput methods is forcing scientists to re-think the way they do traditional drug-target discovery. In particular it is leading them more and more towards identifying potential drug targets using computers. In fact, drug-target identification is now being done as much on the desk-top as on the bench-top. This review focuses on describing how drug-target discovery can be done in silico (i.e. via computer) using a variety of bioinformatic resources that are freely available on the web. Specifically, it highlights a number of web-accessible sequence databases, automated genome annotation tools, text mining tools; and integrated drug/sequence databases that can be used to identify drug targets for both endogenous (genetic and epigenetic) diseases as well as exogenous (infectious) diseases.

PPT-DB: the Protein Property Prediction and Testing Database

Nucleic Acids Research. Jan, 2008  |  Pubmed ID: 17916570

The protein property prediction and testing database (PPT-DB) is a database housing nearly 30 carefully curated databases, each of which contains commonly predicted protein property information. These properties include both structural (i.e. secondary structure, contact order, disulfide pairing) and dynamic (i.e. order parameters, B-factors, folding rates) features that have been measured, derived or tabulated from a variety of sources. PPT-DB is designed to serve two purposes. First it is intended to serve as a centralized, up-to-date, freely downloadable and easily queried repository of predictable or 'derived' protein property data. In this role, PPT-DB can serve as a one-stop, fully standardized repository for developers to obtain the required training, testing and validation data needed for almost any kind of protein property prediction program they may wish to create. The second role that PPT-DB can play is as a tool for homology-based protein property prediction. Users may query PPT-DB with a sequence of interest and have a specific property predicted using a sequence similarity search against PPT-DB's extensive collection of proteins with known properties. PPT-DB exploits the well-known fact that protein structure and dynamic properties are highly conserved between homologous proteins. Predictions derived from PPT-DB's similarity searches are typically 85-95% correct (for categorical predictions, such as secondary structure) or exhibit correlations of >0.80 (for numeric predictions, such as accessible surface area). This performance is 10-20% better than what is typically obtained from standard 'ab initio' predictions. PPT-DB, its prediction utilities and all of its contents are available at

Application of the Random Coil Index to Studying Protein Flexibility

Journal of Biomolecular NMR. Jan, 2008  |  Pubmed ID: 17985196

Protein flexibility lies at the heart of many protein-ligand binding events and enzymatic activities. However, the experimental measurement of protein motions is often difficult, tedious and error-prone. As a result, there is a considerable interest in developing simpler and faster ways of quantifying protein flexibility. Recently, we described a method, called Random Coil Index (RCI), which appears to be able to quantitatively estimate model-free order parameters and flexibility in protein structural ensembles using only backbone chemical shifts. Because of its potential utility, we have undertaken a more detailed investigation of the RCI method in an attempt to ascertain its underlying principles, its general utility, its sensitivity to chemical shift errors, its sensitivity to data completeness, its applicability to other proteins, and its general strengths and weaknesses. Overall, we find that the RCI method is very robust and that it represents a useful addition to traditional methods of studying protein flexibility. We have implemented many of the findings and refinements reported here into a web server that allows facile, automated predictions of model-free order parameters, MD RMSF and NMR RMSD values directly from backbone 1H, 13C and 15N chemical shift assignments. The server is available at

DrugBank: a Knowledgebase for Drugs, Drug Actions and Drug Targets

Nucleic Acids Research. Jan, 2008  |  Pubmed ID: 18048412

DrugBank is a richly annotated resource that combines detailed drug data with comprehensive drug target and drug action information. Since its first release in 2006, DrugBank has been widely used to facilitate in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. The latest version of DrugBank (release 2.0) has been expanded significantly over the previous release. With approximately 4900 drug entries, it now contains 60% more FDA-approved small molecule and biotech drugs including 10% more 'experimental' drugs. Significantly, more protein target data has also been added to the database, with the latest version of DrugBank containing three times as many non-redundant protein or drug target sequences as before (1565 versus 524). Each DrugCard entry now contains more than 100 data fields with half of the information being devoted to drug/chemical data and the other half devoted to pharmacological, pharmacogenomic and molecular biological data. A number of new data fields, including food-drug interactions, drug-drug interactions and experimental ADME data have been added in response to numerous user requests. DrugBank has also significantly improved the power and simplicity of its structure query and text query searches. DrugBank is available at

Metabolomics: a Complementary Tool in Renal Transplantation

Contributions to Nephrology. 2008  |  Pubmed ID: 18401163

Renal transplant success is closely tied to the ability to monitor transplant recipients and responsively change their medications. However, transplant monitoring still depends on relatively dated technologies - serum creatinine levels, urine output, and histopathology of biopsy samples. These techniques do not offer sufficient specificity, sensitivity, or accuracy for appropriate and timely interventions. As a result, more specific diagnostic techniques, based on proteomics, genomics and metabolomics are being sought. Metabolomics (the high-throughput measurement and analysis of metabolites) may make it possible to monitor transplants more effectively and specifically. Changes in the concentration profiles of a number of small molecule metabolites found in either blood or urine can be used to localize kidney damage, assess organs at risk of rejection, assess kidneys suffering from ischemiareperfusion injury or identify organs that have been damaged by immunosuppressive drugs. The application of metabolomics to kidney transplant monitoring is still in its early stages. Nevertheless, there are a number of easily measured metabolites in both urine and serum that can provide reliable indications of kidney function, kidney injury, and immunosuppressive drug toxicity. Metabolomics could serve as a good complement to existing proteomic and genomic technologies.

Identifying Putative Drug Targets and Potential Drug Leads: Starting Points for Virtual Screening and Docking

Methods in Molecular Biology (Clifton, N.J.). 2008  |  Pubmed ID: 18446295

The availability of three-dimensional (3D) models of both drug leads (small molecule ligands) and drug targets (proteins) is essential to molecular docking and computational drug discovery. This chapter describes an emerging methodology that can be used to identify both drug leads and drug targets using three newly developed web-accessible databases: 1) DrugBank; 2) The Human Metabolome Database; and 3) PubChem. Specifically, it illustrates how putative drug targets and drug leads for exogenous diseases (i.e., infectious diseases) can be readily identified and their 3D structures selected using only the genomic sequences from pathogenic bacteria or viruses as input. It also illustrates how putative drug targets and drug leads for endogenous diseases (i.e., non-infectious diseases or chronic conditions) can be identified using similar databases and similar sequence input. This chapter is intended to illustrate how bioinformatics and cheminformatics can work synergistically to help provide the necessary inputs for computer-aided drug design.

PROTEUS2: a Web Server for Comprehensive Protein Structure Prediction and Structure-based Annotation

Nucleic Acids Research. Jul, 2008  |  Pubmed ID: 18483082

PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane beta-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline. Using a combination of progressive multi-sequence alignment, structure-based mapping, hidden Markov models, multi-component neural nets and up-to-date databases of known secondary structure assignments, PROTEUS is able to achieve among the highest reported levels of predictive accuracy for signal peptides (Q2 = 94%), membrane spanning helices (Q2 = 87%) and secondary structure (Q3 score of 81.3%). PROTEUS2's homology modeling services also provide high quality 3D models that compare favorably with those generated by SWISS-MODEL and 3D JigSaw (within 0.2 A RMSD). The average PROTEUS2 prediction takes approximately 3 min per query sequence. The PROTEUS2 server along with source code for many of its modules is accessible a

PolySearch: a Web-based Text Mining System for Extracting Relationships Between Human Diseases, Genes, Mutations, Drugs and Metabolites

Nucleic Acids Research. Jul, 2008  |  Pubmed ID: 18487273

A particular challenge in biomedical text mining is to find ways of handling 'comprehensive' or 'associative' queries such as 'Find all genes associated with breast cancer'. Given that many queries in genomics, proteomics or metabolomics involve these kind of comprehensive searches we believe that a web-based tool that could support these searches would be quite useful. In response to this need, we have developed the PolySearch web server. PolySearch supports >50 different classes of queries against nearly a dozen different types of text, scientific abstract or bioinformatic databases. The typical query supported by PolySearch is 'Given X, find all Y's' where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites. PolySearch also exploits a variety of techniques in text mining and information retrieval to identify, highlight and rank informative abstracts, paragraphs or sentences. PolySearch's performance has been assessed in tasks such as gene synonym identification, protein-protein interaction identification and disease gene identification using a variety of manually assembled 'gold standard' text corpuses. Its f-measure on these tasks is 88, 81 and 79%, respectively. These values are between 5 and 50% better than other published tools. The server is freely available at

The Human Cerebrospinal Fluid Metabolome

Journal of Chromatography. B, Analytical Technologies in the Biomedical and Life Sciences. Aug, 2008  |  Pubmed ID: 18502700

With continuing improvements in analytical technology and an increased interest in comprehensive metabolic profiling of biofluids and tissues, there is a growing need to develop comprehensive reference resources for certain clinically important biofluids, such as blood, urine and cerebrospinal fluid (CSF). As part of our effort to systematically characterize the human metabolome we have chosen to characterize CSF as the first biofluid to be intensively scrutinized. In doing so, we combined comprehensive NMR, gas chromatography-mass spectrometry (GC-MS) and liquid chromatography (LC) Fourier transform-mass spectrometry (FTMS) methods with computer-aided literature mining to identify and quantify essentially all of the metabolites that can be commonly detected (with today's technology) in the human CSF metabolome. Tables containing the compounds, concentrations, spectra, protocols and links to disease associations that we have found for the human CSF metabolome are freely available at

Protein Contact Order Prediction from Primary Sequences

BMC Bioinformatics. May, 2008  |  Pubmed ID: 18513429

Contact order is a topological descriptor that has been shown to be correlated with several interesting protein properties such as protein folding rates and protein transition state placements. Contact order has also been used to select for viable protein folds from ab initio protein structure prediction programs. For proteins of known three-dimensional structure, their contact order can be calculated directly. However, for proteins with unknown three-dimensional structure, there is no effective prediction method currently available.

CS23D: a Web Server for Rapid Protein Structure Generation Using NMR Chemical Shifts and Sequence Data

Nucleic Acids Research. Jul, 2008  |  Pubmed ID: 18515350

CS23D (chemical shift to 3D structure) is a web server for rapidly generating accurate 3D protein structures using only assigned nuclear magnetic resonance (NMR) chemical shifts and sequence data as input. Unlike conventional NMR methods, CS23D requires no NOE and/or J-coupling data to perform its calculations. CS23D accepts chemical shift files in either SHIFTY or BMRB formats, and produces a set of PDB coordinates for the protein in about 10-15 min. CS23D uses a pipeline of several preexisting programs or servers to calculate the actual protein structure. Depending on the sequence similarity (or lack thereof) CS23D uses either (i) maximal subfragment assembly (a form of homology modeling), (ii) chemical shift threading or (iii) shift-aided de novo structure prediction (via Rosetta) followed by chemical shift refinement to generate and/or refine protein coordinates. Tests conducted on more than 100 proteins from the BioMagResBank indicate that CS23D converges (i.e. finds a solution) for >95% of protein queries. These chemical shift generated structures were found to be within 0.2-2.8 A RMSD of the NMR structure generated using conventional NOE-base NMR methods or conventional X-ray methods. The performance of CS23D is dependent on the completeness of the chemical shift assignments and the similarity of the query protein to known 3D folds. CS23D is accessible at

DrugBank and Its Relevance to Pharmacogenomics

Pharmacogenomics. Aug, 2008  |  Pubmed ID: 18681788

DrugBank is a freely available web-enabled database that combines detailed drug data with comprehensive drug-target and drug-action information. It was specifically designed to facilitate in silico drug-target discovery, drug design, drug-metabolism prediction, drug-interaction prediction, and general pharmaceutical education. One of the most unique and useful components of the DrugBank database is the information it contains on drug metabolism, drug-metabolizing enzymes and drug-target polymorphisms. As pharmacogenomics is fundamentally concerned with the role of genes and genetic variation of how an individual responds to a drug, DrugBank is able to offer a convenient venue to explore pharmacogenomic questions in silico. This paper provides a brief overview on DrugBank and how it can facilitate pharmacogenomic research.

PSA Fluoroimmunoassays Using Anti-PSA ScFv and Quantum-dot Conjugates

Nanomedicine (London, England). Aug, 2008  |  Pubmed ID: 18694310

The conjugates of monoclonal antibodies and luminescent nanoparticles (quantum dots [Qdots]) have a large number of potential applications in both fluoroimmunoassays and biological imaging; however, conjugating full-length antibody monoclonal antibodies directly to Qdots or other inorganic nanoparticles often results in the irreversible formation of oligomeric monoclonal antibody-nanoparticle complexes, which leads to dramatically reduced binding activities. This study demonstrated that the use of single-chain antibody fragments (scFvs) appears to have a number of advantages, in terms of solubility, activity, ease of preparation and ease of structure-based genetic engineering.

Applications of Metabolomics in Drug Discovery and Development

Drugs in R&D. 2008  |  Pubmed ID: 18721000

Metabolomics is a relatively new field of 'omics' technology that is primarily concerned with the global or system-wide characterization of small molecule metabolites using technologies such as nuclear magnetic resonance, liquid chromatography and/or mass spectrometry. Its unique focus on small molecules and the physiological effects of small molecules aligns the field of metabolomics very closely with the aims and interests of many researchers in the pharmaceutical industry. Because of its conceptual and technical overlap with many aspects of pharmaceutical research, metabolomics is now finding applications that span almost the full length of the drug discovery and development pipeline, from lead compound discovery to post-approval drug surveillance. This review explores some of the most interesting or significant applications of metabolomics as they relate to pharmaceutical research and development. Specific examples are given that show how metabolomics can be used to facilitate lead compound discovery, to improve biomarker identification (for monitoring disease status and drug efficacy) and to monitor drug metabolism and toxicity. Other applications are also discussed, including the use of metabolomics to facilitate clinical trial testing and to improve post-approval drug monitoring. These examples show that metabolomics potentially offer drug researchers and drug regulators an effective, inexpensive route to addressing many of the riskier or more expensive issues associated with the discovery, development and monitoring of drug products.

MetaboMiner--semi-automated Identification of Metabolites from 2D NMR Spectra of Complex Biofluids

BMC Bioinformatics. 2008  |  Pubmed ID: 19040747

One-dimensional (1D) 1H nuclear magnetic resonance (NMR) spectroscopy is widely used in metabolomic studies involving biofluids and tissue extracts. There are several software packages that support compound identification and quantification via 1D 1H NMR by spectral fitting techniques. Because 1D 1H NMR spectra are characterized by extensive peak overlap or spectral congestion, two-dimensional (2D) NMR, with its increased spectral resolution, could potentially improve and even automate compound identification or quantification. However, the lack of dedicated software for this purpose significantly restricts the application of 2D NMR methods to most metabolomic studies.

Genomic Sequence and Activity of KS10, a Transposable Phage of the Burkholderia Cepacia Complex

BMC Genomics. Dec, 2008  |  Pubmed ID: 19094239

The Burkholderia cepacia complex (BCC) is a versatile group of Gram negative organisms that can be found throughout the environment in sources such as soil, water, and plants. While BCC bacteria can be involved in beneficial interactions with plants, they are also considered opportunistic pathogens, specifically in patients with cystic fibrosis and chronic granulomatous disease. These organisms also exhibit resistance to many antibiotics, making conventional treatment often unsuccessful. KS10 was isolated as a prophage of B. cenocepacia K56-2, a clinically relevant strain of the BCC. Our objective was to sequence the genome of this phage and also determine if this prophage encoded any virulence determinants.

HMDB: a Knowledgebase for the Human Metabolome

Nucleic Acids Research. Jan, 2009  |  Pubmed ID: 18953024

The Human Metabolome Database (HMDB, is a richly annotated resource that is designed to address the broad needs of biochemists, clinical chemists, physicians, medical geneticists, nutritionists and members of the metabolomics community. Since its first release in 2007, the HMDB has been used to facilitate the research for nearly 100 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 2.0) has been significantly expanded and enhanced over the previous release (version 1.0). In particular, the number of fully annotated metabolite entries has grown from 2180 to more than 6800 (a 300% increase), while the number of metabolites with biofluid or tissue concentration data has grown by a factor of five (from 883 to 4413). Similarly, the number of purified compounds with reference to NMR, LC-MS and GC-MS spectra has more than doubled (from 380 to more than 790 compounds). In addition to this significant expansion in database size, many new database searching tools and new data content has been added or enhanced. These include better algorithms for spectral searching and matching, more powerful chemical substructure searches, faster text searching software, as well as dedicated pathway searching tools and customized, clickable metabolic maps. Changes to the user-interface have also been implemented to accommodate future expansion and to make database navigation much easier. These improvements should make the HMDB much more useful to a much wider community of users.

In Silico Identification of Genes in Bacteriophage DNA

Methods in Molecular Biology (Clifton, N.J.). 2009  |  Pubmed ID: 19082552

One of the most satisfying aspects of a genome sequencing project is the identification of the genes contained within it.These are of two types: those which encode tRNAs and those which produce proteins. After a general introduction on the properties of protein-encoding genes and the utility of the Basic Local Alignment Search Tool (BLASTX) to identify genes through homologs, a variety of tools are discussed by their creators. These include for genome annotation: GeneMark, Artemis, and BASys; and, for genome comparisons: Artemis Comparison Tool (ACT), Mauve, CoreGenes, and GeneOrder.

Exploring Human Metabolites Using the Human Metabolome Database

Current Protocols in Bioinformatics. Mar, 2009  |  Pubmed ID: 19274632

The Human Metabolome Database (HMDB) is a Web-based bioinformatic/cheminformatic resource with detailed information about human metabolites and metabolic enzymes. It can be used for fields of study including metabolomics, biochemistry, clinical chemistry, biomarker discovery, medicine, nutrition, and general education. In addition to its comprehensive literature-derived data, the HMDB contains an extensive collection of experimental metabolite concentration data for plasma, urine, CSF, and/or other biofluids The HMDB is fully searchable, with many tools for viewing, sorting and extracting metabolite names, chemical structures, biofluid concentrations, enzymes, genes, NMR or MS spectra, and disease information. Each metabolite entry in the HMDB contains an average of 90 separate data fields including a comprehensive compound description, names and synonyms, chemical structure information, physico-chemical data, reference NMR and MS spectra, normal and abnormal biofluid concentrations, tissue locations, disease associations, pathway information, enzyme data, gene sequence data, and SNP and mutation data, as well as extensive links to images, references and other public databases.

Development of a Novel Virtual Screening Cascade Protocol to Identify Potential Trypanothione Reductase Inhibitors

Journal of Medicinal Chemistry. Mar, 2009  |  Pubmed ID: 19296695

The implementation of a novel sequential computational approach that can be used effectively for virtual screening and identification of prospective ligands that bind to trypanothione reductase (TryR) is reported. The multistep strategy combines a ligand-based virtual screening for building an enriched library of small molecules with a docking protocol (AutoDock, X-Score) for screening against the TryR target. Compounds were ranked by an exhaustive conformational consensus scoring approach that employs a rank-by-rank strategy by combining both scoring functions. Analysis of the predicted ligand-protein interactions highlights the role of bulky quaternary amine moieties for binding affinity. The scaffold hopping (SHOP) process derived from this computational approach allowed the identification of several chemotypes, not previously reported as antiprotozoal agents, which includes dibenzothiepine, dibenzooxathiepine, dibenzodithiepine, and polycyclic cationic structures like thiaazatetracyclo-nonadeca-hexaen-3-ium. Assays measuring the inhibiting effect of these compounds on T. cruzi and T. brucei TryR confirm their potential for further rational optimization.

GeNMR: a Web Server for Rapid NMR-based Protein Structure Determination

Nucleic Acids Research. Jul, 2009  |  Pubmed ID: 19406927

GeNMR (GEnerate NMR structures) is a web server for rapidly generating accurate 3D protein structures using sequence data, NOE-based distance restraints and/or NMR chemical shifts as input. GeNMR accepts distance restraints in XPLOR or CYANA format as well as chemical shift files in either SHIFTY or BMRB formats. The web server produces an ensemble of PDB coordinates for the protein within 15-25 min, depending on model complexity and completeness of experimental restraints. GeNMR uses a pipeline of several pre-existing programs and servers to calculate the actual protein structure. In particular, GeNMR combines genetic algorithms for structure optimization along with homology modeling, chemical shift threading, torsion angle and distance predictions from chemical shifts/NOEs as well as ROSETTA-based structure generation and simulated annealing with XPLOR-NIH to generate and/or refine protein coordinates. GeNMR greatly simplifies the task of protein structure determination as users do not have to install or become familiar with complex stand-alone programs or obscure format conversion utilities. Tests conducted on a sample of 90 proteins from the BioMagResBank indicate that GeNMR produces high-quality models for all protein queries, regardless of the type of NMR input data. GeNMR was developed to facilitate rapid, user-friendly structure determination of protein structures via NMR spectroscopy. GeNMR is accessible at

Spatiotemporal Integration of Molecular and Anatomical Data in Virtual Reality Using Semantic Mapping

International Journal of Nanomedicine. 2009  |  Pubmed ID: 19421373

We have developed a computational framework for spatiotemporal integration of molecular and anatomical datasets in a virtual reality environment. Using two case studies involving gene expression data and pharmacokinetic data, respectively, we demonstrate how existing knowledge bases for molecular data can be semantically mapped onto a standardized anatomical context of human body. Our data mapping methodology uses ontological representations of heterogeneous biomedical datasets and an ontology reasoner to create complex semantic descriptions of biomedical processes. This framework provides a means to systematically combine an increasing amount of biomedical imaging and numerical data into spatiotemporally coherent graphical representations. Our work enables medical researchers with different expertise to simulate complex phenomena visually and to develop insights through the use of shared data, thus paving the way for pathological inference, developmental pattern discovery and biomedical hypothesis testing.

MetaboAnalyst: a Web Server for Metabolomic Data Analysis and Interpretation

Nucleic Acids Research. Jul, 2009  |  Pubmed ID: 19429898

Metabolomics is a newly emerging field of 'omics' research that is concerned with characterizing large numbers of metabolites using NMR, chromatography and mass spectrometry. It is frequently used in biomarker identification and the metabolic profiling of cells, tissues or organisms. The data processing challenges in metabolomics are quite unique and often require specialized (or expensive) data analysis software and a detailed knowledge of cheminformatics, bioinformatics and statistics. In an effort to simplify metabolomic data analysis while at the same time improving user accessibility, we have developed a freely accessible, easy-to-use web server for metabolomic data analysis called MetaboAnalyst. Fundamentally, MetaboAnalyst is a web-based metabolomic data processing tool not unlike many of today's web-based microarray analysis packages. It accepts a variety of input data (NMR peak lists, binned spectra, MS peak lists, compound/concentration data) in a wide variety of formats. It also offers a number of options for metabolomic data processing, data normalization, multivariate statistical analysis, graphing, metabolite identification and pathway mapping. In particular, MetaboAnalyst supports such techniques as: fold change analysis, t-tests, PCA, PLS-DA, hierarchical clustering and a number of more sophisticated statistical or machine learning methods. It also employs a large library of reference spectra to facilitate compound identification from most kinds of input spectra. MetaboAnalyst guides users through a step-by-step analysis pipeline using a variety of menus, information hyperlinks and check boxes. Upon completion, the server generates a detailed report describing each method used, embedded with graphical and tabular outputs. MetaboAnalyst is capable of handling most kinds of metabolomic data and was designed to perform most of the common kinds of metabolomic data analyses. MetaboAnalyst is accessible at

Solid Phase Synthesis of Acylglycine Human Metabolites

Bioorganic & Medicinal Chemistry Letters. Dec, 2009  |  Pubmed ID: 19836947

Acylglycines represents a large and important class of human metabolites. They are often used in medicine to identify fatty acid oxidation disorders. A highly efficient solid phase synthesis approach to obtain these clinically important compounds is developed via coupling reaction between glycine-preloaded Wang resin and a set of carboxylic acids. The developed methodology facilitates the preparation of several structurally-diverse acylglycines with high yields and purity.

Computational Strategies for Metabolite Identification in Metabolomics

Bioanalysis. Dec, 2009  |  Pubmed ID: 21083105

Most metabolomic data are characterized by complex spectra or chromatograms containing hundreds of peaks or features. While there are many methods for aligning or comparing these spectral features, there are few approaches for actually identifying which peaks match to which compounds. Indeed, one of the biggest unmet needs in the field of metabolomics lies in the problem of compound identification. This review describes some of the newly emerging computational strategies in metabolomics that are being used to aid in the identification of metabolites from biofluid mixtures analyzed by NMR and MS. The most successful compound-identification strategies typically involve matching spectral features of the unknown compound(s) to curated spectral databases of reference compounds. This approach is known as the identification of 'known unknowns'. However, the identification of truly novel compounds (the 'unknown unknowns') is particularly challenging and requires the use of computer-aided structure elucidation methods being applied to the purified compound. The strengths and limitations of these approaches as applied to different analytical technologies (GC-MS, LC-MS and NMR) will be discussed, as will prospects for potential improvements to existing strategies.

Evidence for Copurification of Micronuclei in Sucrose Density Gradient-enriched Plasma Membranes from Cell Lines

Analytical Biochemistry. Jan, 2010  |  Pubmed ID: 19699175

Sucrose density gradient-enriched membrane preparations and membrane fraction enrichment through affinity purification techniques are commonly used in proteomic analysis. However, published proteomic profiles characterized by the above methods show the presence of nuclear proteins in addition to membrane proteins. While shuttling of nuclear proteins across cellular compartments and their transient residency at membrane interfaces could explain some of these observations, the presence of nuclear proteins in proteomic profiles generated with crude and enriched membranes could be the result of nonspecific contamination of nuclear debris during cell fractionation procedures. We hypothesized that micronuclei arising from the genomic instability inherent to cancer cells may copurify with plasma membrane fractions on sucrose gradients. Using sucrose gradient-enriched plasma membranes from breast cancer cell lines derived from the MCF-7 cell line, we provide experimental evidence to indicate that micronuclei are present in fresh preparations of plasma membranes. The origin of these micronuclei was traced to budding of nuclei in intact cells. Furthermore, mass spectrometric analysis confirmed the presence of nuclear proteins as well as membrane and associated signaling proteins in sucrose gradient-enriched preparations.

T3DB: a Comprehensively Annotated Database of Common Toxins and Their Targets

Nucleic Acids Research. Jan, 2010  |  Pubmed ID: 19897546

In an effort to capture meaningful biological, chemical and mechanistic information about clinically relevant, commonly encountered or important toxins, we have developed the Toxin and Toxin-Target Database (T3DB). The T3DB is a unique bioinformatics resource that compiles comprehensive information about common or ubiquitous toxins and their toxin-targets into a single electronic repository. The database currently contains over 2900 small molecule and peptide toxins, 1300 toxin-targets and more than 33,000 toxin-target associations. Each T3DB record (ToxCard) contains over 80 data fields providing detailed information on chemical properties and descriptors, toxicity values, protein and gene sequences (for both targets and toxins), molecular and cellular interaction data, toxicological data, mechanistic information and references. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, government documents, textbooks and scientific journals. A key focus of the T3DB is on providing 'depth' over 'breadth' with detailed descriptions, mechanisms of action, and information on toxins and toxin-targets. T3DB is fully searchable and supports extensive text, sequence, chemical structure and relational query searches, similar to those found in the Human Metabolome Database (HMDB) and DrugBank. Potential applications of the T3DB include clinical metabolomics, toxin target prediction, toxicity prediction and toxicology education. The T3DB is available online at

SMPDB: The Small Molecule Pathway Database

Nucleic Acids Research. Jan, 2010  |  Pubmed ID: 19948758

The Small Molecule Pathway Database (SMPDB) is an interactive, visual database containing more than 350 small-molecule pathways found in humans. More than 2/3 of these pathways (>280) are not found in any other pathway database. SMPDB is designed specifically to support pathway elucidation and pathway discovery in clinical metabolomics, transcriptomics, proteomics and systems biology. SMPDB provides exquisitely detailed, hyperlinked diagrams of human metabolic pathways, metabolic disease pathways, metabolite signaling pathways and drug-action pathways. All SMPDB pathways include information on the relevant organs, organelles, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Each small molecule is hyperlinked to detailed descriptions contained in the Human Metabolome Database (HMDB) or DrugBank and each protein or enzyme complex is hyperlinked to UniProt. All SMPDB pathways are accompanied with detailed descriptions, providing an overview of the pathway, condition or processes depicted in each diagram. The database is easily browsed and supports full text searching. Users may query SMPDB with lists of metabolite names, drug names, genes/protein names, SwissProt IDs, GenBank IDs, Affymetrix IDs or Agilent microarray IDs. These queries will produce lists of matching pathways and highlight the matching molecules on each of the pathway diagrams. Gene, metabolite and protein concentration data can also be visualized through SMPDB's mapping interface. All of SMPDB's images, image maps, descriptions and tables are downloadable. SMPDB is available at:

Computational Approaches to Metabolomics

Methods in Molecular Biology (Clifton, N.J.). 2010  |  Pubmed ID: 19957155

This chapter is intended to familiarize readers with the field of metabolomics and some of the algorithms, data analysis strategies, and computer programs used to analyze or interpret metabolomic data. Specifically, this chapter provides a brief overview of the experimental approaches and applications of metabolomics followed by a description of the spectral and statistical analysis tools for metabolomics. The chapter concludes with a discussion of the resources that can be used to interpret and analyze metabolomic data at a biological or clinical level. Emerging needs, challenges, and recent progress being made in these areas are also discussed.

A Probabilistic Approach for Validating Protein NMR Chemical Shift Assignments

Journal of Biomolecular NMR. Jun, 2010  |  Pubmed ID: 20446018

It has been estimated that more than 20% of the proteins in the BMRB are improperly referenced and that about 1% of all chemical shift assignments are mis-assigned. These statistics also reflect the likelihood that any newly assigned protein will have shift assignment or shift referencing errors. The relatively high frequency of these errors continues to be a concern for the biomolecular NMR community. While several programs do exist to detect and/or correct chemical shift mis-referencing or chemical shift mis-assignments, most can only do one, or the other. The one program (SHIFTCOR) that is capable of handling both chemical shift mis-referencing and mis-assignments, requires the 3D structure coordinates of the target protein. Given that chemical shift mis-assignments and chemical shift re-referencing issues should ideally be addressed prior to 3D structure determination, there is a clear need to develop a structure-independent approach. Here, we present a new structure-independent protocol, which is based on using residue-specific and secondary structure-specific chemical shift distributions calculated over small (3-6 residue) fragments to identify mis-assigned resonances. The method is also able to identify and re-reference mis-referenced chemical shift assignments. Comparisons against existing re-referencing or mis-assignment detection programs show that the method is as good or superior to existing approaches. The protocol described here has been implemented into a freely available Java program called "Probabilistic Approach for protein Nmr Assignment Validation (PANAV)" and as a web server ( ) which can be used to validate and/or correct as well as re-reference assigned protein chemical shifts.

MSEA: a Web-based Tool to Identify Biologically Meaningful Patterns in Quantitative Metabolomic Data

Nucleic Acids Research. Jul, 2010  |  Pubmed ID: 20457745

Gene set enrichment analysis (GSEA) is a widely used technique in transcriptomic data analysis that uses a database of predefined gene sets to rank lists of genes from microarray studies to identify significant and coordinated changes in gene expression data. While GSEA has been playing a significant role in understanding transcriptomic data, no similar tools are currently available for understanding metabolomic data. Here, we introduce a web-based server, called Metabolite Set Enrichment Analysis (MSEA), to help researchers identify and interpret patterns of human or mammalian metabolite concentration changes in a biologically meaningful context. Key to the development of MSEA has been the creation of a library of approximately 1000 predefined metabolite sets covering various metabolic pathways, disease states, biofluids, and tissue locations. MSEA also supports user-defined or custom metabolite sets for more specialized analysis. MSEA offers three different enrichment analyses for metabolomic studies including overrepresentation analysis (ORA), single sample profiling (SSP) and quantitative enrichment analysis (QEA). ORA requires only a list of compound names, while SSP and QEA require both compound names and compound concentrations. MSEA generates easily understood graphs or tables embedded with hyperlinks to relevant pathway images and disease descriptors. For non-mammalian or more specialized metabolomic studies, MSEA allows users to provide their own metabolite sets for enrichment analysis. The MSEA server also supports conversion between metabolite common names, synonyms, and major database identifiers. MSEA has the potential to help users identify obvious as well as 'subtle but coordinated' changes among a group of related metabolites that may go undetected with conventional approaches. MSEA is freely available at

Transcriptional Response of E. Coli Upon FimH-mediated Fimbrial Adhesion

Gene Regulation and Systems Biology. Mar, 2010  |  Pubmed ID: 20458372

Functionalities which may be genetically programmed into a bacterium are limited by its range of possible activities and its sensory capabilities. Therefore, enhancing the bacterial sensory repertoire is a crucial step for expanded utility in potential biomedical, industrial or environmental applications. Using microarray and qRT-PCR analyses, we have investigated transcription in E. coli (strain CSH50) following FimH-mediated adhesion to biocompatible substrates. Specifically, wild-type FimH-mediated adhesion of E. coli to mannose agarose beads and His-tagged FimH-mediated adhesion to Ni(2+)-NTA beads both led to induction of ahpCF, dps, grxA and marRAB genes among bound cells relative to unbound cells. The strongly-induced genes are known to be regulated by OxyR or SoxS cytoplasmic redox sensors. Some differentially altered genes also overlapped with those implicated in biofilm formation. This study provides an insight into transcriptional events following FimH-mediated adhesion and may provide a platform for elucidation of the signaling circuit necessary for engineering a synthetic attachment response in E. coli.

PROSESS: a Protein Structure Evaluation Suite and Server

Nucleic Acids Research. Jul, 2010  |  Pubmed ID: 20460469

PROSESS (PROtein Structure Evaluation Suite and Server) is a web server designed to evaluate and validate protein structures generated by X-ray crystallography, NMR spectroscopy or computational modeling. While many structure evaluation packages have been developed over the past 20 years, PROSESS is unique in its comprehensiveness, its capacity to evaluate X-ray, NMR and predicted structures as well as its ability to evaluate a variety of experimental NMR data. PROSESS integrates a variety of previously developed, well-known and thoroughly tested methods to evaluate both global and residue specific: (i) covalent and geometric quality; (ii) non-bonded/packing quality; (iii) torsion angle quality; (iv) chemical shift quality and (v) NOE quality. In particular, PROSESS uses VADAR for coordinate, packing, H-bond, secondary structure and geometric analysis, GeNMR for calculating folding, threading and solvent energetics, ShiftX for calculating chemical shift correlations, RCI for correlating structure mobility to chemical shift and PREDITOR for calculating torsion angle-chemical shifts agreement. PROSESS also incorporates several other programs including MolProbity to assess atomic clashes, Xplor-NIH to identify and quantify NOE restraint violations and NAMD to assess structure energetics. PROSESS produces detailed tables, explanations, structural images and graphs that summarize the results and compare them to values observed in high-quality or high-resolution protein structures. Using a simplified red-amber-green coloring scheme PROSESS also alerts users about both general and residue-specific structural problems. PROSESS is intended to serve as a tool that can be used by structure biologists as well as database curators to assess and validate newly determined protein structures. PROSESS is freely available at

MetPA: a Web-based Metabolomics Tool for Pathway Analysis and Visualization

Bioinformatics (Oxford, England). Sep, 2010  |  Pubmed ID: 20628077

MetPA (Metabolomics Pathway Analysis) is a user-friendly, web-based tool dedicated to the analysis and visualization of metabolomic data within the biological context of metabolic pathways. MetPA combines several advanced pathway enrichment analysis procedures along with the analysis of pathway topological characteristics to help identify the most relevant metabolic pathways involved in a given metabolomic study. The results are presented in a Google-map style network visualization system that supports intuitive and interactive data exploration through point-and-click, dragging and lossless zooming. Additional features include a comprehensive compound library for metabolite name conversion, automatic generation of analysis report, as well as the implementation of various univariate statistical procedures that can be accessed when users click on any metabolite node on a pathway map. MetPA currently enables analysis and visualization of 874 metabolic pathways, covering 11 common model organisms.

DrugBank 3.0: a Comprehensive Resource for 'omics' Research on Drugs

Nucleic Acids Research. Jan, 2011  |  Pubmed ID: 21059682

DrugBank ( is a richly annotated database of drug and drug target information. It contains extensive data on the nomenclature, ontology, chemistry, structure, function, action, pharmacology, pharmacokinetics, metabolism and pharmaceutical properties of both small molecule and large molecule (biotech) drugs. It also contains comprehensive information on the target diseases, proteins, genes and organisms on which these drugs act. First released in 2006, DrugBank has become widely used by pharmacists, medicinal chemists, pharmaceutical researchers, clinicians, educators and the general public. Since its last update in 2008, DrugBank has been greatly expanded through the addition of new drugs, new targets and the inclusion of more than 40 new data fields per drug entry (a 40% increase in data 'depth'). These data field additions include illustrated drug-action pathways, drug transporter data, drug metabolite data, pharmacogenomic data, adverse drug response data, ADMET data, pharmacokinetic data, computed property data and chemical classification data. DrugBank 3.0 also offers expanded database links, improved search tools for drug-drug and food-drug interaction, new resources for querying and viewing drug pathways and hundreds of new drug entries with detailed patent, pricing and manufacturer data. These additions have been complemented by enhancements to the quality and quantity of existing data, particularly with regard to drug target, drug description and drug action data. DrugBank 3.0 represents the result of 2 years of manual annotation work aimed at making the database much more useful for a wide range of 'omics' (i.e. pharmacogenomic, pharmacoproteomic, pharmacometabolomic and even pharmacoeconomic) applications.

Detailed Biophysical Characterization of the Acid-induced PrP(c) to PrP(β) Conversion Process

Biochemistry. Feb, 2011  |  Pubmed ID: 21189021

Prions are believed to spontaneously convert from a native, monomeric highly helical form (called PrP(c)) to a largely β-sheet-rich, multimeric and insoluble aggregate (called PrP(sc)). Because of its large size and insolubility, biophysical characterization of PrP(sc) has been difficult, and there are several contradictory or incomplete models of the PrP(sc) structure. A β-sheet-rich, soluble intermediate, called PrP(β), exhibits many of the same features as PrP(sc) and can be generated using a combination of low pH and/or mild denaturing conditions. Studies of the PrP(c) to PrP(β) conversion process and of PrP(β) folding intermediates may provide insights into the structure of PrP(sc). Using a truncated, recombinant version of Syrian hamster PrP(β) (shPrP(90-232)), we used NMR spectroscopy, in combination with other biophysical techniques (circular dichroism, dynamic light scattering, electron microscopy, fluorescence spectroscopy, mass spectrometry, and proteinase K digestion), to characterize the pH-driven PrP(c) to PrP(β) conversion process in detail. Our results show that below pH 2.8 the protein oligomerizes and conversion to the β-rich structure is initiated. At pH 1.7 and above, the oligomeric protein can recover its native monomeric state through dialysis to pH 5.2. However, when conversion is completed at pH 1.0, the large oligomer "locks down" irreversibly into a stable, β-rich form. At pH values above 3.0, the protein is amenable to NMR investigation. Chemical shift perturbations, NOE, amide line width, and T(2) measurements implicate the putative "amylome motif" region, "NNQNNF" as the region most involved in the initial helix-to-β conversion phase. We also found that acid-induced PrP(β) oligomers could be converted to fibrils without the use of chaotropic denaturants. The latter finding represents one of the first examples wherein physiologically accessible conditions (i.e., only low pH) were used to achieve PrP conversion and fibril formation.

Interpreting Protein Chemical Shift Data

Progress in Nuclear Magnetic Resonance Spectroscopy. Feb, 2011  |  Pubmed ID: 21241884

The Human Serum Metabolome

PloS One. Feb, 2011  |  Pubmed ID: 21359215

Continuing improvements in analytical technology along with an increased interest in performing comprehensive, quantitative metabolic profiling, is leading to increased interest pressures within the metabolomics community to develop centralized metabolite reference resources for certain clinically important biofluids, such as cerebrospinal fluid, urine and blood. As part of an ongoing effort to systematically characterize the human metabolome through the Human Metabolome Project, we have undertaken the task of characterizing the human serum metabolome. In doing so, we have combined targeted and non-targeted NMR, GC-MS and LC-MS methods with computer-aided literature mining to identify and quantify a comprehensive, if not absolutely complete, set of metabolites commonly detected and quantified (with today's technology) in the human serum metabolome. Our use of multiple metabolomics platforms and technologies allowed us to substantially enhance the level of metabolome coverage while critically assessing the relative strengths and weaknesses of these platforms or technologies. Tables containing the complete set of 4229 confirmed and highly probable human serum compounds, their concentrations, related literature references and links to their known disease associations are freely available at

Towards Automatic Metabolomic Profiling of High-resolution One-dimensional Proton NMR Spectra

Journal of Biomolecular NMR. Apr, 2011  |  Pubmed ID: 21360156

Nuclear magnetic resonance (NMR) and Mass Spectroscopy (MS) are the two most common spectroscopic analytical techniques employed in metabolomics. The large spectral datasets generated by NMR and MS are often analyzed using data reduction techniques like Principal Component Analysis (PCA). Although rapid, these methods are susceptible to solvent and matrix effects, high rates of false positives, lack of reproducibility and limited data transferability from one platform to the next. Given these limitations, a growing trend in both NMR and MS-based metabolomics is towards targeted profiling or "quantitative" metabolomics, wherein compounds are identified and quantified via spectral fitting prior to any statistical analysis. Despite the obvious advantages of this method, targeted profiling is hindered by the time required to perform manual or computer-assisted spectral fitting. In an effort to increase data analysis throughput for NMR-based metabolomics, we have developed an automatic method for identifying and quantifying metabolites in one-dimensional (1D) proton NMR spectra. This new algorithm is capable of using carefully constructed reference spectra and optimizing thousands of variables to reconstruct experimental NMR spectra of biofluids using rules and concepts derived from physical chemistry and NMR theory. The automated profiling program has been tested against spectra of synthetic mixtures as well as biological spectra of urine, serum and cerebral spinal fluid (CSF). Our results indicate that the algorithm can correctly identify compounds with high fidelity in each biofluid sample (except for urine). Furthermore, the metabolite concentrations exhibit a very high correlation with both simulated and manually-detected values.

SHIFTX2: Significantly Improved Protein Chemical Shift Prediction

Journal of Biomolecular NMR. May, 2011  |  Pubmed ID: 21448735

A new computer program, called SHIFTX2, is described which is capable of rapidly and accurately calculating diamagnetic (1)H, (13)C and (15)N chemical shifts from protein coordinate data. Compared to its predecessor (SHIFTX) and to other existing protein chemical shift prediction programs, SHIFTX2 is substantially more accurate (up to 26% better by correlation coefficient with an RMS error that is up to 3.3× smaller) than the next best performing program. It also provides significantly more coverage (up to 10% more), is significantly faster (up to 8.5×) and capable of calculating a wider variety of backbone and side chain chemical shifts (up to 6×) than many other shift predictors. In particular, SHIFTX2 is able to attain correlation coefficients between experimentally observed and predicted backbone chemical shifts of 0.9800 ((15)N), 0.9959 ((13)Cα), 0.9992 ((13)Cβ), 0.9676 ((13)C'), 0.9714 ((1)HN), 0.9744 ((1)Hα) and RMS errors of 1.1169, 0.4412, 0.5163, 0.5330, 0.1711, and 0.1231 ppm, respectively. The correlation between SHIFTX2's predicted and observed side chain chemical shifts is 0.9787 ((13)C) and 0.9482 ((1)H) with RMS errors of 0.9754 and 0.1723 ppm, respectively. SHIFTX2 is able to achieve such a high level of accuracy by using a large, high quality database of training proteins (>190), by utilizing advanced machine learning techniques, by incorporating many more features (χ(2) and χ(3) angles, solvent accessibility, H-bond geometry, pH, temperature), and by combining sequence-based with structure-based chemical shift prediction techniques. With this substantial improvement in accuracy we believe that SHIFTX2 will open the door to many long-anticipated applications of chemical shift prediction to protein structure determination, refinement and validation. SHIFTX2 is available both as a standalone program and as a web server ( ).

Metabolomic Data Processing, Analysis, and Interpretation Using MetaboAnalyst

Current Protocols in Bioinformatics. Jun, 2011  |  Pubmed ID: 21633943

MetaboAnalyst is a comprehensive, Web-based tool designed for processing, analyzing, and interpreting metabolomic data. It handles most of the common metabolomic data types including compound concentration lists, spectral bin lists, peak lists, and raw MS spectra. In addition to providing a variety of data processing and normalization procedures, MetaboAnalyst supports a number of data-analysis tasks using a range of univariate, multivariate, and machine-learning methods. MetaboAnalyst also offers two newly developed approaches-Metabolite Set Enrichment Analysis (MSEA) and Metabolic Pathway Analysis (MetPA)-for metabolomic data interpretation. MSEA helps detect biologically meaningful metabolite sets that have been enriched in human metabolomic studies, while MetPA allows users to identify any metabolic pathways that have been perturbed. MetaboAnalyst enables facile interactive exploration and visualization of nearly all of its results. At the end of each session, it produces a detailed analysis report with graphical, tabular, and textual output that summarizes each analytical method used and each result generated.

Web-based Inference of Biological Patterns, Functions and Pathways from Metabolomic Data Using MetaboAnalyst

Nature Protocols. Jun, 2011  |  Pubmed ID: 21637195

MetaboAnalyst is an integrated web-based platform for comprehensive analysis of quantitative metabolomic data. It is designed to be used by biologists (with little or no background in statistics) to perform a variety of complex metabolomic data analysis tasks. These include data processing, data normalization, statistical analysis and high-level functional interpretation. This protocol provides a step-wise description on how to format and upload data to MetaboAnalyst, how to process and normalize data, how to identify significant features and patterns through univariate and multivariate statistical methods and, finally, how to use metabolite set enrichment analysis and metabolic pathway analysis to help elucidate possible biological mechanisms. The complete protocol can be executed in approximately 45 min.

PHAST: a Fast Phage Search Tool

Nucleic Acids Research. Jul, 2011  |  Pubmed ID: 21672955

PHAge Search Tool (PHAST) is a web server designed to rapidly and accurately identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids. It accepts either raw DNA sequence data or partially annotated GenBank formatted data and rapidly performs a number of database comparisons as well as phage 'cornerstone' feature identification steps to locate, annotate and display prophage sequences and prophage features. Relative to other prophage identification tools, PHAST is up to 40 times faster and up to 15% more sensitive. It is also able to process and annotate both raw DNA sequence data and Genbank files, provide richly annotated tables on prophage features and prophage 'quality' and distinguish between intact and incomplete prophage. PHAST also generates downloadable, high quality, interactive graphics that display all identified prophage components in both circular and linear genomic views. PHAST is available at (

MetATT: a Web-based Metabolomics Tool for Analyzing Time-series and Two-factor Datasets

Bioinformatics (Oxford, England). Sep, 2011  |  Pubmed ID: 21712247

Time-series and multifactor studies have become increasingly common in metabolomic studies. Common tasks for analyzing data from these relatively complex experiments include identification of major variations associated with each experimental factor, comparison of temporal profiles across different biological conditions, as well as detection and validation of the presence of interactions. Here we introduce MetATT, a web-based tool for time-series and two-factor metabolomic data analysis. MetATT offers a number of complementary approaches including 3D interactive principal component analysis, two-way heatmap visualization, two-way ANOVA, ANOVA-simultaneous component analysis and multivariate empirical Bayes time-series analysis. These procedures are presented through an intuitive web interface. At the end of each session, a detailed analysis report is generated to facilitate understanding of the results.

Relative and Regional Stabilities of the Hamster, Mouse, Rabbit, and Bovine Prion Proteins Toward Urea Unfolding Assessed by Nuclear Magnetic Resonance and Circular Dichroism Spectroscopies

Biochemistry. Sep, 2011  |  Pubmed ID: 21800884

The residue-specific urea-induced unfolding patterns of recombinant prion proteins from different species (bovine, rabbit, mouse, and Syrian hamster) were monitored using high-resolution (1)H nuclear magnetic resonance (NMR) spectroscopy. Protein constructs of different lengths, and with and without a His tag attached at the N-terminus, were studied. The various species showed different overall sensitivities toward urea denaturation with stabilities in the following order: hamster ≤ mouse < rabbit < bovine protein. This order is in agreement with recent circular dichroism (CD) spectroscopic measurements for several species [Khan, M. Q. (2010) Proc. Natl. Acad. Sci. U.S.A.107, 19808-19813] and for the bovine protein presented herein. The [urea](1/2) values determined by CD spectroscopy parallel those of the most stable residues observed by NMR spectroscopy. Neither the longer constructs containing an additional hydrophobic region nor the His tag influenced the stability of the structured domain of the constructs studied. The effect of the S174N mutation in rabbit PrP(C) was also investigated. The rank order of the regional stabilities within each protein remained the same for all species. In particular, the residues in the β-sheet region in all four species were more sensitive to urea-induced unfolding than residues in the α2 and α3 helical regions. These observations indicate that the regional specific unfolding pattern is the same for the four mammalian prion proteins studied but militate against the idea that PrP(Sc) formation is linked with the global stability of PrP(C).

Advances in Metabolite Identification

Bioanalysis. Aug, 2011  |  Pubmed ID: 21827274

One of the central challenges to metabolomics is metabolite identification. Regardless of whether one uses so-called 'targeted' or 'untargeted' metabolomics, eventually all paths lead to the requirement of identifying (and quantifying) certain key metabolites. Indeed, without metabolite identification, the results of any metabolomic analysis are biologically and chemically uninterpretable. Given the chemical diversity of most metabolomes and the character of most metabolomic data, metabolite identification is intrinsically difficult. Consequently a great deal of effort in metabolomics over the past decade has been focused on making metabolite identification better, faster and cheaper. This review describes some of the newly emerging techniques or technologies in metabolomics that are making metabolite identification easier and more robust. In particular, it focuses on advances in metabolite identification that have occurred over the past 2 to 3 years concerning the technologies, methodologies and software as applied to NMR, MS and separation science. The strengths and limitations of some of these approaches are discussed along with some of the important trends in metabolite identification.

The Prion Protein Binds Thiamine

The FEBS Journal. Nov, 2011  |  Pubmed ID: 21848803

Although highly conserved throughout evolution, the exact biological function of the prion protein is still unclear. In an effort to identify the potential biological functions of the prion protein we conducted a small-molecule screening assay using the Syrian hamster prion protein [shPrP(90-232)]. The screen was performed using a library of 149 water-soluble metabolites that are known to pass through the blood-brain barrier. Using a combination of 1D NMR, fluorescence quenching and surface plasmon resonance we identified thiamine (vitamin B1) as a specific prion ligand with a binding constant of ~60 μM. Subsequent studies showed that this interaction is evolutionarily conserved, with similar binding constants being seen for mouse, hamster and human prions. Various protein construct lengths, both with and without the unstructured N-terminal region in the presence and absence of copper, were examined. This indicates that the N-terminus has no influence on the protein's ability to interact with thiamine. In addition to thiamine, the more biologically abundant forms of vitamin B1 (thiamine monophosphate and thiamine diphosphate) were also found to bind the prion protein with similar affinity. Heteronuclear NMR experiments were used to determine thiamine's interaction site, which is located between helix 1 and the preceding loop. These data, in conjunction with computer-aided docking and molecular dynamics, were used to model the thiamine-binding pharmacophore and a comparison with other thiamine binding proteins was performed to reveal the common features of interaction.

Comparative Analysis of Essential Collective Dynamics and NMR-derived Flexibility Profiles in Evolutionarily Diverse Prion Proteins

Prion. Jul-Sep, 2011  |  Pubmed ID: 21869604

Collective motions on ns-μs time scales are known to have a major impact on protein folding, stability, binding and enzymatic efficiency. It is also believed that these motions may have an important role in the early stages of prion protein misfolding and prion disease. In an effort to accurately characterize these motions and their potential influence on the misfolding and prion disease transmissibility we have conducted a combined analysis of molecular dynamic simulations and NMR-derived flexibility measurements over a diverse range of prion proteins. Using a recently developed numerical formalism, we have analyzed the essential collective dynamics (ECD) for prion proteins from 8 different species including human, cow, elk, cat, hamster, chicken, turtle and frog. We also compared the numerical results with flexibility profiles generated by the random coil index (RCI) from NMR chemical shifts. Prion protein backbone flexibility derived from experimental NMR data and from theoretical computations show strong agreement with each other, demonstrating that it is possible to predict the observed RCI profiles employing the numerical ECD formalism. Interestingly, flexibility differences in the loop between second beta strand (S2) and the second alpha helix (HB) appear to distinguish prion proteins from species that are susceptible to prion disease and those that are resistant. Our results show that the different levels of flexibility in the S2-HB loop in various species are predictable via the ECD method, indicating that ECD may be used to identify disease resistant variants of prion proteins, as well as the influence of prion proteins mutations on disease susceptibility or misfolding propensity.

Intermolecular Transmission of Superoxide Dismutase 1 Misfolding in Living Cells

Proceedings of the National Academy of Sciences of the United States of America. Sep, 2011  |  Pubmed ID: 21930926

Human wild-type superoxide dismutase-1 (wtSOD1) is known to coaggregate with mutant SOD1 in familial amyotrophic lateral sclerosis (FALS), in double transgenic models of FALS, and in cell culture systems, but the structural determinants of this process are unclear. Here we molecularly dissect the effects of intracellular and cell-free obligately misfolded SOD1 mutant proteins on natively structured wild-type SOD1. Expression of the enzymatically inactive, natural familial ALS SOD1 mutations G127X and G85R in human mesenchymal and neural cell lines induces misfolding of wild-type natively structured SOD1, as indicated by: acquisition of immunoreactivity with SOD1 misfolding-specific monoclonal antibodies; markedly enhanced protease sensitivity suggestive of structural loosening; and nonnative disulfide-linked oligomer and multimer formation. Expression of G127X and G85R in mouse cell lines did not induce misfolding of murine wtSOD1, and a species restriction element for human wtSOD1 conversion was mapped to a region of sequence divergence in loop II and β-strand 3 of the SOD1 β-barrel (residues 24-36), then further refined surprisingly to a single tryptophan residue at codon 32 (W32) in human SOD1. Time course experiments enabled by W32 restriction revealed that G127X and misfolded wtSOD1 can induce misfolding of cell-endogenous wtSOD1. Finally, aggregated recombinant G127X is capable of inducing misfolding and protease sensitivity of recombinant human wtSOD1 in a cell-free system containing reducing and chelating agents; cell-free wtSOD1 conversion was also restricted by W32. These observations demonstrate that misfolded SOD1 can induce misfolding of natively structured wtSOD1 in a physiological intracellular milieu, consistent with a direct protein-protein interaction.

The Bacterial Nanorecorder: Engineering E. Coli to Function As a Chemical Recording Device

PloS One. 2011  |  Pubmed ID: 22132112

Synthetic biology is an emerging branch of molecular biology that uses synthetic genetic constructs to create man-made cells or organisms that are capable of performing novel and/or useful applications. Using a synthetic chemically sensitive genetic toggle switch to activate appropriate fluorescent protein indicators (GFP, RFP) and a cell division inhibitor (minC), we have created a novel E. coli strain that can be used as a highly specific, yet simple and inexpensive chemical recording device. This biological "nanorecorder" can be used to determine both the type and the time at which a brief chemical exposure event has occurred. In particular, we show that the short-term exposure (15-30 min) of cells harboring this synthetic genetic circuit to small molecule signals (anhydrotetracycline or IPTG) triggered long-term and uniform cell elongation, with cell length being directly proportional to the time elapsed following a brief chemical exposure. This work demonstrates that facile modification of an existing genetic toggle switch can be exploited to generate a robust, biologically-based "nanorecorder" that could potentially be adapted to detect, respond and record a wide range of chemical stimuli that may vary over time and space.

YMDB: the Yeast Metabolome Database

Nucleic Acids Research. Jan, 2012  |  Pubmed ID: 22064855

The Yeast Metabolome Database (YMDB, is a richly annotated 'metabolomic' database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry.

BacMap: an Up-to-date Electronic Atlas of Annotated Bacterial Genomes

Nucleic Acids Research. Jan, 2012  |  Pubmed ID: 22135301

Originally released in 2005, BacMap is an electronic, interactive atlas of fully sequenced bacterial genomes. It contains fully labeled, zoomable and searchable chromosome maps for essentially all sequenced prokaryotic (archaebacterial and eubacterial) species. Each map can be zoomed to the level of individual genes and each gene is hyperlinked to a richly annotated gene card. The latest release of BacMap ( now contains data for more than 1700 bacterial species (~10× more than the 2005 release), corresponding to more than 2800 chromosome and plasmid maps. All bacterial genome maps are now supplemented with separate prophage genome maps as well as separate tRNA and rRNA maps. Each bacterial chromosome entry in BacMap also contains graphs and tables on a variety of gene and protein statistics. Likewise, every bacterial species entry contains a bacterial 'biography' card, with taxonomic details, phenotypic details, textual descriptions and images (when available). Improved data browsing and searching tools have also been added to allow more facile filtering, sorting and display of the chromosome maps and their contents.

Prediction of Skeletal Muscle and Fat Mass in Patients with Advanced Cancer Using a Metabolomic Approach

The Journal of Nutrition. Jan, 2012  |  Pubmed ID: 22157537

Urine and plasma metabolites originate from endogenous metabolic pathways in different organs and exogenous sources (diet). Urine and plasma were obtained from advanced cancer patients and investigated to determine if variations in lean and fat mass, dietary intake, and energy metabolism relate to variation in metabolite profiles. Patients (n = 55) recorded their diets for 3 d and after an overnight fast they were evaluated by DXA and indirect calorimetry. Metabolites were measured by NMR and direct injection MS. Three algorithms were used [partial least squares discriminant-analysis, support vector machines (SVM), and least absolute shrinkage and selection operator] to relate patients' plasma/urine metabolic profile with their dietary/physiological assessments. Leave-one-out cross-validation and permutation testing were conducted to determine statistical validity. None of the algorithms, using 63 urine metabolites, could learn to predict variations in individual's resting energy expenditure, respiratory quotient, or their intake of total energy, fat, sugar, or carbohydrate. Urine metabolites predicted appendicular lean tissue (skeletal muscle) with excellent cross-validation accuracy (98% using SVM). Total lean tissue correlated highly with appendicular muscle (Pearson r = 0.98; P < 0.0001) and gave similar cross-validation accuracies. Fat mass was effectively predicted using the 63 urine metabolites or the 143 plasma metabolites, exclusively. In conclusion, in this population, lean and fat mass variation could be effectively predicted using urinary metabolites, suggesting a potential role for metabolomics in body composition research. Furthermore, variation in lean and fat mass potentially confounds metabolomic studies attempting to characterize diet or disease conditions. Future studies should account or correct for such variation.

Discovery of Small Molecule Inhibitors That Interact with γ-tubulin

Chemical Biology & Drug Design. May, 2012  |  Pubmed ID: 22268380

Recent studies have shown an overexpression of γ-tubulin in human glioblastomas and glioblastoma cell lines. As the 2-year survival rate for glioblastoma is very poor, potential benefit exists for discovering novel chemotherapeutic agents that can inhibit γ-tubulin, which is known to form a ring complex that acts as a microtubule nucleation center. We present experimental evidence that colchicine and combretastatin A-4 bind to γ-tubulin, which are to our knowledge the first drug-like compounds known to interact with γ-tubulin. Molecular dynamics simulations and docking studies were used to analyze the hypothesized γ-tubulin binding domain of these compounds. The suitability of the potential binding modes was evaluated and suggests the subsequent rational design of novel targeted inhibitors of γ-tubulin.

Use of Proteinase K Nonspecific Digestion for Selective and Comprehensive Identification of Interpeptide Cross-links: Application to Prion Proteins

Molecular & Cellular Proteomics : MCP. Jul, 2012  |  Pubmed ID: 22438564

Chemical cross-linking combined with mass spectrometry is a rapidly developing technique for structural proteomics. Cross-linked proteins are usually digested with trypsin to generate cross-linked peptides, which are then analyzed by mass spectrometry. The most informative cross-links, the interpeptide cross-links, are often large in size, because they consist of two peptides that are connected by a cross-linker. In addition, trypsin targets the same residues as amino-reactive cross-linkers, and cleavage will not occur at these cross-linker-modified residues. This produces high molecular weight cross-linked peptides, which complicates their mass spectrometric analysis and identification. In this paper, we examine a nonspecific protease, proteinase K, as an alternative to trypsin for cross-linking studies. Initial tests on a model peptide that was digested by proteinase K resulted in a "family" of related cross-linked peptides, all of which contained the same cross-linking sites, thus providing additional verification of the cross-linking results, as was previously noted for other post-translational modification studies. The procedure was next applied to the native (PrP(C)) and oligomeric form of prion protein (PrPβ). Using proteinase K, the affinity-purifiable CID-cleavable and isotopically coded cross-linker cyanurbiotindipropionylsuccinimide and MALDI-MS cross-links were found for all of the possible cross-linking sites. After digestion with proteinase K, we obtained a mass distribution of the cross-linked peptides that is very suitable for MALDI-MS analysis. Using this new method, we were able to detect over 60 interpeptide cross-links in the native PrP(C) and PrPβ prion protein. The set of cross-links for the native form was used as distance constraints in developing a model of the native prion protein structure, which includes the 90-124-amino acid N-terminal portion of the protein. Several cross-links were unique to each form of the prion protein, including a Lys(185)-Lys(220) cross-link, which is unique to the PrPβ and thus may be indicative of the conformational change involved in the formation of prion protein oligomers.

Exploring the Essential Collective Dynamics of Interacting Proteins: Application to Prion Protein Dimers

Proteins. Jul, 2012  |  Pubmed ID: 22488640

Essential collective dynamics is a promising and robust approach for analysing the slow motions of macromolecules from short molecular dynamics trajectories. In this study, an extension of the method to treat a collection of interacting protein molecules is presented. The extension is applied to investigate the effects of dimerization on the collective dynamics of ovine prion protein molecules in two different arrangements. Examination of the structural plasticity shows that aggregation has a restricting effect on the local mobility of the prion protein molecules in the interfacial regions. Domain motions of the two dimeric ovine prion protein conformations are distinctly different and can be related to interatomic correlations at the interface. Correlated motions are among the slow collective modes extensively analysed by considering both main-chain and side-chain atoms. Correlation maps reveal the existence of a vast network of dynamically correlated side groups, which extends beyond individual subunits via interfacial interconnections. The network is formed by a core of hydrophobic side chains surrounded by hydrophilic groups at the periphery. The relevance of these findings are discussed in the context of mutations associated with prion diseases. The binding free energy of the dimeric conformations is evaluated to probe their thermodynamic stability. The descriptions afforded by the analysis of the essential collective dynamics of the prion dimers are consistent with their binding free energies. The agreement validates the extension of the methodology and provides a means of interpreting the collective dynamics in terms of the thermodynamic stability of ovine prion proteins.

Development of Ecom₅₀ and Retention Index Models for Nontargeted Metabolomics: Identification of 1,3-dicyclohexylurea in Human Serum by HPLC/mass Spectrometry

Journal of Chemical Information and Modeling. May, 2012  |  Pubmed ID: 22489687

The goal of many metabolomic studies is to identify the molecular structure of endogenous molecules that are differentially expressed among sampled or treatment groups. The identified compounds can then be used to gain an understanding of disease mechanisms. Unfortunately, despite recent advances in a variety of analytical techniques, small molecule (<1000 Da) identification remains difficult. Rarely can a chemical structure be determined from experimental "features" such as retention time, exact mass, and collision induced dissociation spectra. Thus, without knowing structure, biological significance remains obscure. In this study, we explore an identification method in which the measured exact mass of an unknown is used to query available chemical databases to compile a list of candidate compounds. Predictions are made for the candidates using models of experimental features that have been measured for the unknown. The predicted values are used to filter the candidate list by eliminating compounds with predicted values substantially different from the unknown. The intent is to reduce the list of candidates to a reasonable number that can be obtained and measured for confirmation. To facilitate this exploration, we measured data and created models for two experimental features; MS Ecom₅₀ (the energy in electronvolts required to fragment 50% of a selected precursor ion) and HPLC retention index. Using a data set of 52 compounds, Ecom₅₀ models were developed based on both Molconn and CODESSA structural descriptors. These models gave r² values of 0.89 to 0.94 depending on the number of inputs, the modeling algorithm chosen, and whether neutral or protonated structures were used. The retention index model was developed with 400 compounds using a back-propagation artificial neural network and 33 Molconn structure descriptors. External validation gave a v² = 0.87 and standard error of 38 retention index units. As a test of the validity of the filtering approach, the Ecom₅₀ and retention index models, along with exact mass and collision induced dissociation spectra matching, were used to identify 1,3-dicyclohexylurea in human plasma. This compound was not previously known to exist in human biofluids and its elemental formula was identical to 315 other candidate compounds downloaded from PubChem. These results suggest that the use of Ecom₅₀ and retention index predictive models can improve nontargeted metabolite structure identification using HPLC/MS derived structural features.

Resolution-enhanced Native Acidic Gel Electrophoresis: a Method for Resolving, Sizing, and Quantifying Prion Protein Oligomers

Analytical Biochemistry. Jul, 2012  |  Pubmed ID: 22490465

The formation of β-sheet-rich prion protein (PrP(β)) oligomers from native or cellular PrP(c) is thought to be a key step in the development of prion diseases. To assist in this characterization process we have developed a rapid and remarkably high resolution gel electrophoresis technique called RENAGE (resolution-enhanced native acidic gel electrophoresis) for separating, sizing, and quantifying oligomeric PrP(β) complexes. PrP(β) oligomers formed via either urea/salt or acid conversion can be resolved by RENAGE into a clear set of oligomeric bands differing by just one subunit. Calibration of the size of the PrP(β) oligomer bands was made possible with a cross-linked mouse PrP(90-232) ladder (1- to 11-mer) generated using ruthenium bipyridyl-based photoinduced cross-linking of unmodified proteins (PICUP). This PrP PICUP ladder allowed the size and abundance of PrP(β) oligomers formed from urea/salt and acid conversion to be determined. This distribution consists of 7-, 8-, 9-, 10-, and 11-mers, with the most abundant species being the 8-mer. The high-resolution separation afforded by RENAGE has allowed us to investigate distinctive size and population changes in PrP(β) oligomers formed under various conversion conditions, with various construct lengths, from various species or in the presence of anti-prion compounds.

Metabolomics and First-trimester Prediction of Early-onset Preeclampsia

The Journal of Maternal-fetal & Neonatal Medicine : the Official Journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians. Oct, 2012  |  Pubmed ID: 22494326

To evaluate the use of metabolomics for the first-trimester detection of maternal metabolic dysfunction and prediction of subsequent development of early-onset preeclampsia (PE).

Multi-platform Characterization of the Human Cerebrospinal Fluid Metabolome: a Comprehensive and Quantitative Update

Genome Medicine. Apr, 2012  |  Pubmed ID: 22546835

Human cerebral spinal fluid (CSF) is known to be a rich source of small molecule biomarkers for neurological and neurodegenerative diseases. In 2007, we conducted a comprehensive metabolomic study and performed a detailed literature review on metabolites that could be detected (via metabolomics or other techniques) in CSF. A total of 308 detectable metabolites were identified, of which only 23% were shown to be routinely identifiable or quantifiable with the metabolomics technologies available at that time. The continuing advancement in analytical technologies along with the growing interest in CSF metabolomics has led us to re-visit the human CSF metabolome and to re-assess both its size and the level of coverage than can be achieved with today's technologies.

MetaboAnalyst 2.0--a Comprehensive Server for Metabolomic Data Analysis

Nucleic Acids Research. Jul, 2012  |  Pubmed ID: 22553367

First released in 2009, MetaboAnalyst ( was a relatively simple web server designed to facilitate metabolomic data processing and statistical analysis. With continuing advances in metabolomics along with constant user feedback, it became clear that a substantial upgrade to the original server was necessary. MetaboAnalyst 2.0, which is the successor to MetaboAnalyst, represents just such an upgrade. MetaboAnalyst 2.0 now contains dozens of new features and functions including new procedures for data filtering, data editing and data normalization. It also supports multi-group data analysis, two-factor analysis as well as time-series data analysis. These new functions have also been supplemented with: (i) a quality-control module that allows users to evaluate their data quality before conducting any analysis, (ii) a functional enrichment analysis module that allows users to identify biologically meaningful patterns using metabolite set enrichment analysis and (iii) a metabolic pathway analysis module that allows users to perform pathway analysis and visualization for 15 different model organisms. In developing MetaboAnalyst 2.0 we have also substantially improved its graphical presentation tools. All images are now generated using anti-aliasing and are available over a range of resolutions, sizes and formats (PNG, TIFF, PDF, PostScript, or SVG). To improve its performance, MetaboAnalyst 2.0 is now hosted on a much more powerful server with substantially modified code to take advantage the server's multi-core CPUs for computationally intensive tasks. MetaboAnalyst 2.0 also maintains a collection of 50 or more FAQs and more than a dozen tutorials compiled from user queries and requests. A downloadable version of MetaboAnalyst 2.0, along detailed instructions for local installation is now available as well.

METAGENassist: a Comprehensive Web Server for Comparative Metagenomics

Nucleic Acids Research. Jul, 2012  |  Pubmed ID: 22645318

With recent improvements in DNA sequencing and sample extraction techniques, the quantity and quality of metagenomic data are now growing exponentially. This abundance of richly annotated metagenomic data and bacterial census information has spawned a new branch of microbiology called comparative metagenomics. Comparative metagenomics involves the comparison of bacterial populations between different environmental samples, different culture conditions or different microbial hosts. However, in order to do comparative metagenomics, one typically requires a sophisticated knowledge of multivariate statistics and/or advanced software programming skills. To make comparative metagenomics more accessible to microbiologists, we have developed a freely accessible, easy-to-use web server for comparative metagenomic analysis called METAGENassist. Users can upload their bacterial census data from a wide variety of common formats, using either amplified 16S rRNA data or shotgun metagenomic data. Metadata concerning environmental, culture, or host conditions can also be uploaded. During the data upload process, METAGENassist also performs an automated taxonomic-to-phenotypic mapping. Phenotypic information covering nearly 20 functional categories such as GC content, genome size, oxygen requirements, energy sources and preferred temperature range is automatically generated from the taxonomic input data. Using this phenotypically enriched data, users can then perform a variety of multivariate and univariate data analyses including fold change analysis, t-tests, PCA, PLS-DA, clustering and classification. To facilitate data processing, users are guided through a step-by-step analysis workflow using a variety of menus, information hyperlinks and check boxes. METAGENassist also generates colorful, publication quality tables and graphs that can be downloaded and used directly in the preparation of scientific papers. METAGENassist is available at

Resolution-by-proxy: a Simple Measure for Assessing and Comparing the Overall Quality of NMR Protein Structures

Journal of Biomolecular NMR. Jul, 2012  |  Pubmed ID: 22678091

In protein X-ray crystallography, resolution is often used as a good indicator of structural quality. Diffraction resolution of protein crystals correlates well with the number of X-ray observables that are used in structure generation and, therefore, with protein coordinate errors. In protein NMR, there is no parameter identical to X-ray resolution. Instead, resolution is often used as a synonym of NMR model quality. Resolution of NMR structures is often deduced from ensemble precision, torsion angle normality and number of distance restraints per residue. The lack of common techniques to assess the resolution of X-ray and NMR structures complicates the comparison of structures solved by these two methods. This problem is sometimes approached by calculating "equivalent resolution" from structure quality metrics. However, existing protocols do not offer a comprehensive assessment of protein structure as they calculate equivalent resolution from a relatively small number (<5) of protein parameters. Here, we report a development of a protocol that calculates equivalent resolution from 25 measurable protein features. This new method offers better performance (correlation coefficient of 0.92, mean absolute error of 0.28 Å) than existing predictors of equivalent resolution. Because the method uses coordinate data as a proxy for X-ray diffraction data, we call this measure "Resolution-by-Proxy" or ResProx. We demonstrate that ResProx can be used to identify under-restrained, poorly refined or inaccurate NMR structures, and can discover structural defects that the other equivalent resolution methods cannot detect. The ResProx web server is available at

The Metabolomic Profile of Umbilical Cord Blood in Neonatal Hypoxic Ischaemic Encephalopathy

PloS One. 2012  |  Pubmed ID: 23227182

Hypoxic ischaemic encephalopathy (HIE) in newborns can cause significant long-term neurological disability. The insult is a complex injury characterised by energy failure and disruption of cellular homeostasis, leading to mitochondrial damage. The importance of individual metabolic pathways, and their interaction in the disease process is not fully understood. The aim of this study was to describe and quantify the metabolomic profile of umbilical cord blood samples in a carefully defined population of full-term infants with HIE.

Chapter 3: Small Molecules and Disease

PLoS Computational Biology. 2012  |  Pubmed ID: 23300405

"Big" molecules such as proteins and genes still continue to capture the imagination of most biologists, biochemists and bioinformaticians. "Small" molecules, on the other hand, are the molecules that most biologists, biochemists and bioinformaticians prefer to ignore. However, it is becoming increasingly apparent that small molecules such as amino acids, lipids and sugars play a far more important role in all aspects of disease etiology and disease treatment than we realized. This particular chapter focuses on an emerging field of bioinformatics called "chemical bioinformatics"--a discipline that has evolved to help address the blended chemical and molecular biological needs of toxicogenomics, pharmacogenomics, metabolomics and systems biology. In the following pages we will cover several topics related to chemical bioinformatics. First, a brief overview of some of the most important or useful chemical bioinformatic resources will be given. Second, a more detailed overview will be given on those particular resources that allow researchers to connect small molecules to diseases. This section will focus on describing a number of recently developed databases or knowledgebases that explicitly relate small molecules--either as the treatment, symptom or cause--to disease. Finally a short discussion will be provided on newly emerging software tools that exploit these databases as a means to discover new biomarkers or even new treatments for disease.

Using Multiple Structural Proteomics Approaches for the Characterization of Prion Proteins

Journal of Proteomics. Apr, 2013  |  Pubmed ID: 23085224

Structural proteomics approaches are valuable tools, particularly in cases where the exact mechanisms of protein conformational changes or the structures of proteins and protein complexes cannot be elucidated by traditional structural biology techniques like X-ray crystallography or NMR methods. Each structural proteomics method can provide a different set of data, all of which can be used as structural constraints for modeling the protein. We have applied a combination of limited proteolysis, surface modification, chemical crosslinking, and hydrogen/deuterium exchange for the characterization of structural differences in prion proteins in native monomeric and in the aggregated β-oligomeric states. Data from these multiple proteomics approaches are in remarkable agreement in pointing to the rearrangement of the beta sheet 1-helix1-beta sheet 2-helix 2 (β1-H1-β2-H2) region as a major conformational change between the native and oligomeric prion protein forms. This data is also consistent with the β1-H1-β2 loop moving away from the H2-H3 core during the prion protein conversion. This is an example of how complementary data from multiple structural proteomics approaches can provide novel insights into the three-dimensional structures of proteins and protein complexes. This article is part of a Special Issue entitled: From protein structures to clinical applications.

ECMDB: the E. Coli Metabolome Database

Nucleic Acids Research. Jan, 2013  |  Pubmed ID: 23109553

The Escherichia coli Metabolome Database (ECMDB, is a comprehensively annotated metabolomic database containing detailed information about the metabolome of E. coli (K-12). Modelled closely on the Human and Yeast Metabolome Databases, the ECMDB contains >2600 metabolites with links to ∼1500 different genes and proteins, including enzymes and transporters. The information in the ECMDB has been collected from dozens of textbooks, journal articles and electronic databases. Each metabolite entry in the ECMDB contains an average of 75 separate data fields, including comprehensive compound descriptions, names and synonyms, chemical taxonomy, compound structural and physicochemical data, bacterial growth conditions and substrates, reactions, pathway information, enzyme data, gene/protein sequence data and numerous hyperlinks to images, references and other public databases. The ECMDB also includes an extensive collection of intracellular metabolite concentration data compiled from our own work as well as other published metabolomic studies. This information is further supplemented with thousands of fully assigned reference nuclear magnetic resonance and mass spectrometry spectra obtained from pure E. coli metabolites that we (and others) have collected. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of E. coli's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers but also to molecular biologists, systems biologists and individuals in the biotechnology industry.

First-trimester Metabolomic Detection of Late-onset Preeclampsia

American Journal of Obstetrics and Gynecology. Jan, 2013  |  Pubmed ID: 23159745

We sought to identify first-trimester maternal serum biomarkers for the prediction of late-onset preeclampsia (PE) using metabolomic analysis.

HMDB 3.0--The Human Metabolome Database in 2013

Nucleic Acids Research. Jan, 2013  |  Pubmed ID: 23161693

The Human Metabolome Database (HMDB) ( is a resource dedicated to providing scientists with the most current and comprehensive coverage of the human metabolome. Since its first release in 2007, the HMDB has been used to facilitate research for nearly 1000 published studies in metabolomics, clinical biochemistry and systems biology. The most recent release of HMDB (version 3.0) has been significantly expanded and enhanced over the 2009 release (version 2.0). In particular, the number of annotated metabolite entries has grown from 6500 to more than 40 000 (a 600% increase). This enormous expansion is a result of the inclusion of both 'detected' metabolites (those with measured concentrations or experimental confirmation of their existence) and 'expected' metabolites (those for which biochemical pathways are known or human intake/exposure is frequent but the compound has yet to be detected in the body). The latest release also has greatly increased the number of metabolites with biofluid or tissue concentration data, the number of compounds with reference spectra and the number of data fields per entry. In addition to this expansion in data quantity, new database visualization tools and new data content have been added or enhanced. These include better spectral viewing tools, more powerful chemical substructure searches, an improved chemical taxonomy and better, more interactive pathway maps. This article describes these enhancements to the HMDB, which was previously featured in the 2009 NAR Database Issue. (Note to referees, HMDB 3.0 will go live on 18 September 2012.).

Differential Metabolite Profiles and Salinity Tolerance Between Two Genetically Related Brown-seeded and Yellow-seeded Brassica Carinata Lines

Plant Science : an International Journal of Experimental Plant Biology. Jan, 2013  |  Pubmed ID: 23199683

Brassica carinata (Ethiopian mustard) has previously been identified as a potential crop species suitable for marginal land in the North American prairies due to its relatively high salt tolerance. Two genetically related B. carinata lines with brown-seeded (BS) and yellow-seeded (YS) phenotypes were assessed for their tolerance to sodium sulfate. Specifically, each line was greenhouse-grown under 0, 50 and 100mM of salt, and analyzed after four weeks and eight weeks of treatment. Generally, the height of the BS line was greater than the YS line under both salt treatments, indicating enhanced salt tolerance of the BS line. NMR-based metabolite profiling and PCA analyses indicated a more pronounced shift in key stem metabolites after four weeks of treatment with the YS line compared to the BS line. For example, tryptophan and formate levels increased in the YS line after four weeks of 100mM salt treatment, while proline and threonine levels varied uniquely compared to other metabolites of the lines. Together, the data indicate that the brown-seeded line has greater sodium tolerance than the yellow-seed line, provide clues to the biochemical underpinnings for the phenotypic variation, and highlight the utility of B. carinata as a biorefinery crop for saline land.

Metabolomic Analysis for First-trimester Down Syndrome Prediction

American Journal of Obstetrics and Gynecology. May, 2013  |  Pubmed ID: 23313728

The objective of the study was to perform first-trimester maternal serum metabolomic analysis and compare the results in aneuploid vs Down syndrome (DS) pregnancies.

An Improved Method to Detect Correct Protein Folds Using Partial Clustering

BMC Bioinformatics. Jan, 2013  |  Pubmed ID: 23323835

Structure-based clustering is commonly used to identify correct protein folds among candidate folds (also called decoys) generated by protein structure prediction programs. However, traditional clustering methods exhibit a poor runtime performance on large decoy sets. We hypothesized that a more efficient "partial" clustering approach in combination with an improved scoring scheme could significantly improve both the speed and performance of existing candidate selection methods.

MyCompoundID: Using an Evidence-based Metabolome Library for Metabolite Identification

Analytical Chemistry. Mar, 2013  |  Pubmed ID: 23373753

Identification of unknown metabolites is a major challenge in metabolomics. Without the identities of the metabolites, the metabolome data generated from a biological sample cannot be readily linked with the proteomic and genomic information for studies in systems biology and medicine. We have developed a web-based metabolite identification tool ( ) that allows searching and interpreting mass spectrometry (MS) data against a newly constructed metabolome library composed of 8,021 known human endogenous metabolites and their predicted metabolic products (375,809 compounds from one metabolic reaction and 10,583,901 from two reactions). As an example, in the analysis of a simple extract of human urine or plasma and the whole human urine by liquid chromatography-mass spectrometry and MS/MS, we are able to identify at least two times more metabolites in these samples than by using a standard human metabolome library. In addition, it is shown that the evidence-based metabolome library (EML) provides a much superior performance in identifying putative metabolites from a human urine sample, compared to the use of the ChemPub and KEGG libraries.

Nanopore Analysis of Wild-type and Mutant Prion Protein (PrP(C)): Single Molecule Discrimination and PrP(C) Kinetics

PloS One. 2013  |  Pubmed ID: 23393562

Prion diseases are fatal neurodegenerative diseases associated with the conversion of cellular prion protein (PrP(C)) in the central nervous system into the infectious isoform (PrP(Sc)). The mechanics of conversion are almost entirely unknown, with understanding stymied by the lack of an atomic-level structure for PrP(Sc). A number of pathogenic PrP(C) mutants exist that are characterized by an increased propensity for conversion into PrP(Sc) and that differ from wild-type by only a single amino-acid point mutation in their primary structure. These mutations are known to perturb the stability and conformational dynamics of the protein. Understanding of how this occurs may provide insight into the mechanism of PrP(C) conversion. In this work we sought to explore wild-type and pathogenic mutant prion protein structure and dynamics by analysis of the current fluctuations through an organic α-hemolysin nanometer-scale pore (nanopore) in which a single prion protein has been captured electrophoretically. In doing this, we find that wild-type and D178N mutant PrP(C), (a PrP(C) mutant associated with both Fatal Familial Insomnia and Creutzfeldt-Jakob disease), exhibit easily distinguishable current signatures and kinetics inside the pore and we further demonstrate, with the use of Hidden Markov Model signal processing, accurate discrimination between these two proteins at the single molecule level based on the kinetics of a single PrP(C) capture event. Moreover, we present a four-state model to describe wild-type PrP(C) kinetics in the pore as a first step in our investigation on characterizing the differences in kinetics and conformational dynamics between wild-type and D178N mutant PrP(C). These results demonstrate the potential of nanopore analysis for highly sensitive, real-time protein and small molecule detection based on single molecule kinetics inside a nanopore, and show the utility of this technique as an assay to probe differences in stability between wild-type and mutant prion proteins at the single molecule level.

Metabolomic Analysis for First-trimester Trisomy 18 Detection

American Journal of Obstetrics and Gynecology. Jul, 2013  |  Pubmed ID: 23535240

The purpose of this study was to determine whether nuclear magnetic resonance-based metabolomic markers in first-trimester maternal serum can detect fetuses with trisomy 18.

Translational Biomarker Discovery in Clinical Metabolomics: an Introductory Tutorial

Metabolomics : Official Journal of the Metabolomic Society. Apr, 2013  |  Pubmed ID: 23543913

Metabolomics is increasingly being applied towards the identification of biomarkers for disease diagnosis, prognosis and risk prediction. Unfortunately among the many published metabolomic studies focusing on biomarker discovery, there is very little consistency and relatively little rigor in how researchers select, assess or report their candidate biomarkers. In particular, few studies report any measure of sensitivity, specificity, or provide receiver operator characteristic (ROC) curves with associated confidence intervals. Even fewer studies explicitly describe or release the biomarker model used to generate their ROC curves. This is surprising given that for biomarker studies in most other biomedical fields, ROC curve analysis is generally considered the standard method for performance assessment. Because the ultimate goal of biomarker discovery is the translation of those biomarkers to clinical practice, it is clear that the metabolomics community needs to start "speaking the same language" in terms of biomarker analysis and reporting-especially if it wants to see metabolite markers being routinely used in the clinic. In this tutorial, we will first introduce the concept of ROC curves and describe their use in single biomarker analysis for clinical chemistry. This includes the construction of ROC curves, understanding the meaning of area under ROC curves (AUC) and partial AUC, as well as the calculation of confidence intervals. The second part of the tutorial focuses on biomarker analyses within the context of metabolomics. This section describes different statistical and machine learning strategies that can be used to create multi-metabolite biomarker models and explains how these models can be assessed using ROC curves. In the third part of the tutorial we discuss common issues and potential pitfalls associated with different analysis methods and provide readers with a list of nine recommendations for biomarker analysis and reporting. To help readers test, visualize and explore the concepts presented in this tutorial, we also introduce a web-based tool called ROCCET (ROC Curve Explorer & Tester, ROCCET was originally developed as a teaching aid but it can also serve as a training and testing resource to assist metabolomics researchers build biomarker models and conduct a range of common ROC curve analyses for biomarker studies.

Small Molecule Inhibitors of ERCC1-XPF Protein-protein Interaction Synergize Alkylating Agents in Cancer Cells

Molecular Pharmacology. Jul, 2013  |  Pubmed ID: 23580445

The benefit of cancer chemotherapy based on alkylating agents is limited because of the action of DNA repair enzymes, which mitigate the damage induced by these agents. The interaction between the proteins ERCC1 and XPF involves two major components of the nucleotide excision repair pathway. Here, novel inhibitors of this interaction were identified by virtual screening based on available structures with use of the National Cancer Institute diversity set and a panel of DrugBank small molecules. Subsequently, experimental validation of the in silico screening was undertaken. Top hits were evaluated on A549 and HCT116 cancer cells. In particular, the compound labeled NSC 130813 [4-[(6-chloro-2-methoxy-9-acridinyl)amino]-2-[(4-methyl-1-piperazinyl)methyl]] was shown to act synergistically with cisplatin and mitomycin C; to increase UVC-mediated cytotoxicity; to modify DNA repair as indicated by the staining of phosphorylated H2AX; and to disrupt interaction between ERCC1 and XPF in cells. In addition, using the Biacore technique, we showed that this compound interacts with the domain of XPF responsible for interaction with ERCC1. This study shows that small molecules targeting the protein-protein interaction of ERCC1 and XPF can be developed to enhance the effects of alkylating agents on cancer cells.

INMEX--a Web-based Tool for Integrative Meta-analysis of Expression Data

Nucleic Acids Research. Jul, 2013  |  Pubmed ID: 23766290

The widespread applications of various 'omics' technologies in biomedical research together with the emergence of public data repositories have resulted in a plethora of data sets for almost any given physiological state or disease condition. Properly combining or integrating these data sets with similar basic hypotheses can help reduce study bias, increase statistical power and improve overall biological understanding. However, the difficulties in data management and the complexities of analytical approaches have significantly limited data integration to enable meta-analysis. Here, we introduce integrative meta-analysis of expression data (INMEX), a user-friendly web-based tool designed to support meta-analysis of multiple gene-expression data sets, as well as to enable integration of data sets from gene expression and metabolomics experiments. INMEX contains three functional modules. The data preparation module supports flexible data processing, annotation and visualization of individual data sets. The statistical analysis module allows researchers to combine multiple data sets based on P-values, effect sizes, rank orders and other features. The significant genes can be examined in functional analysis module for enriched Gene Ontology terms or Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, or expression profile visualization. INMEX has built-in support for common gene/metabolite identifiers (IDs), as well as 45 popular microarray platforms for human, mouse and rat. Complex operations are performed through a user-friendly web interface in a step-by-step manner. INMEX is freely available at

Recommendations of the WwPDB NMR Validation Task Force

Structure (London, England : 1993). Sep, 2013  |  Pubmed ID: 24010715

As methods for analysis of biomolecular structure and dynamics using nuclear magnetic resonance spectroscopy (NMR) continue to advance, the resulting 3D structures, chemical shifts, and other NMR data are broadly impacting biology, chemistry, and medicine. Structure model assessment is a critical area of NMR methods development, and is an essential component of the process of making these structures accessible and useful to the wider scientific community. For these reasons, the Worldwide Protein Data Bank (wwPDB) has convened an NMR Validation Task Force (NMR-VTF) to work with wwPDB partners in developing metrics and policies for biomolecular NMR data harvesting, structure representation, and structure quality assessment. This paper summarizes the recommendations of the NMR-VTF, and lays the groundwork for future work in developing standards and metrics for biomolecular NMR structure quality assessment.

The Human Urine Metabolome

PloS One. 2013  |  Pubmed ID: 24023812

Urine has long been a "favored" biofluid among metabolomics researchers. It is sterile, easy-to-obtain in large volumes, largely free from interfering proteins or lipids and chemically complex. However, this chemical complexity has also made urine a particularly difficult substrate to fully understand. As a biological waste material, urine typically contains metabolic breakdown products from a wide range of foods, drinks, drugs, environmental contaminants, endogenous waste metabolites and bacterial by-products. Many of these compounds are poorly characterized and poorly understood. In an effort to improve our understanding of this biofluid we have undertaken a comprehensive, quantitative, metabolome-wide characterization of human urine. This involved both computer-aided literature mining and comprehensive, quantitative experimental assessment/validation. The experimental portion employed NMR spectroscopy, gas chromatography mass spectrometry (GC-MS), direct flow injection mass spectrometry (DFI/LC-MS/MS), inductively coupled plasma mass spectrometry (ICP-MS) and high performance liquid chromatography (HPLC) experiments performed on multiple human urine samples. This multi-platform metabolomic analysis allowed us to identify 445 and quantify 378 unique urine metabolites or metabolite species. The different analytical platforms were able to identify (quantify) a total of: 209 (209) by NMR, 179 (85) by GC-MS, 127 (127) by DFI/LC-MS/MS, 40 (40) by ICP-MS and 10 (10) by HPLC. Our use of multiple metabolomics platforms and technologies allowed us to identify several previously unknown urine metabolites and to substantially enhance the level of metabolome coverage. It also allowed us to critically assess the relative strengths and weaknesses of different platforms or technologies. The literature review led to the identification and annotation of another 2206 urinary compounds and was used to help guide the subsequent experimental studies. An online database containing the complete set of 2651 confirmed human urine metabolite species, their structures (3079 in total), concentrations, related literature references and links to their known disease associations are freely available at

A Simple Method to Measure Protein Side-chain Mobility Using NMR Chemical Shifts

Journal of the American Chemical Society. Oct, 2013  |  Pubmed ID: 24032347

Protein side-chain motions are involved in many important biological processes including enzymatic catalysis, allosteric regulation, and the mediation of protein-protein, protein-DNA, protein-RNA, and protein-cofactor interactions. NMR spectroscopy has long been used to provide insights into the motions of side-chain groups. Currently, the method of choice for studying side-chain dynamics by NMR is the measurement of methyl (2)H autorelaxation. Methyl (2)H autorelaxation exhibits simple relaxation mechanisms and can be straightforwardly converted to meaningful dynamic parameters. However, methyl groups can only be found in 6 of 19 side-chain bearing amino acids. Consequently, only a sparse assessment of protein side-chain dynamics is possible. Therefore, there is a significant interest in developing novel methods of studying side-chain motions that can be applied to all types of side-chains. Here, we show how side-chain chemical shifts can be used to determine the magnitude of fast side-chain motions in proteins. The chemical shift method is applicable to all side-chain bearing residues and does not require any additional measurements beyond standard NMR experiments for backbone and side-chain assignments.

Phenol-Explorer 3.0: a Major Update of the Phenol-Explorer Database to Incorporate Data on the Effects of Food Processing on Polyphenol Content

Database : the Journal of Biological Databases and Curation. 2013  |  Pubmed ID: 24103452

Polyphenols are a major class of bioactive phytochemicals whose consumption may play a role in the prevention of a number of chronic diseases such as cardiovascular diseases, type II diabetes and cancers. Phenol-Explorer, launched in 2009, is the only freely available web-based database on the content of polyphenols in food and their in vivo metabolism and pharmacokinetics. Here we report the third release of the database (Phenol-Explorer 3.0), which adds data on the effects of food processing on polyphenol contents in foods. Data on >100 foods, covering 161 polyphenols or groups of polyphenols before and after processing, were collected from 129 peer-reviewed publications and entered into new tables linked to the existing relational design. The effect of processing on polyphenol content is expressed in the form of retention factor coefficients, or the proportion of a given polyphenol retained after processing, adjusted for change in water content. The result is the first database on the effects of food processing on polyphenol content and, following the model initially defined for Phenol-Explorer, all data may be traced back to original sources. The new update will allow polyphenol scientists to more accurately estimate polyphenol exposure from dietary surveys.

Molecular Docking of Thiamine Reveals Similarity in Binding Properties Between the Prion Protein and Other Thiamine-binding Proteins

Journal of Molecular Modeling. Dec, 2013  |  Pubmed ID: 24126825

Prion-induced diseases are a global health concern. The lack of effective therapy and 100% mortality rates for such diseases have made the prion protein an important target for drug discovery. Previous NMR experimental work revealed that thiamine and its derivatives bind the prion protein in a pocket near the N-terminal loop of helix 1, and conserved intermolecular interactions were noted between thiamine and other thiamine-binding proteins. Furthermore, water-mediated interactions were observed in all of the X-ray crystallographic structures of thiamine-binding proteins, but were not observed in the thiamine-prion NMR study. To better understand the potential role of water in thiamine-prion binding, a docking study was employed using structural X-ray solvent. Before energy minimization, docked thiamine assumed a "V" shape similar to some of the known thiamine-dependent proteins. Following minimization with NMR-derived restraints, the "F" conformation was observed. Our findings confirmed that water is involved in ligand stabilization and phosphate group interaction. The resulting refined structure of thiamine bound to the prion protein allowed the 4-aminopyrimidine ring of thiamine to π-stack with Tyr150, and facilitated hydrogen bonding between Asp147 and the amino group of 4-aminopyrimidine. Investigation of the π-stacking interaction through mutation of the tyrosine residue further revealed its importance in ligand placement. The resulting refined structure is in good agreement with previous experimental restraints, and is consistent with the pharmacophore model of thiamine-binding proteins.

Identification and Characterization of ϕH111-1: A Novel Myovirus with Broad Activity Against Clinical Isolates of Burkholderia Cenocepacia

Bacteriophage. Oct, 2013  |  Pubmed ID: 24265978

Characterization of prophages in sequenced bacterial genomes is important for virulence assessment, evolutionary analysis, and phage application development. The objective of this study was to identify complete, inducible prophages in the cystic fibrosis (CF) clinical isolate Burkholderia cenocepacia H111. Using the prophage-finding program PHAge Search Tool (PHAST), we identified three putative intact prophages in the H111 sequence. Virions were readily isolated from H111 culture supernatants following extended incubation. Using shotgun cloning and sequencing, one of these virions (designated ϕH111-1 [vB_BceM_ϕH111-1]) was identified as the infective particle of a PHAST-detected intact prophage. ϕH111-1 has an extremely broad host range with respect to B. cenocepacia strains and is predicted to use lipopolysaccharide (LPS) as a receptor. Bioinformatics analysis indicates that the prophage is 42,972 base pairs in length, encodes 54 proteins, and shows relatedness to the virion morphogenesis modules of AcaML1 and "Vhmllikevirus" myoviruses. As ϕH111-1 is active against a broad panel of clinical strains and encodes no putative virulence factors, it may be therapeutically effective for Burkholderia infections.

Metabolomic Analysis of Cold Acclimation of Arctic Mesorhizobium Sp. Strain N33

PloS One. 2013  |  Pubmed ID: 24386418

Arctic Mesorhizobium sp. N33 isolated from nodules of Oxytropis arctobia in Canada's eastern Arctic has a growth temperature range from 0 °C to 30 °C and is a well-known cold-adapted rhizobia. The key molecular mechanisms underlying cold adaptation in Arctic rhizobia remains totally unknown. Since the concentration and contents of metabolites are closely related to stress adaptation, we applied GC-MS and NMR to identify and quantify fatty acids and water soluble compounds possibly related to low temperature acclimation in strain N33. Bacterial cells were grown at three different growing temperatures (4 °C, 10 °C and 21 °C). Cells from 21 °C were also cold-exposed to 4°C for different times (2, 4, 8, 60 and 240 minutes). We identified that poly-unsaturated linoleic acids 18:2 (9, 12) & 18:2 (6, 9) were more abundant in cells growing at 4 or 10 °C, than in cells cultivated at 21 °C. The mono-unsaturated phospho/neutral fatty acids myristoleic acid 14:1(11) were the most significantly overexpressed (45-fold) after 1 hour of exposure to 4 °C. As reported in the literature, these fatty acids play important roles in cold adaptability by supplying cell membrane fluidity, and by providing energy to cells. Analysis of water-soluble compounds revealed that isobutyrate, sarcosine, threonine and valine were more accumulated during exposure to 4 °C. These metabolites might play a role in conferring cold acclimation to strain N33 at 4 °C, probably by acting as cryoprotectants. Isobutyrate was highly upregulated (19.4-fold) during growth at 4 °C, thus suggesting that this compound is a precursor for the cold-regulated fatty acids modification to low temperature adaptation.

SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database

Nucleic Acids Research. Jan, 2014  |  Pubmed ID: 24203708

The Small Molecule Pathway Database (SMPDB, is a comprehensive, colorful, fully searchable and highly interactive database for visualizing human metabolic, drug action, drug metabolism, physiological activity and metabolic disease pathways. SMPDB contains >600 pathways with nearly 75% of its pathways not found in any other database. All SMPDB pathway diagrams are extensively hyperlinked and include detailed information on the relevant tissues, organs, organelles, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Since its last release in 2010, SMPDB has undergone substantial upgrades and significant expansion. In particular, the total number of pathways in SMPDB has grown by >70%. Additionally, every previously entered pathway has been completely redrawn, standardized, corrected, updated and enhanced with additional molecular or cellular information. Many SMPDB pathways now include transporter proteins as well as much more physiological, tissue, target organ and reaction compartment data. Thanks to the development of a standardized pathway drawing tool (called PathWhiz) all SMPDB pathways are now much more easily drawn and far more rapidly updated. PathWhiz has also allowed all SMPDB pathways to be saved in a BioPAX format. Significant improvements to SMPDB's visualization interface now make the browsing, selection, recoloring and zooming of pathways far easier and far more intuitive. Because of its utility and breadth of coverage, SMPDB is now integrated into several other databases including HMDB and DrugBank.

DrugBank 4.0: Shedding New Light on Drug Metabolism

Nucleic Acids Research. Jan, 2014  |  Pubmed ID: 24203711

DrugBank ( is a comprehensive online database containing extensive biochemical and pharmacological information about drugs, their mechanisms and their targets. Since it was first described in 2006, DrugBank has rapidly evolved, both in response to user requests and in response to changing trends in drug research and development. Previous versions of DrugBank have been widely used to facilitate drug and in silico drug target discovery. The latest update, DrugBank 4.0, has been further expanded to contain data on drug metabolism, absorption, distribution, metabolism, excretion and toxicity (ADMET) and other kinds of quantitative structure activity relationships (QSAR) information. These enhancements are intended to facilitate research in xenobiotic metabolism (both prediction and characterization), pharmacokinetics, pharmacodynamics and drug design/discovery. For this release, >1200 drug metabolites (including their structures, names, activity, abundance and other detailed data) have been added along with >1300 drug metabolism reactions (including metabolizing enzymes and reaction types) and dozens of drug metabolism pathways. Another 30 predicted or measured ADMET parameters have been added to each DrugCard, bringing the average number of quantitative ADMET values for Food and Drug Administration-approved drugs close to 40. Referential nuclear magnetic resonance and MS spectra have been added for almost 400 drugs as well as spectral and mass matching tools to facilitate compound identification. This expanded collection of drug information is complemented by a number of new or improved search tools, including one that provides a simple analyses of drug-target, -enzyme and -transporter associations to provide insight on drug-drug interactions.

Using Isotopically-coded Hydrogen Peroxide As a Surface Modification Reagent for the Structural Characterization of Prion Protein Aggregates

Journal of Proteomics. Apr, 2014  |  Pubmed ID: 24316355

The conversion of the cellular prion protein (PrP(C)) into aggregated ß-oligomeric (PrP(ß)) and fibril (PrP(Sc)) forms is the central element in the development of prion diseases. Here we report the first use of isotopically-coded hydrogen peroxide surface modification combined with mass spectrometry (MS) for the differential characterization of PrP(C) and PrP(β). (16)O and (18)O hydrogen peroxide were used to oxidize methionine and tryptophan residues in PrP(C) and PrP(β), allowing for the relative quantitation of the extent of modification of each form of the prion protein. After modification with either light or heavy forms of hydrogen peroxide (H2(16)O2 and H2(18)O2), the PrP(C) and PrP(β) forms of the protein were then combined, digested with trypsin, and analysed by LC-MS. The (18)O/(16)O signal intensity ratios were used to determine the relative levels of oxidation of specific amino acids in the PrP(C) and PrP(β) forms. Using this approach we have detected several residues that are differentially-oxidized between the native and β-oligomeric prion forms, allowing determination of the regions of PrP(C) involved in the formation of PrP(β) aggregates. Modification of these residues in the β-oligomeric form is compatible with a flip of the β1-H1-β2 loop away from amphipathic helices 2 and 3 during conversion.

Metabolomic Prediction of Fetal Congenital Heart Defect in the First Trimester

American Journal of Obstetrics and Gynecology. Sep, 2014  |  Pubmed ID: 24704061

The objective of the study was to identify metabolomic markers in maternal first-trimester serum for the detection of fetal congenital heart defects (CHDs).

The Food Metabolome: a Window over Dietary Exposure

The American Journal of Clinical Nutrition. Jun, 2014  |  Pubmed ID: 24760973

The food metabolome is defined as the part of the human metabolome directly derived from the digestion and biotransformation of foods and their constituents. With >25,000 compounds known in various foods, the food metabolome is extremely complex, with a composition varying widely according to the diet. By its very nature it represents a considerable and still largely unexploited source of novel dietary biomarkers that could be used to measure dietary exposures with a high level of detail and precision. Most dietary biomarkers currently have been identified on the basis of our knowledge of food compositions by using hypothesis-driven approaches. However, the rapid development of metabolomics resulting from the development of highly sensitive modern analytic instruments, the availability of metabolite databases, and progress in (bio)informatics has made agnostic approaches more attractive as shown by the recent identification of novel biomarkers of intakes for fruit, vegetables, beverages, meats, or complex diets. Moreover, examples also show how the scrutiny of the food metabolome can lead to the discovery of bioactive molecules and dietary factors associated with diseases. However, researchers still face hurdles, which slow progress and need to be resolved to bring this emerging field of research to maturity. These limits were discussed during the First International Workshop on the Food Metabolome held in Glasgow. Key recommendations made during the workshop included more coordination of efforts; development of new databases, software tools, and chemical libraries for the food metabolome; and shared repositories of metabolomic data. Once achieved, major progress can be expected toward a better understanding of the complex interactions between diet and human health.

Lipopolysaccharide Induced Conversion of Recombinant Prion Protein

Prion. Mar-Apr, 2014  |  Pubmed ID: 24819168

The conformational conversion of the cellular prion protein (PrP(C)) to the β-rich infectious isoform PrP(Sc) is considered a critical and central feature in prion pathology. Although PrP(Sc) is the critical component of the infectious agent, as proposed in the "protein-only" prion hypothesis, cellular components have been identified as important cofactors in triggering and enhancing the conversion of PrP(C) to proteinase K resistant PrP(Sc). A number of in vitro systems using various chemical and/or physical agents such as guanidine hydrochloride, urea, SDS, high temperature, and low pH, have been developed that cause PrP(C) conversion, their amplification, and amyloid fibril formation often under non-physiological conditions. In our ongoing efforts to look for endogenous and exogenous chemical mediators that might initiate, influence, or result in the natural conversion of PrP(C) to PrP(Sc), we discovered that lipopolysaccharide (LPS), a component of gram-negative bacterial membranes interacts with recombinant prion proteins and induces conversion to an isoform richer in β sheet at near physiological conditions as long as the LPS concentration remains above the critical micelle concentration (CMC). More significant was the LPS mediated conversion that was observed even at sub-molar ratios of LPS to recombinant ShPrP (90-232).

Brassica Villosa, a System for Studying Non-glandular Trichomes and Genes in the Brassicas

Plant Molecular Biology. Jul, 2014  |  Pubmed ID: 24831512

Brassica villosa is a wild Brassica C genome species with very dense trichome coverage and strong resistance to many insect pests of Brassica oilseeds and vegetables. Transcriptome analysis of hairy B. villosa leaves indicated higher expression of several important trichome initiation genes compared with glabrous B. napus leaves and consistent with the Arabidopsis model of trichome development. However, transcripts of the TRY inhibitory gene in hairy B. villosa were surprisingly high relative to B. napus and relative transcript levels of SAD2, EGL3, and several XIX genes were low, suggesting potential ancillary or less important trichome-related roles for these genes in Brassica species compared with Arabidopsis. Several antioxidant, calcium, non-calcium metal and secondary metabolite genes also showed differential expression between these two species. These coincided with accumulation of two alkaloid-like compounds, high levels of calcium, and other metals in B. villosa trichomes that are correlated with the known tolerance of B. villosa to high salt and the calcium-rich natural habitat of this wild species. This first time report on the isolation of large amounts of pure B. villosa trichomes, on trichome content, and on relative gene expression differences in an exceptionally hairy Brassica species compared with a glabrous species opens doors for the scientific community to understand trichome gene function in the Brassicas and highlights the potential of B. villosa as a trichome research platform.

Shaking Alone Induces De Novo Conversion of Recombinant Prion Proteins to β-sheet Rich Oligomers and Fibrils

PloS One. 2014  |  Pubmed ID: 24892647

The formation of β-sheet rich prion oligomers and fibrils from native prion protein (PrP) is thought to be a key step in the development of prion diseases. Many methods are available to convert recombinant prion protein into β-sheet rich fibrils using various chemical denaturants (urea, SDS, GdnHCl), high temperature, phospholipids, or mildly acidic conditions (pH 4). Many of these methods also require shaking or another form of agitation to complete the conversion process. We have identified that shaking alone causes the conversion of recombinant PrP to β-sheet rich oligomers and fibrils at near physiological pH (pH 5.5 to pH 6.2) and temperature. This conversion does not require any denaturant, detergent, or any other chemical cofactor. Interestingly, this conversion does not occur when the water-air interface is eliminated in the shaken sample. We have analyzed shaking-induced conversion using circular dichroism, resolution enhanced native acidic gel electrophoresis (RENAGE), electron microscopy, Fourier transform infrared spectroscopy, thioflavin T fluorescence and proteinase K resistance. Our results show that shaking causes the formation of β-sheet rich oligomers with a population distribution ranging from octamers to dodecamers and that further shaking causes a transition to β-sheet fibrils. In addition, we show that shaking-induced conversion occurs for a wide range of full-length and truncated constructs of mouse, hamster and cervid prion proteins. We propose that this method of conversion provides a robust, reproducible and easily accessible model for scrapie-like amyloid formation, allowing the generation of milligram quantities of physiologically stable β-sheet rich oligomers and fibrils. These results may also have interesting implications regarding our understanding of prion conversion and propagation both within the brain and via techniques such as protein misfolding cyclic amplification (PMCA) and quaking induced conversion (QuIC).

Development of Isotope Labeling Liquid Chromatography Mass Spectrometry for Mouse Urine Metabolomics: Quantitative Metabolomic Study of Transgenic Mice Related to Alzheimer's Disease

Journal of Proteome Research. Oct, 2014  |  Pubmed ID: 25164377

Because of a limited volume of urine that can be collected from a mouse, it is very difficult to apply the common strategy of using multiple analytical techniques to analyze the metabolites to increase the metabolome coverage for mouse urine metabolomics. We report an enabling method based on differential isotope labeling liquid chromatography mass spectrometry (LC-MS) for relative quantification of over 950 putative metabolites using 20 μL of urine as the starting material. The workflow involves aliquoting 10 μL of an individual urine sample for ¹²C-dansylation labeling that target amines and phenols. Another 10 μL of aliquot was taken from each sample to generate a pooled sample that was subjected to ¹³C-dansylation labeling. The ¹²C-labeled individual sample was mixed with an equal volume of the ¹³C-labeled pooled sample. The mixture was then analyzed by LC-MS to generate information on metabolite concentration differences among different individual samples. The interday repeatability for the LC-MS runs was assessed, and the median relative standard deviation over 4 days was 5.0%. This workflow was then applied to a metabolomic biomarker discovery study using urine samples obtained from the TgCRND8 mouse model of early onset familial Alzheimer's disease (FAD) throughout the course of their pathological deposition of beta amyloid (Aβ). It was showed that there was a distinct metabolomic separation between the AD prone mice and the wild type (control) group. As early as 15-17 weeks of age (presymptomatic), metabolomic differences were observed between the two groups, and after the age of 25 weeks the metabolomic alterations became more pronounced. The metabolomic changes at different ages corroborated well with the phenotype changes in this transgenic mice model. Several useful candidate biomarkers including methionine, desaminotyrosine, taurine, N1-acetylspermidine, and 5-hydroxyindoleacetic acid were identified. Some of them were found in previous metabolomics studies in human cerebrospinal fluid or blood samples. This work illustrates the utility of this isotope labeling LC-MS method for biomarker discovery using mouse urine metabolomics.

CSI 2.0: a Significantly Improved Version of the Chemical Shift Index

Journal of Biomolecular NMR. Nov, 2014  |  Pubmed ID: 25273503

Protein chemical shifts have long been used by NMR spectroscopists to assist with secondary structure assignment and to provide useful distance and torsion angle constraint data for structure determination. One of the most widely used methods for secondary structure identification is called the Chemical Shift Index (CSI). The CSI method uses a simple digital chemical shift filter to locate secondary structures along the protein chain using backbone (13)C and (1)H chemical shifts. While the CSI method is simple to use and easy to implement, it is only about 75-80% accurate. Here we describe a significantly improved version of the CSI (2.0) that uses machine-learning techniques to combine all six backbone chemical shifts ((13)Cα, (13)Cβ, (13)C, (15)N, (1)HN, (1)Hα) with sequence-derived features to perform far more accurate secondary structure identification. Our tests indicate that CSI 2.0 achieved an average identification accuracy (Q3) of 90.56% for a training set of 181 proteins in a repeated tenfold cross-validation and 89.35% for a test set of 59 proteins. This represents a significant improvement over other state-of-the-art chemical shift-based methods. In particular, the level of performance of CSI 2.0 is equal to that of standard methods, such as DSSP and STRIDE, used to identify secondary structures via 3D coordinate data. This suggests that CSI 2.0 could be used both in providing accurate NMR constraint data in the early stages of protein structure determination as well as in defining secondary structure locations in the final protein model(s). A CSI 2.0 web server ( is available for submitting the input queries for secondary structure identification.

The Use of Metabolomics in Population-based Research

Advances in Nutrition (Bethesda, Md.). Nov, 2014  |  Pubmed ID: 25398741

The NIH has made a significant commitment through the NIH Common Fund's Metabolomics Program to build infrastructure and capacity for metabolomics research, which should accelerate the field. Given this investment, it is the ideal time to start planning strategies to capitalize on the infrastructure being established. An obvious gap in the literature relates to the effective use of metabolomics in large-population studies. Although published reports from population-based studies are beginning to emerge, the number to date remains relatively small. Yet, there is great potential for using metabolomics in population-based studies to evaluate the effects of nutritional, pharmaceutical, and environmental exposures (the "exposome"); conduct risk assessments; predict disease development; and diagnose diseases. Currently, the majority of the metabolomics studies in human populations are in nutrition or nutrition-related fields. This symposium provided a timely venue to highlight the current state-of-science on the use of metabolomics in population-based research. This session provided a forum at which investigators with extensive experience in performing research within large initiatives, multi-investigator grants, and epidemiology consortia could stimulate discussion and ideas for population-based metabolomics research and, in turn, improve knowledge to help devise effective methods of health research.

In Silico Studies and Fluorescence Binding Assays of Potential Anti-prion Compounds Reveal an Important Binding Site for Prion Inhibition from PrP(C) to PrP(Sc)

European Journal of Medicinal Chemistry. Feb, 2015  |  Pubmed ID: 25042003

To understand the pharmacophore properties of 2-aminothiazoles and design novel inhibitors against the prion protein, a highly predictive 3D quantitative structure-activity relationship (QSAR) has been developed by performing comparative molecular field analysis (CoMFA) and comparative similarity analysis (CoMSIA). Both CoMFA and CoMSIA maps reveal the presence of the oxymethyl groups in meta and para positions on the phenyl ring of compound 17 (N-[4-(3,4-dimethoxyphenyl)-1,3-thiazol-2-yl]quinolin-2-amine), is necessary for activity while electro-negative nitrogen of quinoline is highly favorable to enhance activity. The blind docking results for these compounds show that the compound with quinoline binds with higher affinity than isoquinoline and naphthalene groups. Out of 150 novel compounds retrieved using finger print analysis by pharmacophoric model predicted based on five test sets of compounds, five compounds with diverse scaffolds were selected for biological evaluation as possible PrP inhibitors. Molecular docking combined with fluorescence quenching studies show that these compounds bind to pocket-D of SHaPrP near Trp145. The new antiprion compounds 3 and 6, which bind with the interaction energies of -12.1 and -13.2 kcal/mol, respectively, show fluorescence quenching with binding constant (Kd) values of 15.5 and 44.14 μM, respectively. Further fluorescence binding assays with compound 5, which is similar to 2-aminothiazole as a positive control, also show that the molecule binds to the pocket-D with the binding constant (Kd) value of 84.7 μM. Finally, both molecular docking and a fluorescence binding assay of noscapine as a negative control reveals the same binding site on the surface of pocket-A near a rigid loop between β2 and α2 interacting with Arg164. This high level of correlation between molecular docking and fluorescence quenching studies confirm that these five compounds are likely to act as inhibitors for prion propagation while noscapine might act as a prion accelerator from PrP(C) to PrP(Sc).

Salmonella Phages and Prophages: Genomics, Taxonomy, and Applied Aspects

Methods in Molecular Biology (Clifton, N.J.). 2015  |  Pubmed ID: 25253259

Since this book was originally published in 2007 there has been a significant increase in the number of Salmonella bacteriophages, particularly lytic virus, and Salmonella strains which have been fully sequenced. In addition, new insights into phage taxonomy have resulted in new phage genera, some of which have been recognized by the International Committee of Taxonomy of Viruses (ICTV). The properties of each of these genera are discussed, along with the role of phage as agents of genetic exchange, as therapeutic agents, and their involvement in phage typing.

Identifying Putative Drug Targets and Potential Drug Leads: Starting Points for Virtual Screening and Docking

Methods in Molecular Biology (Clifton, N.J.). 2015  |  Pubmed ID: 25330974

The availability of 3D models of both drug leads (small molecule ligands) and drug targets (proteins) is essential to molecular docking and computational drug discovery. This chapter describes a simple approach that can be used to identify both drug leads and drug targets using two popular Web-accessible databases: (1) DrugBank and (2) The Human Metabolome Database. First, it is illustrated how putative drug targets and drug leads for exogenous diseases (i.e., infectious diseases) can be readily identified and their 3D structures selected using only the genomic sequences from pathogenic bacteria or viruses as input. The second part illustrates how putative drug targets and drug leads for endogenous diseases (i.e., noninfectious diseases or chronic conditions) can be identified using similar databases and similar sequence input. This chapter is intended to illustrate how bioinformatics and cheminformatics can work synergistically to help provide the necessary inputs for computer-aided drug design.

Role of Water in Ligand Binding to Maltose-binding Protein: Insight from a New Docking Protocol Based on the 3D-RISM-KH Molecular Theory of Solvation

Journal of Chemical Information and Modeling. Feb, 2015  |  Pubmed ID: 25545470

Maltose-binding protein is a periplasmic binding protein responsible for transport of maltooligosaccarides through the periplasmic space of Gram-negative bacteria, as a part of the ABC transport system. The molecular mechanisms of the initial ligand binding and induced large scale motion of the protein's domains still remain elusive. In this study, we use a new docking protocol that combines a recently proposed explicit water placement algorithm based on the 3D-RISM-KH molecular theory of solvation and conventional docking software (AutoDock Vina) to explain the mechanisms of maltotriose binding to the apo-open state of a maltose-binding protein. We confirm the predictions of previous NMR spectroscopic experiments on binding modes of the ligand. We provide the molecular details on the binding mode that was not previously observed in the X-ray experiments. We show that this mode, which is defined by the fine balance between the protein-ligand direct interactions and solvation effects, can trigger the protein's domain motion resulting in the holo-closed structure of the maltose-binding protein with the maltotriose ligand in excellent agreement with the experimental data. We also discuss the role of water in blocking unfavorable binding sites and water-mediated interactions contributing to the stability of observable binding modes of maltotriose.

Recombinant Mouse Prion Protein Alone or in Combination with Lipopolysaccharide Alters Expression of Innate Immunity Genes in the Colon of Mice

Prion. 2015  |  Pubmed ID: 25695140

The objectives of this study were to test whether recombinant mouse (mo)PrP alone or in combination with LPS or under simulated endotoxemia would affect expression of genes related to host inflammatory and antimicrobial responses. To test our hypotheses colon tissues were collected from 16 male mice (FVB/N strain) and mounted in an Ussing chamber. Application of moPrP to the mucosal side of the colon affected genes related to TLR- and NLR- signaling and antimicrobial responses. When LPS was added on the mucosal side of the colon, genes related to TLR, Nlrp3 inflammasome, and iron transport proteins were over-expressed. Addition of LPS to the serosal side of the colon up-regulated genes related to TLR- and NLR-signaling, Nlrp3 inflammasome, and a chemokine. Treatment with both moPrP and LPS to the mucosal side of the colon upregulated genes associated with TLR, downstream signal transduction (DST), inflammatory response, attraction of dendritic cells to the site of inflammation, and the JNK-apoptosis pathway. Administration of moPrP to the mucosal side and LPS to the serosal side of the colon affected genes related to TLR- and NLR-signaling, DST, apoptosis, inflammatory response, cytokines, chemokines, and antimicrobial peptides. Overall this study suggests a potential role for moPrP as an endogenous 'danger signal' associated with activation of colon genes related to innate immunity and antibacterial responses.

MetaboAnalyst 3.0--making Metabolomics More Meaningful

Nucleic Acids Research. Jul, 2015  |  Pubmed ID: 25897128

MetaboAnalyst ( is a web server designed to permit comprehensive metabolomic data analysis, visualization and interpretation. It supports a wide range of complex statistical calculations and high quality graphical rendering functions that require significant computational resources. First introduced in 2009, MetaboAnalyst has experienced more than a 50X growth in user traffic (>50 000 jobs processed each month). In order to keep up with the rapidly increasing computational demands and a growing number of requests to support translational and systems biology applications, we performed a substantial rewrite and major feature upgrade of the server. The result is MetaboAnalyst 3.0. By completely re-implementing the MetaboAnalyst suite using the latest web framework technologies, we have been able substantially improve its performance, capacity and user interactivity. Three new modules have also been added including: (i) a module for biomarker analysis based on the calculation of receiver operating characteristic curves; (ii) a module for sample size estimation and power analysis for improved planning of metabolomics studies and (iii) a module to support integrative pathway analysis for both genes and metabolites. In addition, popular features found in existing modules have been significantly enhanced by upgrading the graphical output, expanding the compound libraries and by adding support for more diverse organisms.

Pathways with PathWhiz

Nucleic Acids Research. Jul, 2015  |  Pubmed ID: 25934797

PathWhiz ( is a web server designed to create colourful, visually pleasing and biologically accurate pathway diagrams that are both machine-readable and interactive. As a web server, PathWhiz is accessible from almost any place and compatible with essentially any operating system. It also houses a public library of pathways and pathway components that can be easily viewed and expanded upon by its users. PathWhiz allows users to readily generate biologically complex pathways by using a specially designed drawing palette to quickly render metabolites (including automated structure generation), proteins (including quaternary structures, covalent modifications and cofactors), nucleic acids, membranes, subcellular structures, cells, tissues and organs. Both small-molecule and protein/gene pathways can be constructed by combining multiple pathway processes such as reactions, interactions, binding events and transport activities. PathWhiz's pathway replication and propagation functions allow for existing pathways to be used to create new pathways or for existing pathways to be automatically propagated across species. PathWhiz pathways can be saved in BioPAX, SBGN-ML and SBML data exchange formats, as well as PNG, PWML, HTML image map or SVG images that can be viewed offline or explored using PathWhiz's interactive viewer. PathWhiz has been used to generate over 700 pathway diagrams for a number of popular databases including HMDB, DrugBank and SMPDB.

CSI 3.0: a Web Server for Identifying Secondary and Super-secondary Structure in Proteins Using NMR Chemical Shifts

Nucleic Acids Research. Jul, 2015  |  Pubmed ID: 25979265

The Chemical Shift Index or CSI 3.0 ( is a web server designed to accurately identify the location of secondary and super-secondary structures in protein chains using only nuclear magnetic resonance (NMR) backbone chemical shifts and their corresponding protein sequence data. Unlike earlier versions of CSI, which only identified three types of secondary structure (helix, β-strand and coil), CSI 3.0 now identifies total of 11 types of secondary and super-secondary structures, including helices, β-strands, coil regions, five common β-turns (type I, II, I', II' and VIII), β hairpins as well as interior and edge β-strands. CSI 3.0 accepts experimental NMR chemical shift data in multiple formats (NMR Star 2.1, NMR Star 3.1 and SHIFTY) and generates colorful CSI plots (bar graphs) and secondary/super-secondary structure assignments. The output can be readily used as constraints for structure determination and refinement or the images may be used for presentations and publications. CSI 3.0 uses a pipeline of several well-tested, previously published programs to identify the secondary and super-secondary structures in protein chains. Comparisons with secondary and super-secondary structure assignments made via standard coordinate analysis programs such as DSSP, STRIDE and VADAR on high-resolution protein structures solved by X-ray and NMR show >90% agreement between those made with CSI 3.0.

Metabolomic Fingerprint of Heart Failure with Preserved Ejection Fraction

PloS One. 2015  |  Pubmed ID: 26010610

Heart failure (HF) with preserved ejection fraction (HFpEF) is increasingly recognized as an important clinical entity. Preclinical studies have shown differences in the pathophysiology between HFpEF and HF with reduced ejection fraction (HFrEF). Therefore, we hypothesized that a systematic metabolomic analysis would reveal a novel metabolomic fingerprint of HFpEF that will help understand its pathophysiology and assist in establishing new biomarkers for its diagnosis.

Accurate, Fully-automated NMR Spectral Profiling for Metabolomics

PloS One. 2015  |  Pubmed ID: 26017271

Many diseases cause significant changes to the concentrations of small molecules (a.k.a. metabolites) that appear in a person's biofluids, which means such diseases can often be readily detected from a person's "metabolic profile"-i.e., the list of concentrations of those metabolites. This information can be extracted from a biofluids Nuclear Magnetic Resonance (NMR) spectrum. However, due to its complexity, NMR spectral profiling has remained manual, resulting in slow, expensive and error-prone procedures that have hindered clinical and industrial adoption of metabolomics via NMR. This paper presents a system, BAYESIL, which can quickly, accurately, and autonomously produce a person's metabolic profile. Given a 1D 1H NMR spectrum of a complex biofluid (specifically serum or cerebrospinal fluid), BAYESIL can automatically determine the metabolic profile. This requires first performing several spectral processing steps, then matching the resulting spectrum against a reference compound library, which contains the "signatures" of each relevant metabolite. BAYESIL views spectral matching as an inference problem within a probabilistic graphical model that rapidly approximates the most probable metabolic profile. Our extensive studies on a diverse set of complex mixtures including real biological samples (serum and CSF), defined mixtures and realistic computer generated spectra; involving > 50 compounds, show that BAYESIL can autonomously find the concentration of NMR-detectable metabolites accurately (~ 90% correct identification and ~ 10% quantification error), in less than 5 minutes on a single CPU. These results demonstrate that BAYESIL is the first fully-automatic publicly-accessible system that provides quantitative NMR spectral profiling effectively-with an accuracy on these biofluids that meets or exceeds the performance of trained experts. We anticipate this tool will usher in high-throughput metabolomics and enable a wealth of new applications of NMR in clinical settings. BAYESIL is accessible at

NMR Exchange Format: a Unified and Open Standard for Representation of NMR Restraint Data

Nature Structural & Molecular Biology. Jun, 2015  |  Pubmed ID: 26036565

Accessible Surface Area from NMR Chemical Shifts

Journal of Biomolecular NMR. Jul, 2015  |  Pubmed ID: 26078090

Accessible surface area (ASA) is the surface area of an atom, amino acid or biomolecule that is exposed to solvent. The calculation of a molecule's ASA requires three-dimensional coordinate data and the use of a "rolling ball" algorithm to both define and calculate the ASA. For polymers such as proteins, the ASA for individual amino acids is closely related to the hydrophobicity of the amino acid as well as its local secondary and tertiary structure. For proteins, ASA is a structural descriptor that can often be as informative as secondary structure. Consequently there has been considerable effort over the past two decades to try to predict ASA from protein sequence data and to use ASA information (derived from chemical modification studies) as a structure constraint. Recently it has become evident that protein chemical shifts are also sensitive to ASA. Given the potential utility of ASA estimates as structural constraints for NMR we decided to explore this relationship further. Using machine learning techniques (specifically a boosted tree regression model) we developed an algorithm called "ShiftASA" that combines chemical-shift and sequence derived features to accurately estimate per-residue fractional ASA values of water-soluble proteins. This method showed a correlation coefficient between predicted and experimental values of 0.79 when evaluated on a set of 65 independent test proteins, which was an 8.2 % improvement over the next best performing (sequence-only) method. On a separate test set of 92 proteins, ShiftASA reported a mean correlation coefficient of 0.82, which was 12.3 % better than the next best performing method. ShiftASA is available as a web server ( ) for submitting input queries for fractional ASA calculation.

Standardizing the Experimental Conditions for Using Urine in NMR-based Metabolomic Studies with a Particular Focus on Diagnostic Studies: a Review

Metabolomics : Official Journal of the Metabolomic Society. 2015  |  Pubmed ID: 26109927

The metabolic composition of human biofluids can provide important diagnostic and prognostic information. Among the biofluids most commonly analyzed in metabolomic studies, urine appears to be particularly useful. It is abundant, readily available, easily stored and can be collected by simple, noninvasive techniques. Moreover, given its chemical complexity, urine is particularly rich in potential disease biomarkers. This makes it an ideal biofluid for detecting or monitoring disease processes. Among the metabolomic tools available for urine analysis, NMR spectroscopy has proven to be particularly well-suited, because the technique is highly reproducible and requires minimal sample handling. As it permits the identification and quantification of a wide range of compounds, independent of their chemical properties, NMR spectroscopy has been frequently used to detect or discover disease fingerprints and biomarkers in urine. Although protocols for NMR data acquisition and processing have been standardized, no consensus on protocols for urine sample selection, collection, storage and preparation in NMR-based metabolomic studies have been developed. This lack of consensus may be leading to spurious biomarkers being reported and may account for a general lack of reproducibility between laboratories. Here, we review a large number of published studies on NMR-based urine metabolic profiling with the aim of identifying key variables that may affect the results of metabolomics studies. From this survey, we identify a number of issues that require either standardization or careful accounting in experimental design and provide some recommendations for urine collection, sample preparation and data acquisition.

Validation of Metabolomic Models for Prediction of Early-onset Preeclampsia

American Journal of Obstetrics and Gynecology. Oct, 2015  |  Pubmed ID: 26116099

We sought to perform validation studies of previously published and newly derived first-trimester metabolomic algorithms for prediction of early preeclampsia (PE).

Correction: Accurate, Fully-Automated NMR Spectral Profiling for Metabolomics

PloS One. 2015  |  Pubmed ID: 26222058

Is Cancer a Genetic Disease or a Metabolic Disease?

EBioMedicine. Jun, 2015  |  Pubmed ID: 26288805

A Robust Algorithm for Optimizing Protein Structures with NMR Chemical Shifts

Journal of Biomolecular NMR. Nov, 2015  |  Pubmed ID: 26345175

Over the past decade, a number of methods have been developed to determine the approximate structure of proteins using minimal NMR experimental information such as chemical shifts alone, sparse NOEs alone or a combination of comparative modeling data and chemical shifts. However, there have been relatively few methods that allow these approximate models to be substantively refined or improved using the available NMR chemical shift data. Here, we present a novel method, called Chemical Shift driven Genetic Algorithm for biased Molecular Dynamics (CS-GAMDy), for the robust optimization of protein structures using experimental NMR chemical shifts. The method incorporates knowledge-based scoring functions and structural information derived from NMR chemical shifts via a unique combination of multi-objective MD biasing, a genetic algorithm, and the widely used XPLOR molecular modelling language. Using this approach, we demonstrate that CS-GAMDy is able to refine and/or fold models that are as much as 10 Å (RMSD) away from the correct structure using only NMR chemical shift data. CS-GAMDy is also able to refine of a wide range of approximate or mildly erroneous protein structures to more closely match the known/correct structure and the known/correct chemical shifts. We believe CS-GAMDy will allow protein models generated by sparse restraint or chemical-shift-only methods to achieve sufficiently high quality to be considered fully refined and "PDB worthy". The CS-GAMDy algorithm is explained in detail and its performance is compared over a range of refinement scenarios with several commonly used protein structure refinement protocols. The program has been designed to be easily installed and easily used and is available at

Metabolome Analysis of 20 Taxonomically Related Benzylisoquinoline Alkaloid-producing Plants

BMC Plant Biology. Sep, 2015  |  Pubmed ID: 26369413

Recent progress toward the elucidation of benzylisoquinoline alkaloid (BIA) metabolism has focused on a small number of model plant species. Current understanding of BIA metabolism in plants such as opium poppy, which accumulates important pharmacological agents such as codeine and morphine, has relied on a combination of genomics and metabolomics to facilitate gene discovery. Metabolomics studies provide important insight into the primary biochemical networks underpinning specialized metabolism, and serve as a key resource for metabolic engineering, gene discovery, and elucidation of governing regulatory mechanisms. Beyond model plants, few broad-scope metabolomics reports are available for the vast number of plant species known to produce an estimated 2500 structurally diverse BIAs, many of which exhibit promising medicinal properties.

Sildenafil Therapy Normalizes the Aberrant Metabolomic Profile in the Comt(-/-) Mouse Model of Preeclampsia/Fetal Growth Restriction

Scientific Reports. Dec, 2015  |  Pubmed ID: 26667607

Preeclampsia (PE) and fetal growth restriction (FGR) are serious complications of pregnancy, associated with greatly increased risk of maternal and perinatal morbidity and mortality. These complications are difficult to diagnose and no curative treatments are available. We hypothesized that the metabolomic signature of two models of disease, catechol-O-methyl transferase (COMT(-/-)) and endothelial nitric oxide synthase (Nos3(-/-)) knockout mice, would be significantly different from control C57BL/6J mice. Further, we hypothesised that any differences in COMT(-/-) mice would be resolved following treatment with Sildenafil, a treatment which rescues fetal growth. Targeted, quantitative comparisons of serum metabolic profiles of pregnant Nos3(-/-), COMT(-/-) and C57BL/6J mice were made using a kit from BIOCRATES. Significant differences in 4 metabolites were observed between Nos3(-/-) and C57BL/6J mice (p < 0.05) and in 18 metabolites between C57BL/6J and COMT(-/-) mice (p < 0.05). Following treatment with Sildenafil, only 5 of the 18 previously identified differences in metabolites (p < 0.05) remained in COMT(-/-) mice. Metabolomic profiling of mouse models is possible, producing signatures that are clearly different from control animals. A potential new treatment, Sildenafil, is able to normalize the aberrant metabolomic profile in COMT(-/-) mice; as this treatment moves into clinical trials, this information may assist in assessing possible mechanisms of action.

ECMDB 2.0: A Richer Resource for Understanding the Biochemistry of E. Coli

Nucleic Acids Research. Jan, 2016  |  Pubmed ID: 26481353

ECMDB or the Escherichia coli Metabolome Database ( is a comprehensive database containing detailed information about the genome and metabolome of E. coli (K-12). First released in 2012, the ECMDB has undergone substantial expansion and many modifications over the past 4 years. This manuscript describes the most recent version of ECMDB (ECMDB 2.0). In particular, it provides a comprehensive update of the database that was previously described in the 2013 NAR Database Issue and details many of the additions and improvements made to the ECMDB over that time. Some of the most important or significant enhancements include a 13-fold increase in the number of metabolic pathway diagrams (from 125 to 1650), a 3-fold increase in the number of compounds linked to pathways (from 1058 to 3280), the addition of dozens of operon/metabolite signalling pathways, a 44% increase in the number of compounds in the database (from 2610 to 3760), a 7-fold increase in the number of compounds with NMR or MS spectra (from 412 to 3261) and a massive increase in the number of external links to other E. coli or chemical resources. These additions, along with many other enhancements aimed at improving the ease or speed of querying, searching and viewing the data within ECMDB should greatly facilitate the understanding of not only the metabolism of E. coli, but also allow the in-depth exploration of its extensive metabolic networks, its many signalling pathways and its essential biochemistry.

Isolation of Soluble ScFv Antibody Fragments Specific for Small Biomarker Molecule, L-Carnitine, Using Phage Display

Journal of Immunological Methods. Jan, 2016  |  Pubmed ID: 26608419

Isolation of single chain antibody fragment (scFv) clones from naïve Tomlinson I+J phage display libraries that specifically bind a small biomarker molecule, L-Carnitine, was performed using iterative affinity selection procedures. L-Carnitine has been described as a conditionally essential nutrient for humans. Abnormally high concentrations of L-Carnitine in urine are related to many health disorders including diabetes mellitus type 2 and lung cancer. ELISA-based affinity characterization results indicate that selectants preferentially bind to L-Carnitine in the presence of key bioselecting component materials and closely related L-Carnitine derivatives. In addition, the affinity results were confirmed using biophysical fluorescence quenching for tyrosine residues in the V segment. Small-scale production of the soluble fragment yielded 1.3mg/L using immunopure-immobilized protein A affinity column. Circular Dichroism data revealed that the antibody fragment (Ab) represents a folded protein that mainly consists of β-sheets. These novel antibody fragments may find utility as molecular affinity interface receptors in various electrochemical biosensor platforms to provide specific L-Carnitine binding capability with potential applications in metabolomic devices for companion diagnostics and personalized medicine applications. It may also be used in any other biomedical application where detection of the L-Carnitine level is important.

Recommendations and Standardization of Biomarker Quantification Using NMR-Based Metabolomics with Particular Focus on Urinary Analysis

Journal of Proteome Research. Feb, 2016  |  Pubmed ID: 26745651

NMR-based metabolomics has shown considerable promise in disease diagnosis and biomarker discovery because it allows one to nondestructively identify and quantify large numbers of novel metabolite biomarkers in both biofluids and tissues. Precise metabolite quantification is a prerequisite to move any chemical biomarker or biomarker panel from the lab to the clinic. Among the biofluids commonly used for disease diagnosis and prognosis, urine has several advantages. It is abundant, sterile, and easily obtained, needs little sample preparation, and does not require invasive medical procedures for collection. Furthermore, urine captures and concentrates many "unwanted" or "undesirable" compounds throughout the body, providing a rich source of potentially useful disease biomarkers; however, incredible variation in urine chemical concentrations makes analysis of urine and identification of useful urinary biomarkers by NMR challenging. We discuss a number of the most significant issues regarding NMR-based urinary metabolomics with specific emphasis on metabolite quantification for disease biomarker applications and propose data collection and instrumental recommendations regarding NMR pulse sequences, acceptable acquisition parameter ranges, relaxation effects on quantitation, proper handling of instrumental differences, sample preparation, and biomarker assessment.

Clinical Phenotype Clustering in Cardiovascular Risk Patients for the Identification of Responsive Metabotypes After Red Wine Polyphenol Intake

The Journal of Nutritional Biochemistry. Feb, 2016  |  Pubmed ID: 26878788

This study aims to evaluate the robustness of clinical and metabolic phenotyping through, for the first time, the identification of differential responsiveness to dietary strategies in the improvement of cardiometabolic risk conditions. Clinical phenotyping of 57 volunteers with cardiovascular risk factors was achieved using k-means cluster analysis based on 69 biochemical and anthropometric parameters. Cluster validation based on Dunn and Figure of Merit analysis for internal coherence and external homogeneity were employed. k-Means produced four clusters with particular clinical profiles. Differences on urine metabolomic profiles among clinical phenotypes were explored and validated by multivariate orthogonal signal correction partial least-squares discriminant analysis (OSC-PLS-DA) models. OSC-PLS-DA of (1)H-NMR data revealed that model comparing "obese and diabetic cluster" (OD-c) against "healthier cluster" (H-c) showed the best predictability and robustness in terms of explaining the pairwise differences between clusters. Considering these two clusters, distinct groups of metabolites were observed following an intervention with wine polyphenol intake (WPI; 733 equivalents of gallic acid/day) per 28days. Glucose was significantly linked to OD-c metabotype (P<.01), and lactate, betaine and dimethylamine showed a significant trend. Tartrate (P<.001) was associated with wine polyphenol intervention (OD-c_WPI and H-c_WPI), whereas mannitol, threonine methanol, fucose and 3-hydroxyphenylacetate showed a significant trend. Interestingly, 4-hydroxyphenylacetate significantly increased in H-c_WPI compared to OD-c_WPI and to basal groups (P<.05)-gut microbial-derived metabolite after polyphenol intake-, thereby exhibiting a clear metabotypic intervention effect. Results revealed gut microbiota responsive phenotypes to wine polyphenols intervention. Overall, this study illustrates a novel metabolomic strategy for characterizing interindividual responsiveness to dietary intervention and identification of health benefits.

Cancer Metabolomics and the Human Metabolome Database

Metabolites. Mar, 2016  |  Pubmed ID: 26950159

The application of metabolomics towards cancer research has led to a renewed appreciation of metabolism in cancer development and progression. It has also led to the discovery of metabolite cancer biomarkers and the identification of a number of novel cancer causing metabolites. The rapid growth of metabolomics in cancer research is also leading to challenges. In particular, with so many cancer-associate metabolites being identified, it is often difficult to keep track of which compounds are associated with which cancers. It is also challenging to track down information on the specific pathways that particular metabolites, drugs or drug metabolites may be affecting. Even more frustrating are the difficulties associated with identifying metabolites from NMR or MS spectra. Fortunately, a number of metabolomics databases are emerging that are designed to address these challenges. One such database is the Human Metabolome Database (HMDB). The HMDB is currently the world's largest and most comprehensive, organism-specific metabolomics database. It contains more than 40,000 metabolite entries, thousands of metabolite concentrations, >700 metabolic and disease-associated pathways, as well as information on dozens of cancer biomarkers. This review is intended to provide a brief summary of the HMDB and to offer some guidance on how it can be used in metabolomic studies of cancer.

Emerging Applications of Metabolomics in Drug Discovery and Precision Medicine

Nature Reviews. Drug Discovery. Jul, 2016  |  Pubmed ID: 26965202

Metabolomics is an emerging 'omics' science involving the comprehensive characterization of metabolites and metabolism in biological systems. Recent advances in metabolomics technologies are leading to a growing number of mainstream biomedical applications. In particular, metabolomics is increasingly being used to diagnose disease, understand disease mechanisms, identify novel drug targets, customize drug treatments and monitor therapeutic outcomes. This Review discusses some of the latest technological advances in metabolomics, focusing on the application of metabolomics towards uncovering the underlying causes of complex diseases (such as atherosclerosis, cancer and diabetes), the growing role of metabolomics in drug discovery and its potential effect on precision medicine.

Introduction to Cheminformatics

Current Protocols in Bioinformatics. Mar, 2016  |  Pubmed ID: 27010335

Cheminformatics is a field of information technology that focuses on the collection, storage, analysis, and manipulation of chemical data. The chemical data of interest typically includes information on small molecule formulas, structures, properties, spectra, and activities (biological or industrial). Cheminformatics originally emerged as a vehicle to help the drug discovery and development process, however cheminformatics now plays an increasingly important role in many areas of biology, chemistry, and biochemistry. The intent of this unit is to give readers some introduction into the field of cheminformatics and to show how cheminformatics not only shares many similarities with the field of bioinformatics, but also enhances much of what is currently done in bioinformatics, molecular biology, and biochemistry.

PHASTER: a Better, Faster Version of the PHAST Phage Search Tool

Nucleic Acids Research. Jul, 2016  |  Pubmed ID: 27141966

PHASTER (PHAge Search Tool - Enhanced Release) is a significant upgrade to the popular PHAST web server for the rapid identification and annotation of prophage sequences within bacterial genomes and plasmids. Although the steps in the phage identification pipeline in PHASTER remain largely the same as in the original PHAST, numerous software improvements and significant hardware enhancements have now made PHASTER faster, more efficient, more visually appealing and much more user friendly. In particular, PHASTER is now 4.3× faster than PHAST when analyzing a typical bacterial genome. More specifically, software optimizations have made the backend of PHASTER 2.7X faster than PHAST, while the addition of 80 CPUs to the PHASTER compute cluster are responsible for the remaining speed-up. PHASTER can now process a typical bacterial genome in 3 min from the raw sequence alone, or in 1.5 min when given a pre-annotated GenBank file. A number of other optimizations have also been implemented, including automated algorithms to reduce the size and redundancy of PHASTER's databases, improvements in handling multiple (metagenomic) queries and higher user traffic, along with the ability to perform automated look-ups against 14 000 previously PHAST/PHASTER annotated bacterial genomes (which can lead to complete phage annotations in seconds as opposed to minutes). PHASTER's web interface has also been entirely rewritten. A new graphical genome browser has been added, gene/genome visualization tools have been improved, and the graphical interface is now more modern, robust and user-friendly. PHASTER is available online at

Heatmapper: Web-enabled Heat Mapping for All

Nucleic Acids Research. Jul, 2016  |  Pubmed ID: 27190236

Heatmapper is a freely available web server that allows users to interactively visualize their data in the form of heat maps through an easy-to-use graphical interface. Unlike existing non-commercial heat map packages, which either lack graphical interfaces or are specialized for only one or two kinds of heat maps, Heatmapper is a versatile tool that allows users to easily create a wide variety of heat maps for many different data types and applications. More specifically, Heatmapper allows users to generate, cluster and visualize: (i) expression-based heat maps from transcriptomic, proteomic and metabolomic experiments; (ii) pairwise distance maps; (iii) correlation maps; (iv) image overlay heat maps; (v) latitude and longitude heat maps and (vi) geopolitical (choropleth) heat maps. Heatmapper offers a number of simple and intuitive customization options for facile adjustments to each heat map's appearance and plotting parameters. Heatmapper also allows users to interactively explore their numeric data values by hovering their cursor over each heat map cell, or by using a searchable/sortable data table view. Heat map data can be easily uploaded to Heatmapper in text, Excel or tab delimited formatted tables and the resulting heat map images can be easily downloaded in common formats including PNG, JPG and PDF. Heatmapper is designed to appeal to a wide range of users, including molecular biologists, structural biologists, microbiologists, epidemiologists, environmental scientists, agriculture/forestry scientists, fish and wildlife biologists, climatologists, geologists, educators and students. Heatmapper is available at

Using DrugBank for In Silico Drug Exploration and Discovery

Current Protocols in Bioinformatics. Jun, 2016  |  Pubmed ID: 27322405

DrugBank is a fully curated drug and drug target database that contains 8174 drug entries including 1944 FDA approved small-molecule drugs, 198 FDA-approved biotech (protein/peptide) drugs, 93 nutraceuticals, and over 6000 experimental drugs. Additionally, 4300 non-redundant protein (i.e., drug target/enzyme/transporter/carrier) sequences are linked to these drug entries. DrugBank is primarily focused on providing both the query/search tools and biophysical data needed to facilitate drug discovery and drug development. This unit provides readers with a detailed description of how to effectively use the DrugBank database and how to navigate through the DrugBank Web site. It also provides specific examples of how to find chemical homologs of potential drug leads and how to identify potential drug targets from newly sequenced tumor samples. The intent of this unit is to give readers an introduction to the field of Web-based drug discovery and to show how cheminformatics can be seamlessly integrated into the field of bioinformatics. © 2016 by John Wiley & Sons, Inc.

Detecting Renal Allograft Inflammation Using Quantitative Urine Metabolomics and CXCL10

Transplantation Direct. Jun, 2016  |  Pubmed ID: 27500268

The goal of this study was to characterize urinary metabolomics for the noninvasive detection of cellular inflammation and to determine if adding urinary chemokine ligand 10 (CXCL10) improves the overall diagnostic discrimination.

The Future of NMR-based Metabolomics

Current Opinion in Biotechnology. Aug, 2016  |  Pubmed ID: 27580257

The two leading analytical approaches to metabolomics are mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy. Although currently overshadowed by MS in terms of numbers of compounds resolved, NMR spectroscopy offers advantages both on its own and coupled with MS. NMR data are highly reproducible and quantitative over a wide dynamic range and are unmatched for determining structures of unknowns. NMR is adept at tracing metabolic pathways and fluxes using isotope labels. Moreover, NMR is non-destructive and can be utilized in vivo. NMR results have a proven track record of translating in vitro findings to in vivo clinical applications.

Using MetaboAnalyst 3.0 for Comprehensive Metabolomics Data Analysis

Current Protocols in Bioinformatics. Sep, 2016  |  Pubmed ID: 27603023

MetaboAnalyst ( is a comprehensive Web application for metabolomic data analysis and interpretation. MetaboAnalyst handles most of the common metabolomic data types from most kinds of metabolomics platforms (MS and NMR) for most kinds of metabolomics experiments (targeted, untargeted, quantitative). In addition to providing a variety of data processing and normalization procedures, MetaboAnalyst also supports a number of data analysis and data visualization tasks using a range of univariate, multivariate methods such as PCA (principal component analysis), PLS-DA (partial least squares discriminant analysis), heatmap clustering and machine learning methods. MetaboAnalyst also offers a variety of tools for metabolomic data interpretation including MSEA (metabolite set enrichment analysis), MetPA (metabolite pathway analysis), and biomarker selection via ROC (receiver operating characteristic) curve analysis, as well as time series and power analysis. This unit provides an overview of the main functional modules and the general workflow of the latest version of MetaboAnalyst (MetaboAnalyst 3.0), followed by eight detailed protocols. © 2016 by John Wiley & Sons, Inc.

Metabolomics Enables Precision Medicine: "A White Paper, Community Perspective"

Metabolomics : Official Journal of the Metabolomic Society. 2016  |  Pubmed ID: 27642271

Metabolomics is the comprehensive study of the metabolome, the repertoire of biochemicals (or small molecules) present in cells, tissues, and body fluids. The study of metabolism at the global or "-omics" level is a rapidly growing field that has the potential to have a profound impact upon medical practice. At the center of metabolomics, is the concept that a person's metabolic state provides a close representation of that individual's overall health status. This metabolic state reflects what has been encoded by the genome, and modified by diet, environmental factors, and the gut microbiome. The metabolic profile provides a quantifiable readout of biochemical state from normal physiology to diverse pathophysiologies in a manner that is often not obvious from gene expression analyses. Today, clinicians capture only a very small part of the information contained in the metabolome, as they routinely measure only a narrow set of blood chemistry analytes to assess health and disease states. Examples include measuring glucose to monitor diabetes, measuring cholesterol and high density lipoprotein/low density lipoprotein ratio to assess cardiovascular health, BUN and creatinine for renal disorders, and measuring a panel of metabolites to diagnose potential inborn errors of metabolism in neonates.

Nano-Optomechanical Systems for Gas Chromatography

Nano Letters. Nov, 2016  |  Pubmed ID: 27749074

Microgas chromatography (GC) is promising for portable chemical analysis. We demonstrate a nano-optomechanical system (NOMS) as an ultrasensitive mass detector in gas chromatography. Bare, native oxide, silicon surfaces are sensitive enough to monitor volatile organic compounds at ppm levels, while simultaneously demonstrating chemical selectivity. The NOMS is able to sense GC peaks from derivatized metabolites at physiological concentrations. This is an important milestone for small-molecule quantitation assays in next generation metabolite analyses for applications such as disease diagnosis and personalized medicine. The optical microring, which plays an important role in the nanomechanical signal transduction mechanism, can also be used as an analyte concentration sensor. Different adsorption kinetics regimes are realized at different temperatures allowing temporary condensation of the analyte onto the sensor surfaces. This effect amplifies the signal, resulting in a 1 ppb level limit of detection, without partition enhancement from absorbing media. This sensitivity bodes well for NOMS as universal, ultrasensitive detectors in micro-GC, breath analysis, and other chemical-sensing applications.

SPLASH, a Hashed Identifier for Mass Spectra

Nature Biotechnology. Nov, 2016  |  Pubmed ID: 27824832

ClassyFire: Automated Chemical Classification with a Comprehensive, Computable Taxonomy

Journal of Cheminformatics. 2016  |  Pubmed ID: 27867422

Scientists have long been driven by the desire to describe, organize, classify, and compare objects using taxonomies and/or ontologies. In contrast to biology, geology, and many other scientific disciplines, the world of chemistry still lacks a standardized chemical ontology or taxonomy. Several attempts at chemical classification have been made; but they have mostly been limited to either manual, or semi-automated proof-of-principle applications. This is regrettable as comprehensive chemical classification and description tools could not only improve our understanding of chemistry but also improve the linkage between chemistry and many other fields. For instance, the chemical classification of a compound could help predict its metabolic fate in humans, its druggability or potential hazards associated with it, among others. However, the sheer number (tens of millions of compounds) and complexity of chemical structures is such that any manual classification effort would prove to be near impossible.

Role of Polysaccharide and Lipid in Lipopolysaccharide Induced Prion Protein Conversion

Prion. Nov, 2016  |  Pubmed ID: 27906600

Conversion of native cellular prion protein (PrP(c)) from an α-helical structure to a toxic and infectious β-sheet structure (PrP(Sc)) is a critical step in the development of prion disease. There are some indications that the formation of PrP(Sc) is preceded by a β-sheet rich PrP (PrP(β)) form which is non-infectious, but is an intermediate in the formation of infectious PrP(Sc). Furthermore the presence of lipid cofactors is thought to be critical in the formation of both intermediate-PrP(β) and lethal, infectious PrP(Sc). We previously discovered that the endotoxin, lipopolysaccharide (LPS), interacts with recombinant PrP(c) and induces rapid conformational change to a β-sheet rich structure. This LPS induced PrP(β) structure exhibits PrP(Sc)-like features including proteinase K (PK) resistance and the capacity to form large oligomers and rod-like fibrils. LPS is a large, complex molecule with lipid, polysaccharide, 2-keto-3-deoxyoctonate (Kdo) and glucosamine components. To learn more about which LPS chemical constituents are critical for binding PrP(c) and inducing β-sheet conversion we systematically investigated which chemical components of LPS either bind or induce PrP conversion to PrP(β). We analyzed this PrP conversion using resolution enhanced native acidic gel electrophoresis (RENAGE), tryptophan fluorescence, circular dichroism, electron microscopy and PK resistance. Our results indicate that a minimal version of LPS (called detoxified and partially de-acylated LPS or dLPS) containing a portion of the polysaccharide and a portion of the lipid component is sufficient for PrP conversion. Lipid components, alone, and saccharide components, alone, are insufficient for conversion.

Current and Future Perspectives on the Structural Identification of Small Molecules in Biological Systems

Metabolites. Dec, 2016  |  Pubmed ID: 27983674

Although significant advances have been made in recent years, the structural elucidation of small molecules continues to remain a challenging issue for metabolite profiling. Many metabolomic studies feature unknown compounds; sometimes even in the list of features identified as "statistically significant" in the study. Such metabolic "dark matter" means that much of the potential information collected by metabolomics studies is lost. Accurate structure elucidation allows researchers to identify these compounds. This in turn, facilitates downstream metabolite pathway analysis, and a better understanding of the underlying biology of the system under investigation. This review covers a range of methods for the structural elucidation of individual compounds, including those based on gas and liquid chromatography hyphenated to mass spectrometry, single and multi-dimensional nuclear magnetic resonance spectroscopy, and high-resolution mass spectrometry and includes discussion of data standardization. Future perspectives in structure elucidation are also discussed; with a focus on the potential development of instruments and techniques, in both nuclear magnetic resonance spectroscopy and mass spectrometry that, may help solve some of the current issues that are hampering the complete identification of metabolite structure and function.

Metabolomic Determination of Pathogenesis of Late-onset Preeclampsia

The Journal of Maternal-fetal & Neonatal Medicine : the Official Journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians. Mar, 2017  |  Pubmed ID: 27569705

Our primary objective was to apply metabolomic pathway analysis of first trimester maternal serum to provide an insight into the pathogenesis of late-onset preeclampsia (late-PE) and thereby identify plausible therapeutic targets for PE.

YMDB 2.0: a Significantly Expanded Version of the Yeast Metabolome Database

Nucleic Acids Research. Jan, 2017  |  Pubmed ID: 27899612

YMDB or the Yeast Metabolome Database ( is a comprehensive database containing extensive information on the genome and metabolome of Saccharomyces cerevisiae Initially released in 2012, the YMDB has gone through a significant expansion and a number of improvements over the past 4 years. This manuscript describes the most recent version of YMDB (YMDB 2.0). More specifically, it provides an updated description of the database that was previously described in the 2012 NAR Database Issue and it details many of the additions and improvements made to the YMDB over that time. Some of the most important changes include a 7-fold increase in the number of compounds in the database (from 2007 to 16 042), a 430-fold increase in the number of metabolic and signaling pathway diagrams (from 66 to 28 734), a 16-fold increase in the number of compounds linked to pathways (from 742 to 12 733), a 17-fold increase in the numbers of compounds with nuclear magnetic resonance or MS spectra (from 783 to 13 173) and an increase in both the number of data fields and the number of links to external databases. In addition to these database expansions, a number of improvements to YMDB's web interface and its data visualization tools have been made. These additions and improvements should greatly improve the ease, the speed and the quantity of data that can be extracted, searched or viewed within YMDB. Overall, we believe these improvements should not only improve the understanding of the metabolism of S. cerevisiae, but also allow more in-depth exploration of its extensive metabolic networks, signaling pathways and biochemistry.

Improved Glucose Homeostasis in Obese Mice Treated With Resveratrol Is Associated With Alterations in the Gut Microbiome

Diabetes. Feb, 2017  |  Pubmed ID: 27903747

Oral administration of resveratrol is able to improve glucose homeostasis in obese individuals. Herein we show that resveratrol ingestion produces taxonomic and predicted functional changes in the gut microbiome of obese mice. In particular, changes in the gut microbiome were characterized by a decreased relative abundance of Turicibacteraceae, Moryella, Lachnospiraceae, and Akkermansia and an increased relative abundance of Bacteroides and Parabacteroides Moreover, fecal transplantation from healthy resveratrol-fed donor mice is sufficient to improve glucose homeostasis in obese mice, suggesting that the resveratrol-mediated changes in the gut microbiome may play an important role in the mechanism of action of resveratrol.

Exposome-Explorer: a Manually-curated Database on Biomarkers of Exposure to Dietary and Environmental Factors

Nucleic Acids Research. Jan, 2017  |  Pubmed ID: 27924041

Exposome-Explorer ( is the first database dedicated to biomarkers of exposure to environmental risk factors. It contains detailed information on the nature of biomarkers, their concentrations in various human biospecimens, the study population where measured and the analytical techniques used for measurement. It also contains correlations with external exposure measurements and data on biological reproducibility over time. The data in Exposome-Explorer was manually collected from peer-reviewed publications and organized to make it easily accessible through a web interface for in-depth analyses. The database and the web interface were developed using the Ruby on Rails framework. A total of 480 publications were analyzed and 10 510 concentration values in blood, urine and other biospecimens for 692 dietary and pollutant biomarkers were collected. Over 8000 correlation values between dietary biomarker levels and food intake as well as 536 values of biological reproducibility over time were also compiled. Exposome-Explorer makes it easy to compare the performance between biomarkers and their fields of application. It should be particularly useful for epidemiologists and clinicians wishing to select panels of biomarkers that can be used in biomonitoring studies or in exposome-wide association studies, thereby allowing them to better understand the etiology of chronic diseases.

Urinary Metabolomics for Noninvasive Detection of Antibody-mediated Rejection in Children After Kidney Transplantation

Transplantation. Jan, 2017  |  Pubmed ID: 28121909

Biomarkers are needed that identify patients with antibody-mediated rejection (ABMR). The goal of this study was to evaluate the utility of urinary metabolomics for early noninvasive detection of ABMR in pediatric kidney transplant recipients.

GC-MS Metabolomics Identifies Metabolite Alterations That Precede Subclinical Mastitis in the Blood of Transition Dairy Cows

Journal of Proteome Research. Feb, 2017  |  Pubmed ID: 28152597

The objectives of this study were to determine alterations in the serum metabolites related to amino acid (AA), carbohydrate, and lipid metabolism in transition dairy cows before diagnosis of subclinical mastitis (SCM), during, and after diagnosis of disease. A subclinical mastitis case was determined as a cow having somatic cell count (SCC) > 200 000/mL of milk for two or more consecutive reports. Blood samples were collected from 100 Holstein dairy cows at five time points at -8 and -4 weeks before parturition, at the week of SCM diagnosis, and +4 and +8 weeks after parturition. Twenty healthy control cows (CON) and six cows that were diagnosed with SCM were selected for serum analysis with GC-MS. At -8 weeks a total of 13 metabolites were significantly altered in SCM cows. In addition, at the week of SCM diagnosis 17 metabolites were altered in these cows. Four weeks after parturition 10 metabolites were altered in SCM cows and at +8 weeks 11 metabolites were found to be different between the two groups. Valine (Val), serine (Ser), tyrosine (Tyr), and phenylalanine (Phe) had very good predictive abilities for SCM and could be used at -8 weeks and -4 weeks before calving. Combination of Val, isoleucine (Ile), Ser, and proline (Pro) can be used as diagnostic biomarkers of SCM during early stages of lactation at +4 to +8 weeks after parturition. In conclusion, SCM is preceded and followed by alteration in AA metabolism.

simple hit counter