In JoVE (1)

Other Publications (11)

Articles by Allison Pon in JoVE

Other articles by Allison Pon on PubMed

T3DB: a Comprehensively Annotated Database of Common Toxins and Their Targets

Nucleic Acids Research. Jan, 2010  |  Pubmed ID: 19897546

In an effort to capture meaningful biological, chemical and mechanistic information about clinically relevant, commonly encountered or important toxins, we have developed the Toxin and Toxin-Target Database (T3DB). The T3DB is a unique bioinformatics resource that compiles comprehensive information about common or ubiquitous toxins and their toxin-targets into a single electronic repository. The database currently contains over 2900 small molecule and peptide toxins, 1300 toxin-targets and more than 33,000 toxin-target associations. Each T3DB record (ToxCard) contains over 80 data fields providing detailed information on chemical properties and descriptors, toxicity values, protein and gene sequences (for both targets and toxins), molecular and cellular interaction data, toxicological data, mechanistic information and references. This information has been manually extracted and manually verified from numerous sources, including other electronic databases, government documents, textbooks and scientific journals. A key focus of the T3DB is on providing 'depth' over 'breadth' with detailed descriptions, mechanisms of action, and information on toxins and toxin-targets. T3DB is fully searchable and supports extensive text, sequence, chemical structure and relational query searches, similar to those found in the Human Metabolome Database (HMDB) and DrugBank. Potential applications of the T3DB include clinical metabolomics, toxin target prediction, toxicity prediction and toxicology education. The T3DB is available online at

DrugBank 3.0: a Comprehensive Resource for 'omics' Research on Drugs

Nucleic Acids Research. Jan, 2011  |  Pubmed ID: 21059682

DrugBank ( is a richly annotated database of drug and drug target information. It contains extensive data on the nomenclature, ontology, chemistry, structure, function, action, pharmacology, pharmacokinetics, metabolism and pharmaceutical properties of both small molecule and large molecule (biotech) drugs. It also contains comprehensive information on the target diseases, proteins, genes and organisms on which these drugs act. First released in 2006, DrugBank has become widely used by pharmacists, medicinal chemists, pharmaceutical researchers, clinicians, educators and the general public. Since its last update in 2008, DrugBank has been greatly expanded through the addition of new drugs, new targets and the inclusion of more than 40 new data fields per drug entry (a 40% increase in data 'depth'). These data field additions include illustrated drug-action pathways, drug transporter data, drug metabolite data, pharmacogenomic data, adverse drug response data, ADMET data, pharmacokinetic data, computed property data and chemical classification data. DrugBank 3.0 also offers expanded database links, improved search tools for drug-drug and food-drug interaction, new resources for querying and viewing drug pathways and hundreds of new drug entries with detailed patent, pricing and manufacturer data. These additions have been complemented by enhancements to the quality and quantity of existing data, particularly with regard to drug target, drug description and drug action data. DrugBank 3.0 represents the result of 2 years of manual annotation work aimed at making the database much more useful for a wide range of 'omics' (i.e. pharmacogenomic, pharmacoproteomic, pharmacometabolomic and even pharmacoeconomic) applications.

SMPDB 2.0: Big Improvements to the Small Molecule Pathway Database

Nucleic Acids Research. Jan, 2014  |  Pubmed ID: 24203708

The Small Molecule Pathway Database (SMPDB, is a comprehensive, colorful, fully searchable and highly interactive database for visualizing human metabolic, drug action, drug metabolism, physiological activity and metabolic disease pathways. SMPDB contains >600 pathways with nearly 75% of its pathways not found in any other database. All SMPDB pathway diagrams are extensively hyperlinked and include detailed information on the relevant tissues, organs, organelles, subcellular compartments, protein cofactors, protein locations, metabolite locations, chemical structures and protein quaternary structures. Since its last release in 2010, SMPDB has undergone substantial upgrades and significant expansion. In particular, the total number of pathways in SMPDB has grown by >70%. Additionally, every previously entered pathway has been completely redrawn, standardized, corrected, updated and enhanced with additional molecular or cellular information. Many SMPDB pathways now include transporter proteins as well as much more physiological, tissue, target organ and reaction compartment data. Thanks to the development of a standardized pathway drawing tool (called PathWhiz) all SMPDB pathways are now much more easily drawn and far more rapidly updated. PathWhiz has also allowed all SMPDB pathways to be saved in a BioPAX format. Significant improvements to SMPDB's visualization interface now make the browsing, selection, recoloring and zooming of pathways far easier and far more intuitive. Because of its utility and breadth of coverage, SMPDB is now integrated into several other databases including HMDB and DrugBank.

CFM-ID: a Web Server for Annotation, Spectrum Prediction and Metabolite Identification from Tandem Mass Spectra

Nucleic Acids Research. Jul, 2014  |  Pubmed ID: 24895432

CFM-ID is a web server supporting three tasks associated with the interpretation of tandem mass spectra (MS/MS) for the purpose of automated metabolite identification: annotation of the peaks in a spectrum for a known chemical structure; prediction of spectra for a given chemical structure and putative metabolite identification--a predicted ranking of possible candidate structures for a target spectrum. The algorithms used for these tasks are based on Competitive Fragmentation Modeling (CFM), a recently introduced probabilistic generative model for the MS/MS fragmentation process that uses machine learning techniques to learn its parameters from data. These algorithms have been extensively tested on multiple datasets and have been shown to out-perform existing methods such as MetFrag and FingerId. This web server provides a simple interface for using these algorithms and a graphical display of the resulting annotations, spectra and structures. CFM-ID is made freely available at

T3DB: the Toxic Exposome Database

Nucleic Acids Research. Jan, 2015  |  Pubmed ID: 25378312

The exposome is defined as the totality of all human environmental exposures from conception to death. It is often regarded as the complement to the genome, with the interaction between the exposome and the genome ultimately determining one's phenotype. The 'toxic exposome' is the complete collection of chronically or acutely toxic compounds to which humans can be exposed. Considerable interest in defining the toxic exposome has been spurred on by the realization that most human injuries, deaths and diseases are directly or indirectly caused by toxic substances found in the air, water, food, home or workplace. The Toxin-Toxin-Target Database ( is a resource that was specifically designed to capture information about the toxic exposome. Originally released in 2010, the first version of T3DB contained data on nearly 2900 common toxic substances along with detailed information on their chemical properties, descriptions, targets, toxic effects, toxicity thresholds, sequences (for both targets and toxins), mechanisms and references. To more closely align itself with the needs of epidemiologists, toxicologists and exposome scientists, the latest release of T3DB has been substantially upgraded to include many more compounds (>3600), targets (>2000) and gene expression datasets (>15,000 genes). It now includes extensive data on 'normal' toxic compound concentrations in human biofluids as well as detailed chemical taxonomies, informative chemical ontologies and a large number of referential NMR, MS/MS and GC-MS spectra. This manuscript describes the most recent update to the T3DB, which was previously featured in the 2010 NAR Database Issue.

Pathways with PathWhiz

Nucleic Acids Research. Jul, 2015  |  Pubmed ID: 25934797

PathWhiz ( is a web server designed to create colourful, visually pleasing and biologically accurate pathway diagrams that are both machine-readable and interactive. As a web server, PathWhiz is accessible from almost any place and compatible with essentially any operating system. It also houses a public library of pathways and pathway components that can be easily viewed and expanded upon by its users. PathWhiz allows users to readily generate biologically complex pathways by using a specially designed drawing palette to quickly render metabolites (including automated structure generation), proteins (including quaternary structures, covalent modifications and cofactors), nucleic acids, membranes, subcellular structures, cells, tissues and organs. Both small-molecule and protein/gene pathways can be constructed by combining multiple pathway processes such as reactions, interactions, binding events and transport activities. PathWhiz's pathway replication and propagation functions allow for existing pathways to be used to create new pathways or for existing pathways to be automatically propagated across species. PathWhiz pathways can be saved in BioPAX, SBGN-ML and SBML data exchange formats, as well as PNG, PWML, HTML image map or SVG images that can be viewed offline or explored using PathWhiz's interactive viewer. PathWhiz has been used to generate over 700 pathway diagrams for a number of popular databases including HMDB, DrugBank and SMPDB.

ECMDB 2.0: A Richer Resource for Understanding the Biochemistry of E. Coli

Nucleic Acids Research. Jan, 2016  |  Pubmed ID: 26481353

ECMDB or the Escherichia coli Metabolome Database ( is a comprehensive database containing detailed information about the genome and metabolome of E. coli (K-12). First released in 2012, the ECMDB has undergone substantial expansion and many modifications over the past 4 years. This manuscript describes the most recent version of ECMDB (ECMDB 2.0). In particular, it provides a comprehensive update of the database that was previously described in the 2013 NAR Database Issue and details many of the additions and improvements made to the ECMDB over that time. Some of the most important or significant enhancements include a 13-fold increase in the number of metabolic pathway diagrams (from 125 to 1650), a 3-fold increase in the number of compounds linked to pathways (from 1058 to 3280), the addition of dozens of operon/metabolite signalling pathways, a 44% increase in the number of compounds in the database (from 2610 to 3760), a 7-fold increase in the number of compounds with NMR or MS spectra (from 412 to 3261) and a massive increase in the number of external links to other E. coli or chemical resources. These additions, along with many other enhancements aimed at improving the ease or speed of querying, searching and viewing the data within ECMDB should greatly facilitate the understanding of not only the metabolism of E. coli, but also allow the in-depth exploration of its extensive metabolic networks, its many signalling pathways and its essential biochemistry.

PHASTER: a Better, Faster Version of the PHAST Phage Search Tool

Nucleic Acids Research. Jul, 2016  |  Pubmed ID: 27141966

PHASTER (PHAge Search Tool - Enhanced Release) is a significant upgrade to the popular PHAST web server for the rapid identification and annotation of prophage sequences within bacterial genomes and plasmids. Although the steps in the phage identification pipeline in PHASTER remain largely the same as in the original PHAST, numerous software improvements and significant hardware enhancements have now made PHASTER faster, more efficient, more visually appealing and much more user friendly. In particular, PHASTER is now 4.3× faster than PHAST when analyzing a typical bacterial genome. More specifically, software optimizations have made the backend of PHASTER 2.7X faster than PHAST, while the addition of 80 CPUs to the PHASTER compute cluster are responsible for the remaining speed-up. PHASTER can now process a typical bacterial genome in 3 min from the raw sequence alone, or in 1.5 min when given a pre-annotated GenBank file. A number of other optimizations have also been implemented, including automated algorithms to reduce the size and redundancy of PHASTER's databases, improvements in handling multiple (metagenomic) queries and higher user traffic, along with the ability to perform automated look-ups against 14 000 previously PHAST/PHASTER annotated bacterial genomes (which can lead to complete phage annotations in seconds as opposed to minutes). PHASTER's web interface has also been entirely rewritten. A new graphical genome browser has been added, gene/genome visualization tools have been improved, and the graphical interface is now more modern, robust and user-friendly. PHASTER is available online at

Computational Prediction of Electron Ionization Mass Spectra to Assist in GC/MS Compound Identification

Analytical Chemistry. Aug, 2016  |  Pubmed ID: 27381172

We describe a tool, competitive fragmentation modeling for electron ionization (CFM-EI) that, given a chemical structure (e.g., in SMILES or InChI format), computationally predicts an electron ionization mass spectrum (EI-MS) (i.e., the type of mass spectrum commonly generated by gas chromatography mass spectrometry). The predicted spectra produced by this tool can be used for putative compound identification, complementing measured spectra in reference databases by expanding the range of compounds able to be considered when availability of measured spectra is limited. The tool extends CFM-ESI, a recently developed method for computational prediction of electrospray tandem mass spectra (ESI-MS/MS), but unlike CFM-ESI, CFM-EI can handle odd-electron ions and isotopes and incorporates an artificial neural network. Tests on EI-MS data from the NIST database demonstrate that CFM-EI is able to model fragmentation likelihoods in low-resolution EI-MS data, producing predicted spectra whose dot product scores are significantly better than full enumeration "bar-code" spectra. CFM-EI also outperformed previously reported results for MetFrag, MOLGEN-MS, and Mass Frontier on one compound identification task. It also outperformed MetFrag in a range of other compound identification tasks involving a much larger data set, containing both derivatized and nonderivatized compounds. While replicate EI-MS measurements of chemical standards are still a more accurate point of comparison, CFM-EI's predictions provide a much-needed alternative when no reference standard is available for measurement. CFM-EI is available at for download and as a web service.

YMDB 2.0: a Significantly Expanded Version of the Yeast Metabolome Database

Nucleic Acids Research. Jan, 2017  |  Pubmed ID: 27899612

YMDB or the Yeast Metabolome Database ( is a comprehensive database containing extensive information on the genome and metabolome of Saccharomyces cerevisiae Initially released in 2012, the YMDB has gone through a significant expansion and a number of improvements over the past 4 years. This manuscript describes the most recent version of YMDB (YMDB 2.0). More specifically, it provides an updated description of the database that was previously described in the 2012 NAR Database Issue and it details many of the additions and improvements made to the YMDB over that time. Some of the most important changes include a 7-fold increase in the number of compounds in the database (from 2007 to 16 042), a 430-fold increase in the number of metabolic and signaling pathway diagrams (from 66 to 28 734), a 16-fold increase in the number of compounds linked to pathways (from 742 to 12 733), a 17-fold increase in the numbers of compounds with nuclear magnetic resonance or MS spectra (from 783 to 13 173) and an increase in both the number of data fields and the number of links to external databases. In addition to these database expansions, a number of improvements to YMDB's web interface and its data visualization tools have been made. These additions and improvements should greatly improve the ease, the speed and the quantity of data that can be extracted, searched or viewed within YMDB. Overall, we believe these improvements should not only improve the understanding of the metabolism of S. cerevisiae, but also allow more in-depth exploration of its extensive metabolic networks, signaling pathways and biochemistry.

Exposome-Explorer: a Manually-curated Database on Biomarkers of Exposure to Dietary and Environmental Factors

Nucleic Acids Research. Jan, 2017  |  Pubmed ID: 27924041

Exposome-Explorer ( is the first database dedicated to biomarkers of exposure to environmental risk factors. It contains detailed information on the nature of biomarkers, their concentrations in various human biospecimens, the study population where measured and the analytical techniques used for measurement. It also contains correlations with external exposure measurements and data on biological reproducibility over time. The data in Exposome-Explorer was manually collected from peer-reviewed publications and organized to make it easily accessible through a web interface for in-depth analyses. The database and the web interface were developed using the Ruby on Rails framework. A total of 480 publications were analyzed and 10 510 concentration values in blood, urine and other biospecimens for 692 dietary and pollutant biomarkers were collected. Over 8000 correlation values between dietary biomarker levels and food intake as well as 536 values of biological reproducibility over time were also compiled. Exposome-Explorer makes it easy to compare the performance between biomarkers and their fields of application. It should be particularly useful for epidemiologists and clinicians wishing to select panels of biomarkers that can be used in biomonitoring studies or in exposome-wide association studies, thereby allowing them to better understand the etiology of chronic diseases.

simple hit counter