In JoVE (1)

Other Publications (83)

Articles by Frederick P. Roth in JoVE

Other articles by Frederick P. Roth on PubMed

The Genome-wide Localization of Rsc9, a Component of the RSC Chromatin-remodeling Complex, Changes in Response to Stress

Molecular Cell. Mar, 2002  |  Pubmed ID: 11931764

The cellular response to environmental changes includes widespread modifications in gene expression. Here we report the identification and characterization of Rsc9, a member of the RSC chromatin-remodeling complex in yeast. The genome-wide localization of Rsc9 indicated a relationship between genes targeted by Rsc9 and genes regulated by stress; treatment with hydrogen peroxide or rapamycin, which inhibits TOR signaling, resulted in genome-wide changes in Rsc9 occupancy. We further show that Rsc9 is involved in both repression and activation of mRNAs regulated by TOR as well as the synthesis of rRNA. Our results illustrate the response of a chromatin-remodeling factor to signaling cascades and suggest that changes in the activity of chromatin-remodeling factors are reflected in changes in their localization in the genome.

Judging the Quality of Gene Expression-based Clustering Methods Using Gene Annotation

Genome Research. Oct, 2002  |  Pubmed ID: 12368250

We compare several commonly used expression-based gene clustering algorithms using a figure of merit based on the mutual information between cluster membership and known gene attributes. By studying various publicly available expression data sets we conclude that enrichment of clusters for biological function is, in general, highest at rather low cluster numbers. As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. We show the self-organized-map approach to be best for both measurement types at higher numbers of clusters. Clusters of genes derived from single- and average-linkage hierarchical clustering tend to produce worse-than-random results.

Assessing Experimentally Derived Interactions in a Small World

Proceedings of the National Academy of Sciences of the United States of America. Apr, 2003  |  Pubmed ID: 12676999

Experimentally determined networks are susceptible to errors, yet important inferences can still be drawn from them. Many real networks have also been shown to have the small-world network properties of cohesive neighborhoods and short average distances between vertices. Although much analysis has been done on small-world networks, small-world properties have not previously been used to improve our understanding of individual edges in experimentally derived graphs. Here we focus on a small-world network derived from high-throughput (and error-prone) protein-protein interaction experiments. We exploit the neighborhood cohesiveness property of small-world networks to assess confidence for individual protein-protein interactions. By ascertaining how well each protein-protein interaction (edge) fits the pattern of a small-world network, we stratify even those edges with identical experimental evidence. This result promises to improve the quality of inference from protein-protein interaction networks in particular and small-world networks in general.

GoFish Finds Genes with Combinations of Gene Ontology Attributes

Bioinformatics (Oxford, England). Apr, 2003  |  Pubmed ID: 12691998

SUMMARY: GoFish is a Java application that allows users to search for gene products with particular gene ontology (GO) attributes, or combinations of attributes. GoFish ranks gene products by the degree to which they satisfy a Boolean query. Four organisms are currently supported: Saccaromyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and M.musculus.

Predicting Gene Function from Patterns of Annotation

Genome Research. May, 2003  |  Pubmed ID: 12695322

The Gene Ontology (GO) Consortium has produced a controlled vocabulary for annotation of gene function that is used in many organism-specific gene annotation databases. This allows the prediction of gene function based on patterns of annotation. For example, if annotations for two attributes tend to occur together in a database, then a gene holding one attribute is likely to hold the other as well. We modeled the relationships among GO attributes with decision trees and Bayesian networks, using the annotations in the Saccharomyces Genome Database (SGD) and in FlyBase as training data. We tested the models using cross-validation, and we manually assessed 100 gene-attribute associations that were predicted by the models but that were not present in the SGD or FlyBase databases. Of the 100 manually assessed associations, 41 were judged to be true, and another 42 were judged to be plausible.

Regulating General Mutation Rates: Examination of the Hypermutable State Model for Cairnsian Adaptive Mutation

Genetics. Apr, 2003  |  Pubmed ID: 12702691

In the lac adaptive mutation system of Cairns, selected mutant colonies but not unselected mutant types appear to arise from a nongrowing population of Escherichia coli. The general mutagenesis suffered by the selected mutants has been interpreted as support for the idea that E. coli possesses an evolved (and therefore beneficial) mechanism that increases the mutation rate in response to stress (the hypermutable state model, HSM). This mechanism is proposed to allow faster genetic adaptation to stressful conditions and to explain why mutations appear directed to useful sites. Analysis of the HSM reveals that it requires implausibly intense mutagenesis (10(5) times the unselected rate) and even then cannot account for the behavior of the Cairns system. The assumptions of the HSM predict that selected revertants will carry an average of eight deleterious null mutations and thus seem unlikely to be successful in long-term evolution. The experimentally observed 35-fold increase in the level of general mutagenesis cannot account for even one Lac(+) revertant from a mutagenized subpopulation of 10(5) cells (the number proposed to enter the hypermutable state). We conclude that temporary general mutagenesis during stress is unlikely to provide a long-term selective advantage in this or any similar genetic system.

Predicting Phenotype from Patterns of Annotation

Bioinformatics (Oxford, England). 2003  |  Pubmed ID: 12855456

MOTIVATION:Predicting the outcome of specific experiments (such as the growth of a particular mutant strain in a particular medium) has the potential to allow researchers to devote resources to experiments with higher expected numbers of 'hits'. RESULTS: We use decision trees to predict phenotypes associated with Saccharomyces cerevisiae genes on the basis of Gene Ontology (GO) functional annotations from the Saccharomyces Genome Database (SGD) and other phenotypic annotations from the Yeast Phenotype Catalog at the Munich Information Center for Protein Sequences (MIPS). We assess the methodology in three ways: (1) we use cross-validation on the phenotypic annotations listed in MIPS, and show ROC curves indicating the tradeoff between true-positive rate and false-positive rate; (2) we do a literature-search for 100 of the predicted gene-phenotype associations that are not listed in MIPS, and find evidence for 43 of them; (3) we use deletion strains to experimentally assess 61 predicted gene-phenotype associations not listed in MIPS; significantly more of these deletion strains show abnormal growth than would be expected by chance.

Latent Herpes Simplex Virus Infection of Sensory Neurons Alters Neuronal Gene Expression

Journal of Virology. Sep, 2003  |  Pubmed ID: 12915567

The persistence of herpes simplex virus (HSV) and the diseases that it causes in the human population can be attributed to the maintenance of a latent infection within neurons in sensory ganglia. Little is known about the effects of latent infection on the host neuron. We have addressed the question of whether latent HSV infection affects neuronal gene expression by using microarray transcript profiling of host gene expression in ganglia from latently infected versus mock-infected mouse trigeminal ganglia. (33)P-labeled cDNA probes from pooled ganglia harvested at 30 days postinfection or post-mock infection were hybridized to nylon arrays printed with 2,556 mouse genes. Signal intensities were acquired by phosphorimager. Mean intensities (n = 4 replicates in each of three independent experiments) of signals from mock-infected versus latently infected ganglia were compared by using a variant of Student's t test. We identified significant changes in the expression of mouse neuronal genes, including several with roles in gene expression, such as the Clk2 gene, and neurotransmission, such as genes encoding potassium voltage-gated channels and a muscarinic acetylcholine receptor. We confirmed the neuronal localization of some of these transcripts by using in situ hybridization. To validate the microarray results, we performed real-time reverse transcriptase PCR analyses for a selection of the genes. These studies demonstrate that latent HSV infection can alter neuronal gene expression and might provide a new mechanism for how persistent viral infection can cause chronic disease.

A Non-parametric Model for Transcription Factor Binding Sites

Nucleic Acids Research. Oct, 2003  |  Pubmed ID: 14500844

We introduce a non-parametric representation of transcription factor binding sites which can model arbitrary dependencies between positions. As two parameters are varied, this representation smoothly interpolates between the empirical distribution of binding sites and the standard position-specific scoring matrix (PSSM). In a test of generalization to unseen binding sites using 10-fold cross-validation on known binding sites for 95 TRANSFAC transcription factors, this representation outperforms PSSMs on between 65 and 89 of the 95 transcription factors, depending on the choice of the two adjustable parameters. We also discuss how the non- parametric representation may be incorporated into frameworks for finding binding sites given only a collection of unaligned promoter regions.

Characterizing Gene Sets with FuncAssociate

Bioinformatics (Oxford, England). Dec, 2003  |  Pubmed ID: 14668247

SUMMARY: FuncAssociate is a web-based tool to help researchers use Gene Ontology attributes to characterize large sets of genes derived from experiment. Distinguishing features of FuncAssociate include the ability to handle ranked input lists, and a Monte Carlo simulation approach that is more appropriate to determine significance than other methods, such as Bonferroni or idák p-value correction. FuncAssociate currently supports 10 organisms (Vibrio cholerae, Shewanella oneidensis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Arabidopsis thaliana, Caenorhaebditis elegans, Drosophila melanogaster, Mus musculus, Rattus norvegicus and Homo sapiens). AVAILABILITY: FuncAssociate is freely accessible at Source code (in Perl and C) is freely available to academic users 'as is'.

A Map of the Interactome Network of the Metazoan C. Elegans

Science (New York, N.Y.). Jan, 2004  |  Pubmed ID: 14704431

To initiate studies on how protein-protein interaction (or "interactome") networks relate to multicellular functions, we have mapped a large fraction of the Caenorhabditis elegans interactome network. Starting with a subset of metazoan-specific proteins, more than 4000 interactions were identified from high-throughput, yeast two-hybrid (HT=Y2H) screens. Independent coaffinity purification assays experimentally validated the overall quality of this Y2H data set. Together with already described Y2H interactions and interologs predicted in silico, the current version of the Worm Interactome (WI5) map contains approximately 5500 interactions. Topological and biological features of this interactome network, as well as its integration with phenome and transcriptome data sets, lead to numerous biological hypotheses.

Intensity-based Protein Identification by Machine Learning from a Library of Tandem Mass Spectra

Nature Biotechnology. Feb, 2004  |  Pubmed ID: 14730315

Tandem mass spectrometry (MS/MS) has emerged as a cornerstone of proteomics owing in part to robust spectral interpretation algorithms. Widely used algorithms do not fully exploit the intensity patterns present in mass spectra. Here, we demonstrate that intensity pattern modeling improves peptide and protein identification from MS/MS spectra. We modeled fragment ion intensities using a machine-learning approach that estimates the likelihood of observed intensities given peptide and fragment attributes. From 1,000,000 spectra, we chose 27,000 with high-quality, nonredundant matches as training data. Using the same 27,000 spectra, intensity was similarly modeled with mismatched peptides. We used these two probabilistic models to compute the relative likelihood of an observed spectrum given that a candidate peptide is matched or mismatched. We used a 'decoy' proteome approach to estimate incorrect match frequency, and demonstrated that an intensity-based method reduces peptide identification error by 50-96% without any loss in sensitivity.

Global Mapping of the Yeast Genetic Interaction Network

Science (New York, N.Y.). Feb, 2004  |  Pubmed ID: 14764870

A genetic interaction network containing approximately 1000 genes and approximately 4000 interactions was mapped by crossing mutations in 132 different query genes into a set of approximately 4700 viable gene yeast deletion mutants and scoring the double mutant progeny for fitness defects. Network connectivity was predictive of function because interactions often occurred among functionally related genes, and similar patterns of interactions tended to identify components of the same pathway. The genetic network exhibited dense local neighborhoods; therefore, the position of a gene on a partially mapped network is predictive of other genetic interactions. Because digenic interactions are common in yeast, similar networks may underlie the complex genetics associated with inherited phenotypes in other organisms.

Predicting Co-complexed Protein Pairs Using Genomic and Proteomic Data Integration

BMC Bioinformatics. Apr, 2004  |  Pubmed ID: 15090078

Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (APMS) have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship.

Predicting Protein Complex Membership Using Probabilistic Network Reliability

Genome Research. Jun, 2004  |  Pubmed ID: 15140827

Evidence for specific protein-protein interactions is increasingly available from both small- and large-scale studies, and can be viewed as a network. It has previously been noted that errors are frequent among large-scale studies, and that error frequency depends on the large-scale method used. Despite knowledge of the error-prone nature of interaction evidence, edges (connections) in this network are typically viewed as either present or absent. However, use of a probabilistic network that considers quantity and quality of supporting evidence should improve inference derived from protein networks. Here we demonstrate inference of membership in a partially known protein complex by using a probabilistic network model and an algorithm previously used to evaluate reliability in communication networks.

SILVER Helps Assign Peptides to Tandem Mass Spectra Using Intensity-based Scoring

Journal of the American Society for Mass Spectrometry. Jun, 2004  |  Pubmed ID: 15144981

Tandem mass spectrometry is commonly used to identify peptides (and thereby proteins) that are present in complex mixtures. Peptide identification from tandem mass spectra is partially automated, but still requires human curation to resolve "borderline" peptide-spectrum matches (PSMs). SILVER is web-based software that assists manual curation of tandem mass spectra, using a recently developed intensity-based machine-learning approach to scoring PSMs, Elias et al. In this method, a large training set of peptide, fragment, and peak-intensity properties for both matched and mismatched PSMs was used to develop a score measuring consistency between each predicted fragment ion of a candidate peptide and its corresponding observed spectral peak intensity. The SILVER interface provides a visual representation of match quality between each candidate fragment ion and the observed spectrum, thereby expediting manual curation of tandem mass spectra. SILVER is available online at

Prediction of Similarly Acting Cis-regulatory Modules by Subsequence Profiling and Comparative Genomics in Drosophila Melanogaster and D.pseudoobscura

Bioinformatics (Oxford, England). Nov, 2004  |  Pubmed ID: 15145800

To date, computational searches for cis-regulatory modules (CRMs) have relied on two methods. The first, phylogenetic footprinting, has been used to find CRMs in non-coding sequence, but does not directly link DNA sequence with spatio-temporal patterns of expression. The second, based on searches for combinations of transcription factor (TF) binding motifs, has been employed in genome-wide discovery of similarly acting enhancers, but requires prior knowledge of the set of TFs acting at the CRM and the TFs' binding motifs.

Evidence for Dynamically Organized Modularity in the Yeast Protein-protein Interaction Network

Nature. Jul, 2004  |  Pubmed ID: 15190252

In apparently scale-free protein-protein interaction networks, or 'interactome' networks, most proteins interact with few partners, whereas a small but significant proportion of proteins, the 'hubs', interact with many partners. Both biological and non-biological scale-free networks are particularly resistant to random node removal but are extremely sensitive to the targeted removal of hubs. A link between the potential scale-free topology of interactome networks and genetic robustness seems to exist, because knockouts of yeast genes encoding hubs are approximately threefold more likely to confer lethality than those of non-hubs. Here we investigate how hubs might contribute to robustness and other cellular properties for protein-protein interactions dynamically regulated both in time and in space. We uncovered two types of hub: 'party' hubs, which interact with most of their partners simultaneously, and 'date' hubs, which bind their different partners at different times or locations. Both in silico studies of network connectivity and genetic interactions described in vivo support a model of organized modularity in which date hubs organize the proteome, connecting biological processes--or modules--to each other, whereas party hubs function inside modules.

Combining Biological Networks to Predict Genetic Interactions

Proceedings of the National Academy of Sciences of the United States of America. Nov, 2004  |  Pubmed ID: 15496468

Genetic interactions define overlapping functions and compensatory pathways. In particular, synthetic sick or lethal (SSL) genetic interactions are important for understanding how an organism tolerates random mutation, i.e., genetic robustness. Comprehensive identification of SSL relationships remains far from complete in any organism, because mapping these networks is highly labor intensive. The ability to predict SSL interactions, however, could efficiently guide further SSL discovery. Toward this end, we predicted pairs of SSL genes in Saccharomyces cerevisiae by using probabilistic decision trees to integrate multiple types of data, including localization, mRNA expression, physical interaction, protein function, and characteristics of network topology. Experimental evidence demonstrated the reliability of this strategy, which, when extended to human SSL interactions, may prove valuable in discovering drug targets for cancer therapy and in identifying genes responsible for multigenic diseases.

Motifs, Themes and Thematic Maps of an Integrated Saccharomyces Cerevisiae Interaction Network

Journal of Biology. 2005  |  Pubmed ID: 15982408

Large-scale studies have revealed networks of various biological interaction types, such as protein-protein interaction, genetic interaction, transcriptional regulation, sequence homology, and expression correlation. Recurring patterns of interconnection, or 'network motifs', have revealed biological insights for networks containing either one or two types of interaction.

Discovering Functional Relationships: Biochemistry Versus Genetics

Trends in Genetics : TIG. Aug, 2005  |  Pubmed ID: 15982781

Biochemists and geneticists, represented by Doug and Bill in classic essays, have long debated the merits of their methods. We revisited this issue using genomic data from the budding yeast, Saccharomyces cerevisiae, and found that genetic interactions outperformed protein interactions in predicting functional relationships between genes. However, when combined, these interaction types yielded superior performance, convincing Doug and Bill to call a truce.

Transcriptional Compensation for Gene Loss Plays a Minor Role in Maintaining Genetic Robustness in Saccharomyces Cerevisiae

Genetics. Oct, 2005  |  Pubmed ID: 15998714

If a gene is mutated and its function lost, are compensatory genes upregulated? We investigated whether genes are transcriptionally upregulated when their synthetic sick or lethal (SSL) partners are lost. We identified several new examples; however, remarkably few SSL pairs exhibited this phenomenon, suggesting that transcriptional compensation by SSL partners is a rare mechanism for maintaining genetic robustness.

Genomewide Identification of Sko1 Target Promoters Reveals a Regulatory Network That Operates in Response to Osmotic Stress in Saccharomyces Cerevisiae

Eukaryotic Cell. Aug, 2005  |  Pubmed ID: 16087739

In Saccharomyces cerevisiae, the ATF/CREB transcription factor Sko1 (Acr1) regulates the expression of genes induced by osmotic stress under the control of the high osmolarity glycerol (HOG) mitogen-activated protein kinase pathway. By combining chromatin immunoprecipitation and microarrays containing essentially all intergenic regions, we estimate that yeast cells contain approximately 40 Sko1 target promoters in vivo; 20 Sko1 target promoters were validated by direct analysis of individual loci. The ATF/CREB consensus sequence is not statistically overrepresented in confirmed Sko1 target promoters, although some sites are evolutionarily conserved among related yeast species, suggesting that they are functionally important in vivo. These observations suggest that Sko1 association in vivo is affected by factors beyond the protein-DNA interaction defined in vitro. Sko1 binds a number of promoters for genes directly involved in defense functions that relieve osmotic stress. In addition, Sko1 binds to the promoters of genes encoding transcription factors, including Msn2, Mot3, Rox1, Mga1, and Gat2. Stress-induced expression of MSN2, MOT3, and MGA1 is diminished in sko1 mutant cells, while transcriptional regulation of ROX1 seems to be unaffected. Lastly, Sko1 targets PTP3, which encodes a phosphatase that negatively regulates Hog1 kinase activity, and Sko1 is required for osmotic induction of PTP3 expression. Taken together our results suggest that Sko1 operates a transcriptional network upon osmotic stress, which involves other specific transcription factors and a phosphatase that regulates the key component of the signal transduction pathway.

Predictive Models of Molecular Machines Involved in Caenorhabditis Elegans Early Embryogenesis

Nature. Aug, 2005  |  Pubmed ID: 16094371

Although numerous fundamental aspects of development have been uncovered through the study of individual genes and proteins, system-level models are still missing for most developmental processes. The first two cell divisions of Caenorhabditis elegans embryogenesis constitute an ideal test bed for a system-level approach. Early embryogenesis, including processes such as cell division and establishment of cellular polarity, is readily amenable to large-scale functional analysis. A first step toward a system-level understanding is to provide 'first-draft' models both of the molecular assemblies involved and of the functional connections between them. Here we show that such models can be derived from an integrated gene/protein network generated from three different types of functional relationship: protein interaction, expression profiling similarity and phenotypic profiling similarity, as estimated from detailed early embryonic RNA interference phenotypes systematically recorded for hundreds of early embryogenesis genes. The topology of the integrated network suggests that C. elegans early embryogenesis is achieved through coordination of a limited set of molecular machines. We assessed the overall predictive value of such molecular machine models by dynamic localization of ten previously uncharacterized proteins within the living embryo.

Towards a Proteome-scale Map of the Human Protein-protein Interaction Network

Nature. Oct, 2005  |  Pubmed ID: 16189514

Systematic mapping of protein-protein interactions, or 'interactome' mapping, was initiated in model organisms, starting with defined biological processes and then expanding to the scale of the proteome. Although far from complete, such maps have revealed global topological and dynamic features of interactome networks that relate to known biological properties, suggesting that a human interactome map will provide insight into development and disease mechanisms at a systems level. Here we describe an initial version of a proteome-scale map of human binary protein-protein interactions. Using a stringent, high-throughput yeast two-hybrid system, we tested pairwise interactions among the products of approximately 8,100 currently available Gateway-cloned open reading frames and detected approximately 2,800 interactions. This data set, called CCSB-HI1, has a verification rate of approximately 78% as revealed by an independent co-affinity purification assay, and correlates significantly with other biological attributes. The CCSB-HI1 data set increases by approximately 70% the set of available binary interactions within the tested space and reveals more than 300 new connections to over 100 disease-associated proteins. This work represents an important step towards a systematic and comprehensive human interactome project.

Chipper: Discovering Transcription-factor Targets from Chromatin Immunoprecipitation Microarrays Using Variance Stabilization

Genome Biology. 2005  |  Pubmed ID: 16277751

Chromatin immunoprecipitation combined with microarray technology (Chip2) allows genome-wide determination of protein-DNA binding sites. The current standard method for analyzing Chip2 data requires additional control experiments that are subject to systematic error. We developed methods to assess significance using variance stabilization, learning error-model parameters without external control experiments. The method was validated experimentally, shows greater sensitivity than the current standard method, and incorporates false-discovery rate analysis. The corresponding software ('Chipper') is freely available. The method described here should help reveal an organism's transcription-regulatory 'wiring diagram'.

Metabolomic Identification of Novel Biomarkers of Myocardial Ischemia

Circulation. Dec, 2005  |  Pubmed ID: 16344383

Recognition of myocardial ischemia is critical both for the diagnosis of coronary artery disease and the selection and evaluation of therapy. Recent advances in proteomic and metabolic profiling technologies may offer the possibility of identifying novel biomarkers and pathways activated in myocardial ischemia.

Query Chem: a Google-powered Web Search Combining Text and Chemical Structures

Bioinformatics (Oxford, England). Jul, 2006  |  Pubmed ID: 16672261

Query Chem ( is a Web program that integrates chemical structure and text-based searching using publicly available chemical databases and Google's Web Application Program Interface (API). Query Chem makes it possible to search the Web for information about chemical structures without knowing their common names or identifiers. Furthermore, a structure can be combined with textual query terms to further restrict searches. Query Chem's search results can retrieve many interesting structure-property relationships of biomolecules on the Web.

Using High-throughput Screening Data to Discriminate Compounds with Single-target Effects from Those with Side Effects

Journal of Chemical Information and Modeling. Jul-Aug, 2006  |  Pubmed ID: 16859287

The most desirable compound leads from high-throughput assays are those with novel biological activities resulting from their action on a single biological target. Valuable resources can be wasted on compound leads with significant 'side effects' on additional biological targets; therefore, technical refinements to identify compounds that primarily have effects resulting from a single target are needed. This study explores the use of multiple assays of a chemical library and a statistic based on entropy to identify lead compound classes that have patterns of assay activity resulting primarily from small molecule action on a single target. This statistic, called the coincidence score, discriminates with 88% accuracy compound classes known to act primarily on a single target from compound classes with significant side effects on nonhomologous targets. Furthermore, a significant number of the compound classes predicted to have primarily single-target effects contain known bioactive compounds. We also show that a compound's known biological target or mechanism of action can often be suggested by its pattern of activities in multiple assays.

Systematic Genetics Swims Forward Elegantly

Molecular Systems Biology. 2006  |  Pubmed ID: 16969340

Mammalian Ultraconserved Elements Are Strongly Depleted Among Segmental Duplications and Copy Number Variants

Nature Genetics. Oct, 2006  |  Pubmed ID: 16998490

An earlier search in the human, mouse and rat genomes for sequences that are 100% conserved in orthologous segments and > or = 200 bp in length identified 481 distinct sequences. These human-mouse-rat sequences, which represent ultraconserved elements (UCEs), are believed to be important for functions involving DNA binding, RNA processing and the regulation of transcription and development. In vivo and additional computational studies of UCEs and other highly conserved sequences are consistent with these functional associations, with some observations indicating enhancer-like activity for these elements. Here, we show that UCEs are significantly depleted among segmental duplications and copy number variants. Notably, of the UCEs that are found in segmental duplications or copy number variants, the majority overlap exons, indicating, along with other findings presented, that UCEs overlapping exons represent a distinct subset.

Systematic Pathway Analysis Using High-resolution Fitness Profiling of Combinatorial Gene Deletions

Nature Genetics. Feb, 2007  |  Pubmed ID: 17206143

Systematic genetic interaction studies have illuminated many cellular processes. Here we quantitatively examine genetic interactions among 26 Saccharomyces cerevisiae genes conferring resistance to the DNA-damaging agent methyl methanesulfonate (MMS), as determined by chemogenomic fitness profiling of pooled deletion strains. We constructed 650 double-deletion strains, corresponding to all pairings of these 26 deletions. The fitness of single- and double-deletion strains were measured in the presence and absence of MMS. Genetic interactions were defined by combining principles from both statistical and classical genetics. The resulting network predicts that the Mph1 helicase has a role in resolving homologous recombination-derived DNA intermediates that is similar to (but distinct from) that of the Sgs1 helicase. Our results emphasize the utility of small molecules and multifactorial deletion mutants in uncovering functional relationships and pathway order.

Genetic Interaction Screens Advance in Reverse

Genes & Development. Jan, 2007  |  Pubmed ID: 17234880

Genome-scale Analysis of in Vivo Spatiotemporal Promoter Activity in Caenorhabditis Elegans

Nature Biotechnology. Jun, 2007  |  Pubmed ID: 17486083

Differential regulation of gene expression is essential for cell fate specification in metazoans. Characterizing the transcriptional activity of gene promoters, in time and in space, is therefore a critical step toward understanding complex biological systems. Here we present an in vivo spatiotemporal analysis for approximately 900 predicted C. elegans promoters (approximately 5% of the predicted protein-coding genes), each driving the expression of green fluorescent protein (GFP). Using a flow-cytometer adapted for nematode profiling, we generated 'chronograms', two-dimensional representations of fluorescence intensity along the body axis and throughout development from early larvae to adults. Automated comparison and clustering of the obtained in vivo expression patterns show that genes coexpressed in space and time tend to belong to common functional categories. Moreover, integration of this data set with C. elegans protein-protein interactome data sets enables prediction of anatomical and temporal interaction territories between protein partners.

Confirmation of Organized Modularity in the Yeast Interactome

PLoS Biology. Jun, 2007  |  Pubmed ID: 17564493

Functional Specificity Among Ribosomal Proteins Regulates Gene Expression

Cell. Nov, 2007  |  Pubmed ID: 17981122

Duplicated genes escape gene loss by conferring a dosage benefit or evolving diverged functions. The yeast Saccharomyces cerevisiae contains many duplicated genes encoding ribosomal proteins. Prior studies have suggested that these duplicated proteins are functionally redundant and affect cellular processes in proportion to their expression. In contrast, through studies of ASH1 mRNA in yeast, we demonstrate paralog-specific requirements for the translation of localized mRNAs. Intriguingly, these paralog-specific effects are limited to a distinct subset of duplicated ribosomal proteins. Moreover, transcriptional and phenotypic profiling of cells lacking specific ribosomal proteins reveals differences between the functional roles of ribosomal protein paralogs that extend beyond effects on mRNA localization. Finally, we show that ribosomal protein paralogs exhibit differential requirements for assembly and localization. Together, our data indicate complex specialization of ribosomal proteins for specific cellular processes and support the existence of a ribosomal code.

Defining Genetic Interaction

Proceedings of the National Academy of Sciences of the United States of America. Mar, 2008  |  Pubmed ID: 18305163

Sometimes mutations in two genes produce a phenotype that is surprising in light of each mutation's individual effects. This phenomenon, which defines genetic interaction, can reveal functional relationships between genes and pathways. For example, double mutants with surprisingly slow growth define synergistic interactions that can identify compensatory pathways or protein complexes. Recent studies have used four mathematically distinct definitions of genetic interaction (here termed Product, Additive, Log, and Min). Whether this choice holds practical consequences has not been clear, because the definitions yield identical results under some conditions. Here, we show that the choice among alternative definitions can have profound consequences. Although 52% of known synergistic genetic interactions in Saccharomyces cerevisiae were inferred according to the Min definition, we find that both Product and Log definitions (shown here to be practically equivalent) are better than Min for identifying functional relationships. Additionally, we show that the Additive and Log definitions, each commonly used in population genetics, lead to differing conclusions related to the selective advantages of sexual reproduction.

Challenges in Translating Plasma Proteomics from Bench to Bedside: Update from the NHLBI Clinical Proteomics Programs

American Journal of Physiology. Lung Cellular and Molecular Physiology. Jul, 2008  |  Pubmed ID: 18456800

The emerging scientific field of proteomics encompasses the identification, characterization, and quantification of the protein content or proteome of whole cells, tissues, or body fluids. The potential for proteomic technologies to identify and quantify novel proteins in the plasma that can function as biomarkers of the presence or severity of clinical disease states holds great promise for clinical use. However, there are many challenges in translating plasma proteomics from bench to bedside, and relatively few plasma biomarkers have successfully transitioned from proteomic discovery to routine clinical use. Key barriers to this translation include the need for "orthogonal" biomarkers (i.e., uncorrelated with existing markers), the complexity of the proteome in biological samples, the presence of high abundance proteins such as albumin in biological samples that hinder detection of low abundance proteins, false positive associations that occur with analysis of high dimensional datasets, and the limited understanding of the effects of growth, development, and age on the normal plasma proteome. Strategies to overcome these challenges are discussed.

Isoform Discovery by Targeted Cloning, 'deep-well' Pooling and Parallel Sequencing

Nature Methods. Jul, 2008  |  Pubmed ID: 18552854

Describing the 'ORFeome' of an organism, including all major isoforms, is essential for a system-level understanding of any species; however, conventional cloning and sequencing approaches are prohibitively costly and labor-intensive. We describe a potentially genome-wide methodology for efficiently capturing new coding isoforms using reverse transcriptase (RT)-PCR recombinational cloning, 'deep-well' pooling and a next-generation sequencing platform. This ORFeome discovery pipeline will be applicable to any eukaryotic species with a sequenced genome.

A Race Through the Maze of Genomic Evidence

Genome Biology. 2008  |  Pubmed ID: 18613945

A Critical Assessment of Mus Musculus Gene Function Prediction Using Integrated Genomic Evidence

Genome Biology. 2008  |  Pubmed ID: 18613946

Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated.

Combining Guilt-by-association and Guilt-by-profiling to Predict Saccharomyces Cerevisiae Gene Function

Genome Biology. 2008  |  Pubmed ID: 18613951

Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships.

An En Masse Phenotype and Function Prediction System for Mus Musculus

Genome Biology. 2008  |  Pubmed ID: 18613952

Individual researchers are struggling to keep up with the accelerating emergence of high-throughput biological data, and to extract information that relates to their specific questions. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation.

A Protein Domain-based Interactome Network for C. Elegans Early Embryogenesis

Cell. Aug, 2008  |  Pubmed ID: 18692475

Many protein-protein interactions are mediated through independently folding modular domains. Proteome-wide efforts to model protein-protein interaction or "interactome" networks have largely ignored this modular organization of proteins. We developed an experimental strategy to efficiently identify interaction domains and generated a domain-based interactome network for proteins involved in C. elegans early-embryonic cell divisions. Minimal interacting regions were identified for over 200 proteins, providing important information on their domain organization. Furthermore, our approach increased the sensitivity of the two-hybrid system, resulting in a more complete interactome network. This interactome modeling strategy revealed insights into C. elegans centrosome function and is applicable to other biological processes in this and other organisms.

The Synergizer Service for Translating Gene, Protein and Other Biological Identifiers

Bioinformatics (Oxford, England). Oct, 2008  |  Pubmed ID: 18697767

The Synergizer is a database and web service that provides translations of biological database identifiers. It is accessible both programmatically and interactively. AVAILABILITY: The Synergizer is freely available to all users inter-actively via a web application ( and programmatically via a web service. Clients implementing the Synergizer application programming interface (API) are also freely available. Please visit for details.

High-quality Binary Protein Interaction Map of the Yeast Interactome Network

Science (New York, N.Y.). Oct, 2008  |  Pubmed ID: 18719252

Current yeast interactome network maps contain several hundred molecular complexes with limited and somewhat controversial representation of direct binary interactions. We carried out a comparative quality assessment of current yeast interactome data sets, demonstrating that high-throughput yeast two-hybrid (Y2H) screening provides high-quality binary interaction information. Because a large fraction of the yeast binary interactome remains to be mapped, we developed an empirically controlled mapping framework to produce a "second-generation" high-quality, high-throughput Y2H data set covering approximately 20% of all yeast binary interactions. Both Y2H and affinity purification followed by mass spectrometry (AP/MS) data are of equally high quality but of a fundamentally different and complementary nature, resulting in networks with different topological and biological properties. Compared to co-complex interactome models, this binary map is enriched for transient signaling interactions and intercomplex connections with a highly significant clustering between essential proteins. Rather than correlating with essentiality, protein connectivity correlates with genetic pleiotropy.

Metabolite Profiling of Blood from Individuals Undergoing Planned Myocardial Infarction Reveals Early Markers of Myocardial Injury

The Journal of Clinical Investigation. Oct, 2008  |  Pubmed ID: 18769631

Emerging metabolomic tools have created the opportunity to establish metabolic signatures of myocardial injury. We applied a mass spectrometry-based metabolite profiling platform to 36 patients undergoing alcohol septal ablation treatment for hypertrophic obstructive cardiomyopathy, a human model of planned myocardial infarction (PMI). Serial blood samples were obtained before and at various intervals after PMI, with patients undergoing elective diagnostic coronary angiography and patients with spontaneous myocardial infarction (SMI) serving as negative and positive controls, respectively. We identified changes in circulating levels of metabolites participating in pyrimidine metabolism, the tricarboxylic acid cycle and its upstream contributors, and the pentose phosphate pathway. Alterations in levels of multiple metabolites were detected as early as 10 minutes after PMI in an initial derivation group and were validated in a second, independent group of PMI patients. A PMI-derived metabolic signature consisting of aconitic acid, hypoxanthine, trimethylamine N-oxide, and threonine differentiated patients with SMI from those undergoing diagnostic coronary angiography with high accuracy, and coronary sinus sampling distinguished cardiac-derived from peripheral metabolic changes. Our results identify a role for metabolic profiling in the early detection of myocardial injury and suggest that similar approaches may be used for detection or prediction of other disease states.

Chemical Substructures That Enrich for Biological Activity

Bioinformatics (Oxford, England). Nov, 2008  |  Pubmed ID: 18784118

MOTIVATION: Certain chemical substructures are present in many drugs. This has led to the claim of 'privileged' substructures which are predisposed to bioactivity. Because bias in screening library construction could explain this phenomenon, the existence of privilege has been controversial. RESULTS: Using diverse phenotypic assays, we defined bioactivity for multiple compound libraries. Many substructures were associated with bioactivity even after accounting for substructure prevalence in the library, thus validating the privileged substructure concept. Determinations of privilege were confirmed in independent assays and libraries. Our analysis also revealed 'underprivileged' substructures and 'conditional privilege'-rules relating combinations of substructure to bioactivity. Most previously reported substructures have been flat aromatic ring systems. Although we validated such substructures, we also identified three-dimensional privileged substructures. Most privileged substructures display a wide variety of substituents suggesting an entropic mechanism of privilege. Compounds containing privileged substructures had a doubled rate of bioactivity, suggesting practical consequences for pharmaceutical discovery.

A C. Elegans Genome-scale MicroRNA Network Contains Composite Feedback Motifs with High Flux Capacity

Genes & Development. Sep, 2008  |  Pubmed ID: 18794350

MicroRNAs (miRNAs) and transcription factors (TFs) are primary metazoan gene regulators. Whereas much attention has focused on finding the targets of both miRNAs and TFs, the transcriptional networks that regulate miRNA expression remain largely unexplored. Here, we present the first genome-scale Caenorhabditis elegans miRNA regulatory network that contains experimentally mapped transcriptional TF --> miRNA interactions, as well as computationally predicted post-transcriptional miRNA --> TF interactions. We find that this integrated miRNA network contains 23 miRNA <--> TF composite feedback loops in which a TF that controls a miRNA is itself regulated by that same miRNA. By rigorous network randomizations, we show that such loops occur more frequently than expected by chance and, hence, constitute a genuine network motif. Interestingly, miRNAs and TFs in such loops are heavily regulated and regulate many targets. This "high flux capacity" suggests that loops provide a mechanism of high information flow for the coordinate and adaptable control of miRNA and TF target regulons.

An Experimentally Derived Confidence Score for Binary Protein-protein Interactions

Nature Methods. Jan, 2009  |  Pubmed ID: 19060903

Information on protein-protein interactions is of central importance for many areas of biomedical research. At present no method exists to systematically and experimentally assess the quality of individual interactions reported in interaction mapping experiments. To provide a standardized confidence-scoring method that can be applied to tens of thousands of protein interactions, we have developed an interaction tool kit consisting of four complementary, high-throughput protein interaction assays. We benchmarked these assays against positive and random reference sets consisting of well documented pairs of interacting human proteins and randomly chosen protein pairs, respectively. A logistic regression model was trained using the data from these reference sets to combine the assay outputs and calculate the probability that any newly identified interaction pair is a true biophysical interaction once it has been tested in the tool kit. This general approach will allow a systematic and empirical assignment of confidence scores to all individual protein-protein interactions in interactome networks.

An Empirical Framework for Binary Interactome Mapping

Nature Methods. Jan, 2009  |  Pubmed ID: 19060904

Several attempts have been made to systematically map protein-protein interaction, or 'interactome', networks. However, it remains difficult to assess the quality and coverage of existing data sets. Here we describe a framework that uses an empirically-based approach to rigorously dissect quality parameters of currently available human interactome maps. Our results indicate that high-throughput yeast two-hybrid (HT-Y2H) interactions for human proteins are more precise than literature-curated interactions supported by a single publication, suggesting that HT-Y2H is suitable to map a significant portion of the human interactome. We estimate that the human interactome contains approximately 130,000 binary interactions, most of which remain to be mapped. Similar to estimates of DNA sequence data quality and genome size early in the Human Genome Project, estimates of protein interaction data quality and interactome size are crucial to establish the magnitude of the task of comprehensive human interactome mapping and to elucidate a path toward this goal.

Literature-curated Protein Interaction Datasets

Nature Methods. Jan, 2009  |  Pubmed ID: 19116613

High-quality datasets are needed to understand how global and local properties of protein-protein interaction, or 'interactome', networks relate to biological mechanisms, and to guide research on individual proteins. In an evaluation of existing curation of protein interaction experiments reported in the literature, we found that curation can be error-prone and possibly of lower quality than commonly assumed.

Empirically Controlled Mapping of the Caenorhabditis Elegans Protein-protein Interactome Network

Nature Methods. Jan, 2009  |  Pubmed ID: 19123269

To provide accurate biological hypotheses and elucidate global properties of cellular networks, systematic identification of protein-protein interactions must meet high quality standards.We present an expanded C. elegans protein-protein interaction network, or 'interactome' map, derived from testing a matrix of approximately 10,000 x approximately 10,000 proteins using a highly specific, high-throughput yeast two-hybrid system. Through a new empirical quality control framework, we show that the resulting data set (Worm Interactome 2007, or WI-2007) was similar in quality to low-throughput data curated from the literature. We filtered previous interaction data sets and integrated them with WI-2007 to generate a high-confidence consolidated map (Worm Interactome version 8, or WI8). This work allowed us to estimate the size of the worm interactome at approximately 116,000 interactions. Comparison with other types of functional genomic data shows the complementarity of distinct experimental approaches in predicting different functional relationships between genes or proteins

Q&A: Epistasis

Journal of Biology. 2009  |  Pubmed ID: 19486505

Quantitative Phenotyping Via Deep Barcode Sequencing

Genome Research. Oct, 2009  |  Pubmed ID: 19622793

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

Next Generation Software for Functional Trend Analysis

Bioinformatics (Oxford, England). Nov, 2009  |  Pubmed ID: 19717575

FuncAssociate is a web application that discovers properties enriched in lists of genes or proteins that emerge from large-scale experimentation. Here we describe an updated application with a new interface and several new features. For example, enrichment analysis can now be performed within multiple gene- and protein-naming systems. This feature avoids potentially serious translation artifacts to which other enrichment analysis strategies are subject. AVAILABILITY: The FuncAssociate web application is freely available to all users at

Large-scale RACE Approach for Proactive Experimental Definition of C. Elegans ORFeome

Genome Research. Dec, 2009  |  Pubmed ID: 19801531

Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20% of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.

Pathways of the Heart

Circulation. Cardiovascular Genetics. Aug, 2009  |  Pubmed ID: 20031600

The Genetic Landscape of a Cell

Science (New York, N.Y.). Jan, 2010  |  Pubmed ID: 20093466

A genome-scale genetic interaction map was constructed by examining 5.4 million gene-gene pairs for synthetic genetic interactions, generating quantitative genetic interaction profiles for approximately 75% of all genes in the budding yeast, Saccharomyces cerevisiae. A network based on genetic interaction profiles reveals a functional map of the cell in which genes of similar biological processes cluster together in coherent subsets, and highly correlated profiles delineate specific pathways to define gene function. The global network identifies functional cross-connections between all bioprocesses, mapping a cellular wiring diagram of pleiotropy. Genetic interaction degree correlated with a number of different gene attributes, which may be informative about genetic network hubs in other organisms. We also demonstrate that extensive and unbiased mapping of the genetic landscape provides a key for interpretation of chemical-genetic interactions and drug target identification.

Numerous Conserved and Divergent MicroRNAs Expressed by Herpes Simplex Viruses 1 and 2

Journal of Virology. May, 2010  |  Pubmed ID: 20181707

Certain viruses use microRNAs (miRNAs) to regulate the expression of their own genes, host genes, or both. Previous studies have identified a limited number of miRNAs expressed by herpes simplex viruses 1 and 2 (HSV-1 and -2), some of which are conserved between these two viruses. To more comprehensively analyze the miRNAs expressed by HSV-1 or HSV-2 during productive and latent infection, we applied a massively parallel sequencing approach. We were able to identify 16 and 17 miRNAs expressed by HSV-1 and HSV-2, respectively, including all previously known species, and a number of previously unidentified virus-encoded miRNAs. The genomic positions of most miRNAs encoded by these two viruses are within or proximal to the latency-associated transcript region. Nine miRNAs are conserved in position and/or sequence, particularly in the seed region, between these two viruses. Interestingly, we did not detect an HSV-2 miRNA homolog of HSV-1 miR-H1, which is highly expressed during productive infection, but we did detect abundant expression of miR-H6, whose seed region is conserved with HSV-1 miR-H1 and might represent a functional analog. We also identified a highly conserved miRNA family arising from the viral origins of replication. In addition, we detected several pairs of complementary miRNAs and we found miRNA-offset RNAs (moRs) arising from the precursors of HSV-1 and HSV-2 miR-H6 and HSV-2 miR-H4. Our results reveal elements of miRNA conservation and divergence that should aid in identifying miRNA functions.

Interpreting Metabolomic Profiles Using Unbiased Pathway Models

PLoS Computational Biology. Feb, 2010  |  Pubmed ID: 20195502

Human disease is heterogeneous, with similar disease phenotypes resulting from distinct combinations of genetic and environmental factors. Small-molecule profiling can address disease heterogeneity by evaluating the underlying biologic state of individuals through non-invasive interrogation of plasma metabolite levels. We analyzed metabolite profiles from an oral glucose tolerance test (OGTT) in 50 individuals, 25 with normal (NGT) and 25 with impaired glucose tolerance (IGT). Our focus was to elucidate underlying biologic processes. Although we initially found little overlap between changed metabolites and preconceived definitions of metabolic pathways, the use of unbiased network approaches identified significant concerted changes. Specifically, we derived a metabolic network with edges drawn between reactant and product nodes in individual reactions and between all substrates of individual enzymes and transporters. We searched for "active modules"--regions of the metabolic network enriched for changes in metabolite levels. Active modules identified relationships among changed metabolites and highlighted the importance of specific solute carriers in metabolite profiles. Furthermore, hierarchical clustering and principal component analysis demonstrated that changed metabolites in OGTT naturally grouped according to the activities of the System A and L amino acid transporters, the osmolyte carrier SLC6A12, and the mitochondrial aspartate-glutamate transporter SLC25A13. Comparison between NGT and IGT groups supported blunted glucose- and/or insulin-stimulated activities in the IGT group. Using unbiased pathway models, we offer evidence supporting the important role of solute carriers in the physiologic response to glucose challenge and conclude that carrier activities are reflected in individual metabolite profiles of perturbation experiments. Given the involvement of transporters in human disease, metabolite profiling may contribute to improved disease classification via the interrogation of specific transporter activities.

The Tandem Inversion Duplication in Salmonella Enterica: Selection Drives Unstable Precursors to Final Mutation Types

Genetics. May, 2010  |  Pubmed ID: 20215473

During growth under selection, mutant types appear that are rare in unselected populations. Stress-induced mechanisms may cause these structures or selection may favor a series of standard events that modify common preexisting structures. One such mutation is the short junction (SJ) duplication with long repeats separated by short sequence elements: AB*(CD)*(CD)*E (* = a few bases). Another mutation type, described here, is the tandem inversion duplication (TID), where two copies of a parent sequence flank an inverse-order segment: AB(CD)(E'D'C'B')(CD)E. Both duplication types can amplify by unequal exchanges between direct repeats (CD), and both are rare in unselected cultures but common after prolonged selection for amplification. The observed TID junctions are asymmetric (aTIDs) and may arise from a symmetrical precursor (sTID)-ABCDE(E'D'C'B'A')ABCDE-when sequential deletions remove each palindromic junction. Alternatively, one deletion can remove both sTID junctions to generate an SJ duplication. It is proposed that sTID structures form frequently under all growth conditions, but are usually lost due to their instability and fitness cost. Selection for increased copy number helps retain the sTID and favors deletions that remodel junctions, improve fitness, and allow higher amplification. Growth improves with each step in formation of an SJ or aTID amplification, allowing selection to favor completion of the mutation process.

Genome-wide Functional Analysis of Human 5' Untranslated Region Introns

Genome Biology. 2010  |  Pubmed ID: 20222956

Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored.

Absence of Evidence for MHC-dependent Mate Selection Within HapMap Populations

PLoS Genetics. Apr, 2010  |  Pubmed ID: 20442868

The major histocompatibility complex (MHC) of immunity genes has been reported to influence mate choice in vertebrates, and a recent study presented genetic evidence for this effect in humans. Specifically, greater dissimilarity at the MHC locus was reported for European-American mates (parents in HapMap Phase 2 trios) than for non-mates. Here we show that the results depend on a few extreme data points, are not robust to conservative changes in the analysis procedure, and cannot be reproduced in an equivalent but independent set of European-American mates. Although some evidence suggests an avoidance of extreme MHC similarity between mates, rather than a preference for dissimilarity, limited sample sizes preclude a rigorous investigation. In summary, fine-scale molecular-genetic data do not conclusively support the hypothesis that mate selection in humans is influenced by the MHC locus.

FuncBase: a Resource for Quantitative Gene Function Annotation

Bioinformatics (Oxford, England). Jul, 2010  |  Pubmed ID: 20495000

SUMMARY: Computational gene function prediction can serve to focus experimental resources on high-priority experimental tasks. FuncBase is a web resource for viewing quantitative machine learning-based gene function annotations. Quantitative annotations of genes, including fungal and mammalian genes, with Gene Ontology terms are accompanied by a community feedback system. Evidence underlying function annotations is shown. For example, a custom Cytoscape viewer shows functional linkage graphs relevant to the gene or function of interest. FuncBase provides links to external resources, and may be accessed directly or via links from species-specific databases. AVAILABILITY: FuncBase as well as all underlying data and annotations are freely available via

Metabolic Signatures of Exercise in Human Plasma

Science Translational Medicine. May, 2010  |  Pubmed ID: 20505214

Exercise provides numerous salutary effects, but our understanding of how these occur is limited. To gain a clearer picture of exercise-induced metabolic responses, we have developed comprehensive plasma metabolite signatures by using mass spectrometry to measure >200 metabolites before and after exercise. We identified plasma indicators of glycogenolysis (glucose-6-phosphate), tricarboxylic acid cycle span 2 expansion (succinate, malate, and fumarate), and lipolysis (glycerol), as well as modulators of insulin sensitivity (niacinamide) and fatty acid oxidation (pantothenic acid). Metabolites that were highly correlated with fitness parameters were found in subjects undergoing acute exercise testing and marathon running and in 302 subjects from a longitudinal cohort study. Exercise-induced increases in glycerol were strongly related to fitness levels in normal individuals and were attenuated in subjects with myocardial ischemia. A combination of metabolites that increased in plasma in response to exercise (glycerol, niacinamide, glucose-6-phosphate, pantothenate, and succinate) up-regulated the expression of nur77, a transcriptional regulator of glucose utilization and lipid metabolism genes in skeletal muscle in vitro. Plasma metabolic profiles obtained during exercise provide signatures of exercise performance and cardiovascular disease susceptibility, in addition to highlighting molecular pathways that may modulate the salutary effects of exercise.

A Genome-wide Gene Function Prediction Resource for Drosophila Melanogaster

PloS One. 2010  |  Pubmed ID: 20711346

Predicting gene functions by integrating large-scale biological data remains a challenge for systems biology. Here we present a resource for Drosophila melanogaster gene function predictions. We trained function-specific classifiers to optimize the influence of different biological datasets for each functional category. Our model predicted GO terms and KEGG pathway memberships for Drosophila melanogaster genes with high accuracy, as affirmed by cross-validation, supporting literature evidence, and large-scale RNAi screens. The resulting resource of prioritized associations between Drosophila genes and their potential functions offers a guide for experimental investigations.

Identification of Neuronal RNA Targets of TDP-43-containing Ribonucleoprotein Complexes

The Journal of Biological Chemistry. Jan, 2011  |  Pubmed ID: 21051541

TAR DNA-binding protein 43 (TDP-43) is associated with a spectrum of neurodegenerative diseases. Although TDP-43 resembles heterogeneous nuclear ribonucleoproteins, its RNA targets and physiological protein partners remain unknown. Here we identify RNA targets of TDP-43 from cortical neurons by RNA immunoprecipitation followed by deep sequencing (RIP-seq). The canonical TDP-43 binding site (TG)(n) is 55.1-fold enriched, and moreover, a variant with adenine in the middle, (TG)(n)TA(TG)(m), is highly abundant among reads in our TDP-43 RIP-seq library. TDP-43 RNA targets can be divided into three different groups: those primarily binding in introns, in exons, and across both introns and exons. TDP-43 RNA targets are particularly enriched for Gene Ontology terms related to synaptic function, RNA metabolism, and neuronal development. Furthermore, TDP-43 binds to a number of RNAs encoding for proteins implicated in neurodegeneration, including TDP-43 itself, FUS/TLS, progranulin, Tau, and ataxin 1 and -2. We also identify 25 proteins that co-purify with TDP-43 from rodent brain nuclear extracts. Prominent among them are nuclear proteins involved in pre-mRNA splicing and RNA stability and transport. Also notable are two neuron-enriched proteins, methyl CpG-binding protein 2 and polypyrimidine tract-binding protein 2 (PTBP2). A PTBP2 consensus RNA binding motif is enriched in the TDP-43 RIP-seq library, suggesting that PTBP2 may co-regulate TDP-43 RNA targets. This work thus reveals the protein and RNA components of the TDP-43-containing ribonucleoprotein complexes and provides a framework for understanding how dysregulation of TDP-43 in RNA metabolism contributes to neurodegeneration.

Knocking out Multigene Redundancies Via Cycles of Sexual Assortment and Fluorescence Selection

Nature Methods. Feb, 2011  |  Pubmed ID: 21217751

Phenotypes that might otherwise reveal a gene's function can be obscured by genes with overlapping function. This phenomenon is best known within gene families, in which an important shared function may only be revealed by mutating all family members. Here we describe the 'green monster' technology that enables precise deletion of many genes. In this method, a population of deletion strains with each deletion marked by an inducible green fluorescent protein reporter gene, is subjected to repeated rounds of mating, meiosis and flow-cytometric enrichment. This results in the aggregation of multiple deletion loci in single cells. The green monster strategy is potentially applicable to assembling other engineered alterations in any species with sex or alternative means of allelic assortment. To test the technology, we generated a single broadly drug-sensitive strain of Saccharomyces cerevisiae bearing precise deletions of all 16 ATP-binding cassette transporters within clades associated with multidrug resistance.

Reconstitution of Human RNA Interference in Budding Yeast

Nucleic Acids Research. Apr, 2011  |  Pubmed ID: 21252293

Although RNA-mediated interference (RNAi) is a widely conserved process among eukaryotes, including many fungi, it is absent from the budding yeast Saccharomyces cerevisiae. Three human proteins, Ago2, Dicer and TRBP, are sufficient for reconstituting the RISC complex in vitro. To examine whether the introduction of human RNAi genes can reconstitute RNAi in S. cerevisiae, genes encoding these three human proteins were introduced into S. cerevisiae. We observed both siRNA and siRNA- and RISC-dependent silencing of the target gene GFP. Thus, human Ago2, Dicer and TRBP can functionally reconstitute human RNAi in S. cerevisiae, in vivo, enabling the study and use of the human RNAi pathway in a facile genetic model organism.

Next-generation Sequencing to Generate Interactome Datasets

Nature Methods. Jun, 2011  |  Pubmed ID: 21516116

Next-generation sequencing has not been applied to protein-protein interactome network mapping so far because the association between the members of each interacting pair would not be maintained in en masse sequencing. We describe a massively parallel interactome-mapping pipeline, Stitch-seq, that combines PCR stitching with next-generation sequencing and used it to generate a new human interactome dataset. Stitch-seq is applicable to various interaction assays and should help expand interactome network mapping.

Genome Analysis Reveals Interplay Between 5'UTR Introns and Nuclear MRNA Export for Secretory and Mitochondrial Genes

PLoS Genetics. Apr, 2011  |  Pubmed ID: 21533221

In higher eukaryotes, messenger RNAs (mRNAs) are exported from the nucleus to the cytoplasm via factors deposited near the 5' end of the transcript during splicing. The signal sequence coding region (SSCR) can support an alternative mRNA export (ALREX) pathway that does not require splicing. However, most SSCR-containing genes also have introns, so the interplay between these export mechanisms remains unclear. Here we support a model in which the furthest upstream element in a given transcript, be it an intron or an ALREX-promoting SSCR, dictates the mRNA export pathway used. We also experimentally demonstrate that nuclear-encoded mitochondrial genes can use the ALREX pathway. Thus, ALREX can also be supported by nucleotide signals within mitochondrial-targeting sequence coding regions (MSCRs). Finally, we identified and experimentally verified novel motifs associated with the ALREX pathway that are shared by both SSCRs and MSCRs. Our results show strong correlation between 5' untranslated region (5'UTR) intron presence/absence and sequence features at the beginning of the coding region. They also suggest that genes encoding secretory and mitochondrial proteins share a common regulatory mechanism at the level of mRNA export.

Discovering the Targets of Drugs Via Computational Systems Biology

The Journal of Biological Chemistry. Jul, 2011  |  Pubmed ID: 21566122

Computational systems biology is empowering the study of drug action. Studies on biological effects of chemical compounds have increased in scale and accessibility, allowing integration with other large-scale experimental data types. Here, we review computational approaches for elucidating the mechanisms of both intended and undesirable effects of drugs, with the collective potential to change the nature of drug discovery and pharmacological therapy.

Independently Evolved Virulence Effectors Converge Onto Hubs in a Plant Immune System Network

Science (New York, N.Y.). Jul, 2011  |  Pubmed ID: 21798943

Plants generate effective responses to infection by recognizing both conserved and variable pathogen-encoded molecules. Pathogens deploy virulence effector proteins into host cells, where they interact physically with host proteins to modulate defense. We generated an interaction network of plant-pathogen effectors from two pathogens spanning the eukaryote-eubacteria divergence, three classes of Arabidopsis immune system proteins, and ~8000 other Arabidopsis proteins. We noted convergence of effectors onto highly interconnected host proteins and indirect, rather than direct, connections between effectors and plant immune receptors. We demonstrated plant immune system functions for 15 of 17 tested host proteins that interact with effectors from both pathogens. Thus, pathogens from different kingdoms deploy independently evolved virulence proteins that interact with a limited set of highly connected cellular hubs to facilitate their diverse life-cycle strategies.

Systematic Exploration of Synergistic Drug Pairs

Molecular Systems Biology. 2011  |  Pubmed ID: 22068327

Drug synergy allows a therapeutic effect to be achieved with lower doses of component drugs. Drug synergy can result when drugs target the products of genes that act in parallel pathways ('specific synergy'). Such cases of drug synergy should tend to correspond to synergistic genetic interaction between the corresponding target genes. Alternatively, 'promiscuous synergy' can arise when one drug non-specifically increases the effects of many other drugs, for example, by increased bioavailability. To assess the relative abundance of these drug synergy types, we examined 200 pairs of antifungal drugs in S. cerevisiae. We found 38 antifungal synergies, 37 of which were novel. While 14 cases of drug synergy corresponded to genetic interaction, 92% of the synergies we discovered involved only six frequently synergistic drugs. Although promiscuity of four drugs can be explained under the bioavailability model, the promiscuity of Tacrolimus and Pentamidine was completely unexpected. While many drug synergies correspond to genetic interactions, the majority of drug synergies appear to result from non-specific promiscuous synergy.

Personalized Medicine: from Genotypes and Molecular Phenotypes Towards Computed Therapy

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2012  |  Pubmed ID: 22174287

Joint genotyping and large-scale phenotyping of molecular traits are currently available for a number of important patient study cohorts and will soon become feasible in routine medical practice. These data are one component of several that are setting the stage for the development of personalized medicine, promising to yield better disease classification, enabling more specific treatment, and also allowing for improved preventive medical screening. This conference session explores statistical challenges and new opportunities that arise from application of genome-scale experimentation for personalized genomics and medicine.

A Resource of Quantitative Functional Annotation for Homo Sapiens Genes

G3 (Bethesda, Md.). Feb, 2012  |  Pubmed ID: 22384401

The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented-alongside existing validated annotations-in a publicly accessible and searchable web interface.

Response to "MHC-dependent Mate Choice in Humans: Why Genomic Patterns from the HapMap European American Data Set Support the Hypothesis". HapMap Genotypes Do Not Confidently Support a Role for the MHC Locus in Human Mate Selection

BioEssays : News and Reviews in Molecular, Cellular and Developmental Biology. Jul, 2012  |  Pubmed ID: 22467222

Genome Rearrangements Caused by Depletion of Essential DNA Replication Proteins in Saccharomyces Cerevisiae

Genetics. Sep, 2012  |  Pubmed ID: 22673806

Genetic screens of the collection of ~4500 deletion mutants in Saccharomyces cerevisiae have identified the cohort of nonessential genes that promote maintenance of genome integrity. Here we probe the role of essential genes needed for genome stability. To this end, we screened 217 tetracycline-regulated promoter alleles of essential genes and identified 47 genes whose depletion results in spontaneous DNA damage. We further showed that 92 of these 217 essential genes have a role in suppressing chromosome rearrangements. We identified a core set of 15 genes involved in DNA replication that are critical in preventing both spontaneous DNA damage and genome rearrangements. Mapping, classification, and analysis of rearrangement breakpoints indicated that yeast fragile sites, Ty retrotransposons, tRNA genes, early origins of replication, and replication termination sites are common features at breakpoints when essential replication genes that suppress chromosome rearrangements are downregulated. We propose mechanisms by which depletion of essential replication proteins can lead to double-stranded DNA breaks near these features, which are subsequently repaired by homologous recombination at repeated elements.

Viral Perturbations of Host Networks Reflect Disease Etiology

PLoS Computational Biology. 2012  |  Pubmed ID: 22761553

Many human diseases, arising from mutations of disease susceptibility genes (genetic diseases), are also associated with viral infections (virally implicated diseases), either in a directly causal manner or by indirect associations. Here we examine whether viral perturbations of host interactome may underlie such virally implicated disease relationships. Using as models two different human viruses, Epstein-Barr virus (EBV) and human papillomavirus (HPV), we find that host targets of viral proteins reside in network proximity to products of disease susceptibility genes. Expression changes in virally implicated disease tissues and comorbidity patterns cluster significantly in the network vicinity of viral targets. The topological proximity found between cellular targets of viral proteins and disease genes was exploited to uncover a novel pathway linking HPV to Fanconi anemia.

Interpreting Cancer Genomes Using Systematic Host Network Perturbations by Tumour Virus Proteins

Nature. Jul, 2012  |  Pubmed ID: 22810586

Genotypic differences greatly influence susceptibility and resistance to disease. Understanding genotype-phenotype relationships requires that phenotypes be viewed as manifestations of network properties, rather than simply as the result of individual genomic variations. Genome sequencing efforts have identified numerous germline mutations, and large numbers of somatic genomic alterations, associated with a predisposition to cancer. However, it remains difficult to distinguish background, or 'passenger', cancer mutations from causal, or 'driver', mutations in these data sets. Human viruses intrinsically depend on their host cell during the course of infection and can elicit pathological phenotypes similar to those arising from mutations. Here we test the hypothesis that genomic variations and tumour viruses may cause cancer through related mechanisms, by systematically examining host interactome and transcriptome network perturbations caused by DNA tumour virus proteins. The resulting integrated viral perturbation data reflects rewiring of the host cell networks, and highlights pathways, such as Notch signalling and apoptosis, that go awry in cancer. We show that systematic analyses of host targets of viral proteins can identify cancer genes with a success rate on a par with their identification through functional genomics and large-scale cataloguing of tumour mutations. Together, these complementary approaches increase the specificity of cancer gene identification. Combining systems-level studies of pathogen-encoded gene products with genomic approaches will facilitate the prioritization of cancer-causing driver genes to advance the understanding of the genetic basis of human cancer.

Introns in UTRs: Why We Should Stop Ignoring Them

BioEssays : News and Reviews in Molecular, Cellular and Developmental Biology. Dec, 2012  |  Pubmed ID: 23108796

Although introns in 5'- and 3'-untranslated regions (UTRs) are found in many protein coding genes, rarely are they considered distinctive entities with specific functions. Indeed, mammalian transcripts with 3'-UTR introns are often assumed nonfunctional because they are subject to elimination by nonsense-mediated decay (NMD). Nonetheless, recent findings indicate that 5'- and 3'-UTR intron status is of significant functional consequence for the regulation of mammalian genes. Therefore these features should be ignored no longer.

ChromoZoom: a Flexible, Fluid, Web-based Genome Browser

Bioinformatics (Oxford, England). Dec, 2012  |  Pubmed ID: 23220575

SUMMARY: Current web-based genome browsers require repetitious user input to scroll over long distances, alter the drawing density of elements, or zoom through multiple orders of magnitude. Generally, either the server or the client is responsible for the majority of data processing, resulting in either servers having to receive and handle data relevant only to one user, or clients redundantly processing widely viewed data. ChromoZoom pre-renders and caches general-use tracks into tiled images on the server and serves them in an interactive web interface with inertial scrolling and precise, fluent zooming via the mouse wheel or trackpad. Custom tracks in several formats can be rendered by client-side code alongside the pre-rendered tracks, minimizing server load due to user-specific rendering and eliminating the need to transmit private data. ChromoZoom thereby enables rapid and simultaneous exploration of curated, experimental, and personal genomic datasets. AVAILABILITY: Human and yeast genome researchers may browse recent assemblies within ChromoZoom at Source code is available at CONTACT: SUPPLEMENTARY INFORMATION: Table S1 provides a comparison of features with other current web-based genome browsers.

simple hit counter