The eukaryotic cell division cycle is a highly regulated process that consists of a complex series of events and involves thousands of proteins. Researchers have studied the regulation of the cell cycle in several organisms, employing a wide range of high-throughput technologies, such as microarray-based mRNA expression profiling and quantitative proteomics. Due to its complexity, the cell cycle can also fail or otherwise change in many different ways if important genes are knocked out, which has been studied in several microscopy-based knockdown screens. The data from these many large-scale efforts are not easily accessed, analyzed and combined due to their inherent heterogeneity. To address this, we have created Cyclebase-available at http://www.cyclebase.org-an online database that allows users to easily visualize and download results from genome-wide cell-cycle-related experiments. In Cyclebase version 3.0, we have updated the content of the database to reflect changes to genome annotation, added new mRNA and protein expression data, and integrated cell-cycle phenotype information from high-content screens and model-organism databases. The new version of Cyclebase also features a new web interface, designed around an overview figure that summarizes all the cell-cycle-related data for a gene.
Data collected for medical, filing and administrative purposes in electronic patient records (EPRs) represent a rich source of individualised clinical data, which has great potential for improved detection of patients experiencing adverse drug reactions (ADRs), across all approved drugs and across all indication areas.
A key prerequisite for precision medicine is the estimation of disease progression from the current patient state. Disease correlations and temporal disease progression (trajectories) have mainly been analysed with focus on a small number of diseases or using large-scale approaches without time consideration, exceeding a few years. So far, no large-scale studies have focused on defining a comprehensive set of disease trajectories. Here we present a discovery-driven analysis of temporal disease progression patterns using data from an electronic health registry covering the whole population of Denmark. We use the entire spectrum of diseases and convert 14.9 years of registry data on 6.2 million patients into 1,171 significant trajectories. We group these into patterns centred on a small number of key diagnoses such as chronic obstructive pulmonary disease (COPD) and gout, which are central to disease progression and hence important to diagnose early to mitigate the risk of adverse outcomes. We suggest such trajectory analyses may be useful for predicting and preventing future diseases of individual patients.
Binding assays are increasingly used as a screening method for protein kinase inhibitors; however, as yet only a weak correlation with enzymatic activity-based assays has been demonstrated. We show that the correlation between the two types of assays can be improved using more precise screening conditions. Furthermore a marked improvement in the correlation was found by using kinase constructs containing the catalytic domain in presence of additional domains or subunits.
Many protein domains bind to short peptide sequences, called linear motifs. Data on their sequence specificities is sparse, which is why biologists usually resort to basic pattern searches to identify new putative binding sites for experimental follow-up. Most motifs have poor specificity and prioritization of the matches is thus crucial when scanning a full proteome with a pattern. Here we present a generic method to prioritize motif occurrence predictions by using cellular contextual information. We take 2 parameters as input: the motif occurrences and one or more of the interacting domains. The potential hits are ranked based on how strongly the context network associates them with a protein containing one of the specified domains, which leads to an increased predictive performance. The method is available through a web interface at doremi.jensenlab.org, which allows for an easy application of the method. We show that this approach leads to improved predictions of binding partners for PDZ domains and the SUMO binding domain. This is consistent with the earlier observation that coupling sequence motifs with network information improves kinase-specific substrate predictions.
Information on protein subcellular localization is important to understand the cellular functions of proteins. Currently, such information is manually curated from the literature, obtained from high-throughput microscopy-based screens and predicted from primary sequence. To get a comprehensive view of the localization of a protein, it is thus necessary to consult multiple databases and prediction tools. To address this, we present the COMPARTMENTS resource, which integrates all sources listed above as well as the results of automatic text mining. The resource is automatically kept up to date with source databases, and all localization evidence is mapped onto common protein identifiers and Gene Ontology terms. We further assign confidence scores to the localization evidence to facilitate comparison of different types and sources of evidence. To further improve the comparability, we assign confidence scores based on the type and source of the localization evidence. Finally, we visualize the unified localization evidence for a protein on a schematic cell to provide a simple overview. Database URL: http://compartments.jensenlab.org.
MicroRNAs (miRNAs) are a highly abundant class of non-coding RNA genes involved in cellular regulation and thus also diseases. Despite miRNAs being important disease factors, miRNA-disease associations remain low in number and of variable reliability. Furthermore, existing databases and prediction methods do not explicitly facilitate forming hypotheses about the possible molecular causes of the association, thereby making the path to experimental follow-up longer.
The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr BP). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43?kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalskis horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5?million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2?Myr, particularly during periods of severe climatic changes. We estimate that the Przewalskis and domestic horse populations diverged 38-72?kyr BP, and find no evidence of recent admixture between the domestic horse breeds and the Przewalskis horse investigated. This supports the contention that Przewalskis horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalskis and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalskis horse. Such regions could correspond to loci selected early during domestication.
Drugs have tremendous potential to cure and relieve disease, but the risk of unintended effects is always present. Healthcare providers increasingly record data in electronic patient records (EPRs), in which we aim to identify possible adverse events (AEs) and, specifically, possible adverse drug events (ADEs).
Although countless highly penetrant variants have been associated with Mendelian disorders, the genetic etiologies underlying complex diseases remain largely unresolved. By mining the medical records of over 110 million patients, we examine the extent to which Mendelian variation contributes to complex disease risk. We detect thousands of associations between Mendelian and complex diseases, revealing a nondegenerate, phenotypic code that links each complex disorder to a unique collection of Mendelian loci. Using genome-wide association results, we demonstrate that common variants associated with complex diseases are enriched in the genes indicated by this "Mendelian code." Finally, we detect hundreds of comorbidity associations among Mendelian disorders, and we use probabilistic genetic modeling to demonstrate that Mendelian variants likely contribute nonadditively to the risk for a subset of complex diseases. Overall, this study illustrates a complementary approach for mapping complex disease loci and provides unique predictions concerning the etiologies of specific diseases.
Side effect similarities of drugs have recently been employed to predict new drug targets, and networks of side effects and targets have been used to better understand the mechanism of action of drugs. Here, we report a large-scale analysis to systematically predict and characterize proteins that cause drug side effects. We integrated phenotypic data obtained during clinical trials with known drug-target relations to identify overrepresented protein-side effect combinations. Using independent data, we confirm that most of these overrepresentations point to proteins which, when perturbed, cause side effects. Of 1428 side effects studied, 732 were predicted to be predominantly caused by individual proteins, at least 137 of them backed by existing pharmacological or phenotypic data. We prove this concept in vivo by confirming our prediction that activation of the serotonin 7 receptor (HTR7) is responsible for hyperesthesia in mice, which, in turn, can be prevented by a drug that selectively inhibits HTR7. Taken together, we show that a large fraction of complex drug side effects are mediated by individual proteins and create a reference for such relations.
The exponential growth of the biomedical literature is making the need for efficient, accurate text-mining tools increasingly clear. The identification of named biological entities in text is a central and difficult task. We have developed an efficient algorithm and implementation of a dictionary-based approach to named entity recognition, which we here use to identify names of species and other taxa in text. The tool, SPECIES, is more than an order of magnitude faster and as accurate as existing tools. The precision and recall was assessed both on an existing gold-standard corpus and on a new corpus of 800 abstracts, which were manually annotated after the development of the tool. The corpus comprises abstracts from journals selected to represent many taxonomic groups, which gives insights into which types of organism names are hard to detect and which are easy. Finally, we have tagged organism names in the entire Medline database and developed a web resource, ORGANISMS, that makes the results accessible to the broad community of biologists. The SPECIES software is open source and can be downloaded from http://species.jensenlab.org along with dictionary files and the manually annotated gold-standard corpus. The ORGANISMS web resource can be found at http://organisms.jensenlab.org.
To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of interactions connecting over 300,000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version, the number of chemicals with interactions and the number of high-confidence interactions both increase 4-fold. The database can be accessed interactively through a web interface, displaying interactions in an integrated network view. It is also available for computational studies through downloadable files and an API. As an extension in the current version, we offer the option to switch between two levels of detail, namely whether stereoisomers of a given compound are shown as a merged entity or as separate entities. Separate display of stereoisomers is necessary, for example, for carbohydrates and chiral drugs. Combining the isomers increases the coverage, as interaction databases and publications found through text mining will often refer to compounds without specifying the stereoisomer. The database is accessible at http://stitch.embl.de/.
Genome-wide association studies (GWAS) have identified thousands of single nucleotide polymorphisms (SNPs) associated with the risk of hundreds of diseases. However, there is currently no database that enables non-specialists to answer the following simple questions: which SNPs associated with diseases are in linkage disequilibrium (LD) with a gene of interest? Which chromosomal regions have been associated with a given disease, and which are the potentially causal genes in each region? To answer these questions, we use data from the HapMap Project to partition each chromosome into so-called LD blocks, so that SNPs in LD with each other are preferentially in the same block, whereas SNPs not in LD are in different blocks. By projecting SNPs and genes onto LD blocks, the DistiLD database aims to increase usage of existing GWAS results by making it easy to query and visualize disease-associated SNPs and genes in their chromosomal context. The database is available at http://distild.jensenlab.org/.
The tip of a projectile point made of mastodon bone is embedded in a rib of a single disarticulated mastodon at the Manis site in the state of Washington. Radiocarbon dating and DNA analysis show that the rib is associated with the other remains and dates to 13,800 years ago. Thus, osseous projectile points, common to the Beringian Upper Paleolithic and Clovis, were made and used during pre-Clovis times in North America. The Manis site, combined with evidence of mammoth hunting at sites in Wisconsin, provides evidence that people were hunting proboscideans at least two millennia before Clovis.
Using metagenomic parts lists to infer global patterns on microbial ecology remains a significant challenge. To deduce important ecological indicators such as environmental adaptation, molecular trait dispersal, diversity variation and primary production from the gene pool of an ecosystem, we integrated 25 ocean metagenomes with geographical, meteorological and geophysicochemical data. We find that climatic factors (temperature, sunlight) are the major determinants of the biomolecular repertoire of each sample and the main limiting factor on functional trait dispersal (absence of biogeographic provincialism). Molecular functional richness and diversity show a distinct latitudinal gradient peaking at 20° N and correlate with primary production. The latter can also be predicted from the molecular functional composition of an environmental sample. Together, our results show that the functional community composition derived from metagenomes is an important quantitative readout for molecular trait-based biogeography and ecology.
Protein-metabolite networks are central to biological systems, but are incompletely understood. Here, we report a screen to catalog protein-lipid interactions in yeast. We used arrays of 56 metabolites to measure lipid-binding fingerprints of 172 proteins, including 91 with predicted lipid-binding domains. We identified 530 protein-lipid associations, the majority of which are novel. To show the data sets biological value, we studied further several novel interactions with sphingolipids, a class of conserved bioactive lipids with an elusive mode of action. Integration of live-cell imaging suggests new cellular targets for these molecules, including several with pleckstrin homology (PH) domains. Validated interactions with Slm1, a regulator of actin polarization, show that PH domains can have unexpected lipid-binding specificities and can act as coincidence sensors for both phosphatidylinositol phosphates and phosphorylated sphingolipids.
Drug perturbations of human cells lead to complex responses upon target binding. One of the known mechanisms is a (positive or negative) feedback loop that adjusts the expression level of the respective target protein. To quantify this mechanism systems-wide in an unbiased way, drug-induced differential expression of drug target mRNA was examined in three cell lines using the Connectivity Map. To overcome various biases in this valuable resource, we have developed a computational normalization and scoring procedure that is applicable to gene expression recording upon heterogeneous drug treatments. In 1290 drug-target relations, corresponding to 466 drugs acting on 167 drug targets studied, 8% of the targets are subject to regulation at the mRNA level. We confirmed systematically that in particular G-protein coupled receptors, when serving as known targets, are regulated upon drug treatment. We further newly identified drug-induced differential regulation of Lanosterol 14-alpha demethylase, Endoplasmin, DNA topoisomerase 2-alpha and Calmodulin 1. The feedback regulation in these and other targets is likely to be relevant for the success or failure of the molecular intervention.
Protein annotation provides a condensed and systematic view on the function of individual proteins. It has traditionally dealt with sorting proteins into functional categories, which for example has proven to be successful for the comparison of different species. However, if we are to understand the differences between many individuals of the same species-humans in particular - the focus needs be on the functional impact of individual residue variation. To fulfil the promises of personal genomics, we need to start asking not only what is in a genome but also how millions of small differences between individual genomes affect protein function and in turn human health.
A water tunnel in Candida antarctica lipase B that provides the active site with substrate water is hypothesized. A small, focused library created in order to prevent water from entering the active site through the tunnel was screened for increased transacylation over hydrolysis activity. A single mutant, S47L, in which the inner part of the tunnel was blocked, catalysed the transacylation of vinyl butyrate to 20 mM butanol 14 times faster than hydrolysis. The single mutant Q46A, which has a more open outer end of the tunnel, showed an increased hydrolysis rate and a decreased hydrolysis to transacylation ratio compared to the wild-type lipase. Mutants with a blocked tunnel could be very useful in applications in which hydrolysis is unwanted, such as the acylation of highly hydrophilic compounds in the presence of water.
Several cyclic processes take place within a single organism. For example, the cell cycle is coordinated with the 24 h diurnal rhythm in animals and plants, and with the 40 min ultradian rhythm in budding yeast. To examine the evolution of periodic gene expression during these processes, we performed the first systematic comparison in three organisms (Homo sapiens, Arabidopsis thaliana and Saccharomyces cerevisiae) by using public microarray data. We observed that although diurnal-regulated and ultradian-regulated genes are not generally cell-cycle-regulated, they tend to have cell-cycle-regulated paralogues. Thus, diverged temporal expression of paralogues seems to facilitate cellular orchestration under different periodic stimuli. Lineage-specific functional repertoires of periodic-associated paralogues imply that this mode of regulation might have evolved independently in several organisms.
The molecular understanding of phenotypes caused by drugs in humans is essential for elucidating mechanisms of action and for developing personalized medicines. Side effects of drugs (also known as adverse drug reactions) are an important source of human phenotypic information, but so far research on this topic has been hampered by insufficient accessibility of data. Consequently, we have developed a public, computer-readable side effect resource (SIDER) that connects 888 drugs to 1450 side effect terms. It contains information on frequency in patients for one-third of the drug-side effect pairs. For 199 drugs, the side effect frequency of placebo administration could also be extracted. We illustrate the potential of SIDER with a number of analyses. The resource is freely available for academic research at http://sideeffects.embl.de.
Cell division involves a complex series of events orchestrated by thousands of molecules. To study this process, researchers have employed mRNA expression profiling of synchronously growing cell cultures progressing through the cell cycle. These experiments, which have been carried out in several organisms, are not easy to access, combine and evaluate. Complicating factors include variation in interdivision time between experiments and differences in relative duration of each cell-cycle phase across organisms. To address these problems, we created Cyclebase, an online resource of cell-cycle-related experiments. This database provides an easy-to-use web interface that facilitates visualization and download of genome-wide cell-cycle data and analysis results. Data from different experiments are normalized to a common timescale and are complimented with key cell-cycle information and derived analysis results. In Cyclebase version 2.0, we have updated the entire database to reflect changes to genome annotations, included information on cyclin-dependent kinase (CDK) substrates, predicted degradation signals and loss-of-function phenotypes from genome-wide screens. The web interface has been improved and provides a single, gene-centric graph summarizing the available cell-cycle experiments. Finally, key information and links to orthologous and paralogous genes are now included to further facilitate comparison of cell-cycle regulation across species. Cyclebase version 2.0 is available at http://www.cyclebase.org.
Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug-target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74,000 different chemicals, including 2200 drugs. STITCH can be accessed at http://stitch.embl.de/.
The eukaryotic cell cycle requires precise temporal coordination of the activities of hundreds of executor proteins (EPs) involved in cell growth and division. Cyclin-dependent protein kinases (Cdks) play central roles in regulating the production, activation, inactivation and destruction of these EPs. From genome-scale data sets of budding yeast, we identify 126 EPs that are regulated by Cdk1 both through direct phosphorylation of the EP and through phosphorylation of the transcription factors that control expression of the EP, so that each of these EPs is regulated by a feed-forward loop (FFL) from Cdk1. By mathematical modelling, we show that such FFLs can activate EPs at different phases of the cell cycle depending of the effective signs (+ or -) of the regulatory steps of the FFL. We provide several case studies of EPs that are controlled by FFLs exactly as our models predict. The signal-transduction properties of FFLs allow one (or a few) Cdk signal(s) to drive a host of cell cycle responses in correct temporal sequence.
Transcription factors (TFs) have long been known to be principally activators of transcription in eukaryotes and prokaryotes. The growing awareness of the ubiquity of microRNAs (miRNAs) as suppressive regulators in eukaryotes, suggests the possibility of a mutual, preferential, self-regulatory connectivity between miRNAs and TFs. Here we investigate the connectivity from TFs and miRNAs to other genes and each other using text mining, TF promoter binding site and 6 different miRNA binding site prediction methods.
Deubiquitylating enzymes (DUBs) are a large group of proteases that regulate ubiquitin-dependent metabolic pathways by cleaving ubiquitin-protein bonds. Here we present a global study aimed at elucidating the effects DUBs have on protein abundance changes in eukaryotic cells. To this end we compare wild-type Saccharomyces cerevisiae to 20 DUB knock-out strains using quantitative proteomics to measure proteome-wide expression of isotope labeled proteins, and analyze the data in the context of known transcription-factor regulatory networks. Overall we find that protein abundances differ widely between individual deletion strains, demonstrating that removing just a single component from the complex ubiquitin system causes major changes in cellular protein expression. The outcome of our analysis confirms many of the known biological roles for characterized DUBs such as Ubp3p and Ubp8p, and we demonstrate that Sec28p is a novel Ubp3p substrate. In addition we find strong associations for several uncharacterized DUBs providing clues for their possible cellular roles. Hierarchical clustering of all deletion strains reveals pronounced similarities between various DUBs, which corroborate current DUB knowledge and uncover novel functional aspects for uncharacterized DUBs. Observations in our analysis support that DUBs induce both direct and indirect effects on protein abundances.
The cell cycle is a temporal program that regulates DNA synthesis and cell division. When we compared the codon usage of cell cycle-regulated genes with that of other genes, we discovered that there is a significant preference for non-optimal codons. Moreover, genes encoding proteins that cycle at the protein level exhibit non-optimal codon preferences. Remarkably, cell cycle-regulated genes expressed in different phases display different codon preferences. Here, we show empirically that transfer RNA (tRNA) expression is indeed highest in the G2 phase of the cell cycle, consistent with the non-optimal codon usage of genes expressed at this time, and lowest toward the end of G1, reflecting the optimal codon usage of G1 genes. Accordingly, protein levels of human glycyl-, threonyl-, and glutamyl-prolyl tRNA synthetases were found to oscillate, peaking in G2/M phase. In light of our findings, we propose that non-optimal (wobbly) matching codons influence protein synthesis during the cell cycle. We describe a new mathematical model that shows how codon usage can give rise to cell-cycle regulation. In summary, our data indicate that cells exploit wobbling to generate cell cycle-dependent dynamics of proteins.
Genome-wide association studies (GWAS) have heralded a new era in susceptibility locus discovery in complex diseases. For type 1 diabetes, >40 susceptibility loci have been discovered. However, GWAS do not inevitably lead to identification of the gene or genes in a given locus associated with disease, and they do not typically inform the broader context in which the disease genes operate. Here, we integrated type 1 diabetes GWAS data with protein-protein interactions to construct biological networks of relevance for disease. A total of 17 networks were identified. To prioritize and substantiate these networks, we performed expressional profiling in human pancreatic islets exposed to proinflammatory cytokines. Three networks were significantly enriched for cytokine-regulated genes and, thus, likely to play an important role for type 1 diabetes in pancreatic islets. Eight of the regulated genes (CD83, IFNGR1, IL17RD, TRAF3IP2, IL27RA, PLCG2, MYO1B, and CXCR7) in these networks also harbored single nucleotide polymorphisms nominally associated with type 1 diabetes. Finally, the expression and cytokine regulation of these new candidate genes were confirmed in insulin-secreting INS-1 ?-cells. Our results provide novel insight to the mechanisms behind type 1 diabetes pathogenesis and, thus, may provide the basis for the design of novel treatment strategies.
DNA replication, mitosis and mitotic exit are critical transitions of the cell cycle which normally occur only once per cycle. A universal control mechanism was proposed for the regulation of mitotic entry in which Cdk helps its own activation through two positive feedback loops. Recent discoveries in various organisms showed the importance of positive feedbacks in other transitions as well. Here we investigate if a universal control system with transcriptional regulation(s) and post-translational positive feedback(s) can be proposed for the regulation of all cell cycle transitions. Through computational modeling, we analyze the transition dynamics in all possible combinations of transcriptional and post-translational regulations. We find that some combinations lead to sloppy transitions, while others give very precise control. The periodic transcriptional regulation through the activator or the inhibitor leads to radically different dynamics. Experimental evidence shows that in cell cycle transitions of organisms investigated for cell cycle dependent periodic transcription, only the inhibitor OR the activator is under cyclic control and never both of them. Based on these observations, we propose two transcriptional control modes of cell cycle regulation that either STOP or let the cycle GO in case of a transcriptional failure. We discuss the biological relevance of such differences.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.