Thousands of unique mutations in transcription factors (TFs) arise in cancers, and the functional and biological roles of relatively few of these have been characterized. Here, we used structure-based methods developed specifically for DNA-binding proteins to systematically predict the consequences of mutations in several TFs that are frequently mutated in cancers. The explicit consideration of protein-DNA interactions was crucial to explain the roles and prevalence of mutations in TP53 and RUNX1 in cancers, and resulted in a higher specificity of detection for known p53-regulated genes among genetic associations between TP53 genotypes and genome-wide expression in The Cancer Genome Atlas, compared to existing methods of mutation assessment. Biophysical predictions also indicated that the relative prevalence of TP53 missense mutations in cancer is proportional to their thermodynamic impacts on protein stability and DNA binding, which is consistent with the selection for the loss of p53 transcriptional function in cancers. Structure and thermodynamics-based predictions of the impacts of missense mutations that focus on specific molecular functions may be increasingly useful for the precise and large-scale inference of aberrant molecular phenotypes in cancer and other complex diseases.
Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming.
Epigenetic alterations, particularly in DNA methylation, are ubiquitous in cancer, yet the molecular origins and the consequences of these alterations are poorly understood. CTCF, a DNA-binding protein that regulates higher-order chromatin organization, is frequently altered by hemizygous deletion or mutation in human cancer. To date, a causal role for CTCF in cancer has not been established. Here, we show that Ctcf hemizygous knockout mice are markedly susceptible to spontaneous, radiation-, and chemically induced cancer in a broad range of tissues. Ctcf(+/-) tumors are characterized by increased aggressiveness, including invasion, metastatic dissemination, and mixed epithelial/mesenchymal differentiation. Molecular analysis of Ctcf(+/-) tumors indicates that Ctcf is haploinsufficient for tumor suppression. Tissues with hemizygous loss of CTCF exhibit increased variability in CpG methylation genome wide. These findings establish CTCF as a prominent tumor-suppressor gene and point to CTCF-mediated epigenetic stability as a major barrier to neoplastic progression.
Genomic information is encoded on a wide range of distance scales, ranging from tens of bases to megabases. We developed a multiscale framework to analyze and visualize the information content of genomic signals. Different types of signals, such as G+C content or DNA methylation, are characterized by distinct patterns of signal enrichment or depletion across scales spanning several orders of magnitude. These patterns are associated with a variety of genomic annotations. By integrating the information across all scales, we demonstrated improved prediction of gene expression from polymerase II chromatin immunoprecipitation sequencing (ChIP-seq) measurements, and we observed that gene expression differences in colorectal cancer are related to methylation patterns that extend beyond the single-gene scale. Our software is available at https://github.com/tknijnen/msr/.
Ovarian carcinoma is the most lethal gynaecological malignancy. Better understanding of the molecular pathogenesis of this disease and effective targeted therapies are needed to improve patient outcomes. MicroRNAs play important roles in cancer progression and have the potential for use as either therapeutic agents or targets. Studies in other cancers have suggested that miR-506 has anti-tumour activity, but its function has yet to be elucidated. We found that deregulation of miR-506 in ovarian carcinoma promotes an aggressive phenotype. Ectopic over-expression of miR-506 in ovarian cancer cells was sufficient to inhibit proliferation and to promote senescence. We also demonstrated that CDK4 and CDK6 are direct targets of miR-506, and that miR-506 can inhibit CDK4/6-FOXM1 signalling, which is activated in the majority of serous ovarian carcinomas. This newly recognized miR-506-CDK4/6-FOXM1 axis provides further insight into the pathogenesis of ovarian carcinoma and identifies a potential novel therapeutic agent.
Epithelial-to-mesenchymal transition (EMT) and its reverse process, mesenchymal-to-epithelial transition (MET), play important roles in embryogenesis, stem cell biology, and cancer progression. EMT can be regulated by many signaling pathways and regulatory transcriptional networks. Furthermore, post-transcriptional regulatory networks regulate EMT; these networks include the long non-coding RNA (lncRNA) and microRNA (miRNA) families. Specifically, the miR-200 family, miR-101, miR-506, and several lncRNAs have been found to regulate EMT. Recent studies have illustrated that several lncRNAs are overexpressed in various cancers and that they can promote tumor metastasis by inducing EMT. MiRNA controls EMT by regulating EMT transcription factors or other EMT regulators, suggesting that lncRNAs and miRNA are novel therapeutic targets for the treatment of cancer. Further efforts have shown that non-coding-mediated EMT regulation is closely associated with epigenetic regulation through promoter methylation (e.g., miR-200 or miR-506) and protein regulation (e.g., SET8 via miR-502). The formation of gene fusions has also been found to promote EMT in prostate cancer. In this review, we discuss the post-transcriptional regulatory network that is involved in EMT and MET and how targeting EMT and MET may provide effective therapeutics for human disease.
Microorganisms often form multicellular structures such as biofilms and structured colonies that can influence the organism's virulence, drug resistance, and adherence to medical devices. Phenotypic classification of these structures has traditionally relied on qualitative scoring systems that limit detailed phenotypic comparisons between strains. Automated imaging and quantitative analysis have the potential to improve the speed and accuracy of experiments designed to study the genetic and molecular networks underlying different morphological traits. For this reason, we have developed a platform that uses automated image analysis and pattern recognition to quantify phenotypic signatures of yeast colonies. Our strategy enables quantitative analysis of individual colonies, measured at a single time point or over a series of time-lapse images, as well as the classification of distinct colony shapes based on image-derived features. Phenotypic changes in colony morphology can be expressed as changes in feature space trajectories over time, thereby enabling the visualization and quantitative analysis of morphological development. To facilitate data exploration, results are plotted dynamically through an interactive Yeast Image Analysis web application (YIMAA; http://yimaa.cs.tut.fi) that integrates the raw and processed images across all time points, allowing exploration of the image-based features and principal components associated with morphological development.
The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.
Systems biology experiments studying different topics and organisms produce thousands of data values across different types of genomic data. Further, data mining analyses are yielding ranked and heterogeneous results and association networks distributed over the entire genome. The visualization of these results is often difficult and standalone web tools allowing for custom inputs and dynamic filtering are limited.
Bayesian networks are probabilistic graphical models suitable for modeling several kinds of biological systems. In many cases, the structure of a Bayesian network represents causal molecular mechanisms or statistical associations of the underlying system. Bayesian networks have been applied, for example, for inferring the structure of many biological networks from experimental data. We present some recent progress in learning the structure of static and dynamic Bayesian networks from data.
Altered expression of oncogenic and tumour-suppressing microRNAs (miRNAs) is widely associated with tumourigenesis. However, the regulatory mechanisms underlying these alterations are poorly understood. We sought to shed light on the deregulation of miRNA biogenesis promoting the aberrant miRNA expression profiles identified in these tumours. Using sequencing technology to perform both whole-transcriptome and small RNA sequencing of glioma patient samples, we examined precursor and mature miRNAs to directly evaluate the miRNA maturation process, and examined expression profiles for genes involved in the major steps of miRNA biogenesis. We found that ratios of mature to precursor forms of a large number of miRNAs increased with the progression from normal brain to low-grade and then to high-grade gliomas. The expression levels of genes involved in each of the three major steps of miRNA biogenesis (nuclear processing, nucleo-cytoplasmic transport, and cytoplasmic processing) were systematically altered in glioma tissues. Survival analysis of an independent data set demonstrated that the alteration of genes involved in miRNA maturation correlates with survival in glioma patients. Direct quantification of miRNA maturation with deep sequencing demonstrated that deregulation of the miRNA biogenesis pathway is a hallmark for glioma genesis and progression.
The distinct cell types of multicellular organisms arise owing to constraints imposed by gene regulatory networks on the collective change of gene expression across the genome, creating self-stabilizing expression states, or attractors. We curated human expression data comprising 166 cell types and 2,602 transcription-regulating genes and developed a data-driven method for identifying putative determinants of cell fate built around the concept of expression reversal of gene pairs, such as those participating in toggle-switch circuits. This approach allows us to organize the cell types into their ontogenic lineage relationships. Our method identifies genes in regulatory circuits that control neuronal fate, pluripotency and blood cell differentiation, and it may be useful for prioritizing candidate factors for direct conversion of cell fate.
Integrated genomic analyses revealed a miRNA-regulatory network that further defined a robust integrated mesenchymal subtype associated with poor overall survival in 459 cases of serous ovarian cancer (OvCa) from The Cancer Genome Atlas and 560 cases from independent cohorts. Eight key miRNAs, including miR-506, miR-141, and miR-200a, were predicted to regulate 89% of the targets in this network. Follow-up functional experiments illustrate that miR-506 augmented E-cadherin expression, inhibited cell migration and invasion, and prevented TGF?-induced epithelial-mesenchymal transition by targeting SNAI2, a transcriptional repressor of E-cadherin. In human OvCa, miR-506 expression was correlated with decreased SNAI2 and VIM, elevated E-cadherin, and beneficial prognosis. Nanoparticle delivery of miR-506 in orthotopic OvCa mouse models led to E-cadherin induction and reduced tumor growth.
To facilitate analysis and understanding of biological systems, large-scale data are often integrated into models using a variety of mathematical and computational approaches. Such models describe the dynamics of the biological system and can be used to study the changes in the state of the system over time. For many model classes, such as discrete or continuous dynamical systems, there exist appropriate frameworks and tools for analyzing system dynamics. However, the heterogeneous information that encodes and bridges molecular and cellular dynamics, inherent to fine-grained molecular simulation models, presents significant challenges to the study of system dynamics. In this paper, we present an algorithmic information theory based approach for the analysis and interpretation of the dynamics of such executable models of biological systems. We apply a normalized compression distance (NCD) analysis to the state representations of a model that simulates the immune decision making and immune cell behavior. We show that this analysis successfully captures the essential information in the dynamics of the system, which results from a variety of events including proliferation, differentiation, or perturbations such as gene knock-outs. We demonstrate that this approach can be used for the analysis of executable models, regardless of the modeling framework, and for making experimentally quantifiable predictions.
Regulation of gene expression involves the orchestrated interaction of a large number of proteins with transcriptional regulatory elements in the context of chromatin. Our understanding of gene regulation is limited by the lack of a protein measurement technology that can systematically detect and quantify the ensemble of proteins associated with the transcriptional regulatory elements of specific genes. Here, we introduce a set of selected reaction monitoring (SRM) assays for the systematic measurement of 464 proteins with known or suspected roles in transcriptional regulation at RNA polymerase II transcribed promoters in Saccharomyces cerevisiae. Measurement of these proteins in nuclear extracts by SRM permitted the reproducible quantification of 42% of the proteins over a wide range of abundances. By deploying the assay to systematically identify DNA binding transcriptional regulators that interact with the environmentally regulated FLO11 promoter in cell extracts, we identified 15 regulators that bound specifically to distinct regions along ?600 bp of the regulatory sequence. Importantly, the dataset includes a number of regulators that have been shown to either control FLO11 expression or localize to these regulatory regions in vivo. We further validated the utility of the approach by demonstrating that two of the SRM-identified factors, Mot3 and Azf1, are required for proper FLO11 expression. These results demonstrate the utility of SRM-based targeted proteomics to guide the identification of gene-specific transcriptional regulators.
We describe the landscape of somatic genomic alterations based on multidimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors, including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer.
Cells exposed to stimuli exhibit a wide range of responses ensuring phenotypic variability across the population. Such single cell behavior is often examined by flow cytometry; however, gating procedures typically employed to select a small subpopulation of cells with similar morphological characteristics make it difficult, even impossible, to quantitatively compare cells across a large variety of experimental conditions because these conditions can lead to profound morphological variations. To overcome these limitations, we developed a regression approach to correct for variability in fluorescence intensity due to differences in cell size and granularity without discarding any of the cells, which gating ipso facto does. This approach enables quantitative studies of cellular heterogeneity and transcriptional noise in high-throughput experiments involving thousands of samples. We used this approach to analyze a library of yeast knockout strains and reveal genes required for the population to establish a bimodal response to oleic acid induction. We identify a group of epigenetic regulators and nucleoporins that, by maintaining an unresponsive population, may provide the population with the advantage of diversified bet hedging.
In computational biology, permutation tests have become a widely used tool to assess the statistical significance of an event under investigation. However, the common way of computing the P-value, which expresses the statistical significance, requires a very large number of permutations when small (and thus interesting) P-values are to be accurately estimated. This is computationally expensive and often infeasible. Recently, we proposed an alternative estimator, which requires far fewer permutations compared to the standard empirical approach while still reliably estimating small P-values.
When living systems detect changes in their external environment their response must be measured to balance the need to react appropriately with the need to remain stable, ignoring insignificant signals. Because this is a fundamental challenge of all biological systems that execute programs in response to stimuli, we developed a generalized time-frequency analysis (TFA) framework to systematically explore the dynamical properties of biomolecular networks. Using TFA, we focused on two well-characterized yeast gene regulatory networks responsive to carbon-source shifts and a mammalian innate immune regulatory network responsive to lipopolysaccharides (LPS). The networks are comprised of two different basic architectures. Dual positive and negative feedback loops make up the yeast galactose network; whereas overlapping positive and negative feed-forward loops are common to the yeast fatty-acid response network and the LPS-induced network of macrophages. TFA revealed remarkably distinct network behaviors in terms of trade-offs in responsiveness and noise suppression that are appropriately tuned to each biological response. The wild type galactose network was found to be highly responsive while the oleate network has greater noise suppression ability. The LPS network appeared more balanced, exhibiting less bias toward noise suppression or responsiveness. Exploration of the network parameter space exposed dramatic differences in system behaviors for each network. These studies highlight fundamental structural and dynamical principles that underlie each network, reveal constrained parameters of positive and negative feedback and feed-forward strengths that tune the networks appropriately for their respective biological roles, and demonstrate the general utility of the TFA approach for systems and synthetic biology.
Histone acetylation (HAc) is associated with open chromatin, and HAc has been shown to facilitate transcription factor (TF) binding in mammalian cells. In the innate immune system context, epigenetic studies strongly implicate HAc in the transcriptional response of activated macrophages. We hypothesized that using data from large-scale sequencing of a HAc chromatin immunoprecipitation assay (ChIP-Seq) would improve the performance of computational prediction of binding locations of TFs mediating the response to a signaling event, namely, macrophage activation.
Tissue heterogeneity, arising from multiple cell types, is a major confounding factor in experiments that focus on studying cell types, e.g. their expression profiles, in isolation. Although sample heterogeneity can be addressed by manual microdissection, prior to conducting experiments, computational treatment on heterogeneous measurements have become a reliable alternative to perform this microdissection in silico. Favoring computation over manual purification has its advantages, such as time consumption, measuring responses of multiple cell types simultaneously, keeping samples intact of external perturbations and unaltered yield of molecular content.
High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires.
Increased chromosomal instability that alters the gene copy numbers throughout the genome is known to have a role in molecular pathogenesis of tumors. The impact of gene dosage effect to the expression levels of genes in GIST and LMS is unknown. In this paper, we used a combination of array comparative genomic hybridization (aCGH) and gene expression data to gain insights into the interplay of structural and functional changes of the genome in GIST and LMSs. We identified specific target genes that change their expression due to the gene dosage effect. Statistical analysis identified four chromosomal regions, 1p, 14q, 15q, and 22q, where both copy number and mRNA expression were significantly different between the tumor types. Multi-dimensional scaling (MDS) analysis showed that the gene expression profiles of these four regions accurately distinguish GIST and LMS. In addition, the gene dosage sensitive genes in these regions are differently involved in several tumor growth promoting pathways, implying that there are different mechanisms underlying the GIST and LMS carcinogenesis. Integration of aCGH and gene expression data has not only provided insights into how DNA copy number variations affect the gene expression patterns in these cancers, but also proves to be a promising method to choose biologically relevant biomarkers.
Peroxisomes are intracellular organelles that house a number of diverse metabolic processes, notably those required for beta-oxidation of fatty acids. Peroxisomes biogenesis can be induced by the presence of peroxisome proliferators, including fatty acids, which activate complex cellular programs that underlie the induction process. Here, we used multi-parameter quantitative phenotype analyses of an arrayed mutant collection of yeast cells induced to proliferate peroxisomes, to establish a comprehensive inventory of genes required for peroxisome induction and function. The assays employed include growth in the presence of fatty acids, and confocal imaging and flow cytometry through the induction process. In addition to the classical phenotypes associated with loss of peroxisomal functions, these studies identified 169 genes required for robust signaling, transcription, normal peroxisomal development and morphologies, and transmission of peroxisomes to daughter cells. These gene products are localized throughout the cell, and many have indirect connections to peroxisome function. By integration with extant data sets, we present a total of 211 genes linked to peroxisome biogenesis and highlight the complex networks through which information flows during peroxisome biogenesis and function.
Traditionally molecular biology research has tended to reduce biological pathways to composite units studied as isolated parts of the cellular system. With the advent of high throughput methodologies that can capture thousands of data points, and powerful computational approaches, the reality of studying cellular processes at a systems level is upon us. As these approaches yield massive datasets, systems level analyses have drawn upon other fields such as engineering and mathematics, adapting computational and statistical approaches to decipher relationships between molecules. Guided by high quality datasets and analyses, one can begin the process of predictive modeling. The findings from such approaches are often surprising and beyond normal intuition. We discuss four classes of dynamical systems used to model genetic regulatory networks. The discussion is divided into continuous and discrete models, as well as deterministic and stochastic model classes. For each combination of these categories, a model is presented and discussed in the context of the yeast cell cycle, illustrating how different types of questions can be addressed by different model classes.
Fluorescence microscopy is the standard tool for detection and analysis of cellular phenomena. This technique, however, has a number of drawbacks such as the limited number of available fluorescent channels in microscopes, overlapping excitation and emission spectra of the stains, and phototoxicity.
Permutation tests have become a standard tool to assess the statistical significance of an event under investigation. The statistical significance, as expressed in a P-value, is calculated as the fraction of permutation values that are at least as extreme as the original statistic, which was derived from non-permuted data. This empirical method directly couples both the minimal obtainable P-value and the resolution of the P-value to the number of permutations. Thereby, it imposes upon itself the need for a very large number of permutations when small P-values are to be accurately estimated. This is computationally expensive and often infeasible.
We present a computational framework for predicting targets of transcription factor regulation. The framework is based on the integration of a number of sources of evidence, derived from DNA-sequence and gene-expression data, using a weighted sum approach. Sources of evidence are prioritized based on a training set, and their relative contributions are then optimized. The performance of the proposed framework is demonstrated in the context of BCL6 target prediction. We show that this framework is able to uncover BCL6 targets reliably when biological prior information is utilized effectively, particularly in the case of sequence analysis. The framework results in a considerable gain in performance over scores in which sequence information was not incorporated. This analysis shows that with assessment of the quality and biological relevance of the data, reliable predictions can be obtained with this computational framework.
Expression of neutrophil gelatinase-associated lipocalin (NGAL)/lipocalin2, a recently recognized iron regulatory protein that binds to matrix metalloproteinase-9 (MMP9), is increased in a spectrum of cancers, including those of the colorectum. Using colon carcinoma cell lines stably transfected with NGAL or antisense NGAL, we showed that NGAL overexpression altered subcellular localization of E-cadherin and catenins, decreased E-cadherin-mediated cell-cell adhesion, enhanced cell-matrix attachment, and increased cell motility and in vitro invasion. Conversely, a decrease in NGAL enhanced more aggregated growth pattern and decreased in vitro invasion. We further showed that NGAL exerted these effects through the alteration of the subcellular localization of Rac1 in an extracellular matrix-dependent, but MMP9-independent, manner. Furthermore, we observed that the NGAL-overexpressing cells tolerated increased iron levels in the culture environment, whereas the NGAL-underexpressing cells showed significant cell death after prolonged incubation in high-iron condition. Thus, overexpressing NGAL in colon carcinomas is an important regulatory molecule that integrates extracellular environment cues, iron metabolism, and intracellular small GTPase signaling in cancer migration and invasion. NGAL may therefore be a new target for therapeutic intervention in colorectal carcinoma.
Within research each experiment is different, the focus changes and the data is generated from a continually evolving barrage of technologies. There is a continual introduction of new techniques whose usage ranges from in-house protocols through to high-throughput instrumentation. To support these requirements data management systems are needed that can be rapidly built and readily adapted for new usage.
The innate immune system is like a double-edged sword: it is absolutely required for host defense against infection, but when uncontrolled, it can trigger a plethora of inflammatory diseases. Here we use systems-biology approaches to predict and confirm the existence of a gene-regulatory network involving dynamic interaction among the transcription factors NF-kappaB, C/EBPdelta and ATF3 that controls inflammatory responses. We mathematically modeled transcriptional regulation of the genes encoding interleukin 6 and C/EBPdelta and experimentally confirmed the prediction that the combination of an initiator (NF-kappaB), an amplifier (C/EBPdelta) and an attenuator (ATF3) forms a regulatory circuit that discriminates between transient and persistent Toll-like receptor 4-induced signals. Our results suggest a mechanism that enables the innate immune system to detect the duration of infection and to respond appropriately.
The process of cellular differentiation is governed by complex dynamical biomolecular networks consisting of a multitude of genes and their products acting in concert to determine a particular cell fate. Thus, a systems level view is necessary for understanding how a cell coordinates this process and for developing effective therapeutic strategies to treat diseases, such as cancer, in which differentiation plays a significant role. Theoretical considerations and recent experimental evidence support the view that cell fates are high dimensional attractor states of the underlying molecular networks. The temporal behavior of the network states progressing toward different cell fate attractors has the potential to elucidate the underlying molecular mechanisms governing differentiation.
Scientific knowledge is grounded in a particular epistemology and, owing to the requirements of that epistemology, possesses limitations. Some limitations are intrinsic, in the sense that they depend inherently on the nature of scientific knowledge; others are contingent, depending on the present state of knowledge, including technology. Understanding limitations facilitates scientific research because one can then recognize when one is confronted by a limitation, as opposed to simply being unable to solve a problem within the existing bounds of possibility. In the hope that the role of limiting factors can be brought more clearly into focus and discussed, we consider several sources of limitation as they apply to biological knowledge: mathematical complexity, experimental constraints, validation, knowledge discovery, and human intellectual capacity.
A key function of BRCA1 and BRCA2 is the participation in dsDNAbreak repair via homologous recombination. BRCA1 and BRCA2 mutations, which occur in most hereditary ovarian cancers (OCs) and approximately 10% of all OC cases, are associated with defects in homologous recombination and genomic instability, a phenotype termed BRCAness. The clinical effects of BRCA1 and BRCA2 mutations have commonly been analyzed together; however, it is becoming increasingly apparent that these mutations do not have the same effects in OC. Recently, three major reports highlighted the unequal clinical characteristics of OCs with BRCA1 and BRCA2 mutations. All studies demonstrated that BRCA2-mutated patients are associated with better survival and therapeutic response than BRCA1-mutated and wild-type patients with serous OC. The differing prognostic effects of the BRCA2 and BRCA1 mutations is likely due to differing roles of BRCA1 and BRCA2 in homologous recombination repair and a stronger association between the BRCA2 mutation and a hypermutator phenotype. These new findings have potentially important implications for clinical management of patients with serous OC.
Genomic studies are now being undertaken on thousands of samples requiring new computational tools that can rapidly analyze data to identify clinically important features. Inferring structural variations in cancer genomes from mate-paired reads is a combinatorially difficult problem. We introduce Fastbreak, a fast and scalable toolkit that enables the analysis and visualization of large amounts of data from projects such as The Cancer Genome Atlas.
Transcription factor-DNA interactions, central to cellular regulation and control, are commonly described by position weight matrices (PWMs). These matrices are frequently used to predict transcription factor binding sites in regulatory regions of DNA to complement and guide further experimental investigation. The DNA sequence preferences of transcription factors, encoded in PWMs, are dictated primarily by select residues within the DNA binding domain(s) that interact directly with DNA. Therefore, the DNA binding properties of homologous transcription factors with identical DNA binding domains may be characterized by PWMs derived from different species. Accordingly, we have implemented a fully automated domain-level homology searching method for identical DNA binding sequences.By applying the domain-level homology search to transcription factors with existing PWMs in the JASPAR and TRANSFAC databases, we were able to significantly increase coverage in terms of the total number of PWMs associated with a given species, assign PWMs to transcription factors that did not previously have any associations, and increase the number of represented species with PWMs over an order of magnitude. Additionally, using protein binding microarray (PBM) data, we have validated the domain-level method by demonstrating that transcription factor pairs with matching DNA binding domains exhibit comparable DNA binding specificity predictions to transcription factor pairs with completely identical sequences.The increased coverage achieved herein demonstrates the potential for more thorough species-associated investigation of protein-DNA interactions using existing resources. The PWM scanning results highlight the challenging nature of transcription factors that contain multiple DNA binding domains, as well as the impact of motif discovery on the ability to predict DNA binding properties. The method is additionally suitable for identifying domain-level homology mappings to enable utilization of additional information sources in the study of transcription factors. The domain-level homology search method, resulting PWM mappings, web-based user interface, and web API are publicly available at http://dodoma.systemsbiology.netdodoma.systemsbiology.net.
Small sample sizes used in previous studies result in a lack of overlap between the reported gene signatures for prediction of chemotherapy response. Although morphologic features, especially tumor nuclear morphology, are important for cancer grading, little research has been reported on quantitatively correlating cellular morphology with chemotherapy response, especially in a large data set. In this study, we have used a large population of patients to identify molecular and morphologic signatures associated with chemotherapy response in serous ovarian carcinoma.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.