Parkinsons disease (PD) is a major neurodegenerative chronic disease, most likely caused by a complex interplay of genetic and environmental factors. Information on various aspects of PD pathogenesis is rapidly increasing and needs to be efficiently organized, so that the resulting data is available for exploration and analysis. Here we introduce a computationally tractable, comprehensive molecular interaction map of PD. This map integrates pathways implicated in PD pathogenesis such as synaptic and mitochondrial dysfunction, impaired protein degradation, alpha-synuclein pathobiology and neuroinflammation. We also present bioinformatics tools for the analysis, enrichment and annotation of the map, allowing the research community to open new avenues in PD research. The PD map is accessible at http://minerva.uni.lu/pd_map .
Natural microbial communities are ubiquitous, complex, heterogeneous, and dynamic. Here, we argue that the future standard for their study will require systematic omic measurements of spatially and temporally resolved unique samples in line with a discovery-driven planning approach. Resulting datasets will allow the generation of solid hypotheses about causal relationships and, thereby, will facilitate the discovery of previously unknown traits of specific microbial community members. However, to achieve this, solid wet lab, bioinformatic and statistical methodologies are required to have the promises of the emerging field of Eco-Systems Biology come to fruition.
Finding significant differences between the expression levels of genes or proteins across diverse biological conditions is one of the primary goals in the analysis of functional genomics data. However, existing methods for identifying differentially expressed genes or sets of genes by comparing measures of the average expression across predefined sample groups do not detect differential variance in the expression levels across genes in cellular pathways. Since corresponding pathway deregulations occur frequently in microarray gene or protein expression data, we present a new dedicated web application, PathVar, to analyze these data sources. The software ranks pathway-representing gene/protein sets in terms of the differences of the variance in the within-pathway expression levels across different biological conditions. Apart from identifying new pathway deregulation patterns, the tool exploits these patterns by combining different machine learning methods to find clusters of similar samples and build sample classification models.
The meta-analysis of large-scale postgenomics data sets within public databases promises to provide important novel biological knowledge. Statistical approaches including correlation analyses in coexpression studies of gene expression have emerged as tools to elucidate gene function using these data sets. Here, we present a powerful and novel alternative methodology to computationally identify functional relationships between genes from microarray data sets using rule-based machine learning. This approach, termed "coprediction," is based on the collective ability of groups of genes co-occurring within rules to accurately predict the developmental outcome of a biological system. We demonstrate the utility of coprediction as a powerful analytical tool using publicly available microarray data generated exclusively from Arabidopsis thaliana seeds to compute a functional gene interaction network, termed Seed Co-Prediction Network (SCoPNet). SCoPNet predicts functional associations between genes acting in the same developmental and signal transduction pathways irrespective of the similarity in their respective gene expression patterns. Using SCoPNet, we identified four novel regulators of seed germination (ALTERED SEED GERMINATION5, 6, 7, and 8), and predicted interactions at the level of transcript abundance between these novel and previously described factors influencing Arabidopsis seed germination. An online Web tool to query SCoPNet has been developed as a community resource to dissect seed biology and is available at http://www.vseed.nottingham.ac.uk/.
Seed germination is a complex trait of key ecological and agronomic significance. Few genetic factors regulating germination have been identified, and the means by which their concerted action controls this developmental process remains largely unknown. Using publicly available gene expression data from Arabidopsis thaliana, we generated a condition-dependent network model of global transcriptional interactions (SeedNet) that shows evidence of evolutionary conservation in flowering plants. The topology of the SeedNet graph reflects the biological process, including two state-dependent sets of interactions associated with dormancy or germination. SeedNet highlights interactions between known regulators of this process and predicts the germination-associated function of uncharacterized hub nodes connected to them with 50% accuracy. An intermediate transition region between the dormancy and germination subdomains is enriched with genes involved in cellular phase transitions. The phase transition regulators SERRATE and EARLY FLOWERING IN SHORT DAYS from this region affect seed germination, indicating that conserved mechanisms control transitions in cell identity in plants. The SeedNet dormancy region is strongly associated with vegetative abiotic stress response genes. These data suggest that seed dormancy, an adaptive trait that arose evolutionarily late, evolved by coopting existing genetic pathways regulating cellular phase transition and abiotic stress. SeedNet is available as a community resource (http://vseed.nottingham.ac.uk) to aid dissection of this complex trait and gene function in diverse processes.
Cellular processes and pathways, whose deregulation may contribute to the development of cancers, are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, recent functional genomic experiments have identified thousands of interactions for the signalling canonical proteins, challenging the traditional view of pathways as independent functional entities. Combining information from pathway databases and interaction networks obtained from functional genomic experiments is therefore a promising strategy to obtain more robust pathway and process representations, facilitating the study of cancer-related pathways.
Global gene expression profiling studies have classified breast cancer into a number of distinct biological and molecular classes with clinical relevance. The heterogeneous luminal group, which is largely characterised by oestrogen receptor (ER) expression, appears to contain distinct subgroups with differing behaviour. In this study, we analysed 47,293 gene transcripts in 128 invasive breast carcinomas (BC) using Artificial Neural Networks and a cross-validation analysis in combination with an ensemble sample classification to identify genes that can be used to subclassify ER+ luminal tumours. The results were validated using immunohistochemistry on TMAs containing 1,140 invasive breast cancers. Our results showed that the RERG gene is one of the highest ranked genes to differentiate between ER+ luminal-like and ER- non-luminal cancers based on a 10-fold external cross-validation analysis with an average classification accuracy of 89%. This was confirmed in our protein expression studies that showed RERG positive associations with markers of luminal differentiation including ER, luminal cytokeratins (CK19, CK18 and CK7/8) and FOXA1 (P = 0.004) and other markers of good prognosis in BC including small size, lower histologic grade and positive expression of androgen receptor, nuclear BRCA1, FHIT and cell cycle inhibitors p27 and p21. RERG expression was inversely associated with the proliferation marker MIB1 (P = 0.005) and p53. Strong RERG expression showed an association with longer breast cancer specific survival and distant metastasis free interval in the whole series as well as in the ER+ luminal group and these associations were independent of other prognostic variables. In conclusion, we used novel bioinformatics methods to identify candidate genes to characterise ER+ luminal-like breast cancer. RERG gene is a key marker of the luminal BC class and can be used to separate distinct prognostic subgroups.
TopoGSA (Topology-based Gene Set Analysis) is a web-application dedicated to the computation and visualization of network topological properties for gene and protein sets in molecular interaction networks. Different topological characteristics, such as the centrality of nodes in the network or their tendency to form clusters, can be computed and compared with those of known cellular pathways and processes.
Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks.
Assessing functional associations between an experimentally derived gene or protein set of interest and a database of known gene/protein sets is a common task in the analysis of large-scale functional genomics data. For this purpose, a frequently used approach is to apply an over-representation-based enrichment analysis. However, this approach has four drawbacks: (i) it can only score functional associations of overlapping gene/proteins sets; (ii) it disregards genes with missing annotations; (iii) it does not take into account the network structure of physical interactions between the gene/protein sets of interest and (iv) tissue-specific gene/protein set associations cannot be recognized.
Microarray data analysis has been shown to provide an effective tool for studying cancer and genetic diseases. Although classical machine learning techniques have successfully been applied to find informative genes and to predict class labels for new samples, common restrictions of microarray analysis such as small sample sizes, a large attribute space and high noise levels still limit its scientific and clinical applications. Increasing the interpretability of prediction models while retaining a high accuracy would help to exploit the information content in microarray data more effectively. For this purpose, we evaluate our rule-based evolutionary machine learning systems, BioHEL and GAssist, on three public microarray cancer datasets, obtaining simple rule-based models for sample classification. A comparison with other benchmark microarray sample classifiers based on three diverse feature selection algorithms suggests that these evolutionary learning techniques can compete with state-of-the-art methods like support vector machines. The obtained models reach accuracies above 90% in two-level external cross-validation, with the added value of facilitating interpretation by using only combinations of simple if-then-else rules. As a further benefit, a literature mining analysis reveals that prioritizations of informative genes extracted from BioHELs classification rule sets can outperform gene rankings obtained from a conventional ensemble feature selection in terms of the pointwise mutual information between relevant disease terms and the standardized names of top-ranked genes.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.