The biological world is replete with phenomena that appear to be ideally modeled and analyzed by one archetypal statistical framework - the Graphical Probabilistic Model (GPM). The structure of GPMs is a uniquely good match for biological problems that range from aligning sequences to modeling the genome-to-phenome relationship. The fundamental questions that GPMs address involve making decisions based on a complex web of interacting factors. Unfortunately, while GPMs ideally fit many questions in biology, they are not an easy solution to apply. Building a GPM is not a simple task for an end user. Moreover, applying GPMs is also impeded by the insidious fact that the "complex web of interacting factors" inherent to a problem might be easy to define and also intractable to compute upon.
As datasets increase in complexity, the time required for analysis (both computational and human domain-expert) increases. One of the significant impediments introduced by such burgeoning data is the difficulty in knowing what features to include or exclude from statistical models. Simple tables of summary statistics rarely provide an adequate picture of the patterns and details of the dataset to enable researchers to make well-informed decisions about the adequacy of the models they are constructing. We have developed a tool, StickWRLD, which allows the user to visually browse through their data, displaying all possible correlations. By allowing the user to dynamically modify the retention parameters (both P and the residual, r), StickWRLD allows the user to identify significant correlations and disregard potential correlations that do not meet those same criteria - effectively filtering through all possible correlations quickly and identifying possible relationships of interest for further analysis. In this study, we applied StickWRLD to a semi-synthetic dataset constructed from two published human datasets. In addition to detecting high-probability correlations in this dataset, we were able to quickly identify gene-SNP correlations that would have gone undetected using more traditional approaches due to issues of low penetrance.
Numerous studies have examined gene × environment interactions (G × E) in cognitive and behavioral domains. However, these studies have been limited in that they have not been able to directly assess differential patterns of gene expression in the human brain. Here, we assessed G × E interactions using two publically available datasets to assess if DNA variation is associated with post-mortem brain gene expression changes based on smoking behavior, a biobehavioral construct that is part of a complex system of genetic and environmental influences.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.