Identification of three-dimensional (3D) interactions between regulatory elements across the genome is crucial to unravel the complex regulatory machinery that orchestrates proliferation and differentiation of cells. ChIA-PET is a novel method to identify such interactions, where physical contacts between regions bound by a specific protein are quantified using next-generation sequencing. However, determining the significance of the observed interaction frequencies in such datasets is challenging, and few methods have been proposed. Despite the fact that regions that are close in linear genomic distance have a much higher tendency to interact by chance, no methods to date are capable of taking such dependency into account. Here, we propose a statistical model taking into account the genomic distance relationship, as well as the general propensity of anchors to be involved in contacts overall. Using both real and simulated data, we show that the previously proposed statistical test, based on Fisher's exact test, leads to invalid results when data are dependent on genomic distance. We also evaluate our method on previously validated cell-line specific and constitutive 3D interactions, and show that relevant interactions are significant, while avoiding over-estimating the significance of short nearby interactions.
Genome-wide association studies (GWASs) have shown that approximately 60 genetic variants influence the risk of developing multiple sclerosis (MS). Our aim was to identify the cell types in which these variants are active. We used available data on MS-associated single nucleotide polymorphisms (SNPs) and deoxyribonuclease I hypersensitive sites (DHSs) from 112 different cell types. Genomic intervals were tested for overlap using the Genomic Hyperbrowser. The expression profile of the genes located nearby MS-associated SNPs was assessed using the software GRAIL (Gene Relationships Across Implicated Loci). Genomic regions associated with MS were significantly enriched for a number of immune DHSs and in particular T helper (Th) 1, Th17, CD8+ cytotoxic T cells, CD19+ B cells and CD56+ natural killer (NK) cells (enrichment = 2.34, 2.19, 2.27, 2.05 and 1.95, respectively; P < 0.0001 for all of them). Similar results were obtained when genomic regions with suggestive association with MS and additional immune-mediated traits were investigated. Several new candidate MS-associated genes located within regions of suggestive association were identified by GRAIL (CARD11, FCRL2, CHST12, SYK, TCF7, SOCS1, NFKBIZ and NPAS1). Genetic data indicate that Th1, Th17, cytotoxic T, B and NK cells play a prominent role in the etiology of MS. Regions with confirmed and suggestive association have a similar immunological profile, indicating that many SNPs truly influencing the risk of MS actually fail to reach genome-wide significance. Finally, similar cell types are involved in the etiology of other immune-mediated diseases.
The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.
The study of chromatin 3D structure has recently gained much focus owing to novel techniques for detecting genome-wide chromatin contacts using next-generation sequencing. A deeper understanding of the architecture of the DNA inside the nucleus is crucial for gaining insight into fundamental processes such as transcriptional regulation, genome dynamics and genome stability. Chromatin conformation capture-based methods, such as Hi-C and ChIA-PET, are now paving the way for routine genome-wide studies of chromatin 3D structure in a range of organisms and tissues. However, appropriate methods for analyzing such data are lacking. Here, we propose a hypothesis test and an enrichment score of 3D co-localization of genomic elements that handles intra- or interchromosomal interactions, both separately and jointly, and that adjusts for biases caused by structural dependencies in the 3D data. We show that maintaining structural properties during resampling is essential to obtain valid estimation of P-values. We apply the method on chromatin states and a set of mutated regions in leukemia cells, and find significant co-localization of these elements, with varying enrichment scores, supporting the role of chromatin 3D structure in shaping the landscape of somatic mutations in cancer.
Transcription factors in disease-relevant pathways represent potential drug targets, by impacting a distinct set of pathways that may be modulated through gene regulation. The influence of transcription factors is typically studied on a per disease basis, and no current resources provide a global overview of the relations between transcription factors and disease. Furthermore, existing pipelines for related large-scale analysis are tailored for particular sources of input data, and there is a need for generic methodology for integrating complementary sources of genomic information.
The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no.
In active run-in trials, where patients may be excluded after a run-in period based on their response to the treatment, it is implicitly assumed that patients have individual treatment effects. If individual patient data are available, active run-in trials can be modelled using patient-specific random effects. With more than one trial on the same medication available, one can obtain a more precise overall treatment effect estimate.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.