Creating spontaneous yet genetically tractable human tumors from normal cells presents a fundamental challenge. Here we combined retroviral and transposon insertional mutagenesis to enable cancer gene discovery starting with human primary cells. We used lentiviruses to seed gain- and loss-of-function gene disruption elements, which were further deployed by Sleeping Beauty transposons throughout the genome of human bone explant mesenchymal cells. De novo tumors generated rapidly in this context were high-grade myxofibrosarcomas. Tumor insertion sites were enriched in recurrent somatic copy-number aberration regions from multiple cancer types and could be used to pinpoint new driver genes that sustain somatic alterations in patients. We identified HDLBP, which encodes the RNA-binding protein vigilin, as a candidate tumor suppressor deleted at 2q37.3 in greater than one out of ten tumors across multiple tissues of origin. Hybrid viral-transposon systems may accelerate the functional annotation of cancer genomes by enabling insertional mutagenesis screens in higher eukaryotes that are not amenable to germline transgenesis.
Deciphering the most common modes by which chromatin regulates transcription, and how this is related to cellular status and processes is an important task for improving our understanding of human cellular biology. The FANTOM5 and ENCODE projects represent two independent large scale efforts to map regulatory and transcriptional features to the human genome. Here we investigate chromatin features around a comprehensive set of transcription start sites in four cell lines by integrating data from these two projects.
Recently developed methods that couple next-generation sequencing with chromosome conformation capture-based techniques, such as Hi-C and ChIA-PET, allow for characterization of genome-wide chromatin 3D structure. Understanding the organization of chromatin in three dimensions is a crucial next step in the unraveling of global gene regulation, and methods for analyzing such data are needed. We have developed HiBrowse, a user-friendly web-tool consisting of a range of hypothesis-based and descriptive statistics, using realistic assumptions in null-models.
Genome-wide association studies (GWASs) have shown that approximately 60 genetic variants influence the risk of developing multiple sclerosis (MS). Our aim was to identify the cell types in which these variants are active. We used available data on MS-associated single nucleotide polymorphisms (SNPs) and deoxyribonuclease I hypersensitive sites (DHSs) from 112 different cell types. Genomic intervals were tested for overlap using the Genomic Hyperbrowser. The expression profile of the genes located nearby MS-associated SNPs was assessed using the software GRAIL (Gene Relationships Across Implicated Loci). Genomic regions associated with MS were significantly enriched for a number of immune DHSs and in particular T helper (Th) 1, Th17, CD8+ cytotoxic T cells, CD19+ B cells and CD56+ natural killer (NK) cells (enrichment = 2.34, 2.19, 2.27, 2.05 and 1.95, respectively; P < 0.0001 for all of them). Similar results were obtained when genomic regions with suggestive association with MS and additional immune-mediated traits were investigated. Several new candidate MS-associated genes located within regions of suggestive association were identified by GRAIL (CARD11, FCRL2, CHST12, SYK, TCF7, SOCS1, NFKBIZ and NPAS1). Genetic data indicate that Th1, Th17, cytotoxic T, B and NK cells play a prominent role in the etiology of MS. Regions with confirmed and suggestive association have a similar immunological profile, indicating that many SNPs truly influencing the risk of MS actually fail to reach genome-wide significance. Finally, similar cell types are involved in the etiology of other immune-mediated diseases.
A wealth of nuclear receptor binding data has been generated by the application of chromatin immunoprecipitation (ChIP) techniques. However, there have been relatively few attempts to apply these datasets to human complex disease or traits.
The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.
The study of chromatin 3D structure has recently gained much focus owing to novel techniques for detecting genome-wide chromatin contacts using next-generation sequencing. A deeper understanding of the architecture of the DNA inside the nucleus is crucial for gaining insight into fundamental processes such as transcriptional regulation, genome dynamics and genome stability. Chromatin conformation capture-based methods, such as Hi-C and ChIA-PET, are now paving the way for routine genome-wide studies of chromatin 3D structure in a range of organisms and tissues. However, appropriate methods for analyzing such data are lacking. Here, we propose a hypothesis test and an enrichment score of 3D co-localization of genomic elements that handles intra- or interchromosomal interactions, both separately and jointly, and that adjusts for biases caused by structural dependencies in the 3D data. We show that maintaining structural properties during resampling is essential to obtain valid estimation of P-values. We apply the method on chromatin states and a set of mutated regions in leukemia cells, and find significant co-localization of these elements, with varying enrichment scores, supporting the role of chromatin 3D structure in shaping the landscape of somatic mutations in cancer.
Vitamin D insufficiency has been implicated in autoimmunity. ChIP-seq experiments using immune cell lines have shown that vitamin D receptor (VDR) binding sites are enriched near regions of the genome associated with autoimmune diseases. We aimed to investigate VDR binding in primary CD4+ cells from healthy volunteers.
In molecular biology, as in many other scientific fields, the scale of analyses is ever increasing. Often, complex Monte Carlo simulation is required, sometimes within a large-scale multiple testing setting. The resulting computational costs may be prohibitively high.
With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.
Transcription factors in disease-relevant pathways represent potential drug targets, by impacting a distinct set of pathways that may be modulated through gene regulation. The influence of transcription factors is typically studied on a per disease basis, and no current resources provide a global overview of the relations between transcription factors and disease. Furthermore, existing pipelines for related large-scale analysis are tailored for particular sources of input data, and there is a need for generic methodology for integrating complementary sources of genomic information.
The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no.
The accurate prediction and characterization of DNA melting domains by computational tools could facilitate a broad range of biological applications. However, no algorithm for melting domain prediction has been available until now. The main challenges include the difficulty of mathematically mapping a qualitative description of DNA melting domains to quantitative statistical mechanics models, as well as the absence of gold standards and a need for generality. In this paper, we introduce a new approach to identify the twostate regions and melting fork regions along a given DNA sequence. Compared with an ad hoc segmentation used in one of our previous studies, the new algorithm is based on boundary probability profiles, rather than standard melting maps. We demonstrate that a more detailed characterization of the DNA melting domain map can be obtained using our new method, and this approach is independent of the choice of DNA melting model. We expect this work to drive our understanding of DNA melting domains one step further.
Recent associations between age-related differentially methylated sites and bivalently marked chromatin domains have implicated a role for these genomic regions in aging and age-related diseases. However, the overlap between such epigenetic modifications has so far only been identified with respect to age-associated hyper-methylated sites in blood. In this study, we observed that age-associated differentially methylated sites characterized in the human brain were also highly enriched in bivalent domains. Analysis of hyper- vs. hypo-methylated sites partitioned by age (fetal, child, and adult) revealed that enrichment was significant for hyper-methylated sites identified in children and adults (child, fold difference = 2.28, P = 0.0016; adult, fold difference = 4.73, P = 4.00 × 10(-5)); this trend was markedly more pronounced in adults when only the top 100 most significantly hypo- and hyper-methylated sites were considered (adult, fold difference = 10.7, P = 2.00 × 10(-5)). Interestingly, we found that bivalently marked genes overlapped by age-associated hyper-methylation in the adult brain had strong involvement in biological functions related to developmental processes, including neuronal differentiation. Our findings provide evidence that the accumulation of methylation in bivalent gene regions with age is likely to be a common process that occurs across tissue types. Furthermore, particularly with respect to the aging brain, this accumulation might be targeted to loci with important roles in cell differentiation and development, and the closing off of these developmental pathways. Further study of these genes is warranted to assess their potential impact upon the development of age-related neurological disorders.
Both genetic and environmental factors contribute to the aetiology of multiple sclerosis (MS). More than 50 genomic regions have been associated with MS susceptibility and vitamin D status also influences the risk of this complex disease. However, how these factors interact in disease causation is unclear. We aimed to investigate the relationship between vitamin D receptor (VDR) binding in lymphoblastoid cell lines (LCLs), chromatin states in LCLs and MS-associated genomic regions. Using the Genomic Hyperbrowser, we found that VDR-binding regions overlapped with active regulatory regions [active promoter (AP) and strong enhancer (SE)] in LCLs more than expected by chance [45.3-fold enrichment for SE (P < 2.0e-05) and 63.41-fold enrichment for AP (P < 2.0e-05)]. Approximately 77% of VDR regions were covered by either AP or SE elements. The overlap between VDR binding and regulatory elements was significantly greater in LCLs than in non-immune cells (P < 2.0e-05). VDR binding also occurred within MS regions more than expected by chance (3.7-fold enrichment, P < 2.0e-05). Furthermore, regions of joint overlap SE-VDR and AP-VDR were even more enriched within MS regions and near to several disease-associated genes. These findings provide relevant insights into how vitamin D influences the immune system and the risk of MS through VDR interactions with the chromatin state inside MS regions. Furthermore, the data provide additional evidence for an important role played by B cells in MS. Further analyses in other immune cell types and functional studies are warranted to fully elucidate the role of vitamin D in the immune system.
More than 50 genomic regions have now been shown to influence the risk of multiple sclerosis (MS). However, the mechanisms of action, and the cell types in which these associated variants act at the molecular level remain largely unknown. This is especially true for associated regions containing no known genes. Given the evidence for a role for B cells in MS, we hypothesized that MS associated genomic regions co-localized with regions which are functionally active in B cells. We used publicly available data on 1) MS associated regions and single nucleotide polymorphisms (SNPs) and 2) chromatin profiling in B cells as well as three additional cell types thought to be unrelated to MS (hepatocytes, fibroblasts and keratinocytes). Genomic intervals and SNPs were tested for overlap using the Genomic Hyperbrowser. We found that MS associated regions are significantly enriched in strong enhancer, active promoter and strong transcribed regions (p?=?0.00005) and that this overlap is significantly higher in B cells than control cells. In addition, MS associated SNPs also land in active promoter (p?=?0.00005) and enhancer regions more than expected by chance (strong enhancer p?=?0.0006; weak enhancer p?=?0.00005). These results confirm the important role of the immune system and specifically B cells in MS and suggest that MS risk variants exert a gene regulatory role. Previous studies assessing MS risk variants in T cells may be missing important effects in B cells. Similar analyses in other immunological cell types relevant to MS and functional studies are necessary to fully elucidate how genes contribute to MS pathogenesis.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.