RNA transcripts are subjected to post-transcriptional gene regulation by interacting with hundreds of RNA-binding proteins (RBPs) and microRNA-containing ribonucleoprotein complexes (miRNPs) that are often expressed in a cell-type dependently. To understand how the interplay of these RNA-binding factors affects the regulation of individual transcripts, high resolution maps of in vivo protein-RNA interactions are necessary1.
A combination of genetic, biochemical and computational approaches are typically applied to identify RNA-RBP or RNA-RNP interactions. Microarray profiling of RNAs associated with immunopurified RBPs (RIP-Chip)2 defines targets at a transcriptome level, but its application is limited to the characterization of kinetically stable interactions and only in rare cases3,4 allows to identify the RBP recognition element (RRE) within the long target RNA. More direct RBP target site information is obtained by combining in vivo UV crosslinking5,6 with immunoprecipitation7-9 followed by the isolation of crosslinked RNA segments and cDNA sequencing (CLIP)10. CLIP was used to identify targets of a number of RBPs11-17. However, CLIP is limited by the low efficiency of UV 254 nm RNA-protein crosslinking, and the location of the crosslink is not readily identifiable within the sequenced crosslinked fragments, making it difficult to separate UV-crosslinked target RNA segments from background non-crosslinked RNA fragments also present in the sample.
We developed a powerful cell-based crosslinking approach to determine at high resolution and transcriptome-wide the binding sites of cellular RBPs and miRNPs that we term PAR-CliP (Photoactivatable-Ribonucleoside-Enhanced Crosslinking and Immunoprecipitation) (see Fig. 1A for an outline of the method). The method relies on the incorporation of photoreactive ribonucleoside analogs, such as 4-thiouridine (4-SU) and 6-thioguanosine (6-SG) into nascent RNA transcripts by living cells. Irradiation of the cells by UV light of 365 nm induces efficient crosslinking of photoreactive nucleoside-labeled cellular RNAs to interacting RBPs. Immunoprecipitation of the RBP of interest is followed by isolation of the crosslinked and coimmunoprecipitated RNA. The isolated RNA is converted into a cDNA library and deep sequenced using Solexa technology. One characteristic feature of cDNA libraries prepared by PAR-CliP is that the precise position of crosslinking can be identified by mutations residing in the sequenced cDNA. When using 4-SU, crosslinked sequences thymidine to cytidine transition, whereas using 6-SG results in guanosine to adenosine mutations. The presence of the mutations in crosslinked sequences makes it possible to separate them from the background of sequences derived from abundant cellular RNAs.
Application of the method to a number of diverse RNA binding proteins was reported in Hafner et al.18
12 Related JoVE Articles!
An Experimental and Bioinformatics Protocol for RNA-seq Analyses of Photoperiodic Diapause in the Asian Tiger Mosquito, Aedes albopictus
Institutions: Georgetown University, The Ohio State University.
Photoperiodic diapause is an important adaptation that allows individuals to escape harsh seasonal environments via a series of physiological changes, most notably developmental arrest and reduced metabolism. Global gene expression profiling via RNA-Seq can provide important insights into the transcriptional mechanisms of photoperiodic diapause. The Asian tiger mosquito, Aedes albopictus
, is an outstanding organism for studying the transcriptional bases of diapause due to its ease of rearing, easily induced diapause, and the genomic resources available. This manuscript presents a general experimental workflow for identifying diapause-induced transcriptional differences in A. albopictus.
Rearing techniques, conditions necessary to induce diapause and non-diapause development, methods to estimate percent diapause in a population, and RNA extraction and integrity assessment for mosquitoes are documented. A workflow to process RNA-Seq data from Illumina sequencers culminates in a list of differentially expressed genes. The representative results demonstrate that this protocol can be used to effectively identify genes differentially regulated at the transcriptional level in A. albopictus
due to photoperiodic differences. With modest adjustments, this workflow can be readily adapted to study the transcriptional bases of diapause or other important life history traits in other mosquitoes.
Genetics, Issue 93, Aedes albopictus Asian tiger mosquito, photoperiodic diapause, RNA-Seq de novo transcriptome assembly, mosquito husbandry
RNA-Seq Analysis of Differential Gene Expression in Electroporated Chick Embryonic Spinal Cord
Institutions: Universidade de São Paulo.
electroporation of the chick neural tube is a fast and inexpensive method for identification of gene function during neural development. Genome wide analysis of differentially expressed transcripts after such an experimental manipulation has the potential to uncover an almost complete picture of the downstream effects caused by the transfected construct. This work describes a simple method for comparing transcriptomes from samples of transfected embryonic spinal cords comprising all steps between electroporation and identification of differentially expressed transcripts. The first stage consists of guidelines for electroporation and instructions for dissection of transfected spinal cord halves from HH23 embryos in ribonuclease-free environment and extraction of high-quality RNA samples suitable for transcriptome sequencing. The next stage is that of bioinformatic analysis with general guidelines for filtering and comparison of RNA-Seq datasets in the Galaxy public server, which eliminates the need of a local computational structure for small to medium scale experiments. The representative results show that the dissection methods generate high quality RNA samples and that the transcriptomes obtained from two control samples are essentially the same, an important requirement for detection of differential expression genes in experimental samples. Furthermore, one example is provided where experimental overexpression of a DNA construct can be visually verified after comparison with control samples. The application of this method may be a powerful tool to facilitate new discoveries on the function of neural factors involved in spinal cord early development.
Developmental Biology, Issue 93, chicken embryo, in ovo electroporation, spinal cord, RNA-Seq, transcriptome profiling, Galaxy workflow
Detecting Somatic Genetic Alterations in Tumor Specimens by Exon Capture and Massively Parallel Sequencing
Institutions: Memorial Sloan-Kettering Cancer Center, Memorial Sloan-Kettering Cancer Center.
Efforts to detect and investigate key oncogenic mutations have proven valuable to facilitate the appropriate treatment for cancer patients. The establishment of high-throughput, massively parallel "next-generation" sequencing has aided the discovery of many such mutations. To enhance the clinical and translational utility of this technology, platforms must be high-throughput, cost-effective, and compatible with formalin-fixed paraffin embedded (FFPE) tissue samples that may yield small amounts of degraded or damaged DNA. Here, we describe the preparation of barcoded and multiplexed DNA libraries followed by hybridization-based capture of targeted exons for the detection of cancer-associated mutations in fresh frozen and FFPE tumors by massively parallel sequencing. This method enables the identification of sequence mutations, copy number alterations, and select structural rearrangements involving all targeted genes. Targeted exon sequencing offers the benefits of high throughput, low cost, and deep sequence coverage, thus conferring high sensitivity for detecting low frequency mutations.
Molecular Biology, Issue 80, Molecular Diagnostic Techniques, High-Throughput Nucleotide Sequencing, Genetics, Neoplasms, Diagnosis, Massively parallel sequencing, targeted exon sequencing, hybridization capture, cancer, FFPE, DNA mutations
Depletion of Ribosomal RNA for Mosquito Gut Metagenomic RNA-seq
Institutions: New Mexico State University.
The mosquito gut accommodates dynamic microbial communities across different stages of the insect's life cycle. Characterization of the genetic capacity and functionality of the gut community will provide insight into the effects of gut microbiota on mosquito life traits. Metagenomic RNA-Seq has become an important tool to analyze transcriptomes from various microbes present in a microbial community. Messenger RNA usually comprises only 1-3% of total RNA, while rRNA constitutes approximately 90%. It is challenging to enrich messenger RNA from a metagenomic microbial RNA sample because most prokaryotic mRNA species lack stable poly(A) tails. This prevents oligo d(T) mediated mRNA isolation. Here, we describe a protocol that employs sample derived rRNA capture probes to remove rRNA from a metagenomic total RNA sample. To begin, both mosquito and microbial small and large subunit rRNA fragments are amplified from a metagenomic community DNA sample. Then, the community specific biotinylated antisense ribosomal RNA probes are synthesized in vitro
using T7 RNA polymerase. The biotinylated rRNA probes are hybridized to the total RNA. The hybrids are captured by streptavidin-coated beads and removed from the total RNA. This subtraction-based protocol efficiently removes both mosquito and microbial rRNA from the total RNA sample. The mRNA enriched sample is further processed for RNA amplification and RNA-Seq.
Genetics, Issue 74, Infection, Infectious Diseases, Molecular Biology, Cellular Biology, Microbiology, Genomics, biology (general), genetics (animal and plant), life sciences, Eukaryota, Bacteria, metagenomics, metatranscriptome, RNA-seq, rRNA depletion, mRNA enrichment, mosquito gut microbiome, RNA, DNA, sequencing
RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells
Institutions: Children's Mercy Hospital and Clinics, School of Medicine, University of Missouri-Kansas City.
The characterization of gene expression in cells via measurement of mRNA levels is a useful tool in determining how the transcriptional machinery of the cell is affected by external signals (e.g.
drug treatment), or how cells differ between a healthy state and a diseased state. With the advent and continuous refinement of next-generation DNA sequencing technology, RNA-sequencing (RNA-seq) has become an increasingly popular method of transcriptome analysis to catalog all species of transcripts, to determine the transcriptional structure of all expressed genes and to quantify the changing expression levels of the total set of transcripts in a given cell, tissue or organism1,2
. RNA-seq is gradually replacing DNA microarrays as a preferred method for transcriptome analysis because it has the advantages of profiling a complete transcriptome, providing a digital type datum (copy number of any transcript) and not relying on any known genomic sequence3
Here, we present a complete and detailed protocol to apply RNA-seq to profile transcriptomes in human pulmonary microvascular endothelial cells with or without thrombin treatment. This protocol is based on our recent published study entitled "RNA-seq Reveals Novel Transcriptome of Genes and Their Isoforms in Human Pulmonary Microvascular Endothelial Cells Treated with Thrombin,"4
in which we successfully performed the first complete transcriptome analysis of human pulmonary microvascular endothelial cells treated with thrombin using RNA-seq. It yielded unprecedented resources for further experimentation to gain insights into molecular mechanisms underlying thrombin-mediated endothelial dysfunction in the pathogenesis of inflammatory conditions, cancer, diabetes, and coronary heart disease, and provides potential new leads for therapeutic targets to those diseases.
The descriptive text of this protocol is divided into four parts. The first part describes the treatment of human pulmonary microvascular endothelial cells with thrombin and RNA isolation, quality analysis and quantification. The second part describes library construction and sequencing. The third part describes the data analysis. The fourth part describes an RT-PCR validation assay. Representative results of several key steps are displayed. Useful tips or precautions to boost success in key steps are provided in the Discussion section. Although this protocol uses human pulmonary microvascular endothelial cells treated with thrombin, it can be generalized to profile transcriptomes in both mammalian and non-mammalian cells and in tissues treated with different stimuli or inhibitors, or to compare transcriptomes in cells or tissues between a healthy state and a disease state.
Genetics, Issue 72, Molecular Biology, Immunology, Medicine, Genomics, Proteins, RNA-seq, Next Generation DNA Sequencing, Transcriptome, Transcription, Thrombin, Endothelial cells, high-throughput, DNA, genomic DNA, RT-PCR, PCR
Single Read and Paired End mRNA-Seq Illumina Libraries from 10 Nanograms Total RNA
Institutions: Morgridge Institute for Research, University of Wisconsin, University of California.
Whole transcriptome sequencing by mRNA-Seq is now used extensively to perform global gene expression, mutation, allele-specific expression and other genome-wide analyses. mRNA-Seq even opens the gate for gene expression analysis of non-sequenced genomes. mRNA-Seq offers high sensitivity, a large dynamic range and allows measurement of transcript copy numbers in a sample. Illumina’s genome analyzer performs sequencing of a large number (> 107
) of relatively short sequence reads (< 150 bp).The "paired end" approach, wherein a single long read is sequenced at both its ends, allows for tracking alternate splice junctions, insertions and deletions, and is useful for de novo
One of the major challenges faced by researchers is a limited amount of starting material. For example, in experiments where cells are harvested by laser micro-dissection, available starting total RNA may measure in nanograms. Preparation of mRNA-Seq libraries from such samples have been described1, 2
but involves significant PCR amplification that may introduce bias. Other RNA-Seq library construction procedures with minimal PCR amplification have been published3, 4
but require microgram amounts of starting total RNA.
Here we describe a protocol for the Illumina Genome Analyzer II platform for mRNA-Seq sequencing for library preparation that avoids significant PCR amplification and requires only 10 nanograms of total RNA. While this protocol has been described previously and validated for single-end sequencing5
, where it was shown to produce directional libraries without introducing significant amplification bias, here we validate it further for use as a paired end protocol. We selectively amplify polyadenylated messenger RNAs from starting total RNA using the T7 based Eberwine linear amplification method, coined "T7LA" (T7 linear amplification). The amplified poly-A mRNAs are fragmented, reverse transcribed and adapter ligated to produce the final sequencing library. For both single read and paired end runs, sequences are mapped to the human transcriptome6
and normalized so that data from multiple runs can be compared. We report the gene expression measurement in units of transcripts per million (TPM), which is a superior measure to RPKM when comparing samples7
Molecular Biology, Issue 56, Genetics, mRNA-Seq, Illumina-Seq, gene expression profiling, high throughput sequencing
Identification of Key Factors Regulating Self-renewal and Differentiation in EML Hematopoietic Precursor Cells by RNA-sequencing Analysis
Institutions: The University of Texas Graduate School of Biomedical Sciences at Houston.
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro
. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study.
RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment.
In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro
and in vivo
Genetics, Issue 93, EML Cells, Self-renewal, Differentiation, Hematopoietic precursor cell, RNA-Sequencing, Data analysis
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
Institutions: Stony Brook University, Cold Spring Harbor Laboratory, University of Texas at Dallas.
ChIPseq is a widely used technique for investigating protein-DNA interactions. Read density profiles are generated by using next-sequencing of protein-bound DNA and aligning the short reads to a reference genome. Enriched regions are revealed as peaks, which often differ dramatically in shape, depending on the target protein1
. For example, transcription factors often bind in a site- and sequence-specific manner and tend to produce punctate peaks, while histone modifications are more pervasive and are characterized by broad, diffuse islands of enrichment2
. Reliably identifying these regions was the focus of our work.
Algorithms for analyzing ChIPseq data have employed various methodologies, from heuristics3-5
to more rigorous statistical models, e.g.
Hidden Markov Models (HMMs)6-8
. We sought a solution that minimized the necessity for difficult-to-define, ad hoc parameters that often compromise resolution and lessen the intuitive usability of the tool. With respect to HMM-based methods, we aimed to curtail parameter estimation procedures and simple, finite state classifications that are often utilized.
Additionally, conventional ChIPseq data analysis involves categorization of the expected read density profiles as either punctate or diffuse followed by subsequent application of the appropriate tool. We further aimed to replace the need for these two distinct models with a single, more versatile model, which can capably address the entire spectrum of data types.
To meet these objectives, we first constructed a statistical framework that naturally modeled ChIPseq data structures using a cutting edge advance in HMMs9
, which utilizes only explicit formulas-an innovation crucial to its performance advantages. More sophisticated then heuristic models, our HMM accommodates infinite hidden states through a Bayesian model. We applied it to identifying reasonable change points in read density, which further define segments of enrichment. Our analysis revealed how our Bayesian Change Point (BCP) algorithm had a reduced computational complexity-evidenced by an abridged run time and memory footprint. The BCP algorithm was successfully applied to both punctate peak and diffuse island identification with robust accuracy and limited user-defined parameters. This illustrated both its versatility and ease of use. Consequently, we believe it can be implemented readily across broad ranges of data types and end users in a manner that is easily compared and contrasted, making it a great tool for ChIPseq data analysis that can aid in collaboration and corroboration between research groups. Here, we demonstrate the application of BCP to existing transcription factor10,11
and epigenetic data12
to illustrate its usefulness.
Genetics, Issue 70, Bioinformatics, Genomics, Molecular Biology, Cellular Biology, Immunology, Chromatin immunoprecipitation, ChIP-Seq, histone modifications, segmentation, Bayesian, Hidden Markov Models, epigenetics
Metabolic Labeling of Newly Transcribed RNA for High Resolution Gene Expression Profiling of RNA Synthesis, Processing and Decay in Cell Culture
Institutions: Max von Pettenkofer Institute, University of Cambridge, Ludwig-Maximilians-University Munich.
The development of whole-transcriptome microarrays and next-generation sequencing has revolutionized our understanding of the complexity of cellular gene expression. Along with a better understanding of the involved molecular mechanisms, precise measurements of the underlying kinetics have become increasingly important. Here, these powerful methodologies face major limitations due to intrinsic properties of the template samples they study, i.e.
total cellular RNA. In many cases changes in total cellular RNA occur either too slowly or too quickly to represent the underlying molecular events and their kinetics with sufficient resolution. In addition, the contribution of alterations in RNA synthesis, processing, and decay are not readily differentiated.
We recently developed high-resolution gene expression profiling to overcome these limitations. Our approach is based on metabolic labeling of newly transcribed RNA with 4-thiouridine (thus also referred to as 4sU-tagging) followed by rigorous purification of newly transcribed RNA using thiol-specific biotinylation and streptavidin-coated magnetic beads. It is applicable to a broad range of organisms including vertebrates, Drosophila
, and yeast. We successfully applied 4sU-tagging to study real-time kinetics of transcription factor activities, provide precise measurements of RNA half-lives, and obtain novel insights into the kinetics of RNA processing. Finally, computational modeling can be employed to generate an integrated, comprehensive analysis of the underlying molecular mechanisms.
Genetics, Issue 78, Cellular Biology, Molecular Biology, Microbiology, Biochemistry, Eukaryota, Investigative Techniques, Biological Phenomena, Gene expression profiling, RNA synthesis, RNA processing, RNA decay, 4-thiouridine, 4sU-tagging, microarray analysis, RNA-seq, RNA, DNA, PCR, sequencing
Chromatin Interaction Analysis with Paired-End Tag Sequencing (ChIA-PET) for Mapping Chromatin Interactions and Understanding Transcription Regulation
Institutions: Agency for Science, Technology and Research, Singapore, A*STAR-Duke-NUS Neuroscience Research Partnership, Singapore, National University of Singapore, Singapore.
Genomes are organized into three-dimensional structures, adopting higher-order conformations inside the micron-sized nuclear spaces 7, 2, 12
. Such architectures are not random and involve interactions between gene promoters and regulatory elements 13
. The binding of transcription factors to specific regulatory sequences brings about a network of transcription regulation and coordination 1, 14
Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) was developed to identify these higher-order chromatin structures 5,6
. Cells are fixed and interacting loci are captured by covalent DNA-protein cross-links. To minimize non-specific noise and reduce complexity, as well as to increase the specificity of the chromatin interaction analysis, chromatin immunoprecipitation (ChIP) is used against specific protein factors to enrich chromatin fragments of interest before proximity ligation. Ligation involving half-linkers subsequently forms covalent links between pairs of DNA fragments tethered together within individual chromatin complexes. The flanking MmeI restriction enzyme sites in the half-linkers allow extraction of paired end tag-linker-tag constructs (PETs) upon MmeI digestion. As the half-linkers are biotinylated, these PET constructs are purified using streptavidin-magnetic beads. The purified PETs are ligated with next-generation sequencing adaptors and a catalog of interacting fragments is generated via next-generation sequencers such as the Illumina Genome Analyzer. Mapping and bioinformatics analysis is then performed to identify ChIP-enriched binding sites and ChIP-enriched chromatin interactions 8
We have produced a video to demonstrate critical aspects of the ChIA-PET protocol, especially the preparation of ChIP as the quality of ChIP plays a major role in the outcome of a ChIA-PET library. As the protocols are very long, only the critical steps are shown in the video.
Genetics, Issue 62, ChIP, ChIA-PET, Chromatin Interactions, Genomics, Next-Generation Sequencing
Chromatin Immunoprecipitation (ChIP) using Drosophila tissue
Institutions: Johns Hopkins University.
Epigenetics remains a rapidly developing field that studies how the chromatin state contributes to differential gene expression in distinct cell types at different developmental stages. Epigenetic regulation contributes to a broad spectrum of biological processes, including cellular differentiation during embryonic development and homeostasis in adulthood. A critical strategy in epigenetic studies is to examine how various histone modifications and chromatin factors regulate gene expression. To address this, Chromatin Immunoprecipitation (ChIP) is used widely to obtain a snapshot of the association of particular factors with DNA in the cells of interest.
ChIP technique commonly uses cultured cells as starting material, which can be obtained in abundance and homogeneity to generate reproducible data. However, there are several caveats: First, the environment to grow cells in Petri dish is different from that in vivo
, thus may not reflect the endogenous chromatin state of cells in a living organism. Second, not all types of cells can be cultured ex vivo
. There are only a limited number of cell lines, from which people can obtain enough material for ChIP assay.
Here we describe a method to do ChIP experiment using Drosophila
tissues. The starting material is dissected tissue from a living animal, thus can accurately reflect the endogenous chromatin state. The adaptability of this method with many different types of tissue will allow researchers to address a lot more biologically relevant questions regarding epigenetic regulation in vivo1, 2
. Combining this method with high-throughput sequencing (ChIP-seq) will further allow researchers to obtain an epigenomic landscape.
Genetics, Issue 61, ChIP, Drosophila, testes, q-PCR, high throughput sequencing, epi-genetics
Highly Efficient Ligation of Small RNA Molecules for MicroRNA Quantitation by High-Throughput Sequencing
Institutions: University of Colorado, Boulder, University of Colorado, Denver.
MiRNA cloning and high-throughput sequencing, termed miR-Seq, stands alone as a transcriptome-wide approach to quantify miRNAs with single nucleotide resolution. This technique captures miRNAs by attaching 3’ and 5’ oligonucleotide adapters to miRNA molecules and allows de novo
miRNA discovery. Coupling with powerful next-generation sequencing platforms, miR-Seq has been instrumental in the study of miRNA biology. However, significant biases introduced by oligonucleotide ligation steps have prevented miR-Seq from being employed as an accurate quantitation tool. Previous studies demonstrate that biases in current miR-Seq methods often lead to inaccurate miRNA quantification with errors up to 1,000-fold for some miRNAs1,2
. To resolve these biases imparted by RNA ligation, we have developed a small RNA ligation method that results in ligation efficiencies of over 95% for both 3’ and 5′ ligation steps. Benchmarking this improved library construction method using equimolar or differentially mixed synthetic miRNAs, consistently yields reads numbers with less than two-fold deviation from the expected value. Furthermore, this high-efficiency miR-Seq method permits accurate genome-wide miRNA profiling from in vivo
total RNA samples2
Molecular Biology, Issue 93, RNA, ligation, miRNA, miR-Seq, linker, oligonucleotide, high-throughput sequencing