Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study.
RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment.
In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo.
18 Related JoVE Articles!
Massively Parallel Reporter Assays in Cultured Mammalian Cells
Institutions: Broad Institute.
The genetic reporter assay is a well-established and powerful tool for dissecting the relationship between DNA sequences and their gene regulatory activities. The potential throughput of this assay has, however, been limited by the need to individually clone and assay the activity of each sequence on interest using protein fluorescence or enzymatic activity as a proxy for regulatory activity. Advances in high-throughput DNA synthesis and sequencing technologies have recently made it possible to overcome these limitations by multiplexing the construction and interrogation of large libraries of reporter constructs. This protocol describes implementation of a Massively Parallel Reporter Assay (MPRA) that allows direct comparison of hundreds of thousands of putative regulatory sequences in a single cell culture dish.
Genetics, Issue 90, gene regulation, transcriptional regulation, sequence-activity mapping, reporter assay, library cloning, transfection, tag sequencing, mammalian cells
Depletion of Ribosomal RNA for Mosquito Gut Metagenomic RNA-seq
Institutions: New Mexico State University.
The mosquito gut accommodates dynamic microbial communities across different stages of the insect's life cycle. Characterization of the genetic capacity and functionality of the gut community will provide insight into the effects of gut microbiota on mosquito life traits. Metagenomic RNA-Seq has become an important tool to analyze transcriptomes from various microbes present in a microbial community. Messenger RNA usually comprises only 1-3% of total RNA, while rRNA constitutes approximately 90%. It is challenging to enrich messenger RNA from a metagenomic microbial RNA sample because most prokaryotic mRNA species lack stable poly(A) tails. This prevents oligo d(T) mediated mRNA isolation. Here, we describe a protocol that employs sample derived rRNA capture probes to remove rRNA from a metagenomic total RNA sample. To begin, both mosquito and microbial small and large subunit rRNA fragments are amplified from a metagenomic community DNA sample. Then, the community specific biotinylated antisense ribosomal RNA probes are synthesized in vitro
using T7 RNA polymerase. The biotinylated rRNA probes are hybridized to the total RNA. The hybrids are captured by streptavidin-coated beads and removed from the total RNA. This subtraction-based protocol efficiently removes both mosquito and microbial rRNA from the total RNA sample. The mRNA enriched sample is further processed for RNA amplification and RNA-Seq.
Genetics, Issue 74, Infection, Infectious Diseases, Molecular Biology, Cellular Biology, Microbiology, Genomics, biology (general), genetics (animal and plant), life sciences, Eukaryota, Bacteria, metagenomics, metatranscriptome, RNA-seq, rRNA depletion, mRNA enrichment, mosquito gut microbiome, RNA, DNA, sequencing
Purification of Transcripts and Metabolites from Drosophila Heads
Institutions: University of Florida , University of Florida , University of Florida , University of Florida .
For the last decade, we have tried to understand the molecular and cellular mechanisms of neuronal degeneration using Drosophila
as a model organism. Although fruit flies provide obvious experimental advantages, research on neurodegenerative diseases has mostly relied on traditional techniques, including genetic interaction, histology, immunofluorescence, and protein biochemistry. These techniques are effective for mechanistic, hypothesis-driven studies, which lead to a detailed understanding of the role of single genes in well-defined biological problems. However, neurodegenerative diseases are highly complex and affect multiple cellular organelles and processes over time. The advent of new technologies and the omics age provides a unique opportunity to understand the global cellular perturbations underlying complex diseases. Flexible model organisms such as Drosophila
are ideal for adapting these new technologies because of their strong annotation and high tractability. One challenge with these small animals, though, is the purification of enough informational molecules (DNA, mRNA, protein, metabolites) from highly relevant tissues such as fly brains. Other challenges consist of collecting large numbers of flies for experimental replicates (critical for statistical robustness) and developing consistent procedures for the purification of high-quality biological material. Here, we describe the procedures for collecting thousands of fly heads and the extraction of transcripts and metabolites to understand how global changes in gene expression and metabolism contribute to neurodegenerative diseases. These procedures are easily scalable and can be applied to the study of proteomic and epigenomic contributions to disease.
Genetics, Issue 73, Biochemistry, Molecular Biology, Neurobiology, Neuroscience, Bioengineering, Cellular Biology, Anatomy, Neurodegenerative Diseases, Biological Assay, Drosophila, fruit fly, head separation, purification, mRNA, RNA, cDNA, DNA, transcripts, metabolites, replicates, SCA3, neurodegeneration, NMR, gene expression, animal model
Efficient Chromatin Immunoprecipitation using Limiting Amounts of Biomass
Institutions: University of Utah School of Medicine.
Chromatin immunoprecipitation (ChIP) is a widely-used method for determining the interactions of different proteins with DNA in chromatin of living cells. Examples include sequence-specific DNA binding transcription factors, histones and their different modification states, enzymes such as RNA polymerases and ancillary factors, and DNA repair components. Despite its ubiquity, there is a lack of up-to-date, detailed methodologies for both bench preparation of material and for accurate analysis allowing quantitative metrics of interaction. Due to this lack of information, and also because, like any immunoprecipitation, conditions must be re-optimized for new sets of experimental conditions, the ChIP assay is susceptible to inaccurate or poorly quantitative results.
Our protocol is ultimately derived from seminal work on transcription factor:DNA interactions1,2
, but incorporates a number of improvements to sensitivity and reproducibility for difficult-to-obtain cell types. The protocol has been used successfully3,4
, both using qPCR to quantify DNA enrichment, or using a semi-quantitative variant of the below protocol.
This quantitative analysis of PCR-amplified material is performed computationally, and represents a limiting factor in the assay. Important controls and other considerations include the use of an isotype-matched antibody, as well as evaluation of a control region of genomic DNA, such as an intergenic region predicted not to be bound by the protein under study (or anticipated not to show changes under the experimental conditions). In addition, a standard curve of input material for every ChIP sample is used to derive absolute levels of enrichment in the experimental material. Use of standard curves helps to take into account differences between primer sets, regardless of how carefully they are designed, and also efficiency differences throughout the range of template concentrations for a single primer set. Our protocol is different from others that are available5-8
in that we extensively cover the later, analysis phase.
Molecular Biology, Issue 75, Genetics, Cellular Biology, Biomedical Engineering, Microbiology, Immunology, Biochemistry, Proteins, life sciences, animal models, chromatin immunoprecipitation, ChIP, chromatin, immunoprecipitation, gene regulation, T lymphocyte, transcription factor, chromatin modification, DNA, quantitative PCR, PCR, cells, isolation, animal model
Chromatin Immunoprecipitation (ChIP) using Drosophila tissue
Institutions: Johns Hopkins University.
Epigenetics remains a rapidly developing field that studies how the chromatin state contributes to differential gene expression in distinct cell types at different developmental stages. Epigenetic regulation contributes to a broad spectrum of biological processes, including cellular differentiation during embryonic development and homeostasis in adulthood. A critical strategy in epigenetic studies is to examine how various histone modifications and chromatin factors regulate gene expression. To address this, Chromatin Immunoprecipitation (ChIP) is used widely to obtain a snapshot of the association of particular factors with DNA in the cells of interest.
ChIP technique commonly uses cultured cells as starting material, which can be obtained in abundance and homogeneity to generate reproducible data. However, there are several caveats: First, the environment to grow cells in Petri dish is different from that in vivo
, thus may not reflect the endogenous chromatin state of cells in a living organism. Second, not all types of cells can be cultured ex vivo
. There are only a limited number of cell lines, from which people can obtain enough material for ChIP assay.
Here we describe a method to do ChIP experiment using Drosophila
tissues. The starting material is dissected tissue from a living animal, thus can accurately reflect the endogenous chromatin state. The adaptability of this method with many different types of tissue will allow researchers to address a lot more biologically relevant questions regarding epigenetic regulation in vivo1, 2
. Combining this method with high-throughput sequencing (ChIP-seq) will further allow researchers to obtain an epigenomic landscape.
Genetics, Issue 61, ChIP, Drosophila, testes, q-PCR, high throughput sequencing, epi-genetics
Chromatin Isolation by RNA Purification (ChIRP)
Institutions: Stanford University School of Medicine.
Long noncoding RNAs are key regulators of chromatin states for important biological processes such as dosage compensation, imprinting, and developmental gene expression 1,2,3,4,5,6,7
. The recent discovery of thousands of lncRNAs in association with specific chromatin modification complexes, such as Polycomb Repressive Complex 2 (PRC2) that mediates histone H3 lysine 27 trimethylation (H3K27me3), suggests broad roles for numerous lncRNAs in managing chromatin states in a gene-specific fashion 8,9
. While some lncRNAs are thought to work in cis on neighboring genes, other lncRNAs work in trans to regulate distantly located genes. For instance, Drosophila
lncRNAs roX1 and roX2 bind numerous regions on the X chromosome of male cells, and are critical for dosage compensation 10,11
. However, the exact locations of their binding sites are not known at high resolution. Similarly, human lncRNA HOTAIR can affect PRC2 occupancy on hundreds of genes genome-wide 3,12,13
, but how specificity is achieved is unclear. LncRNAs can also serve as modular scaffolds to recruit the assembly of multiple protein complexes. The classic trans-acting RNA scaffold is the TERC RNA that serves as the template and scaffold for the telomerase complex 14
; HOTAIR can also serve as a scaffold for PRC2 and a H3K4 demethylase complex 13
Prior studies mapping RNA occupancy at chromatin have revealed substantial insights 15,16
, but only at a single gene locus at a time. The occupancy sites of most lncRNAs are not known, and the roles of lncRNAs in chromatin regulation have been mostly inferred from the indirect effects of lncRNA perturbation. Just as chromatin immunoprecipitation followed by microarray or deep sequencing (ChIP-chip or ChIP-seq, respectively) has greatly improved our understanding of protein-DNA interactions on a genomic scale, here we illustrate a recently published strategy to map long RNA occupancy genome-wide at high resolution 17
. This method, Chromatin Isolation by RNA Purification (ChIRP) (Figure 1
), is based on affinity capture of target lncRNA:chromatin complex by tiling antisense-oligos, which then generates a map of genomic binding sites at a resolution of several hundred bases with high sensitivity and low background. ChIRP is applicable to many lncRNAs because the design of affinity-probes is straightforward given the RNA sequence and requires no knowledge of the RNA's structure or functional domains.
Genetics, Issue 61, long noncoding RNA (lncRNA), genomics, chromatin binding, high-throughput sequencing, ChIRP
The ChroP Approach Combines ChIP and Mass Spectrometry to Dissect Locus-specific Proteomic Landscapes of Chromatin
Institutions: European Institute of Oncology.
Chromatin is a highly dynamic nucleoprotein complex made of DNA and proteins that controls various DNA-dependent processes. Chromatin structure and function at specific regions is regulated by the local enrichment of histone post-translational modifications (hPTMs) and variants, chromatin-binding proteins, including transcription factors, and DNA methylation. The proteomic characterization of chromatin composition at distinct functional regions has been so far hampered by the lack of efficient protocols to enrich such domains at the appropriate purity and amount for the subsequent in-depth analysis by Mass Spectrometry (MS). We describe here a newly designed chromatin proteomics strategy, named ChroP (Chromatin Proteomics
), whereby a preparative chromatin immunoprecipitation is used to isolate distinct chromatin regions whose features, in terms of hPTMs, variants and co-associated non-histonic proteins, are analyzed by MS. We illustrate here the setting up of ChroP for the enrichment and analysis of transcriptionally silent heterochromatic regions, marked by the presence of tri-methylation of lysine 9 on histone H3. The results achieved demonstrate the potential of ChroP
in thoroughly characterizing the heterochromatin proteome and prove it as a powerful analytical strategy for understanding how the distinct protein determinants of chromatin interact and synergize to establish locus-specific structural and functional configurations.
Biochemistry, Issue 86, chromatin, histone post-translational modifications (hPTMs), epigenetics, mass spectrometry, proteomics, SILAC, chromatin immunoprecipitation , histone variants, chromatome, hPTMs cross-talks
RNA Secondary Structure Prediction Using High-throughput SHAPE
Institutions: Frederick National Laboratory for Cancer Research.
Understanding the function of RNA involved in biological processes requires a thorough knowledge of RNA structure. Toward this end, the methodology dubbed "high-throughput selective 2' hydroxyl acylation analyzed by primer extension", or SHAPE, allows prediction of RNA secondary structure with single nucleotide resolution. This approach utilizes chemical probing agents that preferentially acylate single stranded or flexible regions of RNA in aqueous solution. Sites of chemical modification are detected by reverse transcription of the modified RNA, and the products of this reaction are fractionated by automated capillary electrophoresis (CE). Since reverse transcriptase pauses at those RNA nucleotides modified by the SHAPE reagents, the resulting cDNA library indirectly maps those ribonucleotides that are single stranded in the context of the folded RNA. Using ShapeFinder software, the electropherograms produced by automated CE are processed and converted into nucleotide reactivity tables that are themselves converted into pseudo-energy constraints used in the RNAStructure (v5.3) prediction algorithm. The two-dimensional RNA structures obtained by combining SHAPE probing with in silico
RNA secondary structure prediction have been found to be far more accurate than structures obtained using either method alone.
Genetics, Issue 75, Molecular Biology, Biochemistry, Virology, Cancer Biology, Medicine, Genomics, Nucleic Acid Probes, RNA Probes, RNA, High-throughput SHAPE, Capillary electrophoresis, RNA structure, RNA probing, RNA folding, secondary structure, DNA, nucleic acids, electropherogram, synthesis, transcription, high throughput, sequencing
Cortical Source Analysis of High-Density EEG Recordings in Children
Institutions: UCL Institute of Child Health, University College London.
EEG is traditionally described as a neuroimaging technique with high temporal and low spatial resolution. Recent advances in biophysical modelling and signal processing make it possible to exploit information from other imaging modalities like structural MRI that provide high spatial resolution to overcome this constraint1
. This is especially useful for investigations that require high resolution in the temporal as well as spatial domain. In addition, due to the easy application and low cost of EEG recordings, EEG is often the method of choice when working with populations, such as young children, that do not tolerate functional MRI scans well. However, in order to investigate which neural substrates are involved, anatomical information from structural MRI is still needed. Most EEG analysis packages work with standard head models that are based on adult anatomy. The accuracy of these models when used for children is limited2
, because the composition and spatial configuration of head tissues changes dramatically over development3
In the present paper, we provide an overview of our recent work in utilizing head models based on individual structural MRI scans or age specific head models to reconstruct the cortical generators of high density EEG. This article describes how EEG recordings are acquired, processed, and analyzed with pediatric populations at the London Baby Lab, including laboratory setup, task design, EEG preprocessing, MRI processing, and EEG channel level and source analysis.
Behavior, Issue 88, EEG, electroencephalogram, development, source analysis, pediatric, minimum-norm estimation, cognitive neuroscience, event-related potentials
RNA-Seq Analysis of Differential Gene Expression in Electroporated Chick Embryonic Spinal Cord
Institutions: Universidade de São Paulo.
electroporation of the chick neural tube is a fast and inexpensive method for identification of gene function during neural development. Genome wide analysis of differentially expressed transcripts after such an experimental manipulation has the potential to uncover an almost complete picture of the downstream effects caused by the transfected construct. This work describes a simple method for comparing transcriptomes from samples of transfected embryonic spinal cords comprising all steps between electroporation and identification of differentially expressed transcripts. The first stage consists of guidelines for electroporation and instructions for dissection of transfected spinal cord halves from HH23 embryos in ribonuclease-free environment and extraction of high-quality RNA samples suitable for transcriptome sequencing. The next stage is that of bioinformatic analysis with general guidelines for filtering and comparison of RNA-Seq datasets in the Galaxy public server, which eliminates the need of a local computational structure for small to medium scale experiments. The representative results show that the dissection methods generate high quality RNA samples and that the transcriptomes obtained from two control samples are essentially the same, an important requirement for detection of differential expression genes in experimental samples. Furthermore, one example is provided where experimental overexpression of a DNA construct can be visually verified after comparison with control samples. The application of this method may be a powerful tool to facilitate new discoveries on the function of neural factors involved in spinal cord early development.
Developmental Biology, Issue 93, chicken embryo, in ovo electroporation, spinal cord, RNA-Seq, transcriptome profiling, Galaxy workflow
Single Read and Paired End mRNA-Seq Illumina Libraries from 10 Nanograms Total RNA
Institutions: Morgridge Institute for Research, University of Wisconsin, University of California.
Whole transcriptome sequencing by mRNA-Seq is now used extensively to perform global gene expression, mutation, allele-specific expression and other genome-wide analyses. mRNA-Seq even opens the gate for gene expression analysis of non-sequenced genomes. mRNA-Seq offers high sensitivity, a large dynamic range and allows measurement of transcript copy numbers in a sample. Illumina’s genome analyzer performs sequencing of a large number (> 107
) of relatively short sequence reads (< 150 bp).The "paired end" approach, wherein a single long read is sequenced at both its ends, allows for tracking alternate splice junctions, insertions and deletions, and is useful for de novo
One of the major challenges faced by researchers is a limited amount of starting material. For example, in experiments where cells are harvested by laser micro-dissection, available starting total RNA may measure in nanograms. Preparation of mRNA-Seq libraries from such samples have been described1, 2
but involves significant PCR amplification that may introduce bias. Other RNA-Seq library construction procedures with minimal PCR amplification have been published3, 4
but require microgram amounts of starting total RNA.
Here we describe a protocol for the Illumina Genome Analyzer II platform for mRNA-Seq sequencing for library preparation that avoids significant PCR amplification and requires only 10 nanograms of total RNA. While this protocol has been described previously and validated for single-end sequencing5
, where it was shown to produce directional libraries without introducing significant amplification bias, here we validate it further for use as a paired end protocol. We selectively amplify polyadenylated messenger RNAs from starting total RNA using the T7 based Eberwine linear amplification method, coined "T7LA" (T7 linear amplification). The amplified poly-A mRNAs are fragmented, reverse transcribed and adapter ligated to produce the final sequencing library. For both single read and paired end runs, sequences are mapped to the human transcriptome6
and normalized so that data from multiple runs can be compared. We report the gene expression measurement in units of transcripts per million (TPM), which is a superior measure to RPKM when comparing samples7
Molecular Biology, Issue 56, Genetics, mRNA-Seq, Illumina-Seq, gene expression profiling, high throughput sequencing
DNA-affinity-purified Chip (DAP-chip) Method to Determine Gene Targets for Bacterial Two component Regulatory Systems
Institutions: Lawrence Berkeley National Laboratory.
methods such as ChIP-chip are well-established techniques used to determine global gene targets for transcription factors. However, they are of limited use in exploring bacterial two component regulatory systems with uncharacterized activation conditions. Such systems regulate transcription only when activated in the presence of unique signals. Since these signals are often unknown, the in vitro
microarray based method described in this video article can be used to determine gene targets and binding sites for response regulators. This DNA-affinity-purified-chip method may be used for any purified regulator in any organism with a sequenced genome. The protocol involves allowing the purified tagged protein to bind to sheared genomic DNA and then affinity purifying the protein-bound DNA, followed by fluorescent labeling of the DNA and hybridization to a custom tiling array. Preceding steps that may be used to optimize the assay for specific regulators are also described. The peaks generated by the array data analysis are used to predict binding site motifs, which are then experimentally validated. The motif predictions can be further used to determine gene targets of orthologous response regulators in closely related species. We demonstrate the applicability of this method by determining the gene targets and binding site motifs and thus predicting the function for a sigma54-dependent response regulator DVU3023 in the environmental bacterium Desulfovibrio vulgaris
Genetics, Issue 89, DNA-Affinity-Purified-chip, response regulator, transcription factor binding site, two component system, signal transduction, Desulfovibrio, lactate utilization regulator, ChIP-chip
Metabolic Labeling of Newly Transcribed RNA for High Resolution Gene Expression Profiling of RNA Synthesis, Processing and Decay in Cell Culture
Institutions: Max von Pettenkofer Institute, University of Cambridge, Ludwig-Maximilians-University Munich.
The development of whole-transcriptome microarrays and next-generation sequencing has revolutionized our understanding of the complexity of cellular gene expression. Along with a better understanding of the involved molecular mechanisms, precise measurements of the underlying kinetics have become increasingly important. Here, these powerful methodologies face major limitations due to intrinsic properties of the template samples they study, i.e.
total cellular RNA. In many cases changes in total cellular RNA occur either too slowly or too quickly to represent the underlying molecular events and their kinetics with sufficient resolution. In addition, the contribution of alterations in RNA synthesis, processing, and decay are not readily differentiated.
We recently developed high-resolution gene expression profiling to overcome these limitations. Our approach is based on metabolic labeling of newly transcribed RNA with 4-thiouridine (thus also referred to as 4sU-tagging) followed by rigorous purification of newly transcribed RNA using thiol-specific biotinylation and streptavidin-coated magnetic beads. It is applicable to a broad range of organisms including vertebrates, Drosophila
, and yeast. We successfully applied 4sU-tagging to study real-time kinetics of transcription factor activities, provide precise measurements of RNA half-lives, and obtain novel insights into the kinetics of RNA processing. Finally, computational modeling can be employed to generate an integrated, comprehensive analysis of the underlying molecular mechanisms.
Genetics, Issue 78, Cellular Biology, Molecular Biology, Microbiology, Biochemistry, Eukaryota, Investigative Techniques, Biological Phenomena, Gene expression profiling, RNA synthesis, RNA processing, RNA decay, 4-thiouridine, 4sU-tagging, microarray analysis, RNA-seq, RNA, DNA, PCR, sequencing
RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells
Institutions: Children's Mercy Hospital and Clinics, School of Medicine, University of Missouri-Kansas City.
The characterization of gene expression in cells via measurement of mRNA levels is a useful tool in determining how the transcriptional machinery of the cell is affected by external signals (e.g.
drug treatment), or how cells differ between a healthy state and a diseased state. With the advent and continuous refinement of next-generation DNA sequencing technology, RNA-sequencing (RNA-seq) has become an increasingly popular method of transcriptome analysis to catalog all species of transcripts, to determine the transcriptional structure of all expressed genes and to quantify the changing expression levels of the total set of transcripts in a given cell, tissue or organism1,2
. RNA-seq is gradually replacing DNA microarrays as a preferred method for transcriptome analysis because it has the advantages of profiling a complete transcriptome, providing a digital type datum (copy number of any transcript) and not relying on any known genomic sequence3
Here, we present a complete and detailed protocol to apply RNA-seq to profile transcriptomes in human pulmonary microvascular endothelial cells with or without thrombin treatment. This protocol is based on our recent published study entitled "RNA-seq Reveals Novel Transcriptome of Genes and Their Isoforms in Human Pulmonary Microvascular Endothelial Cells Treated with Thrombin,"4
in which we successfully performed the first complete transcriptome analysis of human pulmonary microvascular endothelial cells treated with thrombin using RNA-seq. It yielded unprecedented resources for further experimentation to gain insights into molecular mechanisms underlying thrombin-mediated endothelial dysfunction in the pathogenesis of inflammatory conditions, cancer, diabetes, and coronary heart disease, and provides potential new leads for therapeutic targets to those diseases.
The descriptive text of this protocol is divided into four parts. The first part describes the treatment of human pulmonary microvascular endothelial cells with thrombin and RNA isolation, quality analysis and quantification. The second part describes library construction and sequencing. The third part describes the data analysis. The fourth part describes an RT-PCR validation assay. Representative results of several key steps are displayed. Useful tips or precautions to boost success in key steps are provided in the Discussion section. Although this protocol uses human pulmonary microvascular endothelial cells treated with thrombin, it can be generalized to profile transcriptomes in both mammalian and non-mammalian cells and in tissues treated with different stimuli or inhibitors, or to compare transcriptomes in cells or tissues between a healthy state and a disease state.
Genetics, Issue 72, Molecular Biology, Immunology, Medicine, Genomics, Proteins, RNA-seq, Next Generation DNA Sequencing, Transcriptome, Transcription, Thrombin, Endothelial cells, high-throughput, DNA, genomic DNA, RT-PCR, PCR
Generation of High Quality Chromatin Immunoprecipitation DNA Template for High-throughput Sequencing (ChIP-seq)
Institutions: Children's Hospital of Philadelphia Research Institute, University of Pennsylvania .
ChIP-sequencing (ChIP-seq) methods directly offer whole-genome coverage, where combining chromatin immunoprecipitation (ChIP) and massively parallel sequencing can be utilized to identify the repertoire of mammalian DNA sequences bound by transcription factors in vivo
. "Next-generation" genome sequencing technologies provide 1-2 orders of magnitude increase in the amount of sequence that can be cost-effectively generated over older technologies thus allowing for ChIP-seq methods to directly provide whole-genome coverage for effective profiling of mammalian protein-DNA interactions.
For successful ChIP-seq approaches, one must generate high quality ChIP DNA template to obtain the best sequencing outcomes. The description is based around experience with the protein product of the gene most strongly implicated in the pathogenesis of type 2 diabetes, namely the transcription factor transcription factor 7-like 2 (TCF7L2). This factor has also been implicated in various cancers.
Outlined is how to generate high quality ChIP DNA template derived from the colorectal carcinoma cell line, HCT116, in order to build a high-resolution map through sequencing to determine the genes bound by TCF7L2, giving further insight in to its key role in the pathogenesis of complex traits.
Molecular Biology, Issue 74, Genetics, Biochemistry, Microbiology, Medicine, Proteins, DNA-Binding Proteins, Transcription Factors, Chromatin Immunoprecipitation, Genes, chromatin, immunoprecipitation, ChIP, DNA, PCR, sequencing, antibody, cross-link, cell culture, assay
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
Institutions: Stony Brook University, Cold Spring Harbor Laboratory, University of Texas at Dallas.
ChIPseq is a widely used technique for investigating protein-DNA interactions. Read density profiles are generated by using next-sequencing of protein-bound DNA and aligning the short reads to a reference genome. Enriched regions are revealed as peaks, which often differ dramatically in shape, depending on the target protein1
. For example, transcription factors often bind in a site- and sequence-specific manner and tend to produce punctate peaks, while histone modifications are more pervasive and are characterized by broad, diffuse islands of enrichment2
. Reliably identifying these regions was the focus of our work.
Algorithms for analyzing ChIPseq data have employed various methodologies, from heuristics3-5
to more rigorous statistical models, e.g.
Hidden Markov Models (HMMs)6-8
. We sought a solution that minimized the necessity for difficult-to-define, ad hoc parameters that often compromise resolution and lessen the intuitive usability of the tool. With respect to HMM-based methods, we aimed to curtail parameter estimation procedures and simple, finite state classifications that are often utilized.
Additionally, conventional ChIPseq data analysis involves categorization of the expected read density profiles as either punctate or diffuse followed by subsequent application of the appropriate tool. We further aimed to replace the need for these two distinct models with a single, more versatile model, which can capably address the entire spectrum of data types.
To meet these objectives, we first constructed a statistical framework that naturally modeled ChIPseq data structures using a cutting edge advance in HMMs9
, which utilizes only explicit formulas-an innovation crucial to its performance advantages. More sophisticated then heuristic models, our HMM accommodates infinite hidden states through a Bayesian model. We applied it to identifying reasonable change points in read density, which further define segments of enrichment. Our analysis revealed how our Bayesian Change Point (BCP) algorithm had a reduced computational complexity-evidenced by an abridged run time and memory footprint. The BCP algorithm was successfully applied to both punctate peak and diffuse island identification with robust accuracy and limited user-defined parameters. This illustrated both its versatility and ease of use. Consequently, we believe it can be implemented readily across broad ranges of data types and end users in a manner that is easily compared and contrasted, making it a great tool for ChIPseq data analysis that can aid in collaboration and corroboration between research groups. Here, we demonstrate the application of BCP to existing transcription factor10,11
and epigenetic data12
to illustrate its usefulness.
Genetics, Issue 70, Bioinformatics, Genomics, Molecular Biology, Cellular Biology, Immunology, Chromatin immunoprecipitation, ChIP-Seq, histone modifications, segmentation, Bayesian, Hidden Markov Models, epigenetics
An Experimental and Bioinformatics Protocol for RNA-seq Analyses of Photoperiodic Diapause in the Asian Tiger Mosquito, Aedes albopictus
Institutions: Georgetown University, The Ohio State University.
Photoperiodic diapause is an important adaptation that allows individuals to escape harsh seasonal environments via a series of physiological changes, most notably developmental arrest and reduced metabolism. Global gene expression profiling via RNA-Seq can provide important insights into the transcriptional mechanisms of photoperiodic diapause. The Asian tiger mosquito, Aedes albopictus
, is an outstanding organism for studying the transcriptional bases of diapause due to its ease of rearing, easily induced diapause, and the genomic resources available. This manuscript presents a general experimental workflow for identifying diapause-induced transcriptional differences in A. albopictus.
Rearing techniques, conditions necessary to induce diapause and non-diapause development, methods to estimate percent diapause in a population, and RNA extraction and integrity assessment for mosquitoes are documented. A workflow to process RNA-Seq data from Illumina sequencers culminates in a list of differentially expressed genes. The representative results demonstrate that this protocol can be used to effectively identify genes differentially regulated at the transcriptional level in A. albopictus
due to photoperiodic differences. With modest adjustments, this workflow can be readily adapted to study the transcriptional bases of diapause or other important life history traits in other mosquitoes.
Genetics, Issue 93, Aedes albopictus Asian tiger mosquito, photoperiodic diapause, RNA-Seq de novo transcriptome assembly, mosquito husbandry
Using SCOPE to Identify Potential Regulatory Motifs in Coregulated Genes
Institutions: Dartmouth College.
SCOPE is an ensemble motif finder that uses three component algorithms in parallel to identify potential regulatory motifs by over-representation and motif position preference1
. Each component algorithm is optimized to find a different kind of motif. By taking the best of these three approaches, SCOPE performs better than any single algorithm, even in the presence of noisy data1
. In this article, we utilize a web version of SCOPE2
to examine genes that are involved in telomere maintenance. SCOPE has been incorporated into at least two other motif finding programs3,4
and has been used in other studies5-8
The three algorithms that comprise SCOPE are BEAM9
, which finds non-degenerate motifs (ACCGGT), PRISM10
, which finds degenerate motifs (ASCGWT), and SPACER11
, which finds longer bipartite motifs (ACCnnnnnnnnGGT). These three algorithms have been optimized to find their corresponding type of motif. Together, they allow SCOPE to perform extremely well.
Once a gene set has been analyzed and candidate motifs identified, SCOPE can look for other genes that contain the motif which, when added to the original set, will improve the motif score. This can occur through over-representation or motif position preference. Working with partial gene sets that have biologically verified transcription factor binding sites, SCOPE was able to identify most of the rest of the genes also regulated by the given transcription factor.
Output from SCOPE shows candidate motifs, their significance, and other information both as a table and as a graphical motif map. FAQs and video tutorials are available at the SCOPE web site which also includes a "Sample Search" button that allows the user to perform a trial run.
Scope has a very friendly user interface that enables novice users to access the algorithm's full power without having to become an expert in the bioinformatics of motif finding. As input, SCOPE can take a list of genes, or FASTA sequences. These can be entered in browser text fields, or read from a file. The output from SCOPE contains a list of all identified motifs with their scores, number of occurrences, fraction of genes containing the motif, and the algorithm used to identify the motif. For each motif, result details include a consensus representation of the motif, a sequence logo, a position weight matrix, and a list of instances for every motif occurrence (with exact positions and "strand" indicated). Results are returned in a browser window and also optionally by email. Previous papers describe the SCOPE algorithms in detail1,2,9-11
Genetics, Issue 51, gene regulation, computational biology, algorithm, promoter sequence motif