JoVE Visualize What is visualize?
Related JoVE Video
 
Pubmed Article
Fine de novo sequencing of a fungal genome using only SOLiD short read data: verification on Aspergillus oryzae RIB40.
PLoS ONE
PUBLISHED: 01-01-2013
The development of next-generation sequencing (NGS) technologies has dramatically increased the throughput, speed, and efficiency of genome sequencing. The short read data generated from NGS platforms, such as SOLiD and Illumina, are quite useful for mapping analysis. However, the SOLiD read data with lengths of <60 bp have been considered to be too short for de novo genome sequencing. Here, to investigate whether de novo sequencing of fungal genomes is possible using only SOLiD short read sequence data, we performed de novo assembly of the Aspergillus oryzae RIB40 genome using only SOLiD read data of 50 bp generated from mate-paired libraries with 2.8- or 1.9-kb insert sizes. The assembled scaffolds showed an N50 value of 1.6 Mb, a 22-fold increase than those obtained using only SOLiD short read in other published reports. In addition, almost 99% of the reference genome was accurately aligned by the assembled scaffold fragments in long lengths. The sequences of secondary metabolite biosynthetic genes and clusters, whose products are of considerable interest in fungal studies due to their potential medicinal, agricultural, and cosmetic properties, were also highly reconstructed in the assembled scaffolds. Based on these findings, we concluded that de novo genome sequencing using only SOLiD short reads is feasible and practical for molecular biological study of fungi. We also investigated the effect of filtering low quality data, library insert size, and k-mer size on the assembly performance, and recommend for the assembly use of mild filtered read data where the N50 was not so degraded and the library has an insert size of ?2.0 kb, and k-mer size 33.
ABSTRACT
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study. RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment. In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro and in vivo.
17 Related JoVE Articles!
Play Button
RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells
Authors: Dilyara Cheranova, Margaret Gibson, Suman Chaudhary, Li Qin Zhang, Daniel P. Heruth, Dmitry N. Grigoryev, Shui Qing Ye.
Institutions: Children's Mercy Hospital and Clinics, School of Medicine, University of Missouri-Kansas City.
The characterization of gene expression in cells via measurement of mRNA levels is a useful tool in determining how the transcriptional machinery of the cell is affected by external signals (e.g. drug treatment), or how cells differ between a healthy state and a diseased state. With the advent and continuous refinement of next-generation DNA sequencing technology, RNA-sequencing (RNA-seq) has become an increasingly popular method of transcriptome analysis to catalog all species of transcripts, to determine the transcriptional structure of all expressed genes and to quantify the changing expression levels of the total set of transcripts in a given cell, tissue or organism1,2 . RNA-seq is gradually replacing DNA microarrays as a preferred method for transcriptome analysis because it has the advantages of profiling a complete transcriptome, providing a digital type datum (copy number of any transcript) and not relying on any known genomic sequence3. Here, we present a complete and detailed protocol to apply RNA-seq to profile transcriptomes in human pulmonary microvascular endothelial cells with or without thrombin treatment. This protocol is based on our recent published study entitled "RNA-seq Reveals Novel Transcriptome of Genes and Their Isoforms in Human Pulmonary Microvascular Endothelial Cells Treated with Thrombin,"4 in which we successfully performed the first complete transcriptome analysis of human pulmonary microvascular endothelial cells treated with thrombin using RNA-seq. It yielded unprecedented resources for further experimentation to gain insights into molecular mechanisms underlying thrombin-mediated endothelial dysfunction in the pathogenesis of inflammatory conditions, cancer, diabetes, and coronary heart disease, and provides potential new leads for therapeutic targets to those diseases. The descriptive text of this protocol is divided into four parts. The first part describes the treatment of human pulmonary microvascular endothelial cells with thrombin and RNA isolation, quality analysis and quantification. The second part describes library construction and sequencing. The third part describes the data analysis. The fourth part describes an RT-PCR validation assay. Representative results of several key steps are displayed. Useful tips or precautions to boost success in key steps are provided in the Discussion section. Although this protocol uses human pulmonary microvascular endothelial cells treated with thrombin, it can be generalized to profile transcriptomes in both mammalian and non-mammalian cells and in tissues treated with different stimuli or inhibitors, or to compare transcriptomes in cells or tissues between a healthy state and a disease state.
Genetics, Issue 72, Molecular Biology, Immunology, Medicine, Genomics, Proteins, RNA-seq, Next Generation DNA Sequencing, Transcriptome, Transcription, Thrombin, Endothelial cells, high-throughput, DNA, genomic DNA, RT-PCR, PCR
4393
Play Button
Single Read and Paired End mRNA-Seq Illumina Libraries from 10 Nanograms Total RNA
Authors: Srikumar Sengupta, Jennifer M. Bolin, Victor Ruotti, Bao Kim Nguyen, James A. Thomson, Angela L. Elwell, Ron Stewart.
Institutions: Morgridge Institute for Research, University of Wisconsin, University of California.
Whole transcriptome sequencing by mRNA-Seq is now used extensively to perform global gene expression, mutation, allele-specific expression and other genome-wide analyses. mRNA-Seq even opens the gate for gene expression analysis of non-sequenced genomes. mRNA-Seq offers high sensitivity, a large dynamic range and allows measurement of transcript copy numbers in a sample. Illumina’s genome analyzer performs sequencing of a large number (> 107) of relatively short sequence reads (< 150 bp).The "paired end" approach, wherein a single long read is sequenced at both its ends, allows for tracking alternate splice junctions, insertions and deletions, and is useful for de novo transcriptome assembly. One of the major challenges faced by researchers is a limited amount of starting material. For example, in experiments where cells are harvested by laser micro-dissection, available starting total RNA may measure in nanograms. Preparation of mRNA-Seq libraries from such samples have been described1, 2 but involves significant PCR amplification that may introduce bias. Other RNA-Seq library construction procedures with minimal PCR amplification have been published3, 4 but require microgram amounts of starting total RNA. Here we describe a protocol for the Illumina Genome Analyzer II platform for mRNA-Seq sequencing for library preparation that avoids significant PCR amplification and requires only 10 nanograms of total RNA. While this protocol has been described previously and validated for single-end sequencing5, where it was shown to produce directional libraries without introducing significant amplification bias, here we validate it further for use as a paired end protocol. We selectively amplify polyadenylated messenger RNAs from starting total RNA using the T7 based Eberwine linear amplification method, coined "T7LA" (T7 linear amplification). The amplified poly-A mRNAs are fragmented, reverse transcribed and adapter ligated to produce the final sequencing library. For both single read and paired end runs, sequences are mapped to the human transcriptome6 and normalized so that data from multiple runs can be compared. We report the gene expression measurement in units of transcripts per million (TPM), which is a superior measure to RPKM when comparing samples7.
Molecular Biology, Issue 56, Genetics, mRNA-Seq, Illumina-Seq, gene expression profiling, high throughput sequencing
3340
Play Button
Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
Authors: Francesco Vallania, Enrique Ramos, Sharon Cresci, Robi D. Mitra, Todd E. Druley.
Institutions: Washington University School of Medicine, Washington University School of Medicine, Washington University School of Medicine.
As DNA sequencing technology has markedly advanced in recent years2, it has become increasingly evident that the amount of genetic variation between any two individuals is greater than previously thought3. In contrast, array-based genotyping has failed to identify a significant contribution of common sequence variants to the phenotypic variability of common disease4,5. Taken together, these observations have led to the evolution of the Common Disease / Rare Variant hypothesis suggesting that the majority of the "missing heritability" in common and complex phenotypes is instead due to an individual's personal profile of rare or private DNA variants6-8. However, characterizing how rare variation impacts complex phenotypes requires the analysis of many affected individuals at many genomic loci, and is ideally compared to a similar survey in an unaffected cohort. Despite the sequencing power offered by today's platforms, a population-based survey of many genomic loci and the subsequent computational analysis required remains prohibitive for many investigators. To address this need, we have developed a pooled sequencing approach1,9 and a novel software package1 for highly accurate rare variant detection from the resulting data. The ability to pool genomes from entire populations of affected individuals and survey the degree of genetic variation at multiple targeted regions in a single sequencing library provides excellent cost and time savings to traditional single-sample sequencing methodology. With a mean sequencing coverage per allele of 25-fold, our custom algorithm, SPLINTER, uses an internal variant calling control strategy to call insertions, deletions and substitutions up to four base pairs in length with high sensitivity and specificity from pools of up to 1 mutant allele in 500 individuals. Here we describe the method for preparing the pooled sequencing library followed by step-by-step instructions on how to use the SPLINTER package for pooled sequencing analysis (http://www.ibridgenetwork.org/wustl/splinter). We show a comparison between pooled sequencing of 947 individuals, all of whom also underwent genome-wide array, at over 20kb of sequencing per person. Concordance between genotyping of tagged and novel variants called in the pooled sample were excellent. This method can be easily scaled up to any number of genomic loci and any number of individuals. By incorporating the internal positive and negative amplicon controls at ratios that mimic the population under study, the algorithm can be calibrated for optimal performance. This strategy can also be modified for use with hybridization capture or individual-specific barcodes and can be applied to the sequencing of naturally heterogeneous samples, such as tumor DNA.
Genetics, Issue 64, Genomics, Cancer Biology, Bioinformatics, Pooled DNA sequencing, SPLINTER, rare genetic variants, genetic screening, phenotype, high throughput, computational analysis, DNA, PCR, primers
3943
Play Button
Detecting Somatic Genetic Alterations in Tumor Specimens by Exon Capture and Massively Parallel Sequencing
Authors: Helen H Won, Sasinya N Scott, A. Rose Brannon, Ronak H Shah, Michael F Berger.
Institutions: Memorial Sloan-Kettering Cancer Center, Memorial Sloan-Kettering Cancer Center.
Efforts to detect and investigate key oncogenic mutations have proven valuable to facilitate the appropriate treatment for cancer patients. The establishment of high-throughput, massively parallel "next-generation" sequencing has aided the discovery of many such mutations. To enhance the clinical and translational utility of this technology, platforms must be high-throughput, cost-effective, and compatible with formalin-fixed paraffin embedded (FFPE) tissue samples that may yield small amounts of degraded or damaged DNA. Here, we describe the preparation of barcoded and multiplexed DNA libraries followed by hybridization-based capture of targeted exons for the detection of cancer-associated mutations in fresh frozen and FFPE tumors by massively parallel sequencing. This method enables the identification of sequence mutations, copy number alterations, and select structural rearrangements involving all targeted genes. Targeted exon sequencing offers the benefits of high throughput, low cost, and deep sequence coverage, thus conferring high sensitivity for detecting low frequency mutations.
Molecular Biology, Issue 80, Molecular Diagnostic Techniques, High-Throughput Nucleotide Sequencing, Genetics, Neoplasms, Diagnosis, Massively parallel sequencing, targeted exon sequencing, hybridization capture, cancer, FFPE, DNA mutations
50710
Play Button
Transgenic Rodent Assay for Quantifying Male Germ Cell Mutant Frequency
Authors: Jason M. O'Brien, Marc A. Beal, John D. Gingerich, Lynda Soper, George R. Douglas, Carole L. Yauk, Francesco Marchetti.
Institutions: Environmental Health Centre.
De novo mutations arise mostly in the male germline and may contribute to adverse health outcomes in subsequent generations. Traditional methods for assessing the induction of germ cell mutations require the use of large numbers of animals, making them impractical. As such, germ cell mutagenicity is rarely assessed during chemical testing and risk assessment. Herein, we describe an in vivo male germ cell mutation assay using a transgenic rodent model that is based on a recently approved Organisation for Economic Co-operation and Development (OECD) test guideline. This method uses an in vitro positive selection assay to measure in vivo mutations induced in a transgenic λgt10 vector bearing a reporter gene directly in the germ cells of exposed males. We further describe how the detection of mutations in the transgene recovered from germ cells can be used to characterize the stage-specific sensitivity of the various spermatogenic cell types to mutagen exposure by controlling three experimental parameters: the duration of exposure (administration time), the time between exposure and sample collection (sampling time), and the cell population collected for analysis. Because a large number of germ cells can be assayed from a single male, this method has superior sensitivity compared with traditional methods, requires fewer animals and therefore much less time and resources.
Genetics, Issue 90, sperm, spermatogonia, male germ cells, spermatogenesis, de novo mutation, OECD TG 488, transgenic rodent mutation assay, N-ethyl-N-nitrosourea, genetic toxicology
51576
Play Button
gDNA Enrichment by a Transposase-based Technology for NGS Analysis of the Whole Sequence of BRCA1, BRCA2, and 9 Genes Involved in DNA Damage Repair
Authors: Sandy Chevrier, Romain Boidot.
Institutions: Centre Georges-François Leclerc.
The widespread use of Next Generation Sequencing has opened up new avenues for cancer research and diagnosis. NGS will bring huge amounts of new data on cancer, and especially cancer genetics. Current knowledge and future discoveries will make it necessary to study a huge number of genes that could be involved in a genetic predisposition to cancer. In this regard, we developed a Nextera design to study 11 complete genes involved in DNA damage repair. This protocol was developed to safely study 11 genes (ATM, BARD1, BRCA1, BRCA2, BRIP1, CHEK2, PALB2, RAD50, RAD51C, RAD80, and TP53) from promoter to 3'-UTR in 24 patients simultaneously. This protocol, based on transposase technology and gDNA enrichment, gives a great advantage in terms of time for the genetic diagnosis thanks to sample multiplexing. This protocol can be safely used with blood gDNA.
Genetics, Issue 92, gDNA enrichment, Nextera, NGS, DNA damage, BRCA1, BRCA2
51902
Play Button
RNA-Seq Analysis of Differential Gene Expression in Electroporated Chick Embryonic Spinal Cord
Authors: Felipe M. Vieceli, C.Y. Irene Yan.
Institutions: Universidade de São Paulo.
In ovo electroporation of the chick neural tube is a fast and inexpensive method for identification of gene function during neural development. Genome wide analysis of differentially expressed transcripts after such an experimental manipulation has the potential to uncover an almost complete picture of the downstream effects caused by the transfected construct. This work describes a simple method for comparing transcriptomes from samples of transfected embryonic spinal cords comprising all steps between electroporation and identification of differentially expressed transcripts. The first stage consists of guidelines for electroporation and instructions for dissection of transfected spinal cord halves from HH23 embryos in ribonuclease-free environment and extraction of high-quality RNA samples suitable for transcriptome sequencing. The next stage is that of bioinformatic analysis with general guidelines for filtering and comparison of RNA-Seq datasets in the Galaxy public server, which eliminates the need of a local computational structure for small to medium scale experiments. The representative results show that the dissection methods generate high quality RNA samples and that the transcriptomes obtained from two control samples are essentially the same, an important requirement for detection of differential expression genes in experimental samples. Furthermore, one example is provided where experimental overexpression of a DNA construct can be visually verified after comparison with control samples. The application of this method may be a powerful tool to facilitate new discoveries on the function of neural factors involved in spinal cord early development.
Developmental Biology, Issue 93, chicken embryo, in ovo electroporation, spinal cord, RNA-Seq, transcriptome profiling, Galaxy workflow
51951
Play Button
An Experimental and Bioinformatics Protocol for RNA-seq Analyses of Photoperiodic Diapause in the Asian Tiger Mosquito, Aedes albopictus
Authors: Monica F. Poelchau, Xin Huang, Allison Goff, Julie Reynolds, Peter Armbruster.
Institutions: Georgetown University, The Ohio State University.
Photoperiodic diapause is an important adaptation that allows individuals to escape harsh seasonal environments via a series of physiological changes, most notably developmental arrest and reduced metabolism. Global gene expression profiling via RNA-Seq can provide important insights into the transcriptional mechanisms of photoperiodic diapause. The Asian tiger mosquito, Aedes albopictus, is an outstanding organism for studying the transcriptional bases of diapause due to its ease of rearing, easily induced diapause, and the genomic resources available. This manuscript presents a general experimental workflow for identifying diapause-induced transcriptional differences in A. albopictus. Rearing techniques, conditions necessary to induce diapause and non-diapause development, methods to estimate percent diapause in a population, and RNA extraction and integrity assessment for mosquitoes are documented. A workflow to process RNA-Seq data from Illumina sequencers culminates in a list of differentially expressed genes. The representative results demonstrate that this protocol can be used to effectively identify genes differentially regulated at the transcriptional level in A. albopictus due to photoperiodic differences. With modest adjustments, this workflow can be readily adapted to study the transcriptional bases of diapause or other important life history traits in other mosquitoes.
Genetics, Issue 93, Aedes albopictus Asian tiger mosquito, photoperiodic diapause, RNA-Seq de novo transcriptome assembly, mosquito husbandry
51961
Play Button
Genetic Manipulation in Δku80 Strains for Functional Genomic Analysis of Toxoplasma gondii
Authors: Leah M. Rommereim, Miryam A. Hortua Triana, Alejandra Falla, Kiah L. Sanders, Rebekah B. Guevara, David J. Bzik, Barbara A. Fox.
Institutions: The Geisel School of Medicine at Dartmouth.
Targeted genetic manipulation using homologous recombination is the method of choice for functional genomic analysis to obtain a detailed view of gene function and phenotype(s). The development of mutant strains with targeted gene deletions, targeted mutations, complemented gene function, and/or tagged genes provides powerful strategies to address gene function, particularly if these genetic manipulations can be efficiently targeted to the gene locus of interest using integration mediated by double cross over homologous recombination. Due to very high rates of nonhomologous recombination, functional genomic analysis of Toxoplasma gondii has been previously limited by the absence of efficient methods for targeting gene deletions and gene replacements to specific genetic loci. Recently, we abolished the major pathway of nonhomologous recombination in type I and type II strains of T. gondii by deleting the gene encoding the KU80 protein1,2. The Δku80 strains behave normally during tachyzoite (acute) and bradyzoite (chronic) stages in vitro and in vivo and exhibit essentially a 100% frequency of homologous recombination. The Δku80 strains make functional genomic studies feasible on the single gene as well as on the genome scale1-4. Here, we report methods for using type I and type II Δku80Δhxgprt strains to advance gene targeting approaches in T. gondii. We outline efficient methods for generating gene deletions, gene replacements, and tagged genes by targeted insertion or deletion of the hypoxanthine-xanthine-guanine phosphoribosyltransferase (HXGPRT) selectable marker. The described gene targeting protocol can be used in a variety of ways in Δku80 strains to advance functional analysis of the parasite genome and to develop single strains that carry multiple targeted genetic manipulations. The application of this genetic method and subsequent phenotypic assays will reveal fundamental and unique aspects of the biology of T. gondii and related significant human pathogens that cause malaria (Plasmodium sp.) and cryptosporidiosis (Cryptosporidium).
Infectious Diseases, Issue 77, Genetics, Microbiology, Infection, Medicine, Immunology, Molecular Biology, Cellular Biology, Biomedical Engineering, Bioengineering, Genomics, Parasitology, Pathology, Apicomplexa, Coccidia, Toxoplasma, Genetic Techniques, Gene Targeting, Eukaryota, Toxoplasma gondii, genetic manipulation, gene targeting, gene deletion, gene replacement, gene tagging, homologous recombination, DNA, sequencing
50598
Play Button
Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules
Authors: James Smadbeck, Meghan B. Peterson, George A. Khoury, Martin S. Taylor, Christodoulos A. Floudas.
Institutions: Princeton University.
The aim of de novo protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity. To disseminate these methods for broader use we present Protein WISDOM (http://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Genetics, Issue 77, Molecular Biology, Bioengineering, Biochemistry, Biomedical Engineering, Chemical Engineering, Computational Biology, Genomics, Proteomics, Protein, Protein Binding, Computational Biology, Drug Design, optimization (mathematics), Amino Acids, Peptides, and Proteins, De novo protein and peptide design, Drug design, In silico sequence selection, Optimization, Fold specificity, Binding affinity, sequencing
50476
Play Button
Isolation of Native Soil Microorganisms with Potential for Breaking Down Biodegradable Plastic Mulch Films Used in Agriculture
Authors: Graham Bailes, Margaret Lind, Andrew Ely, Marianne Powell, Jennifer Moore-Kucera, Carol Miles, Debra Inglis, Marion Brodhagen.
Institutions: Western Washington University, Washington State University Northwestern Research and Extension Center, Texas Tech University.
Fungi native to agricultural soils that colonized commercially available biodegradable mulch (BDM) films were isolated and assessed for potential to degrade plastics. Typically, when formulations of plastics are known and a source of the feedstock is available, powdered plastic can be suspended in agar-based media and degradation determined by visualization of clearing zones. However, this approach poorly mimics in situ degradation of BDMs. First, BDMs are not dispersed as small particles throughout the soil matrix. Secondly, BDMs are not sold commercially as pure polymers, but rather as films containing additives (e.g. fillers, plasticizers and dyes) that may affect microbial growth. The procedures described herein were used for isolates acquired from soil-buried mulch films. Fungal isolates acquired from excavated BDMs were tested individually for growth on pieces of new, disinfested BDMs laid atop defined medium containing no carbon source except agar. Isolates that grew on BDMs were further tested in liquid medium where BDMs were the sole added carbon source. After approximately ten weeks, fungal colonization and BDM degradation were assessed by scanning electron microscopy. Isolates were identified via analysis of ribosomal RNA gene sequences. This report describes methods for fungal isolation, but bacteria also were isolated using these methods by substituting media appropriate for bacteria. Our methodology should prove useful for studies investigating breakdown of intact plastic films or products for which plastic feedstocks are either unknown or not available. However our approach does not provide a quantitative method for comparing rates of BDM degradation.
Microbiology, Issue 75, Plant Biology, Environmental Sciences, Agricultural Sciences, Soil Science, Molecular Biology, Cellular Biology, Genetics, Mycology, Fungi, Bacteria, Microorganisms, Biodegradable plastic, biodegradable mulch, compostable plastic, compostable mulch, plastic degradation, composting, breakdown, soil, 18S ribosomal DNA, isolation, culture
50373
Play Button
A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types
Authors: Haipeng Xing, Willey Liao, Yifan Mo, Michael Q. Zhang.
Institutions: Stony Brook University, Cold Spring Harbor Laboratory, University of Texas at Dallas.
ChIPseq is a widely used technique for investigating protein-DNA interactions. Read density profiles are generated by using next-sequencing of protein-bound DNA and aligning the short reads to a reference genome. Enriched regions are revealed as peaks, which often differ dramatically in shape, depending on the target protein1. For example, transcription factors often bind in a site- and sequence-specific manner and tend to produce punctate peaks, while histone modifications are more pervasive and are characterized by broad, diffuse islands of enrichment2. Reliably identifying these regions was the focus of our work. Algorithms for analyzing ChIPseq data have employed various methodologies, from heuristics3-5 to more rigorous statistical models, e.g. Hidden Markov Models (HMMs)6-8. We sought a solution that minimized the necessity for difficult-to-define, ad hoc parameters that often compromise resolution and lessen the intuitive usability of the tool. With respect to HMM-based methods, we aimed to curtail parameter estimation procedures and simple, finite state classifications that are often utilized. Additionally, conventional ChIPseq data analysis involves categorization of the expected read density profiles as either punctate or diffuse followed by subsequent application of the appropriate tool. We further aimed to replace the need for these two distinct models with a single, more versatile model, which can capably address the entire spectrum of data types. To meet these objectives, we first constructed a statistical framework that naturally modeled ChIPseq data structures using a cutting edge advance in HMMs9, which utilizes only explicit formulas-an innovation crucial to its performance advantages. More sophisticated then heuristic models, our HMM accommodates infinite hidden states through a Bayesian model. We applied it to identifying reasonable change points in read density, which further define segments of enrichment. Our analysis revealed how our Bayesian Change Point (BCP) algorithm had a reduced computational complexity-evidenced by an abridged run time and memory footprint. The BCP algorithm was successfully applied to both punctate peak and diffuse island identification with robust accuracy and limited user-defined parameters. This illustrated both its versatility and ease of use. Consequently, we believe it can be implemented readily across broad ranges of data types and end users in a manner that is easily compared and contrasted, making it a great tool for ChIPseq data analysis that can aid in collaboration and corroboration between research groups. Here, we demonstrate the application of BCP to existing transcription factor10,11 and epigenetic data12 to illustrate its usefulness.
Genetics, Issue 70, Bioinformatics, Genomics, Molecular Biology, Cellular Biology, Immunology, Chromatin immunoprecipitation, ChIP-Seq, histone modifications, segmentation, Bayesian, Hidden Markov Models, epigenetics
4273
Play Button
Isolation and Genome Analysis of Single Virions using 'Single Virus Genomics'
Authors: Lisa Zeigler Allen, Thomas Ishoey, Mark A. Novotny, Jeffrey S. McLean, Roger S. Lasken, Shannon J. Williamson.
Institutions: The J. Craig Venter Institute.
Whole genome amplification and sequencing of single microbial cells enables genomic characterization without the need of cultivation 1-3. Viruses, which are ubiquitous and the most numerous entities on our planet 4 and important in all environments 5, have yet to be revealed via similar approaches. Here we describe an approach for isolating and characterizing the genomes of single virions called 'Single Virus Genomics' (SVG). SVG utilizes flow cytometry to isolate individual viruses and whole genome amplification to obtain high molecular weight genomic DNA (gDNA) that can be used in subsequent sequencing reactions.
Genetics, Issue 75, Microbiology, Immunology, Virology, Molecular Biology, Environmental Sciences, Genomics, environmental genomics, Single virus, single virus genomics, SVG, whole genome amplification, flow cytometry, viral ecology, virion, genome analysis, DNA, PCR, sequencing
3899
Play Button
Chromatin Interaction Analysis with Paired-End Tag Sequencing (ChIA-PET) for Mapping Chromatin Interactions and Understanding Transcription Regulation
Authors: Yufen Goh, Melissa J. Fullwood, Huay Mei Poh, Su Qin Peh, Chin Thing Ong, Jingyao Zhang, Xiaoan Ruan, Yijun Ruan.
Institutions: Agency for Science, Technology and Research, Singapore, A*STAR-Duke-NUS Neuroscience Research Partnership, Singapore, National University of Singapore, Singapore.
Genomes are organized into three-dimensional structures, adopting higher-order conformations inside the micron-sized nuclear spaces 7, 2, 12. Such architectures are not random and involve interactions between gene promoters and regulatory elements 13. The binding of transcription factors to specific regulatory sequences brings about a network of transcription regulation and coordination 1, 14. Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) was developed to identify these higher-order chromatin structures 5,6. Cells are fixed and interacting loci are captured by covalent DNA-protein cross-links. To minimize non-specific noise and reduce complexity, as well as to increase the specificity of the chromatin interaction analysis, chromatin immunoprecipitation (ChIP) is used against specific protein factors to enrich chromatin fragments of interest before proximity ligation. Ligation involving half-linkers subsequently forms covalent links between pairs of DNA fragments tethered together within individual chromatin complexes. The flanking MmeI restriction enzyme sites in the half-linkers allow extraction of paired end tag-linker-tag constructs (PETs) upon MmeI digestion. As the half-linkers are biotinylated, these PET constructs are purified using streptavidin-magnetic beads. The purified PETs are ligated with next-generation sequencing adaptors and a catalog of interacting fragments is generated via next-generation sequencers such as the Illumina Genome Analyzer. Mapping and bioinformatics analysis is then performed to identify ChIP-enriched binding sites and ChIP-enriched chromatin interactions 8. We have produced a video to demonstrate critical aspects of the ChIA-PET protocol, especially the preparation of ChIP as the quality of ChIP plays a major role in the outcome of a ChIA-PET library. As the protocols are very long, only the critical steps are shown in the video.
Genetics, Issue 62, ChIP, ChIA-PET, Chromatin Interactions, Genomics, Next-Generation Sequencing
3770
Play Button
Quantitative and Automated High-throughput Genome-wide RNAi Screens in C. elegans
Authors: Barbara Squiban, Jérôme Belougne, Jonathan Ewbank, Olivier Zugasti.
Institutions: Université de la Méditerranée.
RNA interference is a powerful method to understand gene function, especially when conducted at a whole-genome scale and in a quantitative context. In C. elegans, gene function can be knocked down simply and efficiently by feeding worms with bacteria expressing a dsRNA corresponding to a specific gene 1. While the creation of libraries of RNAi clones covering most of the C. elegans genome 2,3 opened the way for true functional genomic studies (see for example 4-7), most established methods are laborious. Moy and colleagues have developed semi-automated protocols that facilitate genome-wide screens 8. The approach relies on microscopic imaging and image analysis. Here we describe an alternative protocol for a high-throughput genome-wide screen, based on robotic handling of bacterial RNAi clones, quantitative analysis using the COPAS Biosort (Union Biometrica (UBI)), and an integrated software: the MBioLIMS (Laboratory Information Management System from Modul-Bio) a technology that provides increased throughput for data management and sample tracking. The method allows screens to be conducted on solid medium plates. This is particularly important for some studies, such as those addressing host-pathogen interactions in C. elegans, since certain microbes do not efficiently infect worms in liquid culture. We show how the method can be used to quantify the importance of genes in anti-fungal innate immunity in C. elegans. In this case, the approach relies on the use of a transgenic strain carrying an epidermal infection-inducible fluorescent reporter gene, with GFP under the control of the promoter of the antimicrobial peptide gene nlp 29 and a red fluorescent reporter that is expressed constitutively in the epidermis. The latter provides an internal control for the functional integrity of the epidermis and nonspecific transgene silencing9. When control worms are infected by the fungus they fluoresce green. Knocking down by RNAi a gene required for nlp 29 expression results in diminished fluorescence after infection. Currently, this protocol allows more than 3,000 RNAi clones to be tested and analyzed per week, opening the possibility of screening the entire genome in less than 2 months.
Molecular Biology, Issue 60, C. elegans, fluorescent reporter, Biosort, LIMS, innate immunity, Drechmeria coniospora
3448
Play Button
Hi-C: A Method to Study the Three-dimensional Architecture of Genomes.
Authors: Nynke L. van Berkum, Erez Lieberman-Aiden, Louise Williams, Maxim Imakaev, Andreas Gnirke, Leonid A. Mirny, Job Dekker, Eric S. Lander.
Institutions: University of Massachusetts Medical School, Broad Institute of Harvard and Massachusetts Institute of Technology, Massachusetts Institute of Technology, Harvard University , Harvard University , Massachusetts Institute of Technology, Harvard Medical School, Massachusetts Institute of Technology.
The three-dimensional folding of chromosomes compartmentalizes the genome and and can bring distant functional elements, such as promoters and enhancers, into close spatial proximity 2-6. Deciphering the relationship between chromosome organization and genome activity will aid in understanding genomic processes, like transcription and replication. However, little is known about how chromosomes fold. Microscopy is unable to distinguish large numbers of loci simultaneously or at high resolution. To date, the detection of chromosomal interactions using chromosome conformation capture (3C) and its subsequent adaptations required the choice of a set of target loci, making genome-wide studies impossible 7-10. We developed Hi-C, an extension of 3C that is capable of identifying long range interactions in an unbiased, genome-wide fashion. In Hi-C, cells are fixed with formaldehyde, causing interacting loci to be bound to one another by means of covalent DNA-protein cross-links. When the DNA is subsequently fragmented with a restriction enzyme, these loci remain linked. A biotinylated residue is incorporated as the 5' overhangs are filled in. Next, blunt-end ligation is performed under dilute conditions that favor ligation events between cross-linked DNA fragments. This results in a genome-wide library of ligation products, corresponding to pairs of fragments that were originally in close proximity to each other in the nucleus. Each ligation product is marked with biotin at the site of the junction. The library is sheared, and the junctions are pulled-down with streptavidin beads. The purified junctions can subsequently be analyzed using a high-throughput sequencer, resulting in a catalog of interacting fragments. Direct analysis of the resulting contact matrix reveals numerous features of genomic organization, such as the presence of chromosome territories and the preferential association of small gene-rich chromosomes. Correlation analysis can be applied to the contact matrix, demonstrating that the human genome is segregated into two compartments: a less densely packed compartment containing open, accessible, and active chromatin and a more dense compartment containing closed, inaccessible, and inactive chromatin regions. Finally, ensemble analysis of the contact matrix, coupled with theoretical derivations and computational simulations, revealed that at the megabase scale Hi-C reveals features consistent with a fractal globule conformation.
Cellular Biology, Issue 39, Chromosome conformation capture, chromatin structure, Illumina Paired End sequencing, polymer physics.
1869
Play Button
A Strategy to Identify de Novo Mutations in Common Disorders such as Autism and Schizophrenia
Authors: Gauthier Julie, Fadi F. Hamdan, Guy A. Rouleau.
Institutions: Universite de Montreal, Universite de Montreal, Universite de Montreal.
There are several lines of evidence supporting the role of de novo mutations as a mechanism for common disorders, such as autism and schizophrenia. First, the de novo mutation rate in humans is relatively high, so new mutations are generated at a high frequency in the population. However, de novo mutations have not been reported in most common diseases. Mutations in genes leading to severe diseases where there is a strong negative selection against the phenotype, such as lethality in embryonic stages or reduced reproductive fitness, will not be transmitted to multiple family members, and therefore will not be detected by linkage gene mapping or association studies. The observation of very high concordance in monozygotic twins and very low concordance in dizygotic twins also strongly supports the hypothesis that a significant fraction of cases may result from new mutations. Such is the case for diseases such as autism and schizophrenia. Second, despite reduced reproductive fitness1 and extremely variable environmental factors, the incidence of some diseases is maintained worldwide at a relatively high and constant rate. This is the case for autism and schizophrenia, with an incidence of approximately 1% worldwide. Mutational load can be thought of as a balance between selection for or against a deleterious mutation and its production by de novo mutation. Lower rates of reproduction constitute a negative selection factor that should reduce the number of mutant alleles in the population, ultimately leading to decreased disease prevalence. These selective pressures tend to be of different intensity in different environments. Nonetheless, these severe mental disorders have been maintained at a constant relatively high prevalence in the worldwide population across a wide range of cultures and countries despite a strong negative selection against them2. This is not what one would predict in diseases with reduced reproductive fitness, unless there was a high new mutation rate. Finally, the effects of paternal age: there is a significantly increased risk of the disease with increasing paternal age, which could result from the age related increase in paternal de novo mutations. This is the case for autism and schizophrenia3. The male-to-female ratio of mutation rate is estimated at about 4–6:1, presumably due to a higher number of germ-cell divisions with age in males. Therefore, one would predict that de novo mutations would more frequently come from males, particularly older males4. A high rate of new mutations may in part explain why genetic studies have so far failed to identify many genes predisposing to complexes diseases genes, such as autism and schizophrenia, and why diseases have been identified for a mere 3% of genes in the human genome. Identification for de novo mutations as a cause of a disease requires a targeted molecular approach, which includes studying parents and affected subjects. The process for determining if the genetic basis of a disease may result in part from de novo mutations and the molecular approach to establish this link will be illustrated, using autism and schizophrenia as examples.
Medicine, Issue 52, de novo mutation, complex diseases, schizophrenia, autism, rare variations, DNA sequencing
2534
Copyright © JoVE 2006-2015. All Rights Reserved.
Policies | License Agreement | ISSN 1940-087X
simple hit counter

What is Visualize?

JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.

How does it work?

We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.

Video X seems to be unrelated to Abstract Y...

In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.