Translate this page to:
In JoVE (1)
Other Publications (32)
- Genome Biology
- Nature
- Nucleic Acids Research
- Nature
- Proceedings of the National Academy of Sciences of the United States of America
- Nature
- Nature
- Proceedings of the National Academy of Sciences of the United States of America
- Nature Biotechnology
- Proceedings of the National Academy of Sciences of the United States of America
- Methods (San Diego, Calif.)
- Genome Biology
- Science (New York, N.Y.)
- Genome Biology
- PLoS Biology
- Nature Methods
- Genome Research
- Nature Biotechnology
- Current Protocols in Human Genetics / Editorial Board, Jonathan L. Haines ... [et Al.]
- Nature Methods
- Genome Biology
- Nature Biotechnology
- Nature Biotechnology
- Proceedings of the National Academy of Sciences of the United States of America
- Cell Stem Cell
- Cell
- Genome Biology
- Nature Protocols
- Nature Biotechnology
- Nature Biotechnology
- Genome Biology
- PLoS Genetics
Articles by Andreas Gnirke in JoVE
Hi-C: A Method to Study the Three-dimensional Architecture of Genomes.
Nynke L. van Berkum*1, Erez Lieberman-Aiden*2,3,4,5, Louise Williams*2, Maxim Imakaev6, Andreas Gnirke2, Leonid A. Mirny3,6, Job Dekker1, Eric S. Lander2,7,8
1Program in Gene Function and Expression, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 2Broad Institute of Harvard and Massachusetts Institute of Technology, 3Division of Health Sciences and Technology, Massachusetts Institute of Technology, 4Program for Evolutionary Dynamics, Department of Organismic and Evolutionary Biology, Department of Mathematics, Harvard University, 5Department of Applied Mathematics, Harvard University, 6Department of Physics, Massachusetts Institute of Technology, 7Department of Systems Biology, Harvard Medical School, 8Department of Biology, Massachusetts Institute of Technology
The Hi-C method allows unbiased, genome-wide identification of chromatin interactions (1). Hi-C couples proximity ligation and massively parallel sequencing. The resulting data can be used to study genomic architecture at multiple scales: initial results identified features such as chromosome territories, segregation of open and closed chromatin, and chromatin structure at the megabase scale.
Other articles by Andreas Gnirke on PubMed
Assessing the Impact of Comparative Genomic Sequence Data on the Functional Annotation of the Drosophila Genome
Genome Biology. 2002 | Pubmed ID: 12537575
It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined.
DNA Sequence and Analysis of Human Chromosome 18
Nature. Sep, 2005 | Pubmed ID: 16177791
Chromosome 18 appears to have the lowest gene density of any human chromosome and is one of only three chromosomes for which trisomic individuals survive to term. There are also a number of genetic disorders stemming from chromosome 18 trisomy and aneuploidy. Here we report the finished sequence and gene annotation of human chromosome 18, which will allow a better understanding of the normal and disease biology of this chromosome. Despite the low density of protein-coding genes on chromosome 18, we find that the proportion of non-protein-coding sequences evolutionarily conserved among mammals is close to the genome-wide average. Extending this analysis to the entire human genome, we find that the density of conserved non-protein-coding sequences is largely uncorrelated with gene density. This has important implications for the nature and roles of non-protein-coding sequence elements.
Reduced Representation Bisulfite Sequencing for Comparative High-resolution DNA Methylation Analysis
Nucleic Acids Research. 2005 | Pubmed ID: 16224102
We describe a large-scale random approach termed reduced representation bisulfite sequencing (RRBS) for analyzing and comparing genomic methylation patterns. BglII restriction fragments were size-selected to 500-600 bp, equipped with adapters, treated with bisulfite, PCR amplified, cloned and sequenced. We constructed RRBS libraries from murine ES cells and from ES cells lacking DNA methyltransferases Dnmt3a and 3b and with knocked-down (kd) levels of Dnmt1 (Dnmt[1(kd),3a-/-,3b-/-]). Sequencing of 960 RRBS clones from Dnmt[1(kd),3a-/-,3b-/-] cells generated 343 kb of non-redundant bisulfite sequence covering 66212 cytosines in the genome. All but 38 cytosines had been converted to uracil indicating a conversion rate of >99.9%. Of the remaining cytosines 35 were found in CpG and 3 in CpT dinucleotides. Non-CpG methylation was >250-fold reduced compared with wild-type ES cells, consistent with a role for Dnmt3a and/or Dnmt3b in CpA and CpT methylation. Closer inspection revealed neither a consensus sequence around the methylated sites nor evidence for clustering of residual methylation in the genome. Our findings indicate random loss rather than specific maintenance of methylation in Dnmt[1(kd),3a-/-,3b-/-] cells. Near-complete bisulfite conversion and largely unbiased representation of RRBS libraries suggest that random shotgun bisulfite sequencing can be scaled to a genome-wide approach.
Insights from the Genome of the Biotrophic Fungal Plant Pathogen Ustilago Maydis
Nature. Nov, 2006 | Pubmed ID: 17080091
Ustilago maydis is a ubiquitous pathogen of maize and a well-established model organism for the study of plant-microbe interactions. This basidiomycete fungus does not use aggressive virulence strategies to kill its host. U. maydis belongs to the group of biotrophic parasites (the smuts) that depend on living tissue for proliferation and development. Here we report the genome sequence for a member of this economically important group of biotrophic fungi. The 20.5-million-base U. maydis genome assembly contains 6,902 predicted protein-encoding genes and lacks pathogenicity signatures found in the genomes of aggressive pathogenic fungi, for example a battery of cell-wall-degrading enzymes. However, we detected unexpected genomic features responsible for the pathogenicity of this organism. Specifically, we found 12 clusters of genes encoding small secreted proteins with unknown function. A significant fraction of these genes exists in small gene families. Expression analysis showed that most of the genes contained in these clusters are regulated together and induced in infected tissue. Deletion of individual clusters altered the virulence of U. maydis in five cases, ranging from a complete lack of symptoms to hypervirulence. Despite years of research into the mechanism of pathogenicity in U. maydis, no 'true' virulence factors had been previously identified. Thus, the discovery of the secreted protein gene clusters and the functional demonstration of their decisive role in the infection process illuminate previously unknown mechanisms of pathogenicity operating in biotrophic fungi. Genomic analysis is, similarly, likely to open up new avenues for the discovery of virulence determinants in other pathogens.
Systematic Discovery of Regulatory Motifs in Conserved Regions of the Human Genome, Including Thousands of CTCF Insulator Sites
Proceedings of the National Academy of Sciences of the United States of America. Apr, 2007 | Pubmed ID: 17442748
Conserved noncoding elements (CNEs) constitute the majority of sequences under purifying selection in the human genome, yet their function remains largely unknown. Experimental evidence suggests that many of these elements play regulatory roles, but little is known about regulatory motifs contained within them. Here we describe a systematic approach to discover and characterize regulatory motifs within mammalian CNEs by searching for long motifs (12-22 nt) with significant enrichment in CNEs and studying their biochemical and genomic properties. Our analysis identifies 233 long motifs (LMs), matching a total of approximately 60,000 conserved instances across the human genome. These motifs include 16 previously known regulatory elements, such as the histone 3'-UTR motif and the neuron-restrictive silencer element, as well as striking examples of novel functional elements. The most highly enriched motif (LM1) corresponds to the X-box motif known from yeast and nematode. We show that it is bound by the RFX1 protein and identify thousands of conserved motif instances, suggesting a broad role for the RFX family in gene regulation. A second group of motifs (LM2*) does not match any previously known motif. We demonstrate by biochemical and computational methods that it defines a binding site for the CTCF protein, which is involved in insulator function to limit the spread of gene activation. We identify nearly 15,000 conserved sites that likely serve as insulators, and we show that nearby genes separated by predicted CTCF sites show markedly reduced correlation in gene expression. These sites may thus partition the human genome into domains of expression.
The Genome of the Model Beetle and Pest Tribolium Castaneum
Nature. Apr, 2008 | Pubmed ID: 18362917
Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. Systemic RNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control.
Genome-scale DNA Methylation Maps of Pluripotent and Differentiated Cells
Nature. Aug, 2008 | Pubmed ID: 18600261
DNA methylation is essential for normal development and has been implicated in many pathologies including cancer. Our knowledge about the genome-wide distribution of DNA methylation, how it changes during cellular differentiation and how it relates to histone methylation and other chromatin modifications in mammals remains limited. Here we report the generation and analysis of genome-scale DNA methylation profiles at nucleotide resolution in mammalian cells. Using high-throughput reduced representation bisulphite sequencing and single-molecule-based sequencing, we generated DNA methylation maps covering most CpG islands, and a representative sampling of conserved non-coding elements, transposons and other genomic features, for mouse embryonic stem cells, embryonic-stem-cell-derived and primary neural cells, and eight other primary tissues. Several key findings emerge from the data. First, DNA methylation patterns are better correlated with histone methylation patterns than with the underlying genome sequence context. Second, methylation of CpGs are dynamic epigenetic marks that undergo extensive changes during cellular differentiation, particularly in regulatory regions outside of core promoters. Third, analysis of embryonic-stem-cell-derived and primary cells reveals that 'weak' CpG islands associated with a specific set of developmentally regulated genes undergo aberrant hypermethylation during extended proliferation in vitro, in a pattern reminiscent of that reported in some primary tumours. More generally, the results establish reduced representation bisulphite sequencing as a powerful technology for epigenetic profiling of cell populations relevant to developmental biology, cancer and regenerative medicine.
The Maternal-effect, Selfish Genetic Element Medea is Associated with a Composite Tc1 Transposon
Proceedings of the National Academy of Sciences of the United States of America. Jul, 2008 | Pubmed ID: 18621706
Maternal-Effect Dominant Embryonic Arrest ("Medea") factors are selfish nuclear elements that combine maternal-lethal and zygotic-rescue activities to gain a postzygotic survival advantage. We show that Medea(1) activity in Tribolium castaneum is associated with a composite Tc1 transposon inserted just downstream of the neurotransmitter reuptake symporter bloated tubules (blot), whose Drosophila ortholog has both maternal and zygotic functions. The 21.5-kb insertion contains defective copies of elongation initiation factor-3, ATP synthase subunit C, and an RNaseD-related gene, as well as a potentially intact copy of a prokaryotic DUF1703 gene. Sequence comparisons suggest that the current distribution of Medea(1) reflects global emanation after a single transpositional event in recent evolutionary time. The Medea system in Tribolium represents an unusual type of intragenomic conflict and could provide a useful vehicle for driving desirable genes into populations.
Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing
Nature Biotechnology. Feb, 2009 | Pubmed ID: 19182786
Targeting genomic loci by massively parallel sequencing requires new methods to enrich templates to be sequenced. We developed a capture method that uses biotinylated RNA 'baits' to fish targets out of a 'pond' of DNA fragments. The RNA is transcribed from PCR-amplified oligodeoxynucleotides originally synthesized on a microarray, generating sufficient bait for multiple captures at concentrations high enough to drive the hybridization. We tested this method with 170-mer baits that target >15,000 coding exons (2.5 Mb) and four regions (1.7 Mb total) using Illumina sequencing as read-out. About 90% of uniquely aligning bases fell on or near bait sequence; up to 50% lay on exons proper. The uniformity was such that approximately 60% of target bases in the exonic 'catch', and approximately 80% in the regional catch, had at least half the mean coverage. One lane of Illumina sequence was sufficient to call high-confidence genotypes for 89% of the targeted exon space.
Ab Initio Construction of a Eukaryotic Transcriptome by Massively Parallel MRNA Sequencing
Proceedings of the National Academy of Sciences of the United States of America. Mar, 2009 | Pubmed ID: 19208812
Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5' and 3' UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.
High-throughput Bisulfite Sequencing in Mammalian Genomes
Methods (San Diego, Calif.). Jul, 2009 | Pubmed ID: 19442738
DNA methylation is a critical epigenetic mark that is essential for mammalian development and aberrant in many diseases including cancer. Over the past decade multiple methods have been developed and applied to characterize its genome-wide distribution. Of these, reduced representation bisulfite sequencing (RRBS) generates nucleotide resolution DNA methylation bisulfite sequencing libraries that enrich for CpG-dense regions by methylation-insensitive restriction digestion. Here we provide an extensive, optimized protocol for generating RRBS libraries and discuss the power of this strategy for methylome profiling. We include information on sequence analysis and the relative coverage over genomic regions of interest for a representative mouse MspI generated RRBS library. Contemporary sequencing and array-based technologies are compared against sample throughput and coverage, highlighting the variety of options available to investigate methylation on the genome-scale.
ALLPATHS 2: Small Genomes Assembled Accurately and with High Continuity from Short Paired Reads
Genome Biology. 2009 | Pubmed ID: 19796385
We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).
Comprehensive Mapping of Long-range Interactions Reveals Folding Principles of the Human Genome
Science (New York, N.Y.). Oct, 2009 | Pubmed ID: 19815776
We describe Hi-C, a method that probes the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. We constructed spatial proximity maps of the human genome with Hi-C at a resolution of 1 megabase. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.
Targeted Next-generation Sequencing of a Cancer Transcriptome Enhances Detection of Sequence Variants and Novel Fusion Transcripts
Genome Biology. 2009 | Pubmed ID: 19835606
Targeted RNA-Seq combines next-generation sequencing with capture of sequences from a relevant subset of a transcriptome. When testing by capturing sequences from a tumor cDNA library by hybridization to oligonucleotide probes specific for 467 cancer-related genes, this method showed high selectivity, improved mutation detection enabling discovery of novel chimeric transcripts, and provided RNA expression data. Thus, targeted RNA-Seq produces an enhanced view of the molecular state of a set of "high interest" genes.
ZBED6, a Novel Transcription Factor Derived from a Domesticated DNA Transposon Regulates IGF2 Expression and Muscle Growth
PLoS Biology. Dec, 2009 | Pubmed ID: 20016685
A single nucleotide substitution in intron 3 of IGF2 in pigs abrogates a binding site for a repressor and leads to a 3-fold up-regulation of IGF2 in skeletal muscle. The mutation has major effects on muscle growth, size of the heart, and fat deposition. Here, we have identified the repressor and find that the protein, named ZBED6, is previously unknown, specific for placental mammals, and derived from an exapted DNA transposon. Silencing of Zbed6 in mouse C2C12 myoblasts affected Igf2 expression, cell proliferation, wound healing, and myotube formation. Chromatin immunoprecipitation (ChIP) sequencing using C2C12 cells identified about 2,500 ZBED6 binding sites in the genome, and the deduced consensus motif gave a perfect match with the established binding site in Igf2. Genes associated with ZBED6 binding sites showed a highly significant enrichment for certain Gene Ontology classifications, including development and transcriptional regulation. The phenotypic effects in mutant pigs and ZBED6-silenced C2C12 myoblasts, the extreme sequence conservation, its nucleolar localization, the broad tissue distribution, and the many target genes with essential biological functions suggest that ZBED6 is an important transcription factor in placental mammals, affecting development, cell proliferation, and growth.
Genome-scale DNA Methylation Mapping of Clinical Samples at Single-nucleotide Resolution
Nature Methods. Feb, 2010 | Pubmed ID: 20062050
Bisulfite sequencing measures absolute levels of DNA methylation at single-nucleotide resolution, providing a robust platform for molecular diagnostics. We optimized bisulfite sequencing for genome-scale analysis of clinical samples: here we outline how restriction digestion targets bisulfite sequencing to hotspots of epigenetic regulation and describe a statistical method for assessing significance of altered DNA methylation patterns. Thirty nanograms of DNA was sufficient for genome-scale analysis and our protocol worked well on formalin-fixed, paraffin-embedded samples.
Integrative Analysis of the Melanoma Transcriptome
Genome Research. Apr, 2010 | Pubmed ID: 20179022
Global studies of transcript structure and abundance in cancer cells enable the systematic discovery of aberrations that contribute to carcinogenesis, including gene fusions, alternative splice isoforms, and somatic mutations. We developed a systematic approach to characterize the spectrum of cancer-associated mRNA alterations through integration of transcriptomic and structural genomic data, and we applied this approach to generate new insights into melanoma biology. Using paired-end massively parallel sequencing of cDNA (RNA-seq) together with analyses of high-resolution chromosomal copy number data, we identified 11 novel melanoma gene fusions produced by underlying genomic rearrangements, as well as 12 novel readthrough transcripts. We mapped these chimeric transcripts to base-pair resolution and traced them to their genomic origins using matched chromosomal copy number information. We also used these data to discover and validate base-pair mutations that accumulated in these melanomas, revealing a surprisingly high rate of somatic mutation and lending support to the notion that point mutations constitute the major driver of melanoma progression. Taken together, these results may indicate new avenues for target discovery in melanoma, while also providing a template for large-scale transcriptome studies across many tumor types.
Ab Initio Reconstruction of Cell Type-specific Transcriptomes in Mouse Reveals the Conserved Multi-exonic Structure of LincRNAs
Nature Biotechnology. May, 2010 | Pubmed ID: 20436462
Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5' start sites, 3' ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
Targeted Exon Sequencing by In-solution Hybrid Selection
Current Protocols in Human Genetics / Editorial Board, Jonathan L. Haines ... [et Al.]. Jul, 2010 | Pubmed ID: 20582916
This unit describes a protocol for the targeted enrichment of exons from randomly sheared genomic DNA libraries using an in-solution hybrid selection approach for sequencing on an Illumina Genome Analyzer II. The steps for designing and ordering a hybrid selection oligo pool are reviewed, as are critical steps for performing the preparation and hybrid selection of an Illumina paired-end library. Critical parameters, performance metrics, and analysis workflow are discussed.
Comprehensive Comparative Analysis of Strand-specific RNA Sequencing Methods
Nature Methods. Sep, 2010 | Pubmed ID: 20711195
Strand-specific, massively parallel cDNA sequencing (RNA-seq) is a powerful tool for transcript discovery, genome annotation and expression profiling. There are multiple published methods for strand-specific RNA-seq, but no consensus exists as to how to choose between them. Here we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library-construction protocols, including both published and our own methods. We found marked differences in strand specificity, library complexity, evenness and continuity of coverage, agreement with known annotations and accuracy for expression profiling. Weighing each method's performance and ease, we identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.
Strand-specific RNA Sequencing Reveals Extensive Regulated Long Antisense Transcripts That Are Conserved Across Yeast Species
Genome Biology. 2010 | Pubmed ID: 20796282
Recent studies in budding yeast have shown that antisense transcription occurs at many loci. However, the functional role of antisense transcripts has been demonstrated only in a few cases and it has been suggested that most antisense transcripts may result from promiscuous bi-directional transcription in a dense genome.
Quantitative Comparison of Genome-wide DNA Methylation Mapping Technologies
Nature Biotechnology. Oct, 2010 | Pubmed ID: 20852634
DNA methylation plays a key role in regulating eukaryotic gene expression. Although mitotically heritable and stable over time, patterns of DNA methylation frequently change in response to cell differentiation, disease and environmental influences. Several methods have been developed to map DNA methylation on a genomic scale. Here, we benchmark four of these approaches by analyzing two human embryonic stem cell lines derived from genetically unrelated embryos and a matched pair of colon tumor and adjacent normal colon tissue obtained from the same donor. Our analysis reveals that methylated DNA immunoprecipitation sequencing (MeDIP-seq), methylated DNA capture by affinity purification (MethylCap-seq), reduced representation bisulfite sequencing (RRBS) and the Infinium HumanMethylation27 assay all produce accurate DNA methylation data. However, these methods differ in their ability to detect differentially methylated regions between pairs of samples. We highlight strengths and weaknesses of the four methods and give practical recommendations for the design of epigenomic case-control studies.
Comparison of Sequencing-based Methods to Profile DNA Methylation and Identification of Monoallelic Epigenetic Modifications
Nature Biotechnology. Oct, 2010 | Pubmed ID: 20852635
Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIP-seq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
High-quality Draft Assemblies of Mammalian Genomes from Massively Parallel Sequence Data
Proceedings of the National Academy of Sciences of the United States of America. Jan, 2011 | Pubmed ID: 21187386
Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.
Reprogramming Factor Expression Initiates Widespread Targeted Chromatin Remodeling
Cell Stem Cell. Jan, 2011 | Pubmed ID: 21211784
Despite rapid progress in characterizing transcription factor-driven reprogramming of somatic cells to an induced pluripotent stem cell (iPSC) state, many mechanistic questions still remain. To gain insight into the earliest events in the reprogramming process, we systematically analyzed the transcriptional and epigenetic changes that occur during early factor induction after discrete numbers of divisions. We observed rapid, genome-wide changes in the euchromatic histone modification, H3K4me2, at more than a thousand loci including large subsets of pluripotency-related or developmentally regulated gene promoters and enhancers. In contrast, patterns of the repressive H3K27me3 modification remained largely unchanged except for focused depletion specifically at positions where H3K4 methylation is gained. These chromatin regulatory events precede transcriptional changes within the corresponding loci. Our data provide evidence for an early, organized, and population-wide epigenetic response to ectopic reprogramming factors that clarify the temporal order through which somatic identity is reset during reprogramming.
Reference Maps of Human ES and IPS Cell Variation Enable High-throughput Characterization of Pluripotent Cell Lines
Cell. Feb, 2011 | Pubmed ID: 21295703
The developmental potential of human pluripotent stem cells suggests that they can produce disease-relevant cell types for biomedical research. However, substantial variation has been reported among pluripotent cell lines, which could affect their utility and clinical safety. Such cell-line-specific differences must be better understood before one can confidently use embryonic stem (ES) or induced pluripotent stem (iPS) cells in translational research. Toward this goal we have established genome-wide reference maps of DNA methylation and gene expression for 20 previously derived human ES lines and 12 human iPS cell lines, and we have measured the in vitro differentiation propensity of these cell lines. This resource enabled us to assess the epigenetic and transcriptional similarity of ES and iPS cells and to predict the differentiation efficiency of individual cell lines. The combination of assays yields a scorecard for quick and comprehensive characterization of pluripotent cell lines.
Analyzing and Minimizing PCR Amplification Bias in Illumina Sequencing Libraries
Genome Biology. 2011 | Pubmed ID: 21338519
Despite the ever-increasing output of Illumina sequencing data, loci with extreme base compositions are often under-represented or absent. To evaluate sources of base-composition bias, we traced genomic sequences ranging from 6% to 90% GC through the process by quantitative PCR. We identified PCR during library preparation as a principal source of bias and optimized the conditions. Our improved protocol significantly reduces amplification bias and minimizes the previously severe effects of PCR instrument and temperature ramp rate.
Preparation of Reduced Representation Bisulfite Sequencing Libraries for Genome-scale DNA Methylation Profiling
Nature Protocols. Apr, 2011 | Pubmed ID: 21412275
Genome-wide mapping of 5-methylcytosine is of broad interest to many fields of biology and medicine. A variety of methods have been developed, and several have recently been advanced to genome-wide scale using arrays and next-generation sequencing approaches. We have previously reported reduced representation bisulfite sequencing (RRBS), a bisulfite-based protocol that enriches CG-rich parts of the genome, thereby reducing the amount of sequencing required while capturing the majority of promoters and other relevant genomic regions. The approach provides single-nucleotide resolution, is highly sensitive and provides quantitative DNA methylation measurements. This protocol should enable any standard molecular biology laboratory to generate RRBS libraries of high quality. Briefly, purified genomic DNA is digested by the methylation-insensitive restriction enzyme MspI to generate short fragments that contain CpG dinucleotides at the ends. After end-repair, A-tailing and ligation to methylated Illumina adapters, the CpG-rich DNA fragments (40-220 bp) are size selected, subjected to bisulfite conversion, PCR amplified and end sequenced on an Illumina Genome Analyzer. Note that alignment and analysis of RRBS sequencing reads are not covered in this protocol. The extremely low input requirements (10-300 ng), the applicability of the protocol to formalin-fixed and paraffin-embedded samples, and the technique's single-nucleotide resolution extends RRBS to a wide range of biological and clinical samples and research applications. The entire process of RRBS library construction takes ∼9 d.
Metabolic Labeling of RNA Uncovers Principles of RNA Production and Degradation Dynamics in Mammalian Cells
Nature Biotechnology. May, 2011 | Pubmed ID: 21516085
Cellular RNA levels are determined by the interplay of RNA production, processing and degradation. However, because most studies of RNA regulation do not distinguish the separate contributions of these processes, little is known about how they are temporally integrated. Here we combine metabolic labeling of RNA at high temporal resolution with advanced RNA quantification and computational modeling to estimate RNA transcription and degradation rates during the response of mouse dendritic cells to lipopolysaccharide. We find that changes in transcription rates determine the majority of temporal changes in RNA levels, but that changes in degradation rates are important for shaping sharp 'peaked' responses. We used sequencing of the newly transcribed RNA population to estimate temporally constant RNA processing and degradation rates genome wide. Degradation rates vary significantly between genes and contribute to the observed differences in the dynamic response. Certain transcripts, including those encoding cytokines and transcription factors, mature faster. Our study provides a quantitative approach to study the integrative process of RNA regulation.
Full-length Transcriptome Assembly from RNA-Seq Data Without a Reference Genome
Nature Biotechnology. Jul, 2011 | Pubmed ID: 21572440
Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.
Hybrid Selection for Sequencing Pathogen Genomes from Clinical Samples
Genome Biology. 2011 | Pubmed ID: 21835008
We have adapted a solution hybrid selection protocol to enrich pathogen DNA in clinical samples dominated by human genetic material. Using mock mixtures of human and Plasmodium falciparum malaria parasite DNA as well as clinical samples from infected patients, we demonstrate an average of approximately 40-fold enrichment of parasite DNA after hybrid selection. This approach will enable efficient genome sequencing of pathogens from clinical samples, as well as sequencing of endosymbiotic organisms such as Wolbachia that live inside diverse metazoan phyla.
Genomic Distribution and Inter-sample Variation of Non-CpG Methylation Across Human Cell Types
PLoS Genetics. Dec, 2011 | Pubmed ID: 22174693
DNA methylation plays an important role in development and disease. The primary sites of DNA methylation in vertebrates are cytosines in the CpG dinucleotide context, which account for roughly three quarters of the total DNA methylation content in human and mouse cells. While the genomic distribution, inter-individual stability, and functional role of CpG methylation are reasonably well understood, little is known about DNA methylation targeting CpA, CpT, and CpC (non-CpG) dinucleotides. Here we report a comprehensive analysis of non-CpG methylation in 76 genome-scale DNA methylation maps across pluripotent and differentiated human cell types. We confirm non-CpG methylation to be predominantly present in pluripotent cell types and observe a decrease upon differentiation and near complete absence in various somatic cell types. Although no function has been assigned to it in pluripotency, our data highlight that non-CpG methylation patterns reappear upon iPS cell reprogramming. Intriguingly, the patterns are highly variable and show little conservation between different pluripotent cell lines. We find a strong correlation of non-CpG methylation and DNMT3 expression levels while showing statistical independence of non-CpG methylation from pluripotency associated gene expression. In line with these findings, we show that knockdown of DNMTA and DNMT3B in hESCs results in a global reduction of non-CpG methylation. Finally, non-CpG methylation appears to be spatially correlated with CpG methylation. In summary these results contribute further to our understanding of cytosine methylation patterns in human cells using a large representative sample set.
