The three-dimensional folding of chromosomes compartmentalizes the genome and and can bring distant functional elements, such as promoters and enhancers, into close spatial proximity 2-6. Deciphering the relationship between chromosome organization and genome activity will aid in understanding genomic processes, like transcription and replication. However, little is known about how chromosomes fold. Microscopy is unable to distinguish large numbers of loci simultaneously or at high resolution. To date, the detection of chromosomal interactions using chromosome conformation capture (3C) and its subsequent adaptations required the choice of a set of target loci, making genome-wide studies impossible 7-10.
We developed Hi-C, an extension of 3C that is capable of identifying long range interactions in an unbiased, genome-wide fashion. In Hi-C, cells are fixed with formaldehyde, causing interacting loci to be bound to one another by means of covalent DNA-protein cross-links. When the DNA is subsequently fragmented with a restriction enzyme, these loci remain linked. A biotinylated residue is incorporated as the 5' overhangs are filled in. Next, blunt-end ligation is performed under dilute conditions that favor ligation events between cross-linked DNA fragments. This results in a genome-wide library of ligation products, corresponding to pairs of fragments that were originally in close proximity to each other in the nucleus. Each ligation product is marked with biotin at the site of the junction. The library is sheared, and the junctions are pulled-down with streptavidin beads. The purified junctions can subsequently be analyzed using a high-throughput sequencer, resulting in a catalog of interacting fragments.
Direct analysis of the resulting contact matrix reveals numerous features of genomic organization, such as the presence of chromosome territories and the preferential association of small gene-rich chromosomes. Correlation analysis can be applied to the contact matrix, demonstrating that the human genome is segregated into two compartments: a less densely packed compartment containing open, accessible, and active chromatin and a more dense compartment containing closed, inaccessible, and inactive chromatin regions. Finally, ensemble analysis of the contact matrix, coupled with theoretical derivations and computational simulations, revealed that at the megabase scale Hi-C reveals features consistent with a fractal globule conformation.
24 Related JoVE Articles!
Massively Parallel Reporter Assays in Cultured Mammalian Cells
Institutions: Broad Institute.
The genetic reporter assay is a well-established and powerful tool for dissecting the relationship between DNA sequences and their gene regulatory activities. The potential throughput of this assay has, however, been limited by the need to individually clone and assay the activity of each sequence on interest using protein fluorescence or enzymatic activity as a proxy for regulatory activity. Advances in high-throughput DNA synthesis and sequencing technologies have recently made it possible to overcome these limitations by multiplexing the construction and interrogation of large libraries of reporter constructs. This protocol describes implementation of a Massively Parallel Reporter Assay (MPRA) that allows direct comparison of hundreds of thousands of putative regulatory sequences in a single cell culture dish.
Genetics, Issue 90, gene regulation, transcriptional regulation, sequence-activity mapping, reporter assay, library cloning, transfection, tag sequencing, mammalian cells
Highly Efficient Ligation of Small RNA Molecules for MicroRNA Quantitation by High-Throughput Sequencing
Institutions: University of Colorado, Boulder, University of Colorado, Denver.
MiRNA cloning and high-throughput sequencing, termed miR-Seq, stands alone as a transcriptome-wide approach to quantify miRNAs with single nucleotide resolution. This technique captures miRNAs by attaching 3’ and 5’ oligonucleotide adapters to miRNA molecules and allows de novo
miRNA discovery. Coupling with powerful next-generation sequencing platforms, miR-Seq has been instrumental in the study of miRNA biology. However, significant biases introduced by oligonucleotide ligation steps have prevented miR-Seq from being employed as an accurate quantitation tool. Previous studies demonstrate that biases in current miR-Seq methods often lead to inaccurate miRNA quantification with errors up to 1,000-fold for some miRNAs1,2
. To resolve these biases imparted by RNA ligation, we have developed a small RNA ligation method that results in ligation efficiencies of over 95% for both 3’ and 5′ ligation steps. Benchmarking this improved library construction method using equimolar or differentially mixed synthetic miRNAs, consistently yields reads numbers with less than two-fold deviation from the expected value. Furthermore, this high-efficiency miR-Seq method permits accurate genome-wide miRNA profiling from in vivo
total RNA samples2
Molecular Biology, Issue 93, RNA, ligation, miRNA, miR-Seq, linker, oligonucleotide, high-throughput sequencing
iCLIP - Transcriptome-wide Mapping of Protein-RNA Interactions with Individual Nucleotide Resolution
Institutions: Medical Research Council - MRC, EMBL Heidelberg, University of Ljubljana, Wellcome Trust Sanger Institute.
The unique composition and spatial arrangement of RNA-binding proteins (RBPs) on a transcript guide the diverse aspects of post-transcriptional regulation1
. Therefore, an essential step towards understanding transcript regulation at the molecular level is to gain positional information on the binding sites of RBPs2
Protein-RNA interactions can be studied using biochemical methods, but these approaches do not address RNA binding in its native cellular context. Initial attempts to study protein-RNA complexes in their cellular environment employed affinity purification or immunoprecipitation combined with differential display or microarray analysis (RIP-CHIP)3-5
. These approaches were prone to identifying indirect or non-physiological interactions6
. In order to increase the specificity and positional resolution, a strategy referred to as CLIP (UV cross-linking and immunoprecipitation) was introduced7,8
. CLIP combines UV cross-linking of proteins and RNA molecules with rigorous purification schemes including denaturing polyacrylamide gel electrophoresis. In combination with high-throughput sequencing technologies, CLIP has proven as a powerful tool to study protein-RNA interactions on a genome-wide scale (referred to as HITS-CLIP or CLIP-seq)9,10
. Recently, PAR-CLIP was introduced that uses photoreactive ribonucleoside analogs for cross-linking11,12
Despite the high specificity of the obtained data, CLIP experiments often generate cDNA libraries of limited sequence complexity. This is partly due to the restricted amount of co-purified RNA and the two inefficient RNA ligation reactions required for library preparation. In addition, primer extension assays indicated that many cDNAs truncate prematurely at the crosslinked nucleotide13
. Such truncated cDNAs are lost during the standard CLIP library preparation protocol. We recently developed iCLIP (individual-nucleotide resolution CLIP), which captures the truncated cDNAs by replacing one of the inefficient intermolecular RNA ligation steps with a more efficient intramolecular cDNA circularization (Figure 1)14
. Importantly, sequencing the truncated cDNAs provides insights into the position of the cross-link site at nucleotide resolution. We successfully applied iCLIP to study hnRNP C particle organization on a genome-wide scale and assess its role in splicing regulation14
Cellular Biology, Issue 50, RNA biochemistry, transcriptome, systems biology, RNA-binding protein
Modeling Astrocytoma Pathogenesis In Vitro and In Vivo Using Cortical Astrocytes or Neural Stem Cells from Conditional, Genetically Engineered Mice
Institutions: University of North Carolina School of Medicine, University of North Carolina School of Medicine, University of North Carolina School of Medicine, University of North Carolina School of Medicine, University of North Carolina School of Medicine, Emory University School of Medicine, University of North Carolina School of Medicine.
Current astrocytoma models are limited in their ability to define the roles of oncogenic mutations in specific brain cell types during disease pathogenesis and their utility for preclinical drug development. In order to design a better model system for these applications, phenotypically wild-type cortical astrocytes and neural stem cells (NSC) from conditional, genetically engineered mice (GEM) that harbor various combinations of floxed oncogenic alleles were harvested and grown in culture. Genetic recombination was induced in vitro
using adenoviral Cre-mediated recombination, resulting in expression of mutated oncogenes and deletion of tumor suppressor genes. The phenotypic consequences of these mutations were defined by measuring proliferation, transformation, and drug response in vitro
. Orthotopic allograft models, whereby transformed cells are stereotactically injected into the brains of immune-competent, syngeneic littermates, were developed to define the role of oncogenic mutations and cell type on tumorigenesis in vivo
. Unlike most established human glioblastoma cell line xenografts, injection of transformed GEM-derived cortical astrocytes into the brains of immune-competent littermates produced astrocytomas, including the most aggressive subtype, glioblastoma, that recapitulated the histopathological hallmarks of human astrocytomas, including diffuse invasion of normal brain parenchyma. Bioluminescence imaging of orthotopic allografts from transformed astrocytes engineered to express luciferase was utilized to monitor in vivo
tumor growth over time. Thus, astrocytoma models using astrocytes and NSC harvested from GEM with conditional oncogenic alleles provide an integrated system to study the genetics and cell biology of astrocytoma pathogenesis in vitro
and in vivo
and may be useful in preclinical drug development for these devastating diseases.
Neuroscience, Issue 90, astrocytoma, cortical astrocytes, genetically engineered mice, glioblastoma, neural stem cells, orthotopic allograft
Initiation of Metastatic Breast Carcinoma by Targeting of the Ductal Epithelium with Adenovirus-Cre: A Novel Transgenic Mouse Model of Breast Cancer
Institutions: Wistar Institute, University of Pennsylvania, Geisel School of Medicine at Dartmouth, University of Pennsylvania, University of Pennsylvania, University of Pennsylvania.
Breast cancer is a heterogeneous disease involving complex cellular interactions between the developing tumor and immune system, eventually resulting in exponential tumor growth and metastasis to distal tissues and the collapse of anti-tumor immunity. Many useful animal models exist to study breast cancer, but none completely recapitulate the disease progression that occurs in humans. In order to gain a better understanding of the cellular interactions that result in the formation of latent metastasis and decreased survival, we have generated an inducible transgenic mouse model of YFP-expressing ductal carcinoma that develops after sexual maturity in immune-competent mice and is driven by consistent, endocrine-independent oncogene expression. Activation of YFP, ablation of p53, and expression of an oncogenic form of K-ras was achieved by the delivery of an adenovirus expressing Cre-recombinase into the mammary duct of sexually mature, virgin female mice. Tumors begin to appear 6 weeks after the initiation of oncogenic events. After tumors become apparent, they progress slowly for approximately two weeks before they begin to grow exponentially. After 7-8 weeks post-adenovirus injection, vasculature is observed connecting the tumor mass to distal lymph nodes, with eventual lymphovascular invasion of YFP+ tumor cells to the distal axillary lymph nodes. Infiltrating leukocyte populations are similar to those found in human breast carcinomas, including the presence of αβ and γδ T cells, macrophages and MDSCs. This unique model will facilitate the study of cellular and immunological mechanisms involved in latent metastasis and dormancy in addition to being useful for designing novel immunotherapeutic interventions to treat invasive breast cancer.
Medicine, Issue 85, Transgenic mice, breast cancer, metastasis, intraductal injection, latent mutations, adenovirus-Cre
The Production of C. elegans Transgenes via Recombineering with the galK Selectable Marker
Institutions: Beth Israel Deaconess Medical Center, Harvard Medical School, University of Pittsburgh.
The creation of transgenic animals is widely utilized in C. elegans
research including the use of GFP fusion proteins to study the regulation and expression pattern of genes of interest or generation of tandem affinity purification (TAP) tagged versions of specific genes to facilitate their purification. Typically transgenes are generated by placing a promoter upstream of a GFP reporter gene or cDNA of interest, and this often produces a representative expression pattern. However, critical elements of gene regulation, such as control elements in the 3' untranslated region or alternative promoters, could be missed by this approach. Further only a single splice variant can be usually studied by this means. In contrast, the use of worm genomic DNA carried by fosmid DNA clones likely includes most if not all elements involved in gene regulation in vivo
which permits the greater ability to capture the genuine expression pattern and timing. To facilitate the generation of transgenes using fosmid DNA, we describe an E. coli
based recombineering procedure to insert GFP, a TAP-tag, or other sequences of interest into any location in the gene. The procedure uses the galK
gene as the selection marker for both the positive and negative selection steps in recombineering which results in obtaining the desired modification with high efficiency. Further, plasmids containing the galK
gene flanked by homology arms to commonly used GFP and TAP fusion genes are available which reduce the cost of oligos by 50% when generating a GFP or TAP fusion protein. These plasmids use the R6K replication origin which precludes the need for extensive PCR product purification. Finally, we also demonstrate a technique to integrate the unc-119
marker on to the fosmid backbone which allows the fosmid to be directly injected or bombarded into worms to generate transgenic animals. This video demonstrates the procedures involved in generating a transgene via recombineering using this method.
Genetics, Issue 47, C. elegans, transgenes, fosmid clone, galK, recombineering, homologous recombination, E. coli
The MultiBac Protein Complex Production Platform at the EMBL
Institutions: EMBL Grenoble Outstation and Unit of Virus Host Cell Interactions (UVHCI) UMR5322.
Proteomics research revealed the impressive complexity of eukaryotic proteomes in unprecedented detail. It is now a commonly accepted notion that proteins in cells mostly exist not as isolated entities but exert their biological activity in association with many other proteins, in humans ten or more, forming assembly lines in the cell for most if not all vital functions.1,2
Knowledge of the function and architecture of these multiprotein assemblies requires their provision in superior quality and sufficient quantity for detailed analysis. The paucity of many protein complexes in cells, in particular in eukaryotes, prohibits their extraction from native sources, and necessitates recombinant production. The baculovirus expression vector system (BEVS) has proven to be particularly useful for producing eukaryotic proteins, the activity of which often relies on post-translational processing that other commonly used expression systems often cannot support.3
BEVS use a recombinant baculovirus into which the gene of interest was inserted to infect insect cell cultures which in turn produce the protein of choice. MultiBac is a BEVS that has been particularly tailored for the production of eukaryotic protein complexes that contain many subunits.4
A vital prerequisite for efficient production of proteins and their complexes are robust protocols for all steps involved in an expression experiment that ideally can be implemented as standard operating procedures (SOPs) and followed also by non-specialist users with comparative ease. The MultiBac platform at the European Molecular Biology Laboratory (EMBL) uses SOPs for all steps involved in a multiprotein complex expression experiment, starting from insertion of the genes into an engineered baculoviral genome optimized for heterologous protein production properties to small-scale analysis of the protein specimens produced.5-8
The platform is installed in an open-access mode at EMBL Grenoble and has supported many scientists from academia and industry to accelerate protein complex research projects.
Molecular Biology, Issue 77, Genetics, Bioengineering, Virology, Biochemistry, Microbiology, Basic Protocols, Genomics, Proteomics, Automation, Laboratory, Biotechnology, Multiprotein Complexes, Biological Science Disciplines, Robotics, Protein complexes, multigene delivery, recombinant expression, baculovirus system, MultiBac platform, standard operating procedures (SOP), cell, culture, DNA, RNA, protein, production, sequencing
Gene Trapping Using Gal4 in Zebrafish
Institutions: Temple University .
Large clutch size and external development of optically transparent embryos make zebrafish an exceptional vertebrate model system for in vivo
insertional mutagenesis using fluorescent reporters to tag expression of mutated genes. Several laboratories have constructed and tested enhancer- and gene-trap vectors in zebrafish, using fluorescent proteins, Gal4- and lexA- based transcriptional activators as reporters 1-7
. These vectors had two potential drawbacks: suboptimal stringency (e.g.
lack of ability to differentiate between enhancer- and gene-trap events) and low mutagenicity (e.g.
integrations into genes rarely produced null alleles). Gene Breaking Transposon (GBTs) were developed to address these drawbacks 8-10
. We have modified one of the first GBT vectors, GBT-R15, for use with Gal4-VP16 as the primary gene trap reporter and added UAS:eGFP as the secondary reporter for direct detection of gene trap events. Application of Gal4-VP16 as the primary gene trap reporter provides two main advantages. First, it increases sensitivity for genes expressed at low expression levels. Second, it enables researchers to use gene trap lines as Gal4 drivers to direct expression of other transgenes in very specific tissues. This is especially pertinent for genes with non-essential or redundant functions, where gene trap integration may not result in overt phenotypes. The disadvantage of using Gal4-VP16 as the primary gene trap reporter is that genes coding for proteins with N-terminal signal sequences are not amenable to trapping, as the resulting Gal4-VP16 fusion proteins are unlikely to be able to enter the nucleus and activate transcription. Importantly, the use of Gal4-VP16 does not pre-select for nuclear proteins: we recovered gene trap mutations in genes encoding proteins which function in the nucleus, the cytoplasm and the plasma membrane.
Developmental Biology, Issue 79, Zebrafish, Mutagenesis, Genetics, genetics (animal and plant), Gal4, transposon, gene trap, insertional mutagenesis
Derivation and Characterization of a Transgene-free Human Induced Pluripotent Stem Cell Line and Conversion into Defined Clinical-grade Conditions
Institutions: University of California, Los Angeles (UCLA), University of California, Los Angeles (UCLA).
Human induced pluripotent stem cells (hiPSCs) can be generated with lentiviral-based reprogramming methodologies. However, traces of potentially oncogenic genes remaining in actively transcribed regions of the genome, limit their potential for use in human therapeutic applications1
. Additionally, non-human antigens derived from stem cell reprogramming or differentiation into therapeutically relevant derivatives preclude these hiPSCs from being used in a human clinical context2
. In this video, we present a procedure for reprogramming and analyzing factor-free hiPSCs free of exogenous transgenes. These hiPSCs then can be analyzed for gene expression abnormalities in the specific intron containing the lentivirus. This analysis may be conducted using sensitive quantitative polymerase chain reaction (PCR), which has an advantage over less sensitive techniques previously used to detect gene expression differences3
. Full conversion into clinical-grade good manufacturing practice (GMP) conditions, allows human clinical relevance. Our protocol offers another methodology—provided that current safe-harbor criteria will expand and include factor-free characterized hiPSC-based derivatives for human therapeutic applications—for deriving GMP-grade hiPSCs, which should eliminate any immunogenicity risk due to non-human antigens. This protocol is broadly applicable to lentiviral reprogrammed cells of any type and provides a reproducible method for converting reprogrammed cells into GMP-grade conditions.
Stem Cell Biology, Issue 93, Human induced pluripotent stem cells, STEMCCA, factor-free, GMP, xeno-free, quantitative PCR
Polymerase Chain Reaction: Basic Protocol Plus Troubleshooting and Optimization Strategies
Institutions: University of California, Los Angeles .
In the biological sciences there have been technological advances that catapult the discipline into golden ages of discovery. For example, the field of microbiology was transformed with the advent of Anton van Leeuwenhoek's microscope, which allowed scientists to visualize prokaryotes for the first time. The development of the polymerase chain reaction (PCR) is one of those innovations that changed the course of molecular science with its impact spanning countless subdisciplines in biology. The theoretical process was outlined by Keppe and coworkers in 1971; however, it was another 14 years until the complete PCR procedure was described and experimentally applied by Kary Mullis while at Cetus Corporation in 1985. Automation and refinement of this technique progressed with the introduction of a thermal stable DNA polymerase from the bacterium Thermus aquaticus
, consequently the name Taq
PCR is a powerful amplification technique that can generate an ample supply of a specific segment of DNA (i.e., an amplicon) from only a small amount of starting material (i.e., DNA template or target sequence). While straightforward and generally trouble-free, there are pitfalls that complicate the reaction producing spurious results. When PCR fails it can lead to many non-specific DNA products of varying sizes that appear as a ladder or smear of bands on agarose gels. Sometimes no products form at all. Another potential problem occurs when mutations are unintentionally introduced in the amplicons, resulting in a heterogeneous population of PCR products. PCR failures can become frustrating unless patience and careful troubleshooting are employed to sort out and solve the problem(s). This protocol outlines the basic principles of PCR, provides a methodology that will result in amplification of most target sequences, and presents strategies for optimizing a reaction. By following this PCR guide, students should be able to:
● Set up reactions and thermal cycling conditions for a conventional PCR experiment
● Understand the function of various reaction components and their overall effect on a PCR experiment
● Design and optimize a PCR experiment for any DNA template
● Troubleshoot failed PCR experiments
Basic Protocols, Issue 63, PCR, optimization, primer design, melting temperature, Tm, troubleshooting, additives, enhancers, template DNA quantification, thermal cycler, molecular biology, genetics
Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
Institutions: Washington University School of Medicine, Washington University School of Medicine, Washington University School of Medicine.
As DNA sequencing technology has markedly advanced in recent years2
, it has become increasingly evident that the amount of genetic variation between any two individuals is greater than previously thought3
. In contrast, array-based genotyping has failed to identify a significant contribution of common sequence variants to the phenotypic variability of common disease4,5
. Taken together, these observations have led to the evolution of the Common Disease / Rare Variant hypothesis suggesting that the majority of the "missing heritability" in common and complex phenotypes is instead due to an individual's personal profile of rare or private DNA variants6-8
. However, characterizing how rare variation impacts complex phenotypes requires the analysis of many affected individuals at many genomic loci, and is ideally compared to a similar survey in an unaffected cohort. Despite the sequencing power offered by today's platforms, a population-based survey of many genomic loci and the subsequent computational analysis required remains prohibitive for many investigators.
To address this need, we have developed a pooled sequencing approach1,9
and a novel software package1
for highly accurate rare variant detection from the resulting data. The ability to pool genomes from entire populations of affected individuals and survey the degree of genetic variation at multiple targeted regions in a single sequencing library provides excellent cost and time savings to traditional single-sample sequencing methodology. With a mean sequencing coverage per allele of 25-fold, our custom algorithm, SPLINTER, uses an internal variant calling control strategy to call insertions, deletions and substitutions up to four base pairs in length with high sensitivity and specificity from pools of up to 1 mutant allele in 500 individuals. Here we describe the method for preparing the pooled sequencing library followed by step-by-step instructions on how to use the SPLINTER package for pooled sequencing analysis (https://www.ibridgenetwork.org/wustl/splinter). We show a comparison between pooled sequencing of 947 individuals, all of whom also underwent genome-wide array, at over 20kb of sequencing per person. Concordance between genotyping of tagged and novel variants called in the pooled sample were excellent. This method can be easily scaled up to any number of genomic loci and any number of individuals. By incorporating the internal positive and negative amplicon controls at ratios that mimic the population under study, the algorithm can be calibrated for optimal performance. This strategy can also be modified for use with hybridization capture or individual-specific barcodes and can be applied to the sequencing of naturally heterogeneous samples, such as tumor DNA.
Genetics, Issue 64, Genomics, Cancer Biology, Bioinformatics, Pooled DNA sequencing, SPLINTER, rare genetic variants, genetic screening, phenotype, high throughput, computational analysis, DNA, PCR, primers
Identification of Sleeping Beauty Transposon Insertions in Solid Tumors using Linker-mediated PCR
Institutions: University of Minnesota, Minneapolis, University of Minnesota, Minneapolis.
Genomic, proteomic, transcriptomic, and epigenomic analyses of human tumors indicate that there are thousands of anomalies within each cancer genome compared to matched normal tissue. Based on these analyses it is evident that there are many undiscovered genetic drivers of cancer1
. Unfortunately these drivers are hidden within a much larger number of passenger anomalies in the genome that do not directly contribute to tumor formation. Another aspect of the cancer genome is that there is considerable genetic heterogeneity within similar tumor types. Each tumor can harbor different mutations that provide a selective advantage for tumor formation2
. Performing an unbiased forward genetic screen in mice provides the tools to generate tumors and analyze their genetic composition, while reducing the background of passenger mutations. The Sleeping Beauty
(SB) transposon system is one such method3
. The SB system utilizes mobile vectors (transposons) that can be inserted throughout the genome by the transposase enzyme. Mutations are limited to a specific cell type through the use of a conditional transposase allele that is activated by Cre Recombinase
. Many mouse lines exist that express Cre Recombinase
in specific tissues. By crossing one of these lines to the conditional transposase allele (e.g.
Lox-stop-Lox-SB11), the SB system is activated only in cells that express Cre Recombinase
. The Cre Recombinase
will excise a stop cassette that blocks expression of the transposase allele, thereby activating transposon mutagenesis within the designated cell type. An SB screen is initiated by breeding three strains of transgenic mice so that the experimental mice carry a conditional transposase allele, a concatamer of transposons, and a tissue-specific Cre Recombinase
allele. These mice are allowed to age until tumors form and they become moribund. The mice are then necropsied and genomic DNA is isolated from the tumors. Next, the genomic DNA is subjected to linker-mediated-PCR (LM-PCR) that results in amplification of genomic loci containing an SB transposon. LM-PCR performed on a single tumor will result in hundreds of distinct amplicons representing the hundreds of genomic loci containing transposon insertions in a single tumor4
. The transposon insertions in all tumors are analyzed and common insertion sites (CISs) are identified using an appropriate statistical method5
. Genes within the CIS are highly likely to be oncogenes or tumor suppressor genes, and are considered candidate cancer genes. The advantages of using the SB system to identify candidate cancer genes are: 1) the transposon can easily be located in the genome because its sequence is known, 2) transposition can be directed to almost any cell type and 3) the transposon is capable of introducing both gain- and loss-of-function mutations6
. The following protocol describes how to devise and execute a forward genetic screen using the SB transposon system to identify candidate cancer genes (Figure 1
Genetics, Issue 72, Medicine, Cancer Biology, Biomedical Engineering, Genomics, Mice, Genetic Techniques, life sciences, animal models, Neoplasms, Genetic Phenomena, Forward genetic screen, cancer drivers, mouse models, oncogenes, tumor suppressor genes, Sleeping Beauty transposons, insertions, DNA, PCR, animal model
Chromatin Interaction Analysis with Paired-End Tag Sequencing (ChIA-PET) for Mapping Chromatin Interactions and Understanding Transcription Regulation
Institutions: Agency for Science, Technology and Research, Singapore, A*STAR-Duke-NUS Neuroscience Research Partnership, Singapore, National University of Singapore, Singapore.
Genomes are organized into three-dimensional structures, adopting higher-order conformations inside the micron-sized nuclear spaces 7, 2, 12
. Such architectures are not random and involve interactions between gene promoters and regulatory elements 13
. The binding of transcription factors to specific regulatory sequences brings about a network of transcription regulation and coordination 1, 14
Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET) was developed to identify these higher-order chromatin structures 5,6
. Cells are fixed and interacting loci are captured by covalent DNA-protein cross-links. To minimize non-specific noise and reduce complexity, as well as to increase the specificity of the chromatin interaction analysis, chromatin immunoprecipitation (ChIP) is used against specific protein factors to enrich chromatin fragments of interest before proximity ligation. Ligation involving half-linkers subsequently forms covalent links between pairs of DNA fragments tethered together within individual chromatin complexes. The flanking MmeI restriction enzyme sites in the half-linkers allow extraction of paired end tag-linker-tag constructs (PETs) upon MmeI digestion. As the half-linkers are biotinylated, these PET constructs are purified using streptavidin-magnetic beads. The purified PETs are ligated with next-generation sequencing adaptors and a catalog of interacting fragments is generated via next-generation sequencers such as the Illumina Genome Analyzer. Mapping and bioinformatics analysis is then performed to identify ChIP-enriched binding sites and ChIP-enriched chromatin interactions 8
We have produced a video to demonstrate critical aspects of the ChIA-PET protocol, especially the preparation of ChIP as the quality of ChIP plays a major role in the outcome of a ChIA-PET library. As the protocols are very long, only the critical steps are shown in the video.
Genetics, Issue 62, ChIP, ChIA-PET, Chromatin Interactions, Genomics, Next-Generation Sequencing
Single Read and Paired End mRNA-Seq Illumina Libraries from 10 Nanograms Total RNA
Institutions: Morgridge Institute for Research, University of Wisconsin, University of California.
Whole transcriptome sequencing by mRNA-Seq is now used extensively to perform global gene expression, mutation, allele-specific expression and other genome-wide analyses. mRNA-Seq even opens the gate for gene expression analysis of non-sequenced genomes. mRNA-Seq offers high sensitivity, a large dynamic range and allows measurement of transcript copy numbers in a sample. Illumina’s genome analyzer performs sequencing of a large number (> 107
) of relatively short sequence reads (< 150 bp).The "paired end" approach, wherein a single long read is sequenced at both its ends, allows for tracking alternate splice junctions, insertions and deletions, and is useful for de novo
One of the major challenges faced by researchers is a limited amount of starting material. For example, in experiments where cells are harvested by laser micro-dissection, available starting total RNA may measure in nanograms. Preparation of mRNA-Seq libraries from such samples have been described1, 2
but involves significant PCR amplification that may introduce bias. Other RNA-Seq library construction procedures with minimal PCR amplification have been published3, 4
but require microgram amounts of starting total RNA.
Here we describe a protocol for the Illumina Genome Analyzer II platform for mRNA-Seq sequencing for library preparation that avoids significant PCR amplification and requires only 10 nanograms of total RNA. While this protocol has been described previously and validated for single-end sequencing5
, where it was shown to produce directional libraries without introducing significant amplification bias, here we validate it further for use as a paired end protocol. We selectively amplify polyadenylated messenger RNAs from starting total RNA using the T7 based Eberwine linear amplification method, coined "T7LA" (T7 linear amplification). The amplified poly-A mRNAs are fragmented, reverse transcribed and adapter ligated to produce the final sequencing library. For both single read and paired end runs, sequences are mapped to the human transcriptome6
and normalized so that data from multiple runs can be compared. We report the gene expression measurement in units of transcripts per million (TPM), which is a superior measure to RPKM when comparing samples7
Molecular Biology, Issue 56, Genetics, mRNA-Seq, Illumina-Seq, gene expression profiling, high throughput sequencing
Genetic Manipulation in Δku80 Strains for Functional Genomic Analysis of Toxoplasma gondii
Institutions: The Geisel School of Medicine at Dartmouth.
Targeted genetic manipulation using homologous recombination is the method of choice for functional genomic analysis to obtain a detailed view of gene function and phenotype(s). The development of mutant strains with targeted gene deletions, targeted mutations, complemented gene function, and/or tagged genes provides powerful strategies to address gene function, particularly if these genetic manipulations can be efficiently targeted to the gene locus of interest using integration mediated by double cross over homologous recombination.
Due to very high rates of nonhomologous recombination, functional genomic analysis of Toxoplasma gondii
has been previously limited by the absence of efficient methods for targeting gene deletions and gene replacements to specific genetic loci. Recently, we abolished the major pathway of nonhomologous recombination in type I and type II strains of T. gondii
by deleting the gene encoding the KU80 protein1,2
. The Δku80
strains behave normally during tachyzoite (acute) and bradyzoite (chronic) stages in vitro
and in vivo
and exhibit essentially a 100% frequency of homologous recombination. The Δku80
strains make functional genomic studies feasible on the single gene as well as on the genome scale1-4
Here, we report methods for using type I and type II Δku80Δhxgprt
strains to advance gene targeting approaches in T. gondii
. We outline efficient methods for generating gene deletions, gene replacements, and tagged genes by targeted insertion or deletion of the hypoxanthine-xanthine-guanine phosphoribosyltransferase (HXGPRT
) selectable marker. The described gene targeting protocol can be used in a variety of ways in Δku80
strains to advance functional analysis of the parasite genome and to develop single strains that carry multiple targeted genetic manipulations. The application of this genetic method and subsequent phenotypic assays will reveal fundamental and unique aspects of the biology of T. gondii
and related significant human pathogens that cause malaria (Plasmodium
sp.) and cryptosporidiosis (Cryptosporidium
Infectious Diseases, Issue 77, Genetics, Microbiology, Infection, Medicine, Immunology, Molecular Biology, Cellular Biology, Biomedical Engineering, Bioengineering, Genomics, Parasitology, Pathology, Apicomplexa, Coccidia, Toxoplasma, Genetic Techniques, Gene Targeting, Eukaryota, Toxoplasma gondii, genetic manipulation, gene targeting, gene deletion, gene replacement, gene tagging, homologous recombination, DNA, sequencing
Mouse Genome Engineering Using Designer Nucleases
Institutions: University of Zurich, University of Minnesota.
Transgenic mice carrying site-specific genome modifications (knockout, knock-in) are of vital importance for dissecting complex biological systems as well as for modeling human diseases and testing therapeutic strategies. Recent advances in the use of designer nucleases such as zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and the clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) 9 system for site-specific genome engineering open the possibility to perform rapid targeted genome modification in virtually any laboratory species without the need to rely on embryonic stem (ES) cell technology. A genome editing experiment typically starts with identification of designer nuclease target sites within a gene of interest followed by construction of custom DNA-binding domains to direct nuclease activity to the investigator-defined genomic locus. Designer nuclease plasmids are in vitro
transcribed to generate mRNA for microinjection of fertilized mouse oocytes. Here, we provide a protocol for achieving targeted genome modification by direct injection of TALEN mRNA into fertilized mouse oocytes.
Genetics, Issue 86, Oocyte microinjection, Designer nucleases, ZFN, TALEN, Genome Engineering
Identification of Key Factors Regulating Self-renewal and Differentiation in EML Hematopoietic Precursor Cells by RNA-sequencing Analysis
Institutions: The University of Texas Graduate School of Biomedical Sciences at Houston.
Hematopoietic stem cells (HSCs) are used clinically for transplantation treatment to rebuild a patient's hematopoietic system in many diseases such as leukemia and lymphoma. Elucidating the mechanisms controlling HSCs self-renewal and differentiation is important for application of HSCs for research and clinical uses. However, it is not possible to obtain large quantity of HSCs due to their inability to proliferate in vitro
. To overcome this hurdle, we used a mouse bone marrow derived cell line, the EML (Erythroid, Myeloid, and Lymphocytic) cell line, as a model system for this study.
RNA-sequencing (RNA-Seq) has been increasingly used to replace microarray for gene expression studies. We report here a detailed method of using RNA-Seq technology to investigate the potential key factors in regulation of EML cell self-renewal and differentiation. The protocol provided in this paper is divided into three parts. The first part explains how to culture EML cells and separate Lin-CD34+ and Lin-CD34- cells. The second part of the protocol offers detailed procedures for total RNA preparation and the subsequent library construction for high-throughput sequencing. The last part describes the method for RNA-Seq data analysis and explains how to use the data to identify differentially expressed transcription factors between Lin-CD34+ and Lin-CD34- cells. The most significantly differentially expressed transcription factors were identified to be the potential key regulators controlling EML cell self-renewal and differentiation. In the discussion section of this paper, we highlight the key steps for successful performance of this experiment.
In summary, this paper offers a method of using RNA-Seq technology to identify potential regulators of self-renewal and differentiation in EML cells. The key factors identified are subjected to downstream functional analysis in vitro
and in vivo
Genetics, Issue 93, EML Cells, Self-renewal, Differentiation, Hematopoietic precursor cell, RNA-Sequencing, Data analysis
RNA-seq Analysis of Transcriptomes in Thrombin-treated and Control Human Pulmonary Microvascular Endothelial Cells
Institutions: Children's Mercy Hospital and Clinics, School of Medicine, University of Missouri-Kansas City.
The characterization of gene expression in cells via measurement of mRNA levels is a useful tool in determining how the transcriptional machinery of the cell is affected by external signals (e.g.
drug treatment), or how cells differ between a healthy state and a diseased state. With the advent and continuous refinement of next-generation DNA sequencing technology, RNA-sequencing (RNA-seq) has become an increasingly popular method of transcriptome analysis to catalog all species of transcripts, to determine the transcriptional structure of all expressed genes and to quantify the changing expression levels of the total set of transcripts in a given cell, tissue or organism1,2
. RNA-seq is gradually replacing DNA microarrays as a preferred method for transcriptome analysis because it has the advantages of profiling a complete transcriptome, providing a digital type datum (copy number of any transcript) and not relying on any known genomic sequence3
Here, we present a complete and detailed protocol to apply RNA-seq to profile transcriptomes in human pulmonary microvascular endothelial cells with or without thrombin treatment. This protocol is based on our recent published study entitled "RNA-seq Reveals Novel Transcriptome of Genes and Their Isoforms in Human Pulmonary Microvascular Endothelial Cells Treated with Thrombin,"4
in which we successfully performed the first complete transcriptome analysis of human pulmonary microvascular endothelial cells treated with thrombin using RNA-seq. It yielded unprecedented resources for further experimentation to gain insights into molecular mechanisms underlying thrombin-mediated endothelial dysfunction in the pathogenesis of inflammatory conditions, cancer, diabetes, and coronary heart disease, and provides potential new leads for therapeutic targets to those diseases.
The descriptive text of this protocol is divided into four parts. The first part describes the treatment of human pulmonary microvascular endothelial cells with thrombin and RNA isolation, quality analysis and quantification. The second part describes library construction and sequencing. The third part describes the data analysis. The fourth part describes an RT-PCR validation assay. Representative results of several key steps are displayed. Useful tips or precautions to boost success in key steps are provided in the Discussion section. Although this protocol uses human pulmonary microvascular endothelial cells treated with thrombin, it can be generalized to profile transcriptomes in both mammalian and non-mammalian cells and in tissues treated with different stimuli or inhibitors, or to compare transcriptomes in cells or tissues between a healthy state and a disease state.
Genetics, Issue 72, Molecular Biology, Immunology, Medicine, Genomics, Proteins, RNA-seq, Next Generation DNA Sequencing, Transcriptome, Transcription, Thrombin, Endothelial cells, high-throughput, DNA, genomic DNA, RT-PCR, PCR
Detecting Somatic Genetic Alterations in Tumor Specimens by Exon Capture and Massively Parallel Sequencing
Institutions: Memorial Sloan-Kettering Cancer Center, Memorial Sloan-Kettering Cancer Center.
Efforts to detect and investigate key oncogenic mutations have proven valuable to facilitate the appropriate treatment for cancer patients. The establishment of high-throughput, massively parallel "next-generation" sequencing has aided the discovery of many such mutations. To enhance the clinical and translational utility of this technology, platforms must be high-throughput, cost-effective, and compatible with formalin-fixed paraffin embedded (FFPE) tissue samples that may yield small amounts of degraded or damaged DNA. Here, we describe the preparation of barcoded and multiplexed DNA libraries followed by hybridization-based capture of targeted exons for the detection of cancer-associated mutations in fresh frozen and FFPE tumors by massively parallel sequencing. This method enables the identification of sequence mutations, copy number alterations, and select structural rearrangements involving all targeted genes. Targeted exon sequencing offers the benefits of high throughput, low cost, and deep sequence coverage, thus conferring high sensitivity for detecting low frequency mutations.
Molecular Biology, Issue 80, Molecular Diagnostic Techniques, High-Throughput Nucleotide Sequencing, Genetics, Neoplasms, Diagnosis, Massively parallel sequencing, targeted exon sequencing, hybridization capture, cancer, FFPE, DNA mutations
A Restriction Enzyme Based Cloning Method to Assess the In vitro Replication Capacity of HIV-1 Subtype C Gag-MJ4 Chimeric Viruses
Institutions: Emory University, Emory University.
The protective effect of many HLA class I alleles on HIV-1 pathogenesis and disease progression is, in part, attributed to their ability to target conserved portions of the HIV-1 genome that escape with difficulty. Sequence changes attributed to cellular immune pressure arise across the genome during infection, and if found within conserved regions of the genome such as Gag, can affect the ability of the virus to replicate in vitro
. Transmission of HLA-linked polymorphisms in Gag to HLA-mismatched recipients has been associated with reduced set point viral loads. We hypothesized this may be due to a reduced replication capacity of the virus. Here we present a novel method for assessing the in vitro
replication of HIV-1 as influenced by the gag
gene isolated from acute time points from subtype C infected Zambians. This method uses restriction enzyme based cloning to insert the gag
gene into a common subtype C HIV-1 proviral backbone, MJ4. This makes it more appropriate to the study of subtype C sequences than previous recombination based methods that have assessed the in vitro
replication of chronically derived gag-pro
sequences. Nevertheless, the protocol could be readily modified for studies of viruses from other subtypes. Moreover, this protocol details a robust and reproducible method for assessing the replication capacity of the Gag-MJ4 chimeric viruses on a CEM-based T cell line. This method was utilized for the study of Gag-MJ4 chimeric viruses derived from 149 subtype C acutely infected Zambians, and has allowed for the identification of residues in Gag that affect replication. More importantly, the implementation of this technique has facilitated a deeper understanding of how viral replication defines parameters of early HIV-1 pathogenesis such as set point viral load and longitudinal CD4+ T cell decline.
Infectious Diseases, Issue 90, HIV-1, Gag, viral replication, replication capacity, viral fitness, MJ4, CEM, GXR25
Generation of Enterobacter sp. YSU Auxotrophs Using Transposon Mutagenesis
Institutions: Youngstown State University.
Prototrophic bacteria grow on M-9 minimal salts medium supplemented with glucose (M-9 medium), which is used as a carbon and energy source. Auxotrophs can be generated using a transposome. The commercially available, Tn5
-derived transposome used in this protocol consists of a linear segment of DNA containing an R6Kγ
replication origin, a gene for kanamycin resistance and two mosaic sequence ends, which serve as transposase binding sites. The transposome, provided as a DNA/transposase protein complex, is introduced by electroporation into the prototrophic strain, Enterobacter
sp. YSU, and randomly incorporates itself into this host’s genome. Transformants are replica plated onto Luria-Bertani agar plates containing kanamycin, (LB-kan) and onto M-9 medium agar plates containing kanamycin (M-9-kan). The transformants that grow on LB-kan plates but not on M-9-kan plates are considered to be auxotrophs. Purified genomic DNA from an auxotroph is partially digested, ligated and transformed into a pir+ Escherichia coli
) strain. The R6Kγ
replication origin allows the plasmid to replicate in pir+ E. coli
strains, and the kanamycin resistance marker allows for plasmid selection. Each transformant possesses a new plasmid containing the transposon flanked by the interrupted chromosomal region. Sanger sequencing and the Basic Local Alignment Search Tool (BLAST) suggest a putative identity of the interrupted gene. There are three advantages to using this transposome mutagenesis strategy. First, it does not rely on the expression of a transposase gene by the host. Second, the transposome is introduced into the target host by electroporation, rather than by conjugation or by transduction and therefore is more efficient. Third, the R6Kγ
replication origin makes it easy to identify the mutated gene which is partially recovered in a recombinant plasmid. This technique can be used to investigate the genes involved in other characteristics of Enterobacter
sp. YSU or of a wider variety of bacterial strains.
Microbiology, Issue 92, Auxotroph, transposome, transposon, mutagenesis, replica plating, glucose minimal medium, complex medium, Enterobacter
Principles of Site-Specific Recombinase (SSR) Technology
Institutions: Max Plank Institute for Molecular Cell Biology and Genetics, Dresden.
Site-specific recombinase (SSR) technology allows the manipulation of gene structure to explore gene function and has become an integral tool of molecular biology. Site-specific recombinases are proteins that bind to distinct DNA target sequences. The Cre/lox system was first described in bacteriophages during the 1980's. Cre recombinase is a Type I topoisomerase that catalyzes site-specific recombination of DNA between two loxP (locus of X-over P1) sites. The Cre/lox system does not require any cofactors. LoxP sequences contain distinct binding sites for Cre recombinases that surround a directional core sequence where recombination and rearrangement takes place. When cells contain loxP sites and express the Cre recombinase, a recombination event occurs. Double-stranded DNA is cut at both loxP sites by the Cre recombinase, rearranged, and ligated ("scissors and glue"). Products of the recombination event depend on the relative orientation of the asymmetric sequences.
SSR technology is frequently used as a tool to explore gene function. Here the gene of interest is flanked with Cre target sites loxP ("floxed"). Animals are then crossed with animals expressing the Cre recombinase under the control of a tissue-specific promoter. In tissues that express the Cre recombinase it binds to target sequences and excises the floxed gene. Controlled gene deletion allows the investigation of gene function in specific tissues and at distinct time points. Analysis of gene function employing SSR technology --- conditional mutagenesis -- has significant advantages over traditional knock-outs where gene deletion is frequently lethal.
Cellular Biology, Issue 15, Molecular Biology, Site-Specific Recombinase, Cre recombinase, Cre/lox system, transgenic animals, transgenic technology
Molecular Evolution of the Tre Recombinase
Institutions: Max Plank Institute for Molecular Cell Biology and Genetics, Dresden.
Here we report the generation of Tre recombinase through directed, molecular evolution. Tre recombinase recognizes a pre-defined target sequence within the LTR sequences of the HIV-1 provirus, resulting in the excision and eradication of the provirus from infected human cells.
We started with Cre, a 38-kDa recombinase, that recognizes a 34-bp double-stranded DNA sequence known as loxP. Because Cre can effectively eliminate genomic sequences, we set out to tailor a recombinase that could remove the sequence between the 5'-LTR and 3'-LTR of an integrated HIV-1 provirus. As a first step we identified sequences within the LTR sites that were similar to loxP and tested for recombination activity. Initially Cre and mutagenized Cre libraries failed to recombine the chosen loxLTR sites of the HIV-1 provirus. As the start of any directed molecular evolution process requires at least residual activity, the original asymmetric loxLTR sequences were split into subsets and tested again for recombination activity. Acting as intermediates, recombination activity was shown with the subsets. Next, recombinase libraries were enriched through reiterative evolution cycles. Subsequently, enriched libraries were shuffled and recombined. The combination of different mutations proved synergistic and recombinases were created that were able to recombine loxLTR1 and loxLTR2. This was evidence that an evolutionary strategy through intermediates can be successful. After a total of 126 evolution cycles individual recombinases were functionally and structurally analyzed. The most active recombinase -- Tre -- had 19 amino acid changes as compared to Cre. Tre recombinase was able to excise the HIV-1 provirus from the genome HIV-1 infected HeLa cells (see "HIV-1 Proviral DNA Excision Using an Evolved Recombinase", Hauber J., Heinrich-Pette-Institute for Experimental Virology and Immunology, Hamburg, Germany). While still in its infancy, directed molecular evolution will allow the creation of custom enzymes that will serve as tools of "molecular surgery" and molecular medicine.
Cell Biology, Issue 15, HIV-1, Tre recombinase, Site-specific recombination, molecular evolution
A Strategy to Identify de Novo Mutations in Common Disorders such as Autism and Schizophrenia
Institutions: Universite de Montreal, Universite de Montreal, Universite de Montreal.
There are several lines of evidence supporting the role of de novo
mutations as a mechanism for common disorders, such as autism and schizophrenia. First, the de novo
mutation rate in humans is relatively high, so new mutations are generated at a high frequency in the population. However, de novo
mutations have not been reported in most common diseases. Mutations in genes leading to severe diseases where there is a strong negative selection against the phenotype, such as lethality in embryonic stages or reduced reproductive fitness, will not be transmitted to multiple family members, and therefore will not be detected by linkage gene mapping or association studies. The observation of very high concordance in monozygotic twins and very low concordance in dizygotic twins also strongly supports the hypothesis that a significant fraction of cases may result from new mutations. Such is the case for diseases such as autism and schizophrenia. Second, despite reduced reproductive fitness1
and extremely variable environmental factors, the incidence of some diseases is maintained worldwide at a relatively high and constant rate. This is the case for autism and schizophrenia, with an incidence of approximately 1% worldwide. Mutational load can be thought of as a balance between selection for or against a deleterious mutation and its production by de novo
mutation. Lower rates of reproduction constitute a negative selection factor that should reduce the number of mutant alleles in the population, ultimately leading to decreased disease prevalence. These selective pressures tend to be of different intensity in different environments. Nonetheless, these severe mental disorders have been maintained at a constant relatively high prevalence in the worldwide population across a wide range of cultures and countries despite a strong negative selection against them2
. This is not what one would predict in diseases with reduced reproductive fitness, unless there was a high new mutation rate. Finally, the effects of paternal age: there is a significantly increased risk of the disease with increasing paternal age, which could result from the age related increase in paternal de novo
mutations. This is the case for autism and schizophrenia3
. The male-to-female ratio of mutation rate is estimated at about 4–6:1, presumably due to a higher number of germ-cell divisions with age in males. Therefore, one would predict that de novo
mutations would more frequently come from males, particularly older males4
. A high rate of new mutations may in part explain why genetic studies have so far failed to identify many genes predisposing to complexes diseases genes, such as autism and schizophrenia, and why diseases have been identified for a mere 3% of genes in the human genome. Identification for de novo
mutations as a cause of a disease requires a targeted molecular approach, which includes studying parents and affected subjects. The process for determining if the genetic basis of a disease may result in part from de novo
mutations and the molecular approach to establish this link will be illustrated, using autism and schizophrenia as examples.
Medicine, Issue 52, de novo mutation, complex diseases, schizophrenia, autism, rare variations, DNA sequencing