To determine the breadth and underpinning of changes in immunocyte gene expression due to genetic variation in mice, we performed, as part of the Immunological Genome Project, gene expression profiling for CD4(+) T cells and neutrophils purified from 39 inbred strains of the Mouse Phenome Database. Considering both cell types, a large number of transcripts showed significant variation across the inbred strains, with 22% of the transcriptome varying by 2-fold or more. These included 119 loci with apparent complete loss of function, where the corresponding transcript was not expressed in some of the strains, representing a useful resource of "natural knockouts." We identified 1222 cis-expression quantitative trait loci (cis-eQTL) that control some of this variation. Most (60%) cis-eQTLs were shared between T cells and neutrophils, but a significant portion uniquely impacted one of the cell types, suggesting cell type-specific regulatory mechanisms. Using a conditional regression algorithm, we predicted regulatory interactions between transcription factors and potential targets, and we demonstrated that these predictions overlap with regulatory interactions inferred from transcriptional changes during immunocyte differentiation. Finally, comparison of these and parallel data from CD4(+) T cells of healthy humans demonstrated intriguing similarities in variability of a gene's expression: the most variable genes tended to be the same in both species, and there was an overlap in genes subject to strong cis-acting genetic variants. We speculate that this "conservation of variation" reflects a differential constraint on intraspecies variation in expression levels of different genes, either through lower pressure for some genes, or by favoring variability for others.
T lymphocyte activation by antigen conditions adaptive immune responses and immunopathologies, but we know little about its variation in humans and its genetic or environmental roots. We analyzed gene expression in CD4(+) T cells during unbiased activation or in T helper 17 (T(H)17) conditions from 348 healthy participants representing European, Asian, and African ancestries. We observed interindividual variability, most marked for cytokine transcripts, with clear biases on the basis of ancestry, and following patterns more complex than simple T(H)1/2/17 partitions. We identified 39 genetic loci specifically associated in cis with activated gene expression. We further fine-mapped and validated a single-base variant that modulates YY1 binding and the activity of an enhancer element controlling the autoimmune-associated IL2RA gene, affecting its activity in activated but not regulatory T cells. Thus, interindividual variability affects the fundamental immunologic process of T helper activation, with important connections to autoimmune disease.
CRISPR-Cas9 is a versatile genome editing technology for studying the functions of genetic elements. To broadly enable the application of Cas9 in vivo, we established a Cre-dependent Cas9 knockin mouse. We demonstrated in vivo as well as ex vivo genome editing using adeno-associated virus (AAV)-, lentivirus-, or particle-mediated delivery of guide RNA in neurons, immune cells, and endothelial cells. Using these mice, we simultaneously modeled the dynamics of KRAS, p53, and LKB1, the top three significantly mutated genes in lung adenocarcinoma. Delivery of a single AAV vector in the lung generated loss-of-function mutations in p53 and Lkb1, as well as homology-directed repair-mediated Kras(G12D) mutations, leading to macroscopic tumors of adenocarcinoma pathology. Together, these results suggest that Cas9 mice empower a wide range of biological and disease modeling applications.
Pseudouridine is the most abundant RNA modification, yet except for a few well-studied cases, little is known about the modified positions and their function(s). Here, we develop ?-seq for transcriptome-wide quantitative mapping of pseudouridine. We validate ?-seq with spike-ins and de novo identification of previously reported positions and discover hundreds of unique sites in human and yeast mRNAs and snoRNAs. Perturbing pseudouridine synthases (PUS) uncovers which pseudouridine synthase modifies each site and their target sequence features. mRNA pseudouridinylation depends on both site-specific and snoRNA-guided pseudouridine synthases. Upon heat shock in yeast, Pus7p-mediated pseudouridylation is induced at >200 sites, and PUS7 deletion decreases the levels of otherwise pseudouridylated mRNA, suggesting a role in enhancing transcript stability. rRNA pseudouridine stoichiometries are conserved but reduced in cells from dyskeratosis congenita patients, where the PUS DKC1 is mutated. Our work identifies an enhanced, transcriptome-wide scope for pseudouridine and methods to dissect its underlying mechanisms and function.
In mammals, cytosine methylation is predominantly restricted to CpG dinucleotides and stably distributed across the genome, with local, cell-type-specific regulation directed by DNA binding factors. This comparatively static landscape is in marked contrast with the events of fertilization, during which the paternal genome is globally reprogrammed. Paternal genome demethylation includes the majority of CpGs, although methylation remains detectable at several notable features. These dynamics have been extensively characterized in the mouse, with only limited observations available in other mammals, and direct measurements are required to understand the extent to which early embryonic landscapes are conserved. We present genome-scale DNA methylation maps of human preimplantation development and embryonic stem cell derivation, confirming a transient state of global hypomethylation that includes most CpGs, while sites of residual maintenance are primarily restricted to gene bodies. Although most features share similar dynamics to those in mouse, maternally contributed methylation is divergently targeted to species-specific sets of CpG island promoters that extend beyond known imprint control regions. Retrotransposon regulation is also highly diverse, and transitions from maternally to embryonically expressed elements. Together, our data confirm that paternal genome demethylation is a general attribute of early mammalian development that is characterized by distinct modes of epigenetic regulation.
Human cancers are complex ecosystems composed of cells with distinct phenotypes, genotypes, and epigenetic states, but current models do not adequately reflect tumor composition in patients. We used single-cell RNA sequencing (RNA-seq) to profile 430 cells from five primary glioblastomas, which we found to be inherently variable in their expression of diverse transcriptional programs related to oncogenic signaling, proliferation, complement/immune response, and hypoxia. We also observed a continuum of stemness-related expression states that enabled us to identify putative regulators of stemness in vivo. Finally, we show that established glioblastoma subtype classifiers are variably expressed across individual cells within a tumor and demonstrate the potential prognostic implications of such intratumoral heterogeneity. Thus, we reveal previously unappreciated heterogeneity in diverse regulatory programs central to glioblastoma biology, prognosis, and therapy.
Constraint-based models are currently the only methodology that allows the study of metabolism at the whole-genome scale. Flux balance analysis is commonly used to analyse constraint-based models. Curiously, the results of this analysis vary with the software being run, a situation that we show can be remedied by using exact rather than floating-point arithmetic. Here we introduce MONGOOSE, a toolbox for analysing the structure of constraint-based metabolic models in exact arithmetic. We apply MONGOOSE to the analysis of 98 existing metabolic network models and find that the biomass reaction is surprisingly blocked (unable to sustain non-zero flux) in nearly half of them. We propose a principled approach for unblocking these reactions and extend it to the problems of identifying essential and synthetic lethal reactions and minimal media. Our structural insights enable a systematic study of constraint-based metabolic models, yielding a deeper understanding of their possibilities and limitations.
To extend our understanding of the genetic basis of human immune function and dysfunction, we performed an expression quantitative trait locus (eQTL) study of purified CD4(+) T cells and monocytes, representing adaptive and innate immunity, in a multi-ethnic cohort of 461 healthy individuals. Context-specific cis- and trans-eQTLs were identified, and cross-population mapping allowed, in some cases, putative functional assignment of candidate causal regulatory variants for disease-associated loci. We note an over-representation of T cell-specific eQTLs among susceptibility alleles for autoimmune diseases and of monocyte-specific eQTLs among Alzheimer's and Parkinson's disease variants. This polarization implicates specific immune cell types in these diseases and points to the need to identify the cell-autonomous effects of disease susceptibility variants.
High-throughput single-cell transcriptomics offers an unbiased approach for understanding the extent, basis and function of gene expression variation between seemingly identical cells. Here we sequence single-cell RNA-seq libraries prepared from over 1,700 primary mouse bone-marrow-derived dendritic cells spanning several experimental conditions. We find substantial variation between identically stimulated dendritic cells, in both the fraction of cells detectably expressing a given messenger RNA and the transcript's level within expressing cells. Distinct gene modules are characterized by different temporal heterogeneity profiles. In particular, a 'core' module of antiviral genes is expressed very early by a few 'precocious' cells in response to uniform stimulation with a pathogenic component, but is later activated in all cells. By stimulating cells individually in sealed microfluidic chambers, analysing dendritic cells from knockout mice, and modulating secretion and extracellular signalling, we show that this response is coordinated by interferon-mediated paracrine signalling from these precocious cells. Notably, preventing cell-to-cell communication also substantially reduces variability between cells in the expression of an early-induced 'peaked' inflammatory module, suggesting that paracrine signalling additionally represses part of the inflammatory program. Our study highlights the importance of cell-to-cell communication in controlling cellular heterogeneity and reveals general strategies that multicellular populations can use to establish complex dynamic responses.
Genome sequencing studies have shown that human malignancies often bear mutations in four or more driver genes, but it is difficult to recapitulate this degree of genetic complexity in mouse models using conventional breeding. Here we use the CRISPR-Cas9 system of genome editing to overcome this limitation. By delivering combinations of small guide RNAs (sgRNAs) and Cas9 with a lentiviral vector, we modified up to five genes in a single mouse hematopoietic stem cell (HSC), leading to clonal outgrowth and myeloid malignancy. We thereby generated models of acute myeloid leukemia (AML) with cooperating mutations in genes encoding epigenetic modifiers, transcription factors and mediators of cytokine signaling, recapitulating the combinations of mutations observed in patients. Our results suggest that lentivirus-delivered sgRNA:Cas9 genome editing should be useful to engineer a broad array of in vivo cancer models that better reflect the complexity of human disease.
One major goal of cancer genome sequencing is to identify key genes and pathways that drive tumor pathogenesis. Although many studies have identified candidate driver genes based on recurrence of mutations in individual genes, subsets of genes with nonrecurrent mutations may also be defined as putative drivers if they affect a single biological pathway. In this fashion, we previously identified Wnt signaling as significantly mutated through large-scale massively parallel DNA sequencing of chronic lymphocytic leukemia (CLL). Here, we use a novel method of biomolecule delivery, vertical silicon nanowires, to efficiently introduce small interfering RNAs into CLL cells, and interrogate the effects of 8 of 15 mutated Wnt pathway members identified across 91 CLLs. In HEK293T cells, mutations in 2 genes did not generate functional changes, 3 led to dysregulated pathway activation, and 3 led to further activation or loss of repression of pathway activation. Silencing 4 of 8 mutated genes in CLL samples harboring the mutated alleles resulted in reduced viability compared with leukemia samples with wild-type alleles. We demonstrate that somatic mutations in CLL can generate dependence on this pathway for survival. These findings support the notion that nonrecurrent mutations at different nodes of the Wnt pathway can contribute to leukemogenesis.
Comprehensive analyses of cancer genomes promise to inform prognoses and precise cancer treatments. A major barrier, however, is inaccessibility of metastatic tissue. A potential solution is to characterize circulating tumor cells (CTCs), but this requires overcoming the challenges of isolating rare cells and sequencing low-input material. Here we report an integrated process to isolate, qualify and sequence whole exomes of CTCs with high fidelity using a census-based sequencing strategy. Power calculations suggest that mapping of >99.995% of the standard exome is possible in CTCs. We validated our process in two patients with prostate cancer, including one for whom we sequenced CTCs, a lymph node metastasis and nine cores of the primary tumor. Fifty-one of 73 CTC mutations (70%) were present in matched tissue. Moreover, we identified 10 early trunk and 56 metastatic trunk mutations in the non-CTC tumor samples and found 90% and 73% of these mutations, respectively, in CTC exomes. This study establishes a foundation for CTC genomics in the clinic.
Little is known about how human genetic variation affects the responses to environmental stimuli in the context of complex diseases. Experimental and computational approaches were applied to determine the effects of genetic variation on the induction of pathogen-responsive genes in human dendritic cells. We identified 121 common genetic variants associated in cis with variation in expression responses to Escherichia coli lipopolysaccharide, influenza, or interferon-? (IFN-?). We localized and validated causal variants to binding sites of pathogen-activated STAT (signal transducer and activator of transcription) and IRF (IFN-regulatory factor) transcription factors. We also identified a common variant in IRF7 that is associated in trans with type I IFN induction in response to influenza infection. Our results reveal common alleles that explain interindividual variation in pathogen sensing and provide functional annotation for genetic variants that alter susceptibility to inflammatory diseases.
Developmental fate decisions are dictated by master transcription factors (TFs) that interact with cis-regulatory elements to direct transcriptional programs. Certain malignant tumors may also depend on cellular hierarchies reminiscent of normal development but superimposed on underlying genetic aberrations. In glioblastoma (GBM), a subset of stem-like tumor-propagating cells (TPCs) appears to drive tumor progression and underlie therapeutic resistance yet remain poorly understood. Here, we identify a core set of neurodevelopmental TFs (POU3F2, SOX2, SALL2, and OLIG2) essential for GBM propagation. These TFs coordinately bind and activate TPC-specific regulatory elements and are sufficient to fully reprogram differentiated GBM cells to "induced" TPCs, recapitulating the epigenetic landscape and phenotype of native TPCs. We reconstruct a network model that highlights critical interactions and identifies candidate therapeutic targets for eliminating TPCs. Our study establishes the epigenetic basis of a developmental hierarchy in GBM, provides detailed insight into underlying gene regulatory programs, and suggests attendant therapeutic strategies. PAPERCLIP:
We identified three retinoid-related orphan receptor gamma t (ROR?t)-specific inhibitors that suppress T helper 17 (Th17) cell responses, including Th17-cell-mediated autoimmune disease. We systemically characterized ROR?t binding in the presence and absence of drugs with corresponding whole-genome transcriptome sequencing. ROR?t acts as a direct activator of Th17 cell signature genes and a direct repressor of signature genes from other T cell lineages; its strongest transcriptional effects are on cis-regulatory sites containing the ROR? binding motif. ROR?t is central in a densely interconnected regulatory network that shapes the balance of T cell differentiation. Here, the three inhibitors modulated the ROR?t-dependent transcriptional network to varying extents and through distinct mechanisms. Whereas one inhibitor displaced ROR?t from its target loci, the other two inhibitors affected transcription predominantly without removing DNA binding. Our work illustrates the power of a system-scale analysis of transcriptional regulation to characterize potential therapeutic compounds that inhibit pathogenic Th17 cells and suppress autoimmunity.
N6-methyladenosine (m6A) is a common modification of mRNA with potential roles in fine-tuning the RNA life cycle. Here, we identify a dense network of proteins interacting with METTL3, a component of the methyltransferase complex, and show that three of them (WTAP, METTL14, and KIAA1429) are required for methylation. Monitoring m6A levels upon WTAP depletion allowed the definition of accurate and near single-nucleotide resolution methylation maps and their classification into WTAP-dependent and -independent sites. WTAP-dependent sites are located at internal positions in transcripts, topologically static across a variety of systems we surveyed, and inversely correlated with mRNA stability, consistent with a role in establishing "basal" degradation rates. WTAP-independent sites form at the first transcribed base as part of the cap structure and are present at thousands of sites, forming a previously unappreciated layer of transcriptome complexity. Our data shed light on the proteomic and transcriptional underpinnings of this RNA modification.
The transcription factor BATF is required for the differentiation of interleukin 17 (IL-17)-producing helper T cells (TH17 cells) and follicular helper T cells (TFH cells). Here we identified a fundamental role for BATF in regulating the differentiation of effector of CD8(+) T cells. BATF-deficient CD8(+) T cells showed profound defects in effector population expansion and underwent proliferative and metabolic catastrophe early after encountering antigen. BATF, together with the transcription factors IRF4 and Jun proteins, bound to and promoted early expression of genes encoding lineage-specific transcription-factors (T-bet and Blimp-1) and cytokine receptors while paradoxically repressing genes encoding effector molecules (IFN-? and granzyme B). Thus, BATF amplifies T cell antigen receptor (TCR)-dependent expression of transcription factors and augments the propagation of inflammatory signals but restrains the expression of genes encoding effector molecules. This checkpoint prevents irreversible commitment to an effector fate until a critical threshold of downstream transcriptional activity has been achieved.
N(6)-methyladenosine (m(6)A) is the most ubiquitous mRNA base modification, but little is known about its precise location, temporal dynamics, and regulation. Here, we generated genomic maps of m(6)A sites in meiotic yeast transcripts at nearly single-nucleotide resolution, identifying 1,308 putatively methylated sites within 1,183 transcripts. We validated eight out of eight methylation sites in different genes with direct genetic analysis, demonstrated that methylated sites are significantly conserved in a related species, and built a model that predicts methylated sites directly from sequence. Sites vary in their methylation profiles along a dense meiotic time course and are regulated both locally, via predictable methylatability of each site, and globally, through the core meiotic circuitry. The methyltransferase complex components localize to the yeast nucleolus, and this localization is essential for mRNA methylation. Our data illuminate a conserved, dynamically regulated methylation program in yeast meiosis and provide an important resource for studying the function of this epitranscriptomic modification.
Hematopoietic stem cells (HSCs) maintain blood homeostasis and are the functional units of bone marrow transplantation. To improve the molecular understanding of HSCs and their proximal progenitors, we performed transcriptome analysis within the context of the ImmGen Consortium data set. Gene sets that define steady-state and mobilized HSCs, as well as hematopoietic stem and progenitor cells (HSPCs), were determined. Genes involved in transcriptional regulation, including a group of putative transcriptional repressors, were identified in multipotent progenitors and HSCs. Proximal promoter analyses combined with ImmGen module analysis identified candidate regulators of HSCs. Enforced expression of one predicted regulator, Hlf, in diverse HSPC subsets led to extensive self-renewal activity ex vivo. These analyses reveal unique insights into the mechanisms that control the core properties of HSPCs.
De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on non-model organisms of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.
Large-scale genomics and computational approaches have identified thousands of putative long non-coding RNAs (lncRNAs). It has been controversial, however, as to what fraction of these RNAs is truly non-coding. Here, we combine ribosome profiling with a machine-learning approach to validate lncRNAs during zebrafish development in a high throughput manner. We find that dozens of proposed lncRNAs are protein-coding contaminants and that many lncRNAs have ribosome profiles that resemble the 5 leaders of coding RNAs. Analysis of ribosome profiling data from embryonic stem cells reveals similar properties for mammalian lncRNAs. These results clarify the annotation of developmental lncRNAs and suggest a potential role for translation in lncRNA regulation. In addition, our computational pipeline and ribosome profiling data provide a powerful resource for the identification of translated open reading frames during zebrafish development.
Comparative functional genomics studies the evolution of biological processes by analyzing functional data, such as gene expression profiles, across species. A major challenge is to compare profiles collected in a complex phylogeny. Here, we present Arboretum, a novel scalable computational algorithm that integrates expression data from multiple species with species and gene phylogenies to infer modules of coexpressed genes in extant species and their evolutionary histories. We also develop new, generally applicable measures of conservation and divergence in gene regulatory modules to assess the impact of changes in gene content and expression on module evolution. We used Arboretum to study the evolution of the transcriptional response to heat shock in eight species of Ascomycota fungi and to reconstruct modules of the ancestral environmental stress response (ESR). We found substantial conservation in the stress response across species and in the reconstructed components of the ancestral ESR modules. The greatest divergence was in the most induced stress, primarily through module expansion. The divergence of the heat stress response exceeds that observed in the response to glucose depletion in the same species. Arboretum and its associated analyses provide a comprehensive framework to systematically study regulatory evolution of condition-specific responses.
The recent development of a semiconductor-based, non-optical DNA sequencing technology promises scalable, low-cost and rapid sequence data production. The technology has previously been applied mainly to genomic sequencing and targeted re-sequencing. Here we demonstrate the utility of Ion Torrent semiconductor-based sequencing for sensitive, efficient and rapid chromatin immunoprecipitation followed by sequencing (ChIP-seq) through the application of sample preparation methods that are optimized for ChIP-seq on the Ion Torrent platform. We leverage this method for epigenetic profiling of tumour tissues.
Recent molecular studies have shown that, even when derived from a seemingly homogenous population, individual cells can exhibit substantial differences in gene expression, protein levels and phenotypic output, with important functional consequences. Existing studies of cellular heterogeneity, however, have typically measured only a few pre-selected RNAs or proteins simultaneously, because genomic profiling methods could not be applied to single cells until very recently. Here we use single-cell RNA sequencing to investigate heterogeneity in the response of mouse bone-marrow-derived dendritic cells (BMDCs) to lipopolysaccharide. We find extensive, and previously unobserved, bimodal variation in messenger RNA abundance and splicing patterns, which we validate by RNA-fluorescence in situ hybridization for select transcripts. In particular, hundreds of key immune genes are bimodally expressed across cells, surprisingly even for genes that are very highly expressed at the population average. Moreover, splicing patterns demonstrate previously unobserved levels of heterogeneity between cells. Some of the observed bimodality can be attributed to closely related, yet distinct, known maturity states of BMDCs; other portions reflect differences in the usage of key regulatory circuits. For example, we identify a module of 137 highly variable, yet co-regulated, antiviral response genes. Using cells from knockout mice, we show that variability in this module may be propagated through an interferon feedback circuit, involving the transcriptional regulators Stat2 and Irf7. Our study demonstrates the power and promise of single-cell genomics in uncovering functional diversity between cells and in deciphering cell states and circuits.
RNA-seq is an effective method for studying the transcriptome, but it can be difficult to apply to scarce or degraded RNA from fixed clinical samples, rare cell populations or cadavers. Recent studies have proposed several methods for RNA-seq of low-quality and/or low-quantity samples, but the relative merits of these methods have not been systematically analyzed. Here we compare five such methods using metrics relevant to transcriptome annotation, transcript discovery and gene expression. Using a single human RNA sample, we constructed and sequenced ten libraries with these methods and compared them against two control libraries. We found that the RNase H method performed best for chemically fragmented, low-quality RNA, and we confirmed this through analysis of actual degraded samples. RNase H can even effectively replace oligo(dT)-based methods for standard RNA-seq. SMART and NuGEN had distinct strengths for measuring low-quantity RNA. Our analysis allows biologists to select the most suitable methods and provides a benchmark for future method development.
TH17 cells (interleukin-17 (IL-17)-producing helper T cells) are highly proinflammatory cells that are critical for clearing extracellular pathogens and for inducing multiple autoimmune diseases. IL-23 has a critical role in stabilizing and reinforcing the TH17 phenotype by increasing expression of IL-23 receptor (IL-23R) and endowing TH17 cells with pathogenic effector functions. However, the precise molecular mechanism by which IL-23 sustains the TH17 response and induces pathogenic effector functions has not been elucidated. Here we used transcriptional profiling of developing TH17 cells to construct a model of their signalling network and nominate major nodes that regulate TH17 development. We identified serum glucocorticoid kinase 1 (SGK1), a serine/threonine kinase, as an essential node downstream of IL-23 signalling. SGK1 is critical for regulating IL-23R expression and stabilizing the TH17 cell phenotype by deactivation of mouse Foxo1, a direct repressor of IL-23R expression. SGK1 has been shown to govern Na(+) transport and salt (NaCl) homeostasis in other cells. We show here that a modest increase in salt concentration induces SGK1 expression, promotes IL-23R expression and enhances TH17 cell differentiation in vitro and in vivo, accelerating the development of autoimmunity. Loss of SGK1 abrogated Na(+)-mediated TH17 differentiation in an IL-23-dependent manner. These data demonstrate that SGK1 has a critical role in the induction of pathogenic TH17 cells and provide a molecular insight into a mechanism by which an environmental factor such as a high salt diet triggers TH17 development and promotes tissue inflammation.
Despite their importance, the molecular circuits that control the differentiation of naive T cells remain largely unknown. Recent studies that reconstructed regulatory networks in mammalian cells have focused on short-term responses and relied on perturbation-based approaches that cannot be readily applied to primary T cells. Here we combine transcriptional profiling at high temporal resolution, novel computational algorithms, and innovative nanowire-based perturbation tools to systematically derive and experimentally validate a model of the dynamic regulatory network that controls the differentiation of mouse TH17 cells, a proinflammatory T-cell subset that has been implicated in the pathogenesis of multiple autoimmune diseases. The TH17 transcriptional network consists of two self-reinforcing, but mutually antagonistic, modules, with 12 novel regulators, the coupled action of which may be essential for maintaining the balance between TH17 and other CD4(+) T-cell subsets. Our study identifies and validates 39 regulatory factors, embeds them within a comprehensive temporal network and reveals its organizational principles; it also highlights novel drug targets for controlling TH17 cell differentiation.
Individual genetic variation affects gene responsiveness to stimuli, often by influencing complex molecular circuits. Here we combine genomic and intermediate-scale transcriptional profiling with computational methods to identify variants that affect the responsiveness of genes to stimuli (responsiveness quantitative trait loci or reQTLs) and to position these variants in molecular circuit diagrams. We apply this approach to study variation in transcriptional responsiveness to pathogen components in dendritic cells from recombinant inbred mouse strains. We identify reQTLs that correlate with particular stimuli and position them in known pathways. For example, in response to a virus-like stimulus, a trans-acting variant responds as an activator of the antiviral response; using RNA interference, we identify Rgs16 as the likely causal gene. Our approach charts an experimental and analytic path to decipher the mechanisms underlying genetic variation in circuits that control responses to stimuli.
Much of the knowledge about cell differentiation and function in the immune system has come from studies in mice, but the relevance to human immunology, diseases, and therapy has been challenged, perhaps more from anecdotal than comprehensive evidence. To this end, we compare two large compendia of transcriptional profiles of human and mouse immune cell types. Global transcription profiles are conserved between corresponding cell lineages. The expression patterns of most orthologous genes are conserved, particularly for lineage-specific genes. However, several hundred genes show clearly divergent expression across the examined cell lineages, and among them, 169 genes did so even with highly stringent criteria. Finally, regulatory mechanisms--reflected by regulators differential expression or enriched cis-elements--are conserved between the species but to a lower degree, suggesting that distinct regulation may underlie some of the conserved transcriptional responses.
The differentiation of hematopoietic stem cells into cells of the immune system has been studied extensively in mammals, but the transcriptional circuitry that controls it is still only partially understood. Here, the Immunological Genome Project gene-expression profiles across mouse immune lineages allowed us to systematically analyze these circuits. To analyze this data set we developed Ontogenet, an algorithm for reconstructing lineage-specific regulation from gene-expression profiles across lineages. Using Ontogenet, we found differentiation stage-specific regulators of mouse hematopoiesis and identified many known hematopoietic regulators and 175 previously unknown candidate regulators, as well as their target genes and the cell types in which they act. Among the previously unknown regulators, we emphasize the role of ETV5 in the differentiation of ?? T cells. As the transcriptional programs of human and mouse cells are highly conserved, it is likely that many lessons learned from the mouse model apply to humans.
Although genetic lesions responsible for some mendelian disorders can be rapidly discovered through massively parallel sequencing of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing and de novo assembly did we find that each of six families with MCKD1 harbors an equivalent but apparently independently arising mutation in sequence markedly under-represented in massively parallel sequencing data: the insertion of a single cytosine in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (?1.5-5 kb), GC-rich (>80%) coding variable-number tandem repeat (VNTR) sequence in the MUC1 gene encoding mucin 1. These results provide a cautionary tale about the challenges in identifying the genes responsible for mendelian, let alone more complex, disorders through massively parallel sequencing.
Divergence in gene regulation can play a major role in evolution. Here, we used a phylogenetic framework to measure mRNA profiles in 15 yeast species from the phylum Ascomycota and reconstruct the evolution of their modular regulatory programs along a time course of growth on glucose over 300 million years [corrected]. We found that modules have diverged proportionally to phylogenetic distance, with prominent changes in gene regulation accompanying changes in lifestyle and ploidy, especially in carbon metabolism. Paralogs have significantly contributed to regulatory divergence, typically within a very short window from their duplication. Paralogs from a whole genome duplication (WGD) event have a uniquely substantial contribution that extends over a longer span. Similar patterns occur when considering the evolution of the heat shock regulatory program measured in eight of the species, suggesting that these are general evolutionary principles. DOI:http://dx.doi.org/10.7554/eLife.00603.001.
Meiosis is a complex developmental process that generates haploid cells from diploid progenitors. We measured messenger RNA (mRNA) abundance and protein production through the yeast meiotic sporulation program and found strong, stage-specific expression for most genes, achieved through control of both mRNA levels and translational efficiency. Monitoring of protein production timing revealed uncharacterized recombination factors and extensive organellar remodeling. Meiotic translation is also shifted toward noncanonical sites, including short open reading frames (ORFs) on unannnotated transcripts and upstream regions of known transcripts (uORFs). Ribosome occupancy at near-cognate uORFs was associated with more efficient ORF translation; by contrast, some AUG uORFs, often exposed by regulated 5 leader extensions, acted competitively. This work reveals pervasive translational control in meiosis and helps to illuminate the molecular basis of the broad restructuring of meiotic cells.
Long noncoding RNAs (lncRNAs) comprise a diverse class of transcripts that structurally resemble mRNAs but do not encode proteins. Recent genome-wide studies in humans and the mouse have annotated lncRNAs expressed in cell lines and adult tissues, but a systematic analysis of lncRNAs expressed during vertebrate embryogenesis has been elusive. To identify lncRNAs with potential functions in vertebrate embryogenesis, we performed a time-series of RNA-seq experiments at eight stages during early zebrafish development. We reconstructed 56,535 high-confidence transcripts in 28,912 loci, recovering the vast majority of expressed RefSeq transcripts while identifying thousands of novel isoforms and expressed loci. We defined a stringent set of 1133 noncoding multi-exonic transcripts expressed during embryogenesis. These include long intergenic ncRNAs (lincRNAs), intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, and precursors for small RNAs (sRNAs). Zebrafish lncRNAs share many of the characteristics of their mammalian counterparts: relatively short length, low exon number, low expression, and conservation levels comparable to that of introns. Subsets of lncRNAs carry chromatin signatures characteristic of genes with developmental functions. The temporal expression profile of lncRNAs revealed two novel properties: lncRNAs are expressed in narrower time windows than are protein-coding genes and are specifically enriched in early-stage embryos. In addition, several lncRNAs show tissue-specific expression and distinct subcellular localization patterns. Integrative computational analyses associated individual lncRNAs with specific pathways and functions, ranging from cell cycle regulation to morphogenesis. Our study provides the first systematic identification of lncRNAs in a vertebrate embryo and forms the foundation for future genetic, genomic, and evolutionary studies.
Recent advances in technologies for genome- and proteome-scale measurements and perturbations promise to accelerate discovery in every aspect of biology and medicine. Although such rapid technological progress provides a tremendous opportunity, it also demands that we learn how to use these tools effectively. One application with great potential to enhance our understanding of biological systems is the unbiased reconstruction of genetic and molecular networks. Cells of the immune system provide a particularly useful model for developing and applying such approaches. Here, we review approaches for the reconstruction of signalling and transcriptional networks, with a focus on applications in the mammalian innate immune system.
The packaging of eukaryotic genomes into nuclesomes plays critical roles in chromatin organization and gene regulation. Studies in Saccharomyces cerevisiae indicate that nucleosome occupancy is partially encoded by intrinsic antinucleosomal DNA sequences, such as poly(A) sequences, as well as by binding sites for trans-acting factors that can evict nucleosomes, such as Reb1 and the Rsc3/30 complex. Here, we use genome-wide nucleosome occupancy maps in 13 Ascomycota fungi to discover large-scale evolutionary reprogramming of both intrinsic and trans determinants of chromatin structure. We find that poly(G)s act as intrinsic antinucleosomal sequences, comparable to the known function of poly(A)s, but that the abundance of poly(G)s has diverged greatly between species, obscuring their antinucleosomal effect in low-poly(G) species such as S. cerevisiae. We also develop a computational method that uses nucleosome occupancy maps for discovering trans-acting general regulatory factor (GRF) binding sites. Our approach reveals that the specific sequences bound by GRFs have diverged substantially across evolution, corresponding to a number of major evolutionary transitions in the repertoire of GRFs. We experimentally validate a proposed evolutionary transition from Cbf1 as a major GRF in pre-whole-genome duplication (WGD) yeasts to Reb1 in post-WGD yeasts. We further show that the mating type switch-activating protein Sap1 is a GRF in S. pombe, demonstrating the general applicability of our approach. Our results reveal that the underlying mechanisms that determine in vivo chromatin organization have diverged and that comparative genomics can help discover new determinants of chromatin organization.
Large intergenic noncoding RNAs (lincRNAs) are emerging as key regulators of diverse cellular processes. Determining the function of individual lincRNAs remains a challenge. Recent advances in RNA sequencing (RNA-seq) and computational methods allow for an unprecedented analysis of such transcripts. Here, we present an integrative approach to define a reference catalog of >8000 human lincRNAs. Our catalog unifies previously existing annotation sources with transcripts we assembled from RNA-seq data collected from ?4 billion RNA-seq reads across 24 tissues and cell types. We characterize each lincRNA by a panorama of >30 properties, including sequence, structural, transcriptional, and orthology features. We found that lincRNA expression is strikingly tissue-specific compared with coding genes, and that lincRNAs are typically coexpressed with their neighboring genes, albeit to an extent similar to that of pairs of neighboring protein-coding genes. We distinguish an additional subset of transcripts that have high evolutionary conservation but may include short ORFs and may serve as either lincRNAs or small peptides. Our integrated, comprehensive, yet conservative reference catalog of human lincRNAs reveals the global properties of lincRNAs and will facilitate experimental studies and further functional classification of these genes.
Deciphering the complex mechanisms by which regulatory networks control gene expression remains a major challenge. While some studies infer regulation from dependencies between the expression levels of putative regulators and their targets, others focus on measured physical interactions.
Massively parallel sequencing of cDNA has enabled deep and efficient probing of transcriptomes. Current approaches for transcript reconstruction from such data often rely on aligning reads to a reference genome, and are thus unsuitable for samples with a partial or missing reference genome. Here we present the Trinity method for de novo assembly of full-length transcripts and evaluate it on samples from fission yeast, mouse and whitefly, whose reference genome is not yet available. By efficiently constructing and analyzing sets of de Bruijn graphs, Trinity fully reconstructs a large fraction of transcripts, including alternatively spliced isoforms and transcripts from recently duplicated genes. Compared with other de novo transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels, with a sensitivity similar to methods that rely on genome alignments. Our approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome.
The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus--occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.
Cellular RNA levels are determined by the interplay of RNA production, processing and degradation. However, because most studies of RNA regulation do not distinguish the separate contributions of these processes, little is known about how they are temporally integrated. Here we combine metabolic labeling of RNA at high temporal resolution with advanced RNA quantification and computational modeling to estimate RNA transcription and degradation rates during the response of mouse dendritic cells to lipopolysaccharide. We find that changes in transcription rates determine the majority of temporal changes in RNA levels, but that changes in degradation rates are important for shaping sharp peaked responses. We used sequencing of the newly transcribed RNA population to estimate temporally constant RNA processing and degradation rates genome wide. Degradation rates vary significantly between genes and contribute to the observed differences in the dynamic response. Certain transcripts, including those encoding cytokines and transcription factors, mature faster. Our study provides a quantitative approach to study the integrative process of RNA regulation.
Although thousands of large intergenic non-coding RNAs (lincRNAs) have been identified in mammals, few have been functionally characterized, leading to debate about their biological role. To address this, we performed loss-of-function studies on most lincRNAs expressed in mouse embryonic stem (ES) cells and characterized the effects on gene expression. Here we show that knockdown of lincRNAs has major consequences on gene expression patterns, comparable to knockdown of well-known ES cell regulators. Notably, lincRNAs primarily affect gene expression in trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage commitment programs. We integrate lincRNAs into the molecular circuitry of ES cells and show that lincRNA genes are regulated by key transcription factors and that lincRNA transcripts bind to multiple chromatin regulatory proteins to affect shared gene expression programs. Together, the results demonstrate that lincRNAs have key roles in the circuitry controlling ES cell state.
Hundreds of chromatin regulators (CRs) control chromatin structure and function by catalyzing and binding histone modifications, yet the rules governing these key processes remain obscure. Here, we present a systematic approach to infer CR function. We developed ChIP-string, a meso-scale assay that combines chromatin immunoprecipitation with a signature readout of 487 representative loci. We applied ChIP-string to screen 145 antibodies, thereby identifying effective reagents, which we used to map the genome-wide binding of 29 CRs in two cell types. We found that specific combinations of CRs colocalize in characteristic patterns at distinct chromatin environments, at genes of coherent functions, and at distal regulatory elements. When comparing between cell types, CRs redistribute to different loci but maintain their modular and combinatorial associations. Our work provides a multiplex method that substantially enhances the ability to monitor CR binding, presents a large resource of CR maps, and reveals common principles for combinatorial CR function.
Deciphering the signaling networks that underlie normal and disease processes remains a major challenge. Here, we report the discovery of signaling components involved in the Toll-like receptor (TLR) response of immune dendritic cells (DCs), including a previously unkown pathway shared across mammalian antiviral responses. By combining transcriptional profiling, genetic and small-molecule perturbations, and phosphoproteomics, we uncover 35 signaling regulators, including 16 known regulators, involved in TLR signaling. In particular, we find that Polo-like kinases (Plk) 2 and 4 are essential components of antiviral pathways in vitro and in vivo and activate a signaling branch involving a dozen proteins, among which is Tnfaip2, a gene associated with autoimmune diseases but whose role was unknown. Our study illustrates the power of combining systematic measurements and perturbations to elucidate complex signaling circuits and discover potential therapeutic targets.
Regulatory circuits controlling gene expression constantly rewire to adapt to environmental stimuli, differentiation cues, and disease. We review our current understanding of the temporal dynamics of gene expression in eukaryotes and prokaryotes and the molecular mechanisms that shape them. We delineate several prototypical temporal patterns, including "impulse" (or single-pulse) patterns in response to transient environmental stimuli, sustained (or state-transitioning) patterns in response to developmental cues, and oscillating patterns. We focus on impulse responses and their higher-order temporal organization in regulons and cascades and describe how core protein circuits and cis-regulatory sequences in promoters integrate with chromatin architecture to generate these responses.
Though many individual transcription factors are known to regulate hematopoietic differentiation, major aspects of the global architecture of hematopoiesis remain unknown. Here, we profiled gene expression in 38 distinct purified populations of human hematopoietic cells and used probabilistic models of gene expression and analysis of cis-elements in gene promoters to decipher the general organization of their regulatory circuitry. We identified modules of highly coexpressed genes, some of which are restricted to a single lineage but most of which are expressed at variable levels across multiple lineages. We found densely interconnected cis-regulatory circuits and a large number of transcription factors that are differentially expressed across hematopoietic states. These findings suggest a more complex regulatory system for hematopoiesis than previously assumed.
Recently, more than 1000 large intergenic noncoding RNAs (lincRNAs) have been reported. These RNAs are evolutionarily conserved in mammalian genomes and thus presumably function in diverse biological processes. Here, we report the identification of lincRNAs that are regulated by p53. One of these lincRNAs (lincRNA-p21) serves as a repressor in p53-dependent transcriptional responses. Inhibition of lincRNA-p21 affects the expression of hundreds of gene targets enriched for genes normally repressed by p53. The observed transcriptional repression by lincRNA-p21 is mediated through the physical association with hnRNP-K. This interaction is required for proper genomic localization of hnRNP-K at repressed genes and regulation of p53 mediates apoptosis. We propose a model whereby transcription factors activate lincRNAs that serve as key repressors by physically associating with repressive complexes and modulate their localization to sets of previously active genes.
Strand-specific, massively parallel cDNA sequencing (RNA-seq) is a powerful tool for transcript discovery, genome annotation and expression profiling. There are multiple published methods for strand-specific RNA-seq, but no consensus exists as to how to choose between them. Here we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library-construction protocols, including both published and our own methods. We found marked differences in strand specificity, library complexity, evenness and continuity of coverage, agreement with known annotations and accuracy for expression profiling. Weighing each methods performance and ease, we identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.
The study of induced pluripotency often relies on experimental approaches that average measurements across a large population of cells, the majority of which do not become pluripotent. Here we used high-resolution, time-lapse imaging to trace the reprogramming process over 2 weeks from single mouse embryonic fibroblasts (MEFs) to pluripotency factor-positive colonies. This enabled us to calculate a normalized cell-of-origin reprogramming efficiency that takes into account only the initial MEFs that respond to form reprogrammed colonies rather than the larger number of final colonies. Furthermore, this retrospective analysis revealed that successfully reprogramming cells undergo a rapid shift in their proliferative rate that coincides with a reduction in cellular area. This event occurs as early as the first cell division and with similar kinetics in all cells that form induced pluripotent stem (iPS) cell colonies. These data contribute to the theoretical modeling of reprogramming and suggest that certain parts of the reprogramming process follow defined rather than stochastic steps.
Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5 start sites, 3 ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
Coexpression of genes within a functional module can be conserved at great evolutionary distances, whereas the associated regulatory mechanisms can substantially diverge. For example, ribosomal protein (RP) genes are tightly coexpressed in Saccharomyces cerevisiae, but the cis and trans factors associated with them are surprisingly diverged across Ascomycota fungi. Little is known, however, about the functional impact of such changes on actual expression levels or about the selective pressures that affect them. Here, we address this question in the context of the evolution of the regulation of RP gene expression by using a comparative genomics approach together with cross-species functional assays. We show that an activator (Ifh1) and a repressor (Crf1) that control RP gene regulation in normal and stress conditions in S. cerevisiae are derived from the duplication and subsequent specialization of a single ancestral protein. We provide evidence that this regulatory innovation coincides with the duplication of RP genes in a whole-genome duplication (WGD) event and may have been important for tighter control of higher levels of RP transcripts. We find that subsequent loss of the derived repressor led to the loss of a stress-dependent repression of RPs in the fungal pathogen Candida glabrata. Our comparative computational and experimental approach shows how gene duplication can constrain and drive regulatory evolution and provides a general strategy for reconstructing the evolutionary trajectory of gene regulation across species.
Whole-genome sequencing allows researchers to study evolution through the lens of comparative genomics. Several landmark studies in yeast have showcased the utility of this approach for identifying functional elements, tracing the evolution of gene regulatory sites, and the revelation of an ancestral whole-genome duplication event. Such studies first require an accurate and comprehensive mapping of all orthologous loci across all species. In this chapter, we present a computational framework for systematic reconstruction of all gene orthology relations across multiple yeast species. We then discuss how to use the resulting genome- and species-wide catalogue of gene phylogenies to study the histories of gene duplications and losses from a functional genomics perspective. We show how these methods allowed us to uncover the functional constraints underlying gene duplications and losses within Ascomycota fungi, and to highlight the importance and limitations of these evolutionary processes. The analytical framework we present here is generalizable and scalable, and can be applied to an array of comparative genomics needs.
Chromatin organization plays a major role in gene regulation and can affect the function and evolution of new transcriptional programs. However, it can be difficult to decipher the basis of changes in chromatin organization and their functional effect on gene expression. Here, we present a large-scale comparative genomic analysis of the relationship between chromatin organization and gene expression, by measuring mRNA abundance and nucleosome positions genome-wide in 12 Hemiascomycota yeast species. We found substantial conservation of global and functional chromatin organization in all species, including prominent nucleosome-free regions (NFRs) at gene promoters, and distinct chromatin architecture in growth and stress genes. Chromatin organization has also substantially diverged in both global quantitative features, such as spacing between adjacent nucleosomes, and in functional groups of genes. Expression levels, intrinsic anti-nucleosomal sequences, and trans-acting chromatin modifiers all play important, complementary, and evolvable roles in determining NFRs. We identify five mechanisms that couple chromatin organization to evolution of gene regulation and have contributed to the evolution of respiro-fermentation and other key systems, including (1) compensatory evolution of alternative modifiers associated with conserved chromatin organization, (2) a gradual transition from constitutive to trans-regulated NFRs, (3) a loss of intrinsic anti-nucleosomal sequences accompanying changes in chromatin organization and gene expression, (4) re-positioning of motifs from NFRs to nucleosome-occluded regions, and (5) the expanded use of NFRs by paralogous activator-repressor pairs. Our study sheds light on the molecular basis of chromatin organization, and on the role of chromatin organization in the evolution of gene regulation.
Recent studies in budding yeast have shown that antisense transcription occurs at many loci. However, the functional role of antisense transcripts has been demonstrated only in a few cases and it has been suggested that most antisense transcripts may result from promiscuous bi-directional transcription in a dense genome.
Accurate profiling of minute quantities of RNA in a global manner can enable key advances in many scientific and clinical disciplines. Here, we present low-quantity RNA sequencing (LQ-RNAseq), a high-throughput sequencing-based technique allowing whole transcriptome surveys from subnanogram RNA quantities in an amplification/ligation-free manner. LQ-RNAseq involves first-strand cDNA synthesis from RNA templates, followed by 3 polyA tailing of the single-stranded cDNA products and direct single molecule sequencing. We applied LQ-RNAseq to profile S. cerevisiae polyA+ transcripts, demonstrate the reproducibility of the approach across different sample preparations and independent instrument runs, and establish the absolute quantitative power of this method through comparisons with other reported transcript profiling techniques and through utilization of RNA spike-in experiments. We demonstrate the practical application of this approach to define the transcriptional landscape of mouse embryonic and induced pluripotent stem cells, observing transcriptional differences, including over 100 genes exhibiting differential expression between these otherwise very similar stem cell populations. This amplification-independent technology, which utilizes small quantities of nucleic acid and provides quantitative measurements of cellular transcripts, enables global gene expression measurements from minute amounts of materials and offers broad utility in both basic research and translational biology for characterization of rare cells.
Divergent adaptation can be associated with reproductive isolation in speciation . We recently demonstrated the link between divergent adaptation and the onset of reproductive isolation in experimental populations of the yeast Saccharomyces cerevisiae evolved from a single progenitor in either a high-salt or a low-glucose environment . Here, whole-genome resequencing and comparative genome hybridization of representatives of three populations revealed 17 mutations, six of which explained the adaptive increases in mitotic fitness. In two populations evolved in high salt, two different mutations occurred in the proton efflux pump gene PMA1 and the global transcriptional repressor gene CYC8; the ENA genes encoding sodium efflux pumps were overexpressed once through expansion of this gene cluster and once because of mutation in the regulator CYC8. In the population from low glucose, one mutation occurred in MDS3, which modulates growth at high pH, and one in MKT1, a global regulator of mRNAs encoding mitochondrial proteins, the latter recapitulating a naturally occurring variant. A Dobzhansky-Muller (DM) incompatibility between the evolved alleles of PMA1 and MKT1 strongly depressed fitness in the low-glucose environment. This DM interaction is the first reported between experimentally evolved alleles of known genes and shows how reproductive isolation can arise rapidly when divergent selection is strong.
After fertilization the embryonic genome is inactive until transcription is initiated during the maternal-zygotic transition. This transition coincides with the formation of pluripotent cells, which in mammals can be used to generate embryonic stem cells. To study the changes in chromatin structure that accompany pluripotency and genome activation, we mapped the genomic locations of histone H3 molecules bearing lysine trimethylation modifications before and after the maternal-zygotic transition in zebrafish. Histone H3 lysine 27 trimethylation (H3K27me3), which is repressive, and H3K4me3, which is activating, were not detected before the transition. After genome activation, more than 80% of genes were marked by H3K4me3, including many inactive developmental regulatory genes that were also marked by H3K27me3. Sequential chromatin immunoprecipitation demonstrated that the same promoter regions had both trimethylation marks. Such bivalent chromatin domains also exist in embryonic stem cells and are thought to poise genes for activation while keeping them repressed. Furthermore, we found many inactive genes that were uniquely marked by H3K4me3. Despite this activating modification, these monovalent genes were neither expressed nor stably bound by RNA polymerase II. Inspection of published data sets revealed similar monovalent domains in embryonic stem cells. Moreover, H3K4me3 marks could form in the absence of both sequence-specific transcriptional activators and stable association of RNA polymerase II, as indicated by the analysis of an inducible transgene. These results indicate that bivalent and monovalent domains might poise embryonic genes for activation and that the chromatin profile associated with pluripotency is established during the maternal-zygotic transition.
Fusarium species are among the most important phytopathogenic and toxigenic fungi. To understand the molecular underpinnings of pathogenicity in the genus Fusarium, we compared the genomes of three phenotypically diverse species: Fusarium graminearum, Fusarium verticillioides and Fusarium oxysporum f. sp. lycopersici. Our analysis revealed lineage-specific (LS) genomic regions in F. oxysporum that include four entire chromosomes and account for more than one-quarter of the genome. LS regions are rich in transposons and genes with distinct evolutionary profiles but related to pathogenicity, indicative of horizontal acquisition. Experimentally, we demonstrate the transfer of two LS chromosomes between strains of F. oxysporum, converting a non-pathogenic strain into a pathogen. Transfer of LS chromosomes between otherwise genetically isolated strains explains the polyphyletic origin of host specificity and the emergence of new pathogenic lineages in F. oxysporum. These findings put the evolution of fungal pathogenicity into a new perspective.
During the course of a viral infection, viral proteins interact with an array of host proteins and pathways. Here, we present a systematic strategy to elucidate the dynamic interactions between H1N1 influenza and its human host. A combination of yeast two-hybrid analysis and genome-wide expression profiling implicated hundreds of human factors in mediating viral-host interactions. These factors were then examined functionally through depletion analyses in primary lung cells. The resulting data point to potential roles for some unanticipated host and viral proteins in viral infection and the host response, including a network of RNA-binding proteins, components of WNT signaling, and viral polymerase subunits. This multilayered approach provides a comprehensive and unbiased physical and regulatory model of influenza-host interactions and demonstrates a general strategy for uncovering complex host-pathogen relationships.
Regulatory divergence is likely a major driving force in evolution. Comparative genomics is being increasingly used to infer the evolution of gene regulation. Ascomycota fungi are uniquely suited among eukaryotes for regulatory evolution studies, due to broad phylogenetic scope, many sequenced genomes, and tractability of genomic analysis. Here we review recent advances in the identification of the contribution of cis- and trans-factors to expression divergence. Whereas current strategies have led to the discovery of surprising signatures and mechanisms, we still understand very little about the adaptive role of regulatory evolution. Empirical studies including experimental evolution, comparative functional genomics and hybrid and engineered strains are showing early promise toward deciphering the contribution of regulatory divergence to adaptation.
Models of mammalian regulatory networks controlling gene expression have been inferred from genomic data but have largely not been validated. We present an unbiased strategy to systematically perturb candidate regulators and monitor cellular transcriptional responses. We applied this approach to derive regulatory networks that control the transcriptional response of mouse primary dendritic cells to pathogens. Our approach revealed the regulatory functions of 125 transcription factors, chromatin modifiers, and RNA binding proteins, which enabled the construction of a network model consisting of 24 core regulators and 76 fine-tuners that help to explain how pathogen-sensing pathways achieve specificity. This study establishes a broadly applicable, comprehensive, and unbiased approach to reveal the wiring and functions of a regulatory network controlling a major transcriptional response in primary mammalian cells.
Chromatin structure and transcription factor localization can be assayed genome-wide by sequencing genomic DNA fractionated by protein occupancy or other properties, but current technologies involve multiple steps that introduce bias and inefficiency. Here we apply a single-molecule approach to directly sequence chromatin immunoprecipitated DNA with minimal sample manipulation. This method is compatible with just 50 pg of DNA and should thus facilitate charting chromatin maps from limited cell populations.
Regulatory divergence is likely a major driving force in evolution. Comparative transcriptomics provides a new glimpse into the evolution of gene regulation. Ascomycota fungi are uniquely suited among eukaryotes for studies of regulatory evolution, because of broad phylogenetic scope, many sequenced genomes, and facility of genomic analysis. Here we review the substantial divergence in gene expression in Ascomycota and how this is reconciled with the modular organization of transcriptional networks. We show that flexibility and redundancy in both cis-regulation and trans-regulation can lead to changes from altered expression of single genes to wholesale rewiring of regulatory modules. Redundancy thus emerges as a major driving force facilitating expression divergence while preserving the coherent functional organization of a transcriptional response.
Transcriptional regulatory circuits govern how cis and trans factors transform signals into messenger RNA (mRNA) expression levels. With advances in quantitative and high-throughput technologies that allow measurement of gene expression state in different conditions, data that can be used to build and test models of transcriptional regulation is being generated at a rapid pace. Here, we review experimental and computational methods used to derive detailed quantitative circuit models on a small scale and cruder, genome-wide models on a large scale. We discuss the potential of combining small- and large-scale approaches to understand the working and wiring of transcriptional regulatory circuits.
We recently showed that the mammalian genome encodes >1,000 large intergenic noncoding (linc)RNAs that are clearly conserved across mammals and, thus, functional. Gene expression patterns have implicated these lincRNAs in diverse biological processes, including cell-cycle regulation, immune surveillance, and embryonic stem cell pluripotency. However, the mechanism by which these lincRNAs function is unknown. Here, we expand the catalog of human lincRNAs to approximately 3,300 by analyzing chromatin-state maps of various human cell types. Inspired by the observation that the well-characterized lincRNA HOTAIR binds the polycomb repressive complex (PRC)2, we tested whether many lincRNAs are physically associated with PRC2. Remarkably, we observe that approximately 20% of lincRNAs expressed in various cell types are bound by PRC2, and that additional lincRNAs are bound by other chromatin-modifying complexes. Also, we show that siRNA-mediated depletion of certain lincRNAs associated with PRC2 leads to changes in gene expression, and that the up-regulated genes are enriched for those normally silenced by PRC2. We propose a model in which some lincRNAs guide chromatin-modifying complexes to specific genomic loci to regulate gene expression.
Transcriptional responses often consist of regulatory modules - sets of genes with a shared expression pattern that are controlled by the same regulatory mechanisms. Previous methods allow dissecting regulatory modules from genomics data, such as expression profiles, protein-DNA binding, and promoter sequences. In cases where physical protein-DNA data are lacking, such methods are essential for the analysis of the underlying regulatory program.
Lineage-survival oncogenes are activated by somatic DNA alterations in cancers arising from the cell lineages in which these genes play a role in normal development. Here we show that a peak of genomic amplification on chromosome 3q26.33 found in squamous cell carcinomas (SCCs) of the lung and esophagus contains the transcription factor gene SOX2, which is mutated in hereditary human esophageal malformations, is necessary for normal esophageal squamous development, promotes differentiation and proliferation of basal tracheal cells and cooperates in induction of pluripotent stem cells. SOX2 expression is required for proliferation and anchorage-independent growth of lung and esophageal cell lines, as shown by RNA interference experiments. Furthermore, ectopic expression of SOX2 here cooperated with FOXE1 or FGFR2 to transform immortalized tracheobronchial epithelial cells. SOX2-driven tumors show expression of markers of both squamous differentiation and pluripotency. These characteristics identify SOX2 as a lineage-survival oncogene in lung and esophageal SCC.
Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5 and 3 UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.