RNA-Seq allows one to examine only gene expression as well as expression of noncoding RNAs, alternative splicing, and allele-specific expression. With this increased sensitivity and dynamic range, there are computational and statistical considerations that need to be contemplated, which are highly dependent on the biological question being asked. We highlight these to provide an overview of their importance and the impact they can have on downstream interpretation of the brain transcriptome.
Complex Mus musculus crosses provide increased resolution to examine the relationships between gene expression and behavior. While the advantages are clear, there are numerous analytical and technological concerns that arise from the increased genetic complexity that must be considered. Each of these issues is discussed, providing an initial framework for complex cross study design and planning.
The molecular causes of many hematologic cancers remain unclear. Among these cancers are chronic neutrophilic leukemia (CNL) and atypical (BCR-ABL1-negative) chronic myeloid leukemia (CML), both of which are diagnosed on the basis of neoplastic expansion of granulocytic cells and exclusion of genetic drivers that are known to occur in other myeloproliferative neoplasms and myeloproliferative-myelodysplastic overlap neoplasms.
Determining the functional relevance of identified sequence variants in cancer is a prerequisite to ultimately matching specific therapies with individual patients. This level of mechanistic understanding requires integration of genomic information with complementary functional analyses to identify oncogenic targets and relies on the development of computational frameworks to aid in the prioritization and visualization of these diverse data types. In response to this, we have developed HitWalker, which prioritizes patient variants relative to their weighted proximity to functional assay results in a protein-protein interaction network. It is highly extensible, allowing incorporation of diverse data types to refine prioritization. In addition to a ranked list of variants, we have also devised a simple shortest path-based approach of visualizing the results in an intuitive manner to provide biological interpretation. Availability and implementation: The program, documentation and example data are available as an R package from www.biodevlab.org/HitWalker.html.
Genetic variation contributes to host responses and outcomes following infection by influenza A virus or other viral infections. Yet narrow windows of disease symptoms and confounding environmental factors have made it difficult to identify polymorphic genes that contribute to differential disease outcomes in human populations. Therefore, to control for these confounding environmental variables in a system that models the levels of genetic diversity found in outbred populations such as humans, we used incipient lines of the highly genetically diverse Collaborative Cross (CC) recombinant inbred (RI) panel (the pre-CC population) to study how genetic variation impacts influenza associated disease across a genetically diverse population. A wide range of variation in influenza disease related phenotypes including virus replication, virus-induced inflammation, and weight loss was observed. Many of the disease associated phenotypes were correlated, with viral replication and virus-induced inflammation being predictors of virus-induced weight loss. Despite these correlations, pre-CC mice with unique and novel disease phenotype combinations were observed. We also identified sets of transcripts (modules) that were correlated with aspects of disease. In order to identify how host genetic polymorphisms contribute to the observed variation in disease, we conducted quantitative trait loci (QTL) mapping. We identified several QTL contributing to specific aspects of the host response including virus-induced weight loss, titer, pulmonary edema, neutrophil recruitment to the airways, and transcriptional expression. Existing whole-genome sequence data was applied to identify high priority candidate genes within QTL regions. A key host response QTL was located at the site of the known anti-influenza Mx1 gene. We sequenced the coding regions of Mx1 in the eight CC founder strains, and identified a novel Mx1 allele that showed reduced ability to inhibit viral replication, while maintaining protection from weight loss.
Patient-specific aberrant expression patterns in conjunction with functional screening assays can guide elucidation of the cancer genome architecture and identification of therapeutic targets. Since most statistical methods for expression analysis are focused on differences between experimental groups, the performance of approaches for patient-specific expression analyses are currently less well characterized. A comparison of methods for the identification of genes that are dysregulated relative to a single sample in a given set of experimental samples, to our knowledge, has not been performed.
The androgen receptor (AR) is the principal therapeutic target in prostate cancer. For the past 70 years, androgen deprivation therapy (ADT) has been the major therapeutic focus. However, some patients do not benefit, and those tumors that do initially respond to ADT eventually progress. One recently described mechanism of such an effect is growth and survival-promoting effects of the AR that are exerted independently of the AR ligands, testosterone and dihydrotestosterone. However, specific ligand-independent AR target genes that account for this effect were not well characterized. We show here that c-Myc, which is a key mediator of ligand-independent prostate cancer growth, is a key ligand-independent AR target gene. Using microarray analysis, we found that c-Myc and AR expression levels strongly correlated with each other in tumors from patients with castration-resistant prostate cancer (CRPC) progressing despite ADT. We confirmed that AR directly regulates c-Myc transcription in a ligand-independent manner, that AR and c-Myc suppression reduces ligand-independent prostate cancer cell growth, and that ectopic expression of c-Myc attenuates the anti-growth effects of AR suppression. Importantly, treatment with the bromodomain inhibitor JQ1 suppressed c-Myc function and suppressed ligand-independent prostate cancer cell survival. Our results define a new link between two critical proteins in prostate cancer - AR and c-Myc - and demonstrate the potential of AR and c-Myc-directed therapies to improve prostate cancer control.
DNA methylation of promoter regions is a common event in prostate cancer, one of the most common cancers in men worldwide. Because prior reports demonstrating that DNA methylation is important in prostate cancer studied a limited number of genes, we systematically quantified the DNA methylation status of 1505 CpG dinucleotides for 807 genes in 78 paraffin-embedded prostate cancer samples and three normal prostate samples. The ERG gene, commonly repressed in prostate cells in the absence of an oncogenic fusion to the TMPRSS2 gene, was one of the most commonly methylated genes, occurring in 74% of prostate cancer specimens. In an independent group of patient samples, we confirmed that ERG DNA methylation was common, occurring in 57% of specimens, and cancer-specific. The ERG promoter is marked by repressive chromatin marks mediated by polycomb proteins in both normal prostate cells and prostate cancer cells, which may explain ERGs predisposition to DNA methylation and the fact that tumors with ERG DNA methylation were more methylated, in general. These results demonstrate that bead arrays offer a high-throughput method to discover novel genes with promoter DNA methylation such as ERG, whose measurement may improve our ability to more accurately detect prostate cancer.
C57BL/6J (B6) and DBA/2J (D2) are two of the most commonly used inbred mouse strains in neuroscience research. However, the only currently available mouse genome is based entirely on the B6 strain sequence. Subsequently, oligonucleotide microarray probes are based solely on this B6 reference sequence, making their application for gene expression profiling comparisons across mouse strains dubious due to their allelic sequence differences, including single nucleotide polymorphisms (SNPs). The emergence of next-generation sequencing (NGS) and the RNA-Seq application provides a clear alternative to oligonucleotide arrays for detecting differential gene expression without the problems inherent to hybridization-based technologies. Using RNA-Seq, an average of 22 million short sequencing reads were generated per sample for 21 samples (10 B6 and 11 D2), and these reads were aligned to the mouse reference genome, allowing 16,183 Ensembl genes to be queried in striatum for both strains. To determine differential expression, digital mRNA counting is applied based on reads that map to exons. The current study compares RNA-Seq (Illumina GA IIx) with two microarray platforms (Illumina MouseRef-8 v2.0 and Affymetrix MOE 430 2.0) to detect differential striatal gene expression between the B6 and D2 inbred mouse strains. We show that by using stringent data processing requirements differential expression as determined by RNA-Seq is concordant with both the Affymetrix and Illumina platforms in more instances than it is concordant with only a single platform, and that instances of discordance with respect to direction of fold change were rare. Finally, we show that additional information is gained from RNA-Seq compared to hybridization-based techniques as RNA-Seq detects more genes than either microarray platform. The majority of genes differentially expressed in RNA-Seq were only detected as present in RNA-Seq, which is important for studies with smaller effect sizes where the sensitivity of hybridization-based techniques could bias interpretation.
Deregulation of the Wnt/?-catenin signaling pathway is a hallmark of colon cancer. Mutations in the adenomatous polyposis coli (APC) gene occur in the vast majority of colorectal cancers and are an initiating event in cellular transformation. Cells harboring mutant APC contain elevated levels of the ?-catenin transcription coactivator in the nucleus which leads to abnormal expression of genes controlled by ?-catenin/T-cell factor 4 (TCF4) complexes. Here, we use chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-Seq) to identify ?-catenin binding regions in HCT116 human colon cancer cells. We localized 2168 ?-catenin enriched regions using a concordance approach for integrating the output from multiple peak alignment algorithms. Motif discovery algorithms found a core TCF4 motif (T/A-T/A-C-A-A-A-G), an extended TCF4 motif (A/T/G-C/G-T/A-T/A-C-A-A-A-G) and an AP-1 motif (T-G-A-C/T-T-C-A) to be significantly represented in ?-catenin enriched regions. Furthermore, 417 regions contained both TCF4 and AP-1 motifs. Genes associated with TCF4 and AP-1 motifs bound ?-catenin, TCF4 and c-Jun in vivo and were activated by Wnt signaling and serum growth factors. Our work provides evidence that Wnt/?-catenin and mitogen signaling pathways intersect directly to regulate a defined set of target genes.
Allelic variation is the cornerstone of genetically determined differences in gene expression, gene product structure, physiology, and behavior. However, allelic variation, particularly cryptic (unknown or not annotated) variation, is problematic for follow up analyses. Polymorphisms result in a high incidence of false positive and false negative results in hybridization based analyses and hinder the identification of the true variation underlying genetically determined differences in physiology and behavior. Given the proliferation of mouse genetic models (e.g., knockout models, selectively bred lines, heterogeneous stocks derived from standard inbred strains and wild mice) and the wealth of gene expression microarray and phenotypic studies using genetic models, the impact of naturally-occurring polymorphisms on these data is critical. With the advent of next-generation, high-throughput sequencing, we are now in a position to determine to what extent polymorphisms are currently cryptic in such models and their impact on downstream analyses.
Complex Mus musculus crosses, e.g., heterogeneous stock (HS), provide increased resolution for quantitative trait loci detection. However, increased genetic complexity challenges detection methods, with discordant results due to low data quality or complex genetic architecture. We quantified the impact of theses factors across three mouse crosses and two different detection methods, identifying procedures that greatly improve detection quality. Importantly, HS populations have complex genetic architectures not fully captured by the whole genome kinship matrix, calling for incorporating chromosome specific relatedness information. We analyze three increasingly complex crosses, using gene expression levels as quantitative traits. The three crosses were an F(2) intercross, a HS formed by crossing four inbred strains (HS4), and a HS (HS-CC) derived from the eight lines found in the collaborative cross. Brain (striatum) gene expression and genotype data were obtained using the Illumina platform. We found large disparities between methods, with concordance varying as genetic complexity increased; this problem was more acute for probes with distant regulatory elements (trans). A suite of data filtering steps resulted in substantial increases in reproducibility. Genetic relatedness between samples generated overabundance of detected eQTLs; an adjustment procedure that includes the kinship matrix attenuates this problem. However, we find that relatedness between individuals is not evenly distributed across the genome; information from distinct chromosomes results in relatedness structure different from the whole genome kinship matrix. Shared polymorphisms from distinct chromosomes collectively affect expression levels, confounding eQTL detection. We suggest that considering chromosome specific relatedness can result in improved eQTL detection.
RNA-Seq experiments have shown great potential for transcriptome profiling. While sequencing increases the level of biological detail, integrative data analysis is also important. One avenue is the construction of coexpression networks. Because the capacity of RNA-Seq data for network construction has not been previously evaluated, we constructed a coexpression network using striatal samples, derived its network properties and compared it with microarray-based networks.
Outbreaks of influenza occur on a yearly basis, causing a wide range of symptoms across the human population. Although evidence exists that the host response to influenza infection is influenced by genetic differences in the host, this has not been studied in a system with genetic diversity mirroring that of the human population. Here we used mice from 44 influenza-infected pre-Collaborative Cross lines determined to have extreme phenotypes with regard to the host response to influenza A virus infection. Global transcriptome profiling identified 2671 transcripts that were significantly differentially expressed between mice that showed a severe ("high") and mild ("low") response to infection. Expression quantitative trait loci mapping was performed on those transcripts that were differentially expressed because of differences in host response phenotype to identify putative regulatory regions potentially controlling their expression. Twenty-one significant expression quantitative trait loci were identified, which allowed direct examination of genes associated with regulation of host response to infection. To perform initial validation of our findings, quantitative polymerase chain reaction was performed in the infected founder strains, and we were able to confirm or partially confirm more than 70% of those tested. In addition, we explored putative causal and reactive (downstream) relationships between the significantly regulated genes and others in the high or low response groups using structural equation modeling. By using systems approaches and a genetically diverse population, we were able to develop a novel framework for identifying the underlying biological subnetworks under host genetic control during influenza virus infection.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.