Pediatric Ewing sarcoma is characterized by the expression of chimeric fusions of EWS and ETS family transcription factors, representing a paradigm for studying cancers driven by transcription factor rearrangements. In this study, we describe the somatic landscape of pediatric Ewing sarcoma. These tumors are among the most genetically normal cancers characterized to date, with only EWS-ETS rearrangements identified in the majority of tumors. STAG2 loss, however, is present in more than 15% of Ewing sarcoma tumors; occurs by point mutation, rearrangement, and likely nongenetic mechanisms; and is associated with disease dissemination. Perhaps the most striking finding is the paucity of mutations in immediately targetable signal transduction pathways, highlighting the need for new therapeutic approaches to target EWS-ETS fusions in this disease.
Retrotransposons constitute a major source of genetic variation, and somatic retrotransposon insertions have been reported in cancer. Here, we applied TranspoSeq, a computational framework that identifies retrotransposon insertions from sequencing data, to whole genomes from 200 tumor/normal pairs across 11 tumor types as part of The Cancer Genome Atlas (TCGA) Pan-Cancer Project. In addition to novel germline polymorphisms, we find 810 somatic retrotransposon insertions primarily in lung squamous, head and neck, colorectal, and endometrial carcinomas. Many somatic retrotransposon insertions occur in known cancer genes. We find that high somatic retrotransposition rates in tumors are associated with high rates of genomic rearrangement and somatic mutation. Finally, we developed TranspoSeq-Exome to interrogate an additional 767 tumor samples with hybrid-capture exome data and discovered 35 novel somatic retrotransposon insertions into exonic regions, including an insertion into an exon of the PTEN tumor suppressor gene. The results of this large-scale, comprehensive analysis of retrotransposon movement across tumor types suggest that somatic retrotransposon insertions may represent an important class of structural variation in cancer.
Small cell lung carcinoma (SCLC) is a highly lethal, smoking-associated cancer with few known targetable genetic alterations. Using genome sequencing, we characterized the somatic evolution of a genetically engineered mouse model (GEMM) of SCLC initiated by loss of Trp53 and Rb1. We identified alterations in DNA copy number and complex genomic rearrangements and demonstrated a low somatic point mutation frequency in the absence of tobacco mutagens. Alterations targeting the tumor suppressor Pten occurred in the majority of murine SCLC studied, and engineered Pten deletion accelerated murine SCLC and abrogated loss of Chr19 in Trp53; Rb1; Pten compound mutant tumors. Finally, we found evidence for polyclonal and sequential metastatic spread of murine SCLC by comparative sequencing of families of related primary tumors and metastases. We propose a temporal model of SCLC tumorigenesis with implications for human SCLC therapeutics and the nature of cancer-genome evolution in GEMMs.
Craniopharyngiomas are epithelial tumors that typically arise in the suprasellar region of the brain. Patients experience substantial clinical sequelae from both extension of the tumors and therapeutic interventions that damage the optic chiasm, the pituitary stalk and the hypothalamic area. Using whole-exome sequencing, we identified mutations in CTNNB1 (?-catenin) in nearly all adamantinomatous craniopharyngiomas examined (11/12, 92%) and recurrent mutations in BRAF (resulting in p.Val600Glu) in all papillary craniopharyngiomas (3/3, 100%). Targeted genotyping revealed BRAF p.Val600Glu in 95% of papillary craniopharyngiomas (36 of 39 tumors) and mutation of CTNNB1 in 96% of adamantinomatous craniopharyngiomas (51 of 53 tumors). The CTNNB1 and BRAF mutations were clonal in each tumor subtype, and we detected no other recurrent mutations or genomic aberrations in either subtype. Adamantinomatous and papillary craniopharyngiomas harbor mutations that are mutually exclusive and clonal. These findings have important implications for the diagnosis and treatment of these neoplasms.
Cervical cancer is responsible for 10-15% of cancer-related deaths in women worldwide. The aetiological role of infection with high-risk human papilloma viruses (HPVs) in cervical carcinomas is well established. Previous studies have also implicated somatic mutations in PIK3CA, PTEN, TP53, STK11 and KRAS as well as several copy-number alterations in the pathogenesis of cervical carcinomas. Here we report whole-exome sequencing analysis of 115 cervical carcinoma-normal paired samples, transcriptome sequencing of 79 cases and whole-genome sequencing of 14 tumour-normal pairs. Previously unknown somatic mutations in 79 primary squamous cell carcinomas include recurrent E322K substitutions in the MAPK1 gene (8%), inactivating mutations in the HLA-B gene (9%), and mutations in EP300 (16%), FBXW7 (15%), NFE2L2 (4%), TP53 (5%) and ERBB2 (6%). We also observe somatic ELF3 (13%) and CBFB (8%) mutations in 24 adenocarcinomas. Squamous cell carcinomas have higher frequencies of somatic nucleotide substitutions occurring at cytosines preceded by thymines (Tp*C sites) than adenocarcinomas. Gene expression levels at HPV integration sites were statistically significantly higher in tumours with HPV integration compared with expression of the same genes in tumours without viral integration at the same site. These data demonstrate several recurrent genomic alterations in cervical carcinomas that suggest new strategies to combat this disease.
MOSAIK is a stable, sensitive and open-source program for mapping second and third-generation sequencing reads to a reference genome. Uniquely among current mapping tools, MOSAIK can align reads generated by all the major sequencing technologies, including Illumina, Applied Biosystems SOLiD, Roche 454, Ion Torrent and Pacific BioSciences SMRT. Indeed, MOSAIK was the only aligner to provide consistent mappings for all the generated data (sequencing technologies, low-coverage and exome) in the 1000 Genomes Project. To provide highly accurate alignments, MOSAIK employs a hash clustering strategy coupled with the Smith-Waterman algorithm. This method is well-suited to capture mismatches as well as short insertions and deletions. To support the growing interest in larger structural variant (SV) discovery, MOSAIK provides explicit support for handling known-sequence SVs, e.g. mobile element insertions (MEIs) as well as generating outputs tailored to aid in SV discovery. All variant discovery benefits from an accurate description of the read placement confidence. To this end, MOSAIK uses a neural-network based training scheme to provide well-calibrated mapping quality scores, demonstrated by a correlation coefficient between MOSAIK assigned and actual mapping qualities greater than 0.98. In order to ensure that studies of any genome are supported, a training pipeline is provided to ensure optimal mapping quality scores for the genome under investigation. MOSAIK is multi-threaded, open source, and incorporated into our command and pipeline launcher system GKNO (http://gkno.me).
Lung squamous cell carcinoma (SCC) is the second most prevalent type of lung cancer. Currently, no targeted therapeutics are approved for treatment of this cancer, largely because of a lack of systematic understanding of the molecular pathogenesis of the disease. To identify therapeutic targets and perform comparative analyses of lung SCC, we probed somatic genome alterations of lung SCC by using samples from Korean patients.
Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.
Elevated resting heart rate is associated with greater risk of cardiovascular disease and mortality. In a 2-stage meta-analysis of genome-wide association studies in up to 181,171 individuals, we identified 14 new loci associated with heart rate and confirmed associations with all 7 previously established loci. Experimental downregulation of gene expression in Drosophila melanogaster and Danio rerio identified 20 genes at 11 loci that are relevant for heart rate regulation and highlight a role for genes involved in signal transmission, embryonic cardiac development and the pathophysiology of dilated cardiomyopathy, congenital heart failure and/or sudden cardiac death. In addition, genetic susceptibility to increased heart rate is associated with altered cardiac conduction and reduced risk of sick sinus syndrome, and both heart rate-increasing and heart rate-decreasing variants associate with risk of atrial fibrillation. Our findings provide fresh insights into the mechanisms regulating heart rate and identify new therapeutic targets.
The incidence of esophageal adenocarcinoma (EAC) has risen 600% over the last 30 years. With a 5-year survival rate of ~15%, the identification of new therapeutic targets for EAC is greatly important. We analyze the mutation spectra from whole-exome sequencing of 149 EAC tumor-normal pairs, 15 of which have also been subjected to whole-genome sequencing. We identify a mutational signature defined by a high prevalence of A>C transversions at AA dinucleotides. Statistical analysis of exome data identified 26 significantly mutated genes. Of these genes, five (TP53, CDKN2A, SMAD4, ARID1A and PIK3CA) have previously been implicated in EAC. The new significantly mutated genes include chromatin-modifying factors and candidate contributors SPG20, TLR4, ELMO1 and DOCK2. Functional analyses of EAC-derived mutations in ELMO1 identifies increased cellular invasion. Therefore, we suggest the potential activation of the RAC1 pathway as a contributor to EAC tumorigenesis.
Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 affected individuals (cases) using a combination of whole-exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per Mb (0.48 nonsilent) and notably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, and an additional 7.1% had focal deletions), MYCN (1.7%, causing a recurrent p.Pro44Leu alteration) and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1 and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies that rely on frequently altered oncogenic drivers.
As researchers begin probing deep coverage sequencing data for increasingly rare mutations and subclonal events, the fidelity of next generation sequencing (NGS) laboratory methods will become increasingly critical. Although error rates for sequencing and polymerase chain reaction (PCR) are well documented, the effects that DNA extraction and other library preparation steps could have on downstream sequence integrity have not been thoroughly evaluated. Here, we describe the discovery of novel C > A/G > T transversion artifacts found at low allelic fractions in targeted capture data. Characteristics such as sequencer read orientation and presence in both tumor and normal samples strongly indicated a non-biological mechanism. We identified the source as oxidation of DNA during acoustic shearing in samples containing reactive contaminants from the extraction process. We show generation of 8-oxoguanine (8-oxoG) lesions during DNA shearing, present analysis tools to detect oxidation in sequencing data and suggest methods to reduce DNA oxidation through the introduction of antioxidants. Further, informatics methods are presented to confidently filter these artifacts from sequencing data sets. Though only seen in a low percentage of reads in affected samples, such artifacts could have profoundly deleterious effects on the ability to confidently call rare mutations, and eliminating other possible sources of artifacts should become a priority for the research community.
Clonal evolution is a key feature of cancer progression and relapse. We studied intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases by integrating whole-exome sequence and copy number to measure the fraction of cancer cells harboring each somatic mutation. We identified driver mutations as predominantly clonal (e.g., MYD88, trisomy 12, and del(13q)) or subclonal (e.g., SF3B1 and TP53), corresponding to earlier and later events in CLL evolution. We sampled leukemia cells from 18 patients at two time points. Ten of twelve CLL cases treated with chemotherapy (but only one of six without treatment) underwent clonal evolution, predominantly involving subclones with driver mutations (e.g., SF3B1 and TP53) that expanded over time. Furthermore, presence of a subclonal driver mutation was an independent risk factor for rapid disease progression. Our study thus uncovers patterns of clonal evolution in CLL, providing insights into its stepwise transformation, and links the presence of subclones with adverse clinical outcomes.
As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.
Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function.
Whole-genome sequencing using massively parallel sequencing technologies enables accurate detection of somatic rearrangements in cancer. Pinpointing large numbers of rearrangement breakpoints to base-pair resolution allows analysis of rearrangement microhomology and genomic location for every sample. Here we analyze 95 tumor genome sequences from breast, head and neck, colorectal, and prostate carcinomas, and from melanoma, multiple myeloma, and chronic lymphocytic leukemia. We discover three genomic factors that are significantly correlated with the distribution of rearrangements: replication time, transcription rate, and GC content. The correlation is complex, and different patterns are observed between tumor types, within tumor types, and even between different types of rearrangements. Mutations in the APC gene correlate with and, hence, potentially contribute to DNA breakage in late-replicating, low %GC, untranscribed regions of the genome. We show that somatic rearrangements display less microhomology than germline rearrangements, and that breakpoint loci are correlated with local hypermutability with a particular enrichment for transversions.
Cancer is principally considered a genetic disease, and numerous mutations are thought essential to drive its growth. However, the existence of genomically stable cancers and the emergence of mutations in genes that encode chromatin remodelers raise the possibility that perturbation of chromatin structure and epigenetic regulation are capable of driving cancer formation. Here we sequenced the exomes of 35 rhabdoid tumors, highly aggressive cancers of early childhood characterized by biallelic loss of SMARCB1, a subunit of the SWI/SNF chromatin remodeling complex. We identified an extremely low rate of mutation, with loss of SMARCB1 being essentially the sole recurrent event. Indeed, in 2 of the cancers there were no other identified mutations. Our results demonstrate that high mutation rates are dispensable for the genesis of cancers driven by mutation of a chromatin remodeling complex. Consequently, cancer can be a remarkably genetically simple disease.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.