Peroxisome proliferator-activated receptor gamma (PPARG) is a master transcriptional regulator of adipocyte differentiation and a canonical target of antidiabetic thiazolidinedione medications. In rare families, loss-of-function (LOF) mutations in PPARG are known to cosegregate with lipodystrophy and insulin resistance; in the general population, the common P12A variant is associated with a decreased risk of type 2 diabetes (T2D). Whether and how rare variants in PPARG and defects in adipocyte differentiation influence risk of T2D in the general population remains undetermined. By sequencing PPARG in 19,752 T2D cases and controls drawn from multiple studies and ethnic groups, we identified 49 previously unidentified, nonsynonymous PPARG variants (MAF < 0.5%). Considered in aggregate (with or without computational prediction of functional consequence), these rare variants showed no association with T2D (OR = 1.35; P = 0.17). The function of the 49 variants was experimentally tested in a novel high-throughput human adipocyte differentiation assay, and nine were found to have reduced activity in the assay. Carrying any of these nine LOF variants was associated with a substantial increase in risk of T2D (OR = 7.22; P = 0.005). The combination of large-scale DNA sequencing and functional testing in the laboratory reveals that approximately 1 in 1,000 individuals carries a variant in PPARG that reduces function in a human adipocyte differentiation assay and is associated with a substantial risk of T2D.
Exome sequencing studies in complex diseases are challenged by the allelic heterogeneity, large number and modest effect sizes of associated variants on disease risk and the presence of large numbers of neutral variants, even in phenotypically relevant genes. Isolated populations with recent bottlenecks offer advantages for studying rare variants in complex diseases as they have deleterious variants that are present at higher frequencies as well as a substantial reduction in rare neutral variation. To explore the potential of the Finnish founder population for studying low-frequency (0.5-5%) variants in complex diseases, we compared exome sequence data on 3,000 Finns to the same number of non-Finnish Europeans and discovered that, despite having fewer variable sites overall, the average Finn has more low-frequency loss-of-function variants and complete gene knockouts. We then used several well-characterized Finnish population cohorts to study the phenotypic effects of 83 enriched loss-of-function variants across 60 phenotypes in 36,262 Finns. Using a deep set of quantitative traits collected on these cohorts, we show 5 associations (p<5×10??) including splice variants in LPA that lowered plasma lipoprotein(a) levels (P?=?1.5×10?¹¹?). Through accessing the national medical records of these participants, we evaluate the LPA finding via Mendelian randomization and confirm that these splice variants confer protection from cardiovascular disease (OR?=?0.84, P?=?3×10??), demonstrating for the first time the correlation between very low levels of LPA in humans with potential therapeutic implications for cardiovascular diseases. More generally, this study articulates substantial advantages for studying the role of rare variation in complex phenotypes in founder populations like the Finns and by combining a unique population genetic history with data from large population cohorts and centralized research access to National Health Registers.
Finnish samples have been extensively utilized in studying single-gene disorders, where the founder effect has clearly aided in discovery, and more recently in genome-wide association studies of complex traits, where the founder effect has had less obvious impacts. As the field starts to explore rare variants' contribution to polygenic traits, it is of great importance to characterize and confirm the Finnish founder effect in sequencing data and to assess its implications for rare-variant association studies. Here, we employ forward simulation, guided by empirical deep resequencing data, to model the genetic architecture of quantitative polygenic traits in both the general European and the Finnish populations simultaneously. We demonstrate that power of rare-variant association tests is higher in the Finnish population, especially when variants' phenotypic effects are tightly coupled with fitness effects and therefore reflect a greater contribution of rarer variants. SKAT-O, variable-threshold tests, and single-variant tests are more powerful than other rare-variant methods in the Finnish population across a range of genetic models. We also compare the relative power and efficiency of exome array genotyping to those of high-coverage exome sequencing. At a fixed cost, less expensive genotyping strategies have far greater power than sequencing; in a fixed number of samples, however, genotyping arrays miss a substantial portion of genetic signals detected in sequencing, even in the Finnish founder population. As genetic studies probe sequence variation at greater depth in more diverse populations, our simulation approach provides a framework for evaluating various study designs for gene discovery.
Loss-of-function mutations protective against human disease provide in vivo validation of therapeutic targets, but none have yet been described for type 2 diabetes (T2D). Through sequencing or genotyping of ~150,000 individuals across 5 ancestry groups, we identified 12 rare protein-truncating variants in SLC30A8, which encodes an islet zinc transporter (ZnT8) and harbors a common variant (p.Trp325Arg) associated with T2D risk and glucose and proinsulin levels. Collectively, carriers of protein-truncating variants had 65% reduced T2D risk (P = 1.7 × 10(-6)), and non-diabetic Icelandic carriers of a frameshift variant (p.Lys34Serfs*50) demonstrated reduced glucose levels (-0.17 s.d., P = 4.6 × 10(-4)). The two most common protein-truncating variants (p.Arg138* and p.Lys34Serfs*50) individually associate with T2D protection and encode unstable ZnT8 proteins. Previous functional study of SLC30A8 suggested that reduced zinc transport increases T2D risk, and phenotypic heterogeneity was observed in mouse Slc30a8 knockouts. In contrast, loss-of-function mutations in humans provide strong evidence that SLC30A8 haploinsufficiency protects against T2D, suggesting ZnT8 inhibition as a therapeutic strategy in T2D prevention.
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.
Two distinct alleles in the gene encoding apolipoprotein L1 (APOL1), a major component of HDL, confer protection against Trypanosoma brucei rhodesiense infection and also increase risk for chronic kidney disease (CKD). Approximately 14% of African-Americans carry two APOL1 risk alleles, accounting for the high CKD burden in this population.
We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.
The genetic architecture of human diseases governs the success of genetic mapping and the future of personalized medicine. Although numerous studies have queried the genetic basis of common disease, contradictory hypotheses have been advocated about features of genetic architecture (for example, the contribution of rare versus common variants). We developed an integrated simulation framework, calibrated to empirical data, to enable the systematic evaluation of such hypotheses. For type 2 diabetes (T2D), two simple parameters-(i) the target size for causal mutation and (ii) the coupling between selection and phenotypic effect-define a broad space of architectures. Whereas extreme models are excluded by the combination of epidemiology, linkage and genome-wide association studies, many models remain consistent, including those where rare variants explain either little (<25%) or most (>80%) of T2D heritability. Ongoing sequencing and genotyping studies will further constrain the space of possible architectures, but very large samples (for example, >250,000 unselected individuals) will be required to localize most of the heritability underlying T2D and other traits characterized by these models.
Genome sequencing can identify individuals in the general population who harbor rare coding variants in genes for Mendelian disorders and who may consequently have increased disease risk. Previous studies of rare variants in phenotypically extreme individuals display ascertainment bias and may demonstrate inflated effect-size estimates. We sequenced seven genes for maturity-onset diabetes of the young (MODY) in well-phenotyped population samples (n = 4,003). We filtered rare variants according to two prediction criteria for disease-causing mutations: reported previously in MODY or satisfying stringent de novo thresholds (rare, conserved and protein damaging). Approximately 1.5% and 0.5% of randomly selected individuals from the Framingham and Jackson Heart Studies, respectively, carry variants from these two classes. However, the vast majority of carriers remain euglycemic through middle age. Accurate estimates of variant effect sizes from population-based sequencing are needed to avoid falsely predicting a substantial fraction of individuals as being at risk for MODY or other Mendelian diseases.
To characterize the role of rare complete human knockouts in autism spectrum disorders (ASDs), we identify genes with homozygous or compound heterozygous loss-of-function (LoF) variants (defined as nonsense and essential splice sites) from exome sequencing of 933 cases and 869 controls. We identify a 2-fold increase in complete knockouts of autosomal genes with low rates of LoF variation (? 5% frequency) in cases and estimate a 3% contribution to ASD risk by these events, confirming this observation in an independent set of 563 probands and 4,605 controls. Outside the pseudoautosomal regions on the X chromosome, we similarly observe a significant 1.5-fold increase in rare hemizygous knockouts in males, contributing to another 2% of ASDs in males. Taken together, these results provide compelling evidence that rare autosomal and X chromosome complete gene knockouts are important inherited risk factors for ASD.
Anophthalmia/microphthalmia (A/M) is caused by mutations in several different transcription factors, but mutations in each causative gene are relatively rare, emphasizing the need for a testing approach that screens multiple genes simultaneously. We used next-generation sequencing to screen 15 A/M patients for mutations in 9 pathogenic genes to evaluate this technology for screening in A/M.
Mendelian disorders are individually rare but collectively common, forming a long tail of genetic disease. A single highly accurate assay for this long tail would allow the scaling up of the Jewish communitys successful campaign of population screening for Tay-Sachs disease to the general population, thereby improving millions of lives, greatly benefiting minority health and saving billions of dollars. This need has been addressed by designing a universal carrier test: a non-invasive, saliva-based assay for more than 100 Mendelian diseases across all major population groups. The test has been exhaustively validated with a median of 147 positive and 525 negative samples per variant, demonstrating a multiplex assay whose performance compares favourably with the previous standard of care, namely blood-based single-gene carrier tests. Because the test represents a dramatic reduction in the cost and complexity of large-scale population screening, an end to many preventable genetic diseases is now in sight. Moreover, given that the assay is inexpensive and requires only a saliva sample, it is now increasingly feasible to make carrier testing a routine part of preconception care.
We developed Graemlin 2.0, a new multiple network aligner with (1) a new multi-stage approach to local network alignment; (2) a novel scoring function that can use arbitrary features of a multiple network alignment, such as protein deletions, protein duplications, protein mutations, and interaction losses; (3) a parameter learning algorithm that uses a training set of known network alignments to learn parameters for our scoring function and thereby adapt it to any set of networks; and (4) an algorithm that uses our scoring function to find approximate multiple network alignments in linear time. We tested Graemlin 2.0s accuracy on protein interaction networks from IntAct, DIP, and the Stanford Network Database. We show that, on each of these datasets, Graemlin 2.0 has higher sensitivity and specificity than existing network aligners. Graemlin 2.0 is available under the GNU public license at http://graemlin.stanford.edu .
The extent to which variants in the protein-coding sequence of genes contribute to risk of rheumatoid arthritis (RA) is unknown. In this study, we addressed this issue by deep exon sequencing and large-scale genotyping of 25 biological candidate genes located within RA risk loci discovered by genome-wide association studies (GWASs). First, we assessed the contribution of rare coding variants in the 25 genes to the risk of RA in a pooled sequencing study of 500 RA cases and 650 controls of European ancestry. We observed an accumulation of rare nonsynonymous variants exclusive to RA cases in IL2RA and IL2RB (burden test: p = 0.007 and p = 0.018, respectively). Next, we assessed the aggregate contribution of low-frequency and common coding variants to the risk of RA by dense genotyping of the 25 gene loci in 10,609 RA cases and 35,605 controls. We observed a strong enrichment of coding variants with a nominal signal of association with RA (p < 0.05) after adjusting for the best signal of association at the loci (p(enrichment) = 6.4 × 10(-4)). For one locus containing CD2, we found that a missense variant, rs699738 (c.798C>A [p.His266Gln]), and a noncoding variant, rs624988, reside on distinct haplotypes and independently contribute to the risk of RA (p = 4.6 × 10(-6)). Overall, our results indicate that variants (distributed across the allele-frequency spectrum) within the protein-coding portion of a subset of biological candidate genes identified by GWASs contribute to the risk of RA. Further, we have demonstrated that very large sample sizes will be required for comprehensively identifying the independent alleles contributing to the missing heritability of RA.
Rare sarcomere protein variants cause dominant hypertrophic and dilated cardiomyopathies. To evaluate whether allelic variants in eight sarcomere genes are associated with cardiac morphology and function in the community, we sequenced 3,600 individuals from the Framingham Heart Study (FHS) and Jackson Heart Study (JHS) cohorts. Out of the total, 11.2% of individuals had one or more rare nonsynonymous sarcomere variants. The prevalence of likely pathogenic sarcomere variants was 0.6%, twice the previous estimates; however, only four of the 22 individuals had clinical manifestations of hypertrophic cardiomyopathy. Rare sarcomere variants were associated with an increased risk for adverse cardiovascular events (hazard ratio: 2.3) in the FHS cohort, suggesting that cardiovascular risk assessment in the general population can benefit from rare variant analysis.
High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF < 5%), when low coverage sequence reads are added to dense genome-wide SNP arrays--the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling.
Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.