DNA repair pathways are good candidates for upper aerodigestive tract cancer susceptibility because of their critical role in maintaining genome integrity. We have selected 13 pathways involved in DNA repair representing 212 autosomal genes. To assess the role of these pathways and their associated genes, two European data sets from the International Head and Neck Cancer Epidemiology consortium were pooled, totaling 1954 cases and 3121 controls, with documented demographic, lifetime alcohol and tobacco consumption information. We applied an innovative approach that tests single nucleotide polymorphism (SNP)-sets within DNA repair pathways and then within genes belonging to the significant pathways. We showed an association between the polymerase pathway and oral cavity/pharynx cancers (P-corrected = 4.45 × 10(-) (2)), explained entirely by the association with one SNP, rs1494961 (P = 2.65 × 10(-) (4)), a missense mutation V306I in the second exon of HELQ gene. We also found an association between the cell cycle regulation pathway and esophagus cancer (P-corrected = 1.48 × 10(-) (2)), explained by three SNPs located within or near CSNK1E gene: rs1534891 (P = 1.27 × 10(-) (4)), rs7289981 (P = 3.37 × 10(-) (3)) and rs13054361 (P = 4.09 × 10(-) (3)). As a first attempt to investigate pathway-level associations, our results suggest a role of specific DNA repair genes/pathways in specific upper aerodigestive tract cancer sites.
FSuite is a user-friendly pipeline developed for exploiting inbreeding information derived from human genomic data. It can make use of single nucleotide polymorphism chip or exome data. Compared with other software, the advantage of FSuite is that it provides a complete suite of scripts to describe and use the inbreeding information. It includes a module to detect inbred individuals and estimate their inbreeding coefficient, a module to describe the proportion of different mating types in the population and the individual probability to be offspring of different mating types that can be useful for population genetic studies. It also allows the identification of shared regions of homozygosity between affected individuals (homozygosity mapping) that can be used to identify rare recessive mutations involved in monogenic or multifactorial diseases.
Genome-wide association studies (GWAS), although efficient to detect genes involved in complex diseases, are not designed to measure the real effect of the genes. This is illustrated here by the example of IL2RA in multiple sclerosis (MS). Association between IL2RA and MS is clearly established, although the functional variation is still unknown: the effect of IL2RA might be better described by several SNPs than by a single one. This study investigates whether a pair of SNPs better explains the observed linkage and association data than a single SNP. In total, 522 trio families and 244 affected sib-pairs were typed for 26 IL2RA SNPs. For each SNP and pairs of SNPs, the phased genotypes of patients and controls were compared to determine the SNP set offering the best risk discrimination. Consistency between the genotype risks provided by the retained set and the identical by descent allele sharing in affected sib-pairs was assessed. After controlling for multiple testing, the set of SNPs rs2256774 and rs3118470, provides the best discrimination between the case and control genotype distributions (P-corrected=0.009). The relative risk between the least and most at-risk genotypes is 3.54 with a 95% confidence interval of [2.14-5.94]. Furthermore, the linkage information provided by the allele sharing between affected sibs is consistent with the retained set (P=0.80) but rejects the SNP reported in the literature (P=0.006). Establishing a valid modeling of a disease gene is essential to test its potential interaction with other genes and to reconstruct the pathophysiological pathways.
The use of a reference control panel in genome-wide association studies is an interesting solution to the problem of how to reduce costs. In such designs, data on relevant environmental factors are usually collected only in cases, making it more difficult to deal with potential gene-environment interactions when testing for genetic association. However, under certain circumstances, neglecting an existing interaction with the environment may be detrimental in terms of statistical power to detect the genetic factor. In this paper, the authors propose a novel method based on a multinomial logistic regression model to overcome the lack of environmental exposure information in controls, by contrasting both exposed and unexposed cases with the control sample. For each case group, a genetic effect-size parameter is estimated, and the genetic association and the gene-environment interaction are tested jointly. The authors evaluate the performance of this method through asymptotic computations and simulations of cases and population controls under different models. In the presence of a gene-environment interaction, this approach outperforms other available methods that test for genetic association and gene-environment interaction either separately or jointly. Interestingly, it even has better power than the joint test requiring full knowledge of the environmental information in both cases and controls.
A recent investigation reported, for the first time, an association between variants in the IFIH1-GCA-KCNH7 locus and multiple sclerosis (MS). We sought to replicate this genetic association in MS with a new independent MS cohort composed of French Caucasian MS trio families. The two most significant IFIH1 single nucleotide polymorphisms, rs1990760 and rs2068330, reported as involved in MS susceptibility, were genotyped in 591 French Caucasian MS trio families, and analyzed using the transmission/disequilibrium test. No association with MS was found (rs1990760, P=0.45 and rs2068330, P=0.27). Similarly, no significant association was detected after stratification for HLA-DRB1*1501 carriers. Reasons that may explain this discrepancy between the original report and our study are discussed.
Although variations in allele frequencies at common SNPs have been extensively studied in different populations, little is known about the stratification of rare variants and its impact on association tests. In this paper, we used Affymetrix 500K genotype data from the WTCCC to investigate if variants in three different frequency categories (below 1%, between 1 and 5%, above 5%) show different stratification patterns in the UK population. We found that these patterns are indeed different. The top principal component extracted from the rare variant category shows poor correlations with any principal component or combination of principal components from the low frequency or common variant categories. These results could suggest that a suitable solution to avoid false positive association due to population stratification would involve adjusting for the respective PCs when testing for variants in different allele frequency categories. However, we found this was not the case both on type 2 diabetes data and on simulated data. Indeed, adjusting rare variant association tests on PCs derived from rare variants does no better to correct for population stratification than adjusting on PCs derived from more common variants. Mixed models perform slightly better for low frequency variants than PC based adjustments but less well for the rarest variants. These results call for the need of new methodological developments specifically devoted to address rare variant stratification issues in association tests.
Not accounting for interaction in association analyses may reduce the power to detect the variants involved. We investigate the powers of different designs to detect under two-locus models the effect of disease-causing variants among several hundreds of markers using family-based association tests by simulation. This setting reflects realistic situations of exploration of linkage regions or of biological pathways. We define four strategies: (S1) single-marker analysis of all Single Nucleotide Polymorphisms (SNPs), (S2) two-marker analysis of all possible SNPs pairs, (S3) lax preliminary selection of SNPs followed by a two-marker analysis of all selected SNP pairs, (S4) stringent preliminary selection of SNPs, each being later paired with all the SNPs for two-marker analysis. Strategy S2 is never the best design, except when there is an inversion of the gene effect (flip-flop model). Testing individual SNPs (S1) is the most efficient when the two genes act multiplicatively. Designs S3 and S4 are the most powerful for nonmultiplicative models. Their respective powers depend on the level of symmetry of the model. Because the true genetic model is unknown, we cannot conclude that one design outperforms another. The optimal approach would be the two-step strategy (S3 or S4) as it is often the most powerful, or the second best. Genet.
Lung cancer (LC) is the leading cause of cancer-related death worldwide and tobacco smoking is the major associated risk factor. DNA repair is an important process, maintaining genome integrity and polymorphisms in DNA repair genes may contribute to susceptibility to LC. To explore the role of DNA repair genes in LC, we conducted a multilevel association study with 1655 single nucleotide polymorphisms (SNPs) in 211 DNA repair genes using 6911 individuals pooled from four genome-wide case-control studies. Single SNP association corroborates previous reports of association with rs3131379, located on the gene MSH5 (P = 3.57 × 10-5) and returns a similar risk estimate. The effect of this SNP is modulated by histological subtype. On the log-additive scale, the odds ratio per allele is 1.04 (0.84-1.30) for adenocarcinomas, 1.52 (1.28-1.80) for squamous cell carcinomas and 1.31 (1.09-1.57) for other histologies (heterogeneity test: P = 9.1 × 10(-)(3)). Gene-based association analysis identifies three repair genes associated with LC (P < 0.01): UBE2N, structural maintenance of chromosomes 1L2 and POLB. Two additional genes (RAD52 and POLN) are borderline significant. Pathway-based association analysis identifies five repair pathways associated with LC (P < 0.01): chromatin structure, DNA polymerases, homologous recombination, genes involved in human diseases with sensitivity to DNA-damaging agents and Rad6 pathway and ubiquitination. This first international pooled analysis of a large dataset unravels the role of specific DNA repair pathways in LC and highlights the importance of accounting for gene and pathway effects when studying LC.
To detect fully penetrant rare recessive variants that could constitute Mendelian subentities of complex diseases, we propose a novel strategy, the HBD-GWAS strategy, which can be applied to genome-wide association study (GWAS) data. This strategy first involves the identification of inbred individuals among cases using the genome-wide SNP data and then focuses on these inbred affected individuals and searches for genomic regions of shared homozygosity by descent that could harbor rare recessive disease-causing variants. In this second step, analogous to homozygosity mapping, a heterogeneity lod-score, HFLOD, is computed to quantify the evidence of linkage provided by the data. In this paper, we evaluate this strategy theoretically under different scenarios and compare its performances with those of linkage analysis using affected sib-pair (ASP) data. If cases affected by these Mendelian subentities are not enriched in the sample of cases, the HBD-GWAS strategy has almost no power to detect them, unless they explain an important part of the disease prevalence. The HBD-GWAS strategy outperforms the ASP linkage strategy only in a very limited number of situations where there exists a strong allelic heterogeneity. When several rare recessive variants within the same gene are involved, the ASP design indeed often fails to detect the gene, whereas, by focusing on inbred individuals using the HBD-GWAS strategy, the gene might be detected provided very large samples of cases are available.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.