Detecting positive selection in species with heterogeneous habitats and complex demography is notoriously difficult and prone to statistical biases. The model plant Arabidopsis thaliana exemplifies this problem: In spite of the large amounts of data, little evidence for classic selective sweeps has been found. Moreover, many aspects of the demography are unclear, which makes it hard to judge whether the few signals are indeed signs of selection, or false positives caused by demographic events. Here, we focus on Swedish A. thaliana and we find that the demography can be approximated as a two-population model. Careful analysis of the data shows that such a two island model is characterized by a very old split time that significantly predates the last glacial maximum followed by secondary contact with strong migration. We evaluate selection based on this demography and find that this secondary contact model strongly affects the power to detect sweeps. Moreover, it affects the power differently for northern Sweden (more false positives) as compared with southern Sweden (more false negatives). However, even when the demographic history is accounted for, sweep signals in northern Sweden are stronger than in southern Sweden, with little or no positional overlap. Further simulations including the complex demography and selection confirm that this is not compatible with global selection acting on both populations, and thus can be taken as evidence for local selection within subpopulations of Swedish A. thaliana. This study demonstrates the necessity of combining demographic analyses and sweep scans for the detection of selection, particularly when selection acts predominantly local.
Relating molecular variation to phenotypic diversity is a central goal in evolutionary biology. In Arabidopsis thaliana, FLOWERING LOCUS C (FLC) is a major determinant of variation in vernalization--the acceleration of flowering by prolonged cold. Here, through analysis of 1307 A. thaliana accessions, we identify five predominant FLC haplotypes defined by noncoding sequence variation. Genetic and transgenic experiments show that they are functionally distinct, varying in FLC expression level and rate of epigenetic silencing. Allelic heterogeneity at this single locus accounts for a large proportion of natural variation in vernalization that contributes to adaptation of A. thaliana.
Identifying the factors that influence the outcome of host-microbial interactions is critical to protecting biodiversity, minimizing agricultural losses and improving human health. A few genes that determine symbiosis or resistance to infectious disease have been identified in model species, but a comprehensive examination of how a host genotype influences the structure of its microbial community is lacking. Here we report the results of a field experiment with the model plant Arabidopsis thaliana to identify the fungi and bacteria that colonize its leaves and the host loci that influence the microbe numbers. The composition of this community differs among accessions of A. thaliana. Genome-wide association studies (GWAS) suggest that plant loci responsible for defense and cell wall integrity affect variation in this community. Furthermore, species richness in the bacterial community is shaped by host genetic variation, notably at loci that also influence the reproduction of viruses, trichome branching and morphogenesis.
The exposure of plants to high concentrations of trace metallic elements such as copper involves a remodeling of the root system, characterized by a primary root growth inhibition and an increase in the lateral root density. These characteristics constitute easy and suitable markers for screening mutants altered in their response to copper excess. A forward genetic approach was undertaken in order to discover novel genetic factors involved in the response to copper excess. A Cu(2+) -sensitive mutant named copper modified resistance1 (cmr1) was isolated and a causative mutation in the CMR1 gene was identified by using positional cloning and next-generation sequencing. CMR1 encodes a plant-specific protein of unknown function. The analysis of the cmr1 mutant indicates that the CMR1 protein is required for optimal growth under normal conditions and has an essential role in the stress response. Impairment of the CMR1 activity alters root growth through aberrant activity of the root meristem, and modifies potassium concentration and hormonal balance (ethylene production and auxin accumulation). Our data support a putative role for CMR1 in cell division regulation and meristem maintenance. Research on the role of CMR1 will contribute to the understanding of the plasticity of plants in response to changing environments.
Despite advances in sequencing, the goal of obtaining a comprehensive view of genetic variation in populations is still far from reached. We sequenced 180 lines of A. thaliana from Sweden to obtain as complete a picture as possible of variation in a single region. Whereas simple polymorphisms in the unique portion of the genome are readily identified, other polymorphisms are not. The massive variation in genome size identified by flow cytometry seems largely to be due to 45S rDNA copy number variation, with lines from northern Sweden having particularly large numbers of copies. Strong selection is evident in the form of long-range linkage disequilibrium (LD), as well as in LD between nearby compensatory mutations. Many footprints of selective sweeps were found in lines from northern Sweden, and a massive global sweep was shown to have involved a 700-kb transposition.
The shift from outcrossing to selfing is common in flowering plants, but the genomic consequences and the speed at which they emerge remain poorly understood. An excellent model for understanding the evolution of self fertilization is provided by Capsella rubella, which became self compatible <200,000 years ago. We report a C. rubella reference genome sequence and compare RNA expression and polymorphism patterns between C. rubella and its outcrossing progenitor Capsella grandiflora. We found a clear shift in the expression of genes associated with flowering phenotypes, similar to that seen in Arabidopsis, in which self fertilization evolved about 1 million years ago. Comparisons of the two Capsella species showed evidence of rapid genome-wide relaxation of purifying selection in C. rubella without a concomitant change in transposable element abundance. Overall we document that the transition to selfing may be typified by parallel shifts in gene expression, along with a measurable reduction of purifying selection.
We present JAWAMix5, an out-of-core open-source toolkit for association mapping using high-throughput sequence data. Taking advantage of its HDF5-based implementation, JAWAMix5 stores genotype data on disk and accesses them as though stored in main memory. Therefore, it offers a scalable and fast analysis without concerns about memory usage, whatever the size of the dataset. We have implemented eight functions for association studies, including standard methods (linear models, linear mixed models, rare variants test, analysis in nested association mapping design and local variance component analysis), as well as a novel Bayesian local variance component analysis. Application to real data demonstrates that JAWAMix5 is reasonably fast compared with traditional solutions that load the complete dataset into memory, and that the memory usage is efficient regardless of the dataset size.
Variation in human skin and eye color is substantial and especially apparent in admixed populations, yet the underlying genetic architecture is poorly understood because most genome-wide studies are based on individuals of European ancestry. We study pigmentary variation in 699 individuals from Cape Verde, where extensive West African/European admixture has given rise to a broad range in trait values and genomic ancestry proportions. We develop and apply a new approach for measuring eye color, and identify two major loci (HERC2[OCA2] P = 2.3 × 10(-62), SLC24A5 P = 9.6 × 10(-9)) that account for both blue versus brown eye color and varying intensities of brown eye color. We identify four major loci (SLC24A5 P = 5.4 × 10(-27), TYR P = 1.1 × 10(-9), APBA2[OCA2] P = 1.5 × 10(-8), SLC45A2 P = 6 × 10(-9)) for skin color that together account for 35% of the total variance, but the genetic component with the largest effect (~44%) is average genomic ancestry. Our results suggest that adjacent cis-acting regulatory loci for OCA2 explain the relationship between skin and eye color, and point to an underlying genetic architecture in which several genes of moderate effect act together with many genes of small effect to explain ~70% of the estimated heritability.
Life-history traits controlling the duration and timing of developmental phases in the life cycle jointly determine fitness. Therefore, life-history traits studied in isolation provide an incomplete view on the relevance of life-cycle variation for adaptation. In this study, we examine genetic variation in traits covering the major life history events of the annual species Arabidopsis thaliana: seed dormancy, vegetative growth rate and flowering time. In a sample of 112 genotypes collected throughout the European range of the species, both seed dormancy and flowering time follow a latitudinal gradient independent of the major population structure gradient. This finding confirms previous studies reporting the adaptive evolution of these two traits. Here, however, we further analyze patterns of co-variation among traits. We observe that co-variation between primary dormancy, vegetative growth rate and flowering time also follows a latitudinal cline. At higher latitudes, vegetative growth rate is positively correlated with primary dormancy and negatively with flowering time. In the South, this trend disappears. Patterns of trait co-variation change, presumably because major environmental gradients shift with latitude. This pattern appears unrelated to population structure, suggesting that changes in the coordinated evolution of major life history traits is adaptive. Our data suggest that A. thaliana provides a good model for the evolution of trade-offs and their genetic basis.
A major strength of Arabidopsis thaliana as a model lies in the availability of a large number of naturally occurring inbred lines. Recent studies of A. thaliana population structure, using thousands of accessions from stock center and natural collections, have revealed a robust pattern of isolation by distance at several spatial scales, such that genetically identical individuals are generally found close to each other. However, some individual accessions deviate from this pattern. While some of these may be the products of rare long-distance dispersal events, many deviations may be the result of mis-identification, in the sense that the data regarding location of origin data are incorrect. Here, we aim to identify such discrepancies. Of the 5965 accessions examined, we conclude that 286 deserve special attention as being potentially mis-identified. We describe these suspicious accessions and their possible origins, and advise caution with regard to their use in experiments in which accurate information on geographic origin is important. Finally, we discuss possibilities for maintaining the integrity of stock lines.
Genomic imprinting is an epigenetic phenomenon leading to parent-of-origin specific differential expression of maternally and paternally inherited alleles. In plants, genomic imprinting has mainly been observed in the endosperm, an ephemeral triploid tissue derived after fertilization of the diploid central cell with a haploid sperm cell. In an effort to identify novel imprinted genes in Arabidopsis thaliana, we generated deep sequencing RNA profiles of F1 hybrid seeds derived after reciprocal crosses of Arabidopsis Col-0 and Bur-0 accessions. Using polymorphic sites to quantify allele-specific expression levels, we could identify more than 60 genes with potential parent-of-origin specific expression. By analyzing the distribution of DNA methylation and epigenetic marks established by Polycomb group (PcG) proteins using publicly available datasets, we suggest that for maternally expressed genes (MEGs) repression of the paternally inherited alleles largely depends on DNA methylation or PcG-mediated repression, whereas repression of the maternal alleles of paternally expressed genes (PEGs) predominantly depends on PcG proteins. While maternal alleles of MEGs are also targeted by PcG proteins, such targeting does not cause complete repression. Candidate MEGs and PEGs are enriched for cis-proximal transposons, suggesting that transposons might be a driving force for the evolution of imprinted genes in Arabidopsis. In addition, we find that MEGs and PEGs are significantly faster evolving when compared to other genes in the genome. In contrast to the predominant location of mammalian imprinted genes in clusters, cluster formation was only detected for few MEGs and PEGs, suggesting that clustering is not a major requirement for imprinted gene regulation in Arabidopsis.
We report the 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47 based on 8.3× dideoxy sequence coverage. We predict 32,670 genes in this outcrossing species compared to the 27,025 genes in the selfing species Arabidopsis thaliana. The much smaller 125-Mb genome of A. thaliana, which diverged from A. lyrata 10 million years ago, likely constitutes the derived state for the family. We found evidence for DNA loss from large-scale rearrangements, but most of the difference in genome size can be attributed to hundreds of thousands of small deletions, mostly in noncoding DNA and transposons. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome. The high-quality reference genome sequence for A. lyrata will be an important resource for functional, evolutionary and ecological studies in the genus Arabidopsis.
We have explored the genetic basis of variation in vernalization requirement and response in Arabidopsis accessions, selected on the basis of their phenotypic distinctiveness. Phenotyping of F2 populations in different environments, plus fine mapping, indicated possible causative genes. Our data support the identification of FRI and FLC as candidates for the major-effect QTL underlying variation in vernalization response, and identify a weak FLC allele, caused by a Mutator-like transposon, contributing to flowering time variation in two N. American accessions. They also reveal a number of additional QTL that contribute to flowering time variation after saturating vernalization. One of these was the result of expression variation at the FT locus. Overall, our data suggest that distinct phenotypic variation in the vernalization and flowering response of Arabidopsis accessions is accounted for by variation that has arisen independently at relatively few major-effect loci.
With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, its difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/.
Flowering time (FT) is the developmental transition coupling an internal genetic program with external local and seasonal climate cues. The genetic loci sensitive to predictable environmental signals underlie local adaptation. We dissected natural variation in FT across a new global diversity set of 473 unique accessions, with >12,000 plants across two seasonal plantings in each of two simulated local climates, Spain and Sweden. Genome-wide association mapping was carried out with 213,497 SNPs. A total of 12 FT candidate quantitative trait loci (QTL) were fine-mapped in two independent studies, including 4 located within ±10 kb of previously cloned FT alleles and 8 novel loci. All QTL show sensitivity to planting season and/or simulated location in a multi-QTL mixed model. Alleles at four QTL were significantly correlated with latitude of origin, implying past selection for faster flowering in southern locations. Finally, maximum seed yield was observed at an optimal FT unique to each season and location, with four FT QTL directly controlling yield. Our results suggest that these major, environmentally sensitive FT QTL play an important role in spatial and temporal adaptation.
Genome-wide association mapping is a popular method for using natural variation within a species to generate a genotype-phenotype map. Statistical association between an allele at a locus and the trait in question is used as evidence that variation at the locus is responsible for variation of the trait. Indirect association, however, can give rise to statistically significant results at loci unrelated to the trait. We use a haploid, three-locus, binary genetic model to describe the conditions under which these indirect associations become stronger than any of the causative associations in the organism--even to the point of representing the only associations present in the data. These indirect associations are the result of disequilibrium between multiple factors affecting a single trait. Epistasis and population structure can exacerbate the problem but are not required to create it. From a statistical point of view, indirect associations are true associations rather than the result of stochastic noise: they will not be ameliorated by increasing sampling size or marker density and can be reproduced in independent studies.
The genetic model plant Arabidopsis thaliana, like many plant species, experiences a range of edaphic conditions across its natural habitat. Such heterogeneity may drive local adaptation, though the molecular genetic basis remains elusive. Here, we describe a study in which we used genome-wide association mapping, genetic complementation, and gene expression studies to identify cis-regulatory expression level polymorphisms at the AtHKT1;1 locus, encoding a known sodium (Na(+)) transporter, as being a major factor controlling natural variation in leaf Na(+) accumulation capacity across the global A. thaliana population. A weak allele of AtHKT1;1 that drives elevated leaf Na(+) in this population has been previously linked to elevated salinity tolerance. Inspection of the geographical distribution of this allele revealed its significant enrichment in populations associated with the coast and saline soils in Europe. The fixation of this weak AtHKT1;1 allele in these populations is genetic evidence supporting local adaptation to these potentially saline impacted environments.
The model plant Arabidopsis thaliana exhibits extensive natural variation in resistance to parasites. Immunity is often conferred by resistance (R) genes that permit recognition of specific races of a disease. The number of such R genes and their distribution are poorly understood. In this study, we investigated the basis for resistance to the downy mildew agent Hyaloperonospora arabidopsidis ex parasitica (Hpa) in a global sample of A. thaliana. We implemented a combined genome-wide mapping of resistance using populations of recombinant inbred lines and a collection of wild A. thaliana accessions. We tested the interaction between 96 host genotypes collected worldwide and five strains of Hpa. Then, a fraction of the species-wide resistance was genetically dissected using six recently constructed populations of recombinant inbred lines. We found that resistance is usually governed by single dominant R genes that are concentrated in four genomic regions only. We show that association genetics of resistance to diseases such as downy mildew enables increased mapping resolution from quantitative trait loci interval to candidate gene level. Association patterns in quantitative trait loci intervals indicate that the pool of A. thaliana resistance sources against the tested Hpa isolates may be predominantly confined to six RPP (Resistance to Hpa) loci isolated in previous studies. Our results suggest that combining association and linkage mapping could accelerate resistance gene discovery in plants.
Plants can defend themselves against a wide array of enemies, from microbes to large animals, yet there is great variability in the effectiveness of such defences, both within and between species. Some of this variation can be explained by conflicting pressures from pathogens with different modes of attack. A second explanation comes from an evolutionary tug of war, in which pathogens adapt to evade detection, until the plant has evolved new recognition capabilities for pathogen invasion. If selection is, however, sufficiently strong, susceptible hosts should remain rare. That this is not the case is best explained by costs incurred from constitutive defences in a pest-free environment. Using a combination of forward genetics and genome-wide association analyses, we demonstrate that allelic diversity at a single locus, ACCELERATED CELL DEATH 6 (ACD6), underpins marked pleiotropic differences in both vegetative growth and resistance to microbial infection and herbivory among natural Arabidopsis thaliana strains. A hyperactive ACD6 allele, compared to the reference allele, strongly enhances resistance to a broad range of pathogens from different phyla, but at the same time slows the production of new leaves and greatly reduces the biomass of mature leaves. This allele segregates at intermediate frequency both throughout the worldwide range of A. thaliana and within local populations, consistent with this allele providing substantial fitness benefits despite its marked impact on growth.
Flowering time is a key life-history trait in the plant life cycle. Most studies to unravel the genetics of flowering time in Arabidopsis thaliana have been performed under greenhouse conditions. Here, we describe a study about the genetics of flowering time that differs from previous studies in two important ways: first, we measure flowering time in a more complex and ecologically realistic environment; and, second, we combine the advantages of genome-wide association (GWA) and traditional linkage (QTL) mapping. Our experiments involved phenotyping nearly 20,000 plants over 2 winters under field conditions, including 184 worldwide natural accessions genotyped for 216,509 SNPs and 4,366 RILs derived from 13 independent crosses chosen to maximize genetic and phenotypic diversity. Based on a photothermal time model, the flowering time variation scored in our field experiment was poorly correlated with the flowering time variation previously obtained under greenhouse conditions, reinforcing previous demonstrations of the importance of genotype by environment interactions in A. thaliana and the need to study adaptive variation under natural conditions. The use of 4,366 RILs provides great power for dissecting the genetic architecture of flowering time in A. thaliana under our specific field conditions. We describe more than 60 additive QTLs, all with relatively small to medium effects and organized in 5 major clusters. We show that QTL mapping increases our power to distinguish true from false associations in GWA mapping. QTL mapping also permits the identification of false negatives, that is, causative SNPs that are lost when applying GWA methods that control for population structure. Major genes underpinning flowering time in the greenhouse were not associated with flowering time in this study. Instead, we found a prevalence of genes involved in the regulation of the plant circadian clock. Furthermore, we identified new genomic regions lacking obvious candidate genes.
Although pioneered by human geneticists as a potential solution to the challenging problem of finding the genetic basis of common human diseases, genome-wide association (GWA) studies have, owing to advances in genotyping and sequencing technology, become an obvious general approach for studying the genetics of natural variation and traits of agricultural importance. They are particularly useful when inbred lines are available, because once these lines have been genotyped they can be phenotyped multiple times, making it possible (as well as extremely cost effective) to study many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise. Here we demonstrate the power of this approach by carrying out a GWA study of 107 phenotypes in Arabidopsis thaliana, a widely distributed, predominantly self-fertilizing model plant known to harbour considerable genetic variation for many adaptively important traits. Our results are dramatically different from those of human GWA studies, in that we identify many common alleles of major effect, but they are also, in many cases, harder to interpret because confounding by complex genetics and population structure make it difficult to distinguish true associations from false. However, a-priori candidates are significantly over-represented among these associations as well, making many of them excellent candidates for follow-up experiments. Our study demonstrates the feasibility of GWA studies in A. thaliana and suggests that the approach will be appropriate for many other organisms.
Aquilegia formosa and pubescens are two closely related species belonging to the columbine genus. Despite their morphological and ecological differences, previous studies have revealed a large degree of intercompatibility, as well as little sequence divergence between these two taxa. We compared the inter- and intraspecific patterns of variation for 9 nuclear loci, and found that the two species were practically indistinguishable at the level of DNA sequence polymorphism, indicating either very recent speciation or continued gene flow. As a comparison, we also analyzed variation at two loci across 30 other Aquilegia taxa; this revealed slightly more differentiation among taxa, which seemed best explained by geographic distance. By contrast, we found no evidence for isolation by distance on a more local geographic scale. We conclude that the extremely low levels of genetic differentiation between A. formosa and A. pubescens at neutral loci will facilitate future genome-wide scans for speciation genes.
The population structure of an organism reflects its evolutionary history and influences its evolutionary trajectory. It constrains the combination of genetic diversity and reveals patterns of past gene flow. Understanding it is a prerequisite for detecting genomic regions under selection, predicting the effect of population disturbances, or modeling gene flow. This paper examines the detailed global population structure of Arabidopsis thaliana. Using a set of 5,707 plants collected from around the globe and genotyped at 149 SNPs, we show that while A. thaliana as a species self-fertilizes 97% of the time, there is considerable variation among local groups. This level of outcrossing greatly limits observed heterozygosity but is sufficient to generate considerable local haplotypic diversity. We also find that in its native Eurasian range A. thaliana exhibits continuous isolation by distance at every geographic scale without natural breaks corresponding to classical notions of populations. By contrast, in North America, where it exists as an exotic species, A. thaliana exhibits little or no population structure at a continental scale but local isolation by distance that extends hundreds of km. This suggests a pattern for the development of isolation by distance that can establish itself shortly after an organism fills a new habitat range. It also raises questions about the general applicability of many standard population genetics models. Any model based on discrete clusters of interchangeable individuals will be an uneasy fit to organisms like A. thaliana which exhibit continuous isolation by distance on many scales.
Arabidopsis thaliana is an important model organism for understanding the genetics and molecular biology of plants. Its highly selfing nature, small size, short generation time, small genome size, and wide geographic distribution make it an ideal model organism for understanding natural variation. Genome-wide association studies (GWAS) have proven a useful technique for identifying genetic loci responsible for natural variation in A. thaliana. Previously genotyped accessions (natural inbred lines) can be grown in replicate under different conditions and phenotyped for different traits. These important features greatly simplify association mapping of traits and allow for systematic dissection of the genetics of natural variation by the entire A. thaliana community. To facilitate this, we present GWAPP, an interactive Web-based application for conducting GWAS in A. thaliana. Using an efficient implementation of a linear mixed model, traits measured for a subset of 1386 publicly available ecotypes can be uploaded and mapped with a mixed model and other methods in just a couple of minutes. GWAPP features an extensive, interactive, and user-friendly interface that includes interactive Manhattan plots and linkage disequilibrium plots. It also facilitates exploratory data analysis by implementing features such as the inclusion of candidate polymorphisms in the model as cofactors.
The authors argue that population structure per se is not a problem in genome-wide association studies - the true sources are the environment and the genetic background, and the latter is greatly underappreciated. They conclude that mixed models effectively address this issue.
Understanding the mechanism of cadmium (Cd) accumulation in plants is important to help reduce its potential toxicity to both plants and humans through dietary and environmental exposure. Here, we report on a study to uncover the genetic basis underlying natural variation in Cd accumulation in a world-wide collection of 349 wild collected Arabidopsis thaliana accessions. We identified a 4-fold variation (0.5-2 µg Cd g(-1) dry weight) in leaf Cd accumulation when these accessions were grown in a controlled common garden. By combining genome-wide association mapping, linkage mapping in an experimental F2 population, and transgenic complementation, we reveal that HMA3 is the sole major locus responsible for the variation in leaf Cd accumulation we observe in this diverse population of A. thaliana accessions. Analysis of the predicted amino acid sequence of HMA3 from 149 A. thaliana accessions reveals the existence of 10 major natural protein haplotypes. Association of these haplotypes with leaf Cd accumulation and genetics complementation experiments indicate that 5 of these haplotypes are active and 5 are inactive, and that elevated leaf Cd accumulation is associated with the reduced function of HMA3 caused by a nonsense mutation and polymorphisms that change two specific amino acids.
Genome-wide association studies (GWAS) are a standard approach for studying the genetics of natural variation. A major concern in GWAS is the need to account for the complicated dependence structure of the data, both between loci as well as between individuals. Mixed models have emerged as a general and flexible approach for correcting for population structure in GWAS. Here, we extend this linear mixed-model approach to carry out GWAS of correlated phenotypes, deriving a fully parameterized multi-trait mixed model (MTMM) that considers both the within-trait and between-trait variance components simultaneously for multiple traits. We apply this to data from a human cohort for correlated blood lipid traits from the Northern Finland Birth Cohort 1966 and show greatly increased power to detect pleiotropic loci that affect more than one blood lipid trait. We also apply this approach to an Arabidopsis thaliana data set for flowering measurements in two different locations, identifying loci whose effect depends on the environment.
Population structure causes genome-wide linkage disequilibrium between unlinked loci, leading to statistical confounding in genome-wide association studies. Mixed models have been shown to handle the confounding effects of a diffuse background of large numbers of loci of small effect well, but they do not always account for loci of larger effect. Here we propose a multi-locus mixed model as a general method for mapping complex traits in structured populations. Simulations suggest that our method outperforms existing methods in terms of power as well as false discovery rate. We apply our method to human and Arabidopsis thaliana data, identifying new associations and evidence for allelic heterogeneity. We also show how a priori knowledge from an A. thaliana linkage mapping study can be integrated into our method using a Bayesian approach. Our implementation is computationally efficient, making the analysis of large data sets (n > 10,000) practicable.
Arabidopsis thaliana is native to Eurasia and is naturalized across the world. Its ability to be easily propagated and its high phenotypic variability make it an ideal model system for functional, ecological and evolutionary genetics. To date, analyses of the natural genetic variation of A. thaliana have involved small numbers of individual plants or genetic markers. Here we genotype 1,307 worldwide accessions, including several regional samples, using a 250K SNP chip. This allowed us to produce a high-resolution description of the global pattern of genetic variation. We applied three complementary selection tests and identified new targets of selection. Further, we characterized the pattern of historical recombination in A. thaliana and observed an enrichment of hotspots in its intergenic regions and repetitive DNA, which is consistent with the pattern that is observed for humans but which is strikingly different from that observed in other plant species. We have made the seeds we used to produce this Regional Mapping (RegMap) panel publicly available. This panel comprises one of the largest genomic mapping resources currently available for global natural isolates of a non-human species.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.