Genomic selection (GS) is a method to predict the genetic value of selection candidates based on the genomic estimated breeding value (GEBV) predicted from high-density markers positioned throughout the genome. Unlike marker-assisted selection, the GEBV is based on all markers including both minor and major marker effects. Thus, the GEBV may capture more of the genetic variation for the particular trait under selection.
Population structure must be evaluated before optimization of the training set population. Maximizing the phenotypic variance captured by the training set is important for optimal performance. The optimization of the training set (TRS) in genomic selection has received much interest in both animal and plant breeding, because it is critical to the accuracy of the prediction models. In this study, five different TRS sampling algorithms, stratified sampling, mean of the coefficient of determination (CDmean), mean of predictor error variance (PEVmean), stratified CDmean (StratCDmean) and random sampling, were evaluated for prediction accuracy in the presence of different levels of population structure. In the presence of population structure, the most phenotypic variation captured by a sampling method in the TRS is desirable. The wheat dataset showed mild population structure, and CDmean and stratified CDmean methods showed the highest accuracies for all the traits except for test weight and heading date. The rice dataset had strong population structure and the approach based on stratified sampling showed the highest accuracies for all traits. In general, CDmean minimized the relationship between genotypes in the TRS, maximizing the relationship between TRS and the test set. This makes it suitable as an optimization criterion for long-term selection. Our results indicated that the best selection criterion used to optimize the TRS seems to depend on the interaction of trait architecture and population structure.
Cassava mosaic disease (CMD), caused by different species of cassava mosaic geminiviruses (CMGs), is the most important disease of cassava in Africa and the Indian sub-continent. The cultivated cassava species is protected from CMD by polygenic resistance introgressed from the wild species Manihot glaziovii and a dominant monogenic type of resistance, named CMD2, discovered in African landraces. The ability of the monogenic resistance to confer high levels of resistance in different genetic backgrounds has led recently to its extensive usage in breeding across Africa as well as pre-emptive breeding in Latin America. However, most of the landraces carrying the monogenic resistance are morphologically very similar and come from a geographically restricted area of West Africa, raising the possibility that the diversity of the single-gene resistance could be very limited, or even located at a single locus. Several mapping studies, employing bulk segregant analysis, in different genetic backgrounds have reported additional molecular markers linked to supposedly new resistance genes. However, it is not possible to tell if these are indeed new genes in the absence adequate genetic map framework or allelism tests. To address this important question, a high-density single nucleotide polymorphism (SNP) map of cassava was developed through genotyping-by-sequencing a bi-parental mapping population (N=180) that segregates for the dominant monogenic resistance to CMD. Virus screening using PCR showed that CMD symptoms and presence of virus were strongly correlated (r=0.98). Genome-wide scan and high-resolution composite interval mapping using 6756 SNPs uncovered a single locus with large effect (R(2)=0.74). Projection of the previously published resistance-linked microsatellite markers showed that they co-occurred in the same chromosomal location surrounding the presently mapped resistance locus. Moreover, their relative distance to the mapped resistance locus correlated with the reported degree of linkage with the resistance phenotype. Cluster analysis of the landraces first shown to have this type of resistance revealed that they are very closely related, if not identical. These findings suggest that there is a single source of monogenic resistance in the crop's genepool tracing back to a common ancestral clone. In the absence of further resistance diversification, the long-term effectiveness of the single gene resistance is known to be precarious, given the potential to be overcome by CMGs due to their fast-paced evolutionary rate. However, combining the quantitative with the qualitative type of resistance may ensure that this resistance gene continues to offer protection to cassava, a crop that is depended upon by millions of people in Africa against the devastating onslaught of CMGs.
Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) arrays. Therefore, GBS has become an attractive alternative technology for genomic selection. However, the use of GBS data poses important challenges, and the accuracy of genomic prediction using GBS is currently undergoing investigation in several crops, including maize, wheat, and cassava. The main objective of this study was to evaluate various methods for incorporating GBS information and compare them with pedigree models for predicting genetic values of lines from two maize populations evaluated for different traits measured in different environments (experiments 1 and 2). Given that GBS data come with a large percentage of uncalled genotypes, we evaluated methods using nonimputed, imputed, and GBS-inferred haplotypes of different lengths (short or long). GBS and pedigree data were incorporated into statistical models using either the genomic best linear unbiased predictors (GBLUP) or the reproducing kernel Hilbert spaces (RKHS) regressions, and prediction accuracy was quantified using cross-validation methods. The following results were found: relative to pedigree or marker-only models, there were consistent gains in prediction accuracy by combining pedigree and GBS data; there was increased predictive ability when using imputed or nonimputed GBS data over inferred haplotype in experiment 1, or nonimputed GBS and information-based imputed short and long haplotypes, as compared to the other methods in experiment 2; the level of prediction accuracy achieved using GBS data in experiment 2 is comparable to those reported by previous authors who analyzed this data set using SNP arrays; and GBLUP and RKHS models with pedigree with nonimputed and imputed GBS data provided the best prediction correlations for the three traits in experiment 1, whereas for experiment 2 RKHS provided slightly better prediction than GBLUP for drought-stressed environments, and both models provided similar predictions in well-watered environments.
Intense structuring of plant breeding populations challenges the design of the training set (TS) in genomic selection (GS). An important open question is how the TS should be constructed from multiple related or unrelated small biparental families to predict progeny from individual crosses. Here, we used a set of five interconnected maize (Zea mays L.) populations of doubled-haploid (DH) lines derived from four parents to systematically investigate how the composition of the TS affects the prediction accuracy for lines from individual crosses. A total of 635 DH lines genotyped with 16,741 polymorphic SNPs were evaluated for five traits including Gibberella ear rot severity and three kernel yield component traits. The populations showed a genomic similarity pattern, which reflects the crossing scheme with a clear separation of full sibs, half sibs, and unrelated groups. Prediction accuracies within full-sib families of DH lines followed closely theoretical expectations, accounting for the influence of sample size and heritability of the trait. Prediction accuracies declined by 42% if full-sib DH lines were replaced by half-sib DH lines, but statistically significantly better results could be achieved if half-sib DH lines were available from both instead of only one parent of the validation population. Once both parents of the validation population were represented in the TS, including more crosses with a constant TS size did not increase accuracies. Unrelated crosses showing opposite linkage phases with the validation population resulted in negative or reduced prediction accuracies, if used alone or in combination with related families, respectively. We suggest identifying and excluding such crosses from the TS. Moreover, the observed variability among populations and traits suggests that these uncertainties must be taken into account in models optimizing the allocation of resources in GS.
Development of models to predict genotype by environment interactions, in unobserved environments, using environmental covariates, a crop model and genomic selection. Application to a large winter wheat dataset. Genotype by environment interaction (G*E) is one of the key issues when analyzing phenotypes. The use of environment data to model G*E has long been a subject of interest but is limited by the same problems as those addressed by genomic selection methods: a large number of correlated predictors each explaining a small amount of the total variance. In addition, non-linear responses of genotypes to stresses are expected to further complicate the analysis. Using a crop model to derive stress covariates from daily weather data for predicted crop development stages, we propose an extension of the factorial regression model to genomic selection. This model is further extended to the marker level, enabling the modeling of quantitative trait loci (QTL) by environment interaction (Q*E), on a genome-wide scale. A newly developed ensemble method, soft rule fit, was used to improve this model and capture non-linear responses of QTL to stresses. The method is tested using a large winter wheat dataset, representative of the type of data available in a large-scale commercial breeding program. Accuracy in predicting genotype performance in unobserved environments for which weather data were available increased by 11.1 % on average and the variability in prediction accuracy decreased by 10.8 %. By leveraging agronomic knowledge and the large historical datasets generated by breeding programs, this new model provides insight into the genetic architecture of genotype by environment interactions and could predict genotype performance based on past and future weather scenarios.
Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large proportion of missing data. Because marker imputation algorithms were developed for species with a reference genome, algorithms suited for unordered markers have not been rigorously evaluated. Using four empirical datasets, we evaluate and characterize four such imputation methods, referred to as k-nearest neighbors, singular value decomposition, random forest regression, and expectation maximization imputation, in terms of their imputation accuracies and the factors affecting accuracy. The effect of imputation method on the genomic selection accuracy is assessed in comparison with mean imputation. The effect of excluding markers with a large proportion of missing data on the genomic selection accuracy is also examined. Our results show that imputation of unordered markers can be accurate, especially when linkage disequilibrium between markers is high and genotyped individuals are related. Of the methods evaluated, random forest regression imputation produced superior accuracy. In comparison with mean imputation, all four imputation methods we evaluated led to greater genomic selection accuracies when the level of missing data was high. Including rather than excluding markers with a large proportion of missing data nearly always led to greater GS accuracies. We conclude that high levels of missing data in dense marker sets is not a major obstacle for genomic selection, even when marker order is not known.
A physically anchored consensus map is foundational to modern genomics research; however, construction of such a map in oat (Avena sativa L., 2n?=?6x?=?42) has been hindered by the size and complexity of the genome, the scarcity of robust molecular markers, and the lack of aneuploid stocks. Resources developed in this study include a modified SNP discovery method for complex genomes, a diverse set of oat SNP markers, and a novel chromosome-deficient SNP anchoring strategy. These resources were applied to build the first complete, physically-anchored consensus map of hexaploid oat. Approximately 11,000 high-confidence in silico SNPs were discovered based on nine million inter-varietal sequence reads of genomic and cDNA origin. GoldenGate genotyping of 3,072 SNP assays yielded 1,311 robust markers, of which 985 were mapped in 390 recombinant-inbred lines from six bi-parental mapping populations ranging in size from 49 to 97 progeny. The consensus map included 985 SNPs and 68 previously-published markers, resolving 21 linkage groups with a total map distance of 1,838.8 cM. Consensus linkage groups were assigned to 21 chromosomes using SNP deletion analysis of chromosome-deficient monosomic hybrid stocks. Alignments with sequenced genomes of rice and Brachypodium provide evidence for extensive conservation of genomic regions, and renewed encouragement for orthology-based genomic discovery in this important hexaploid species. These results also provide a framework for high-resolution genetic analysis in oat, and a model for marker development and map construction in other species with complex genomes and limited resources.
Genome-wide molecular markers are often being used to evaluate genetic diversity in germplasm collections and for making genomic selections in breeding programs. To accurately predict phenotypes and assay genetic diversity, molecular markers should assay a representative sample of the polymorphisms in the population under study. Ascertainment bias arises when marker data is not obtained from a random sample of the polymorphisms in the population of interest. Genotyping-by-sequencing (GBS) is rapidly emerging as a low-cost genotyping platform, even for the large, complex, and polyploid wheat (Triticum aestivum L.) genome. With GBS, marker discovery and genotyping occur simultaneously, resulting in minimal ascertainment bias. The previous platform of choice for whole-genome genotyping in many species such as wheat was DArT (Diversity Array Technology) and has formed the basis of most of our knowledge about cereals genetic diversity. This study compared GBS and DArT marker platforms for measuring genetic diversity and genomic selection (GS) accuracy in elite U.S. soft winter wheat. From a set of 365 breeding lines, 38,412 single nucleotide polymorphism GBS markers were discovered and genotyped. The GBS SNPs gave a higher GS accuracy than 1,544 DArT markers on the same lines, despite 43.9% missing data. Using a bootstrap approach, we observed significantly more clustering of markers and ascertainment bias with DArT relative to GBS. The minor allele frequency distribution of GBS markers had a deficit of rare variants compared to DArT markers. Despite the ascertainment bias of the DArT markers, GS accuracy for three traits out of four was not significantly different when an equal number of markers were used for each platform. This suggests that the gain in accuracy observed using GBS compared to DArT markers was mainly due to a large increase in the number of markers available for the analysis.
Many genome-wide association studies (GWAS) in humans are concluding that, even with very large sample sizes and high marker densities, most of the genetic basis of complex traits may remain unexplained. At the same time, recent research in plant GWAS is showing much greater success with fewer resources. Both GWAS and genomic selection (GS), a method for predicting phenotypes by the use of genome-wide marker data, are receiving considerable attention among plant breeders. In this review we explore how differences in population genetic histories, as well as past selection for traits of interest, have produced trait architectures and patterns of linkage disequilibrium (LD) that frequently differ dramatically between domesticated plants and humans, making detection of quantitative trait loci (QTL) effects in crops more rewarding and less costly than in humans.
Genome-wide association studies (GWAS) may benefit from utilizing haplotype information for making marker-phenotype associations. Several rationales for grouping single nucleotide polymorphisms (SNPs) into haplotype blocks exist, but any advantage may depend on such factors as genetic architecture of traits, patterns of linkage disequilibrium in the study population, and marker density. The objective of this study was to explore the utility of haplotypes for GWAS in barley (Hordeum vulgare) to offer a first detailed look at this approach for identifying agronomically important genes in crops. To accomplish this, we used genotype and phenotype data from the Barley Coordinated Agricultural Project and constructed haplotypes using three different methods. Marker-trait associations were tested by the efficient mixed-model association algorithm (EMMA). When QTL were simulated using single SNPs dropped from the marker dataset, a simple sliding window performed as well or better than single SNPs or the more sophisticated methods of blocking SNPs into haplotypes. Moreover, the haplotype analyses performed better 1) when QTL were simulated as polymorphisms that arose subsequent to marker variants, and 2) in analysis of empirical heading date data. These results demonstrate that the information content of haplotypes is dependent on the particular mutational and recombinational history of the QTL and nearby markers. Analysis of the empirical data also confirmed our intuition that the distribution of QTL alleles in nature is often unlike the distribution of marker variants, and hence utilizing haplotype information could capture associations that would elude single SNPs. We recommend routine use of both single SNP and haplotype markers for GWAS to take advantage of the full information content of the genotype data.
Simulation and empirical studies of genomic selection (GS) show accuracies sufficient to generate rapid gains in early selection cycles. Beyond those cycles, allele frequency changes, recombination, and inbreeding make analytical prediction of gain impossible. The impacts of GS on long-term gain should be studied prior to its implementation.
We intuitively believe that the dramatic drop in the cost of DNA marker information we have experienced should have immediate benefits in accelerating the delivery of crop varieties with improved yield, quality and biotic and abiotic stress tolerance. But these traits are complex and affected by many genes, each with small effect. Traditional marker-assisted selection has been ineffective for such traits. The introduction of genomic selection (GS), however, has shifted that paradigm. Rather than seeking to identify individual loci significantly associated with a trait, GS uses all marker data as predictors of performance and consequently delivers more accurate predictions. Selection can be based on GS predictions, potentially leading to more rapid and lower cost gains from breeding. The objectives of this article are to review essential aspects of GS and summarize the important take-home messages from recent theoretical, simulation and empirical studies. We then look forward and consider research needs surrounding methodological questions and the implications of GS for long-term selection.
Improvements in the usefulness of QTL analysis arise from better statistical methods applied to the problem, ability to analyze more complex mating designs, and the fitting of less simplified genetic models. Here we review the advantages of different plant mating designs in QTL analysis and conclude that diallel designs have several favorable properties. We then turn to the detection of systematic genome-wide synergistic epistasis. This form of epistasis has important implications from evolutionary (maintenance of sexual reproduction and concealment of cryptic genetic variation) and practical perspectives (response to pyramided favorable alleles). We develop two methods for detecting systematic synergistic epistasis, one based on analyzing interactions between locus effects and predicted individual genotypic values and one based on analyzing pairwise locus interactions. Using the first method we detect synergistic epistasis in a barley and a wheat dataset but not in a maize dataset. We fail to detect synergistic epistasis with the second method. We discuss our results in the light of theoretical questions concerning the mechanisms of synergistic epistasis.
We compared the accuracies of four genomic-selection prediction methods as affected by marker density, level of linkage disequilibrium (LD), quantitative trait locus (QTL) number, sample size, and level of replication in populations generated from multiple inbred lines. Marker data on 42 two-row spring barley inbred lines were used to simulate high and low LD populations from multiple inbred line crosses: the first included many small full-sib families and the second was derived from five generations of random mating. True breeding values (TBV) were simulated on the basis of 20 or 80 additive QTL. Methods used to derive genomic estimated breeding values (GEBV) were random regression best linear unbiased prediction (RR-BLUP), Bayes-B, a Bayesian shrinkage regression method, and BLUP from a mixed model analysis using a relationship matrix calculated from marker data. Using the best methods, accuracies of GEBV were comparable to accuracies from phenotype for predicting TBV without requiring the time and expense of field evaluation. We identified a trade-off between a methods ability to capture marker-QTL LD vs. marker-based relatedness of individuals. The Bayesian shrinkage regression method primarily captured LD, the BLUP methods captured relationships, while Bayes-B captured both. Under most of the study scenarios, mixed-model analysis using a marker-derived relationship matrix (BLUP) was more accurate than methods that directly estimated marker effects, suggesting that relationship information was more valuable than LD information. When markers were in strong LD with large-effect QTL, or when predictions were made on individuals several generations removed from the training data set, however, the ranking of method performance was reversed and BLUP had the lowest accuracy.
Barley geneticists are currently using association genetics to identify and fine map traits directly in elite plant breeding material. This has been made possible by the development of a highly parallel SNP assay platform that provides sufficient marker density for genome-wide scans and linkage disequilibrium-led gene identification. By leveraging the combined resources of the barley research and breeding sectors, marker-trait associations are being identified and a renewed interest has emerged in novel strategies for barley improvement. New database and visualization tools have been developed and statistical methods adapted from human genetics to account for complexities in the datasets. Exciting early results suggest that association genetics will assume a central role in establishing genotype-to-phenotype relationships.
Genomic discovery in oat and its application to oat improvement have been hindered by a lack of genetic markers common to different genetic maps, and by the difficulty of conducting whole-genome analysis using high-throughput markers. This study was intended to develop, characterize, and apply a large set of oat genetic markers based on Diversity Array Technology (DArT).
Association mapping can be a powerful tool for detecting quantitative trait loci (QTLs) without requiring line-crossing experiments. We previously proposed a Bayesian approach for simultaneously mapping multiple QTLs by a regression method that directly incorporates estimates of the population structure. In the present study, we extended our method to analyze ordinal and censored traits, since both types of traits are common in the evaluation of germplasm collections. Ordinal-probit and tobit models were employed to analyze ordinal and censored traits, respectively. In both models, we postulated the existence of a latent continuous variable associated with the observable data, and we used a Markov-chain Monte Carlo algorithm to sample the latent variable and determine the model parameters. We evaluated the efficiency of our approach by using simulated- and real-trait analyses of a rice germplasm collection. Simulation analyses based on real marker data showed that our models could reduce both false-positive and false-negative rates in detecting QTLs to reasonable levels. Simulation analyses based on highly polymorphic marker data, which were generated by coalescent simulations, showed that our models could be applied to genotype data based on highly polymorphic marker systems, like simple sequence repeats. For the real traits, we analyzed heading date as a censored trait and amylose content and the shape of milled rice grains as ordinal traits. We found significant markers that may be linked to previously reported QTLs. Our approach will be useful for whole-genome association mapping of ordinal and censored traits in rice germplasm collections.
Genomic prediction is expected to considerably increase genetic gains by increasing selection intensity and accelerating the breeding cycle. In this study, marker effects estimated in 255 diverse maize (Zea mays L.) hybrids were used to predict grain yield, anthesis date, and anthesis-silking interval within the diversity panel and testcross progenies of 30 F(2)-derived lines from each of five populations. Although up to 25% of the genetic variance could be explained by cross validation within the diversity panel, the prediction of testcross performance of F(2)-derived lines using marker effects estimated in the diversity panel was on average zero. Hybrids in the diversity panel could be grouped into eight breeding populations differing in mean performance. When performance was predicted separately for each breeding population on the basis of marker effects estimated in the other populations, predictive ability was low (i.e., 0.12 for grain yield). These results suggest that prediction resulted mostly from differences in mean performance of the breeding populations and less from the relationship between the training and validation sets or linkage disequilibrium with causal variants underlying the predicted traits. Potential uses for genomic prediction in maize hybrid breeding are discussed emphasizing the need of (1) a clear definition of the breeding scenario in which genomic prediction should be applied (i.e., prediction among or within populations), (2) a detailed analysis of the population structure before performing cross validation, and (3) larger training sets with strong genetic relationship to the validation set.
The additive relationship matrix plays an important role in mixed model prediction of breeding values. For genotype matrix X (loci in columns), the product XX is widely used as a realized relationship matrix, but the scaling of this matrix is ambiguous. Our first objective was to derive a proper scaling such that the mean diagonal element equals 1+f, where f is the inbreeding coefficient of the current population. The result is a formula involving the covariance matrix for sampling genomic loci, which must be estimated with markers. Our second objective was to investigate whether shrinkage estimation of this covariance matrix can improve the accuracy of breeding value (GEBV) predictions with low-density markers. Using an analytical formula for shrinkage intensity that is optimal with respect to mean-squared error, simulations revealed that shrinkage can significantly increase GEBV accuracy in unstructured populations, but only for phenotyped lines; there was no benefit for unphenotyped lines. The accuracy gain from shrinkage increased with heritability, but at high heritability (> 0.6) this benefit was irrelevant because phenotypic accuracy was comparable. These trends were confirmed in a commercial pig population with progeny-test-estimated breeding values. For an anonymous trait where phenotypic accuracy was 0.58, shrinkage increased the average GEBV accuracy from 0.56 to 0.62 (SE < 0.00) when using random sets of 384 markers from a 60K array. We conclude that when moderate-accuracy phenotypes and low-density markers are available for the candidates of genomic selection, shrinkage estimation of the relationship matrix can improve genetic gain.
Genetic correlations between quantitative traits measured in many breeding programs are pervasive. These correlations indicate that measurements of one trait carry information on other traits. Current single-trait (univariate) genomic selection does not take advantage of this information. Multivariate genomic selection on multiple traits could accomplish this but has been little explored and tested in practical breeding programs. In this study, three multivariate linear models (i.e., GBLUP, BayesA, and BayesC?) were presented and compared to univariate models using simulated and real quantitative traits controlled by different genetic architectures. We also extended BayesA with fixed hyperparameters to a full hierarchical model that estimated hyperparameters and BayesC? to impute missing phenotypes. We found that optimal marker-effect variance priors depended on the genetic architecture of the trait so that estimating them was beneficial. We showed that the prediction accuracy for a low-heritability trait could be significantly increased by multivariate genomic selection when a correlated high-heritability trait was available. Further, multiple-trait genomic selection had higher prediction accuracy than single-trait genomic selection when phenotypes are not available on all individuals and traits. Additional factors affecting the performance of multiple-trait genomic selection were explored.
Detection of quantitative trait loci (QTL) controlling complex traits followed by selection has become a common approach for selection in crop plants. The QTL are most often identified by linkage mapping using experimental F(2), backcross, advanced inbred, or doubled haploid families. An alternative approach for QTL detection are genome-wide association studies (GWAS) that use pre-existing lines such as those found in breeding programs. We explored the implementation of GWAS in oat (Avena sativa L.) to identify QTL affecting ?-glucan concentration, a soluble dietary fiber with several human health benefits when consumed as a whole grain. A total of 431 lines of worldwide origin were tested over 2 years and genotyped using Diversity Array Technology (DArT) markers. A mixed model approach was used where both population structure fixed effects and pair-wise kinship random effects were included. Various mixed models that differed with respect to population structure and kinship were tested for their ability to control for false positives. As expected, given the level of population structure previously described in oat, population structure did not play a large role in controlling for false positives. Three independent markers were significantly associated with ?-glucan concentration. Significant marker sequences were compared with rice and one of the three showed sequence homology to genes localized on rice chromosome seven adjacent to the CslF gene family, known to have ?-glucan synthase function. Results indicate that GWAS in oat can be a successful option for QTL detection, more so with future development of higher-density markers.
Advancements in next-generation sequencing technology have enabled whole genome re-sequencing in many species providing unprecedented discovery and characterization of molecular polymorphisms. There are limitations, however, to next-generation sequencing approaches for species with large complex genomes such as barley and wheat. Genotyping-by-sequencing (GBS) has been developed as a tool for association studies and genomics-assisted breeding in a range of species including those with complex genomes. GBS uses restriction enzymes for targeted complexity reduction followed by multiplex sequencing to produce high-quality polymorphism data at a relatively low per sample cost. Here we present a GBS approach for species that currently lack a reference genome sequence. We developed a novel two-enzyme GBS protocol and genotyped bi-parental barley and wheat populations to develop a genetically anchored reference map of identified SNPs and tags. We were able to map over 34,000 SNPs and 240,000 tags onto the Oregon Wolfe Barley reference map, and 20,000 SNPs and 367,000 tags on the Synthetic W9784 × Opata85 (SynOpDH) wheat reference map. To further evaluate GBS in wheat, we also constructed a de novo genetic map using only SNP markers from the GBS data. The GBS approach presented here provides a powerful method of developing high-density markers in species without a sequenced genome while providing valuable tools for anchoring and ordering physical maps and whole-genome shotgun sequence. Development of the sequenced reference genome(s) will in turn increase the utility of GBS data enabling physical mapping of genes and haplotype imputation of missing data. Finally, as a result of low per-sample costs, GBS will have broad application in genomics-assisted plant breeding programs.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.