We developed a cost-effective method to follow non-single nucleotide polymorphism allele dynamics that can easily be adapted to experimental evolution frozen archives. A triplet PCR technique was coupled with automated parallel capillary electrophoresis to quantify the relative frequency of an insertion allele over the course of experimental evolution.
Structural variants (SVs) (i.e., deletions, insertions, duplications, and inversions) are now known to play an important role in phenotypic variation, and consequently in processes such as disease determination or adaptation to a new environment. However, single-nucleotide variants receive much more attention than SVs, probably because they are easier to detect, and their phenotypic effects are easier to predict. The development of short- and long-read deep sequencing technologies have strongly improved the detection of SVs, but the quantification of their frequency from pooled sequencing (poolseq) data is still technically complex and expensive.
Here, we present a rather simple and inexpensive method, which allows researchers to follow the dynamics of SV allele frequency. As an example of application, we follow the frequency of an insertion sequence (IS) insertion in experimental evolution populations of bacteria. This method is based on the design of triplets of primers around the structural variant borders, such that the amplicons produced by amplification of the wild-type (WT) and derived alleles differ in size by at least 5%, and that their amplification efficiency is similar. The quantity of each amplicon is then determined by parallel capillary electrophoresis and normalized to a calibration curve. This method can be easily extended to the quantification of the frequency of other structural variants (deletions, duplications, and inversions) and to pool-seq approaches of natural populations, including within-patient pathogen populations.
Structural variants (SVs) are alterations of the genomic sequence, generally affecting 50 bp or more. The four categories of described SVs are large insertions, large deletions, inversions, and duplications. Until recently, more attention has been devoted to single-nucleotide variants (SNVs) than to structural variants, in terms of their phenotypic effects and their role as genetic determinants of disease, or their contribution to adaptation. This is probably because it is easier to both detect SNVs and predict their phenotypic effects. However, short- and long-read deep sequencing technologies have strongly improved the detection of SVs, at least in single individual or clonal genomes1. In parallel, their phenotypic effects have been better characterized, and many examples of their implication as genetic determinants of human disease2,3 or adaptation to a new environment4 have been documented.
Deletions and insertions, often due to mobile genetic element (MGE) insertions, are much more disruptive than single nucleotide polymorphisms (SNPs) and lead to frameshift mutations and protein structure modifications. Deletions and MGE insertions within genes almost always result in gene inactivation, and insertions into non-coding regions can lead to repression or constitutive expression of adjacent genes when insertion sequences (ISs) contain promoter or termination sequences5. While the knockout of essential genes leads to clear detrimental effects on bacterial fitness, the loss of non-essential genes is beneficial in some cases. Despite their inherent costs, duplications can also be advantageous, and participate in adaptation as they lead to a change in gene dosage; an increase in the activity of a specific protein can be advantageous depending on the conditions6.
Microbial experimental evolution populations are usually started with clones. This initial absence of genetic diversity, combined with the "closed environment" characteristic of test tubes, leads to a very limited potential of evolution by gene gain through horizontal gene transfer and recombination. In these specific conditions, the contribution to adaptation of deletions, duplications, and intragenomic MGE insertion is particularly important; bacteria often adapt through loss-of-function mutations (mainly due to deletions or MGE insertions), affecting genes that are not useful in stable, often nutrient-rich, monoculture artificial environments7. In the longest running E. coli evolution experiment, IS150 insertions are particularly frequent amongst populations evolved after 50,000 generations, with IS elements representing 35% of mutations that reach high frequency in populations that retain their ancestral point mutation rate8.
Evolve and resequence studies couple experimental evolution and next-generation sequencing (NGS) technologies to investigate how bacteria adapt, at the phenotypic and genomic levels, to different environmental conditions and stresses, such as different carbon and energy sources, antibiotics, and osmotic stress9,10,11. These studies typically obtain genomic information on the evolved populations or clones solely at the experimental end point, and in some cases, at a number of intermediate time points12,13,14. These data provide insight into genes and pathways involved in the adaptation to a given environment, but rarely allow researchers to follow the dynamics of de novo emerging and sweeping alleles over time.
One approach to follow these dynamics is to choose a limited number of segregating alleles of interest (because of the function of the genes they affect, because they sweep in parallel in independent populations, etc.) and use amplicon sequencing to quantify the allele proportion, pooling many time points in the same sequencing run15. This method has been successfully used to follow the dynamics of small size variants (SNPs or 1 bp indels) in experimental16 and natural17 populations of microbes. However, in the case of larger indels or MGE insertions, the size difference of the amplicons induces PCR efficiency differences, which distort the relationship between read and allele proportions. In certain cases, the size difference between the two alleles is superior to the classical length of the amplicon. Here, we coupled a triplet PCR technique with automated parallel capillary electrophoresis to quantify the relative frequency of an insertion allele based on size discrimination. This approach allows the exploitation of underused experimental time points to determine the dynamics of an emerging mutant allele and to follow its frequency to fixation or loss, in a cost-effective manner. We applied this method to track emerging mutS– alleles, mutated through an IS10 insertion, providing the mutated genotype with a hypermutator phenotype.
This method requires two target alleles with a ≥5% difference in size. First, primer triplets are designed to produce similarly sized fragments, which share a common primer. Second, PCR conditions are optimized, and a calibration curve is produced using mixes of wild-type (WT) and mutant gDNA. Lastly, samples are amplified by PCR, and the relative frequency of each allele is quantified by parallel quantitative capillary electrophoresis.
Setting up this protocol requires precise knowledge of the insertion, deletion, inversion, or duplication point within the ancestral sequence. This information is usually obtained by whole-genome sequencing (WGS) of the end or intermediate point samples. In the following protocol, the general principle for the case of an insertion mutation is given for each step, alongside a representative case where the frequency of an IS10 insertion in the mutS gene in an experimental evolution population of E. coli is followed. In this population, WGS of the endpoint population identified the insertion of a 1,329 bp IS10 between positions 2,463 and 2,471, resulting in the duplication of this insertion site. This method is applicable to the three other SV types, and the specificities of each case are given in the discussion.
1. Design of triplet primers
Figure 1: Schema of triplet primer design on mutS WT gene and mutant mutS IS10 insertion. The black triangle represents the IS10 insertion site in the mutS gene. The WT gene is in blue, and the IS10 is in orange. Primers FW1 and RV1 flag the IS10 insertion site and produce a 155 bp WT amplicon. The RV1 primer and the intra-IS10 primer FW2 produce a second 226 bp amplicon. Please click here to view a larger version of this figure.
2. Optimization of PCR conditions
3. Calibration curve
4. Sample preparation
5. Allele quantification
Using DNA extracted from an ancestral clone and a hypermutator clone isolated from the S2.11 population at generation 1,000, we established the calibration curve shown in Figure 2. The actual mutant proportions from laboratory-prepared DNA mixes and measured by the parallel capillary electrophoresis instrument were linked by a linear relationship of slope 1.0706, with an R2 of 0.9705. Additionally, there was a good agreement between biological replicates; the standard deviation was between 0.61 and 17.74 across the nine points of the standard curve.
Figure 2: Calibration standard curve. Observed versus expected proportion of WT/mutant DNA mix. Error bars represent the standard error between biological replicates. Please click here to view a larger version of this figure.
DNA was extracted from 19 time points ranging from generation 356 to generation 990, amplified and quantified by parallel capillary electrophoresis. Prosize software was used to identify and quantify each amplicon in each sample. These results were converted to allele proportion using the calibration curve.
The results shown in Figure 3 reveal a nonmonotonic trajectory of the hypermutator-inducing mutS allele. Generation 680 (day 102) was the first time point at which the mutant allele was in sufficient quantity to be detected by parallel capillary electrophoresis. The mutant allele then increased rapidly in frequency and reached 66.7% at generation 713 (33 generations later). The mutant allele frequency then stagnated and was 76% by generation 766, despite having previously increased rapidly in frequency. Unexpectedly, the mutant allele frequency then reduced from 76% to 49% over the course of 13 generations (2 days), after which it increased to fixation.
Figure 3: Dynamics of ancestral (WT) and derived (mutS::IS10) alleles over the course of a 1,000-generation experimental evolution. The ancestral allele frequency is in gray, and the derived allele frequency is in blue. Please click here to view a larger version of this figure.
Figure 4: Primer design and amplicon position for quantification of the four types of structural variants. Amplified fragments are depicted in black and white. Arrows represent primers at their hybridization site; dotted arrows represent primers hybridizing but not participating in any fragment amplification, either because there is no paired primer (duplication and inversion) or the elongation time has been reduced to avoid the amplification of large amplicons (insertion and deletion). Please click here to view a larger version of this figure.
Here, we have proposed a cost-effective method that allows the dynamics of emerging adaptive SV alleles in experimental evolution populations to be followed. This method couples classic PCR techniques and automated parallel capillary electrophoresis, allowing for the relative quantities of two alleles to be determined. Once set up, it permits the quantification of allele proportions in many samples in parallel, and is much less expensive than WGS. This method can be seen as an equivalent to amplicon sequencing for non-SNP mutations and as a solution to the technical limitations of amplicon sequencing of large SVs. We have shown that the method accurately quantifies the proportion of non-SNP mutants, using a calibration curve that overcomes PCR bias (Figure 2).
Characterizing SV allele dynamics can be useful to identify linked mutations through their parallel temporal dynamics, or to calculate selection coefficients and identify their variations across time. In the example of application we have presented here, we identified, by end-point WGS, that hypermutators were segregating or fixed in various experimental evolution E. coli populations18. These populations were considered hypermutator as they had disruptive mutations (IS insertion) in one of the methyl mismatch repair (MMR) genes and had accumulated many more mutations over the 1,000 generations of experimental evolution than the ones not carrying mutations in the MMR genes (24.38 ± 1.75 vs. 126.7 ± 19.57). Mutations in MMR genes are known to lead to an increased mutation rate, allowing bacteria to sample many more mutations than their baseline mutation rates permit12,13. However, a large increase in mutation rate also translates to a faster accumulation of deleterious mutations and a greater mutational load14,15. Mutators have been predicted to be transiently advantageous in populations adapting to new environments, as the proportion of beneficial mutations is larger in this situation. The beneficial genetic backgrounds generated under high mutations rate allows mutators to propagate through a population by hitchhiking on the beneficial mutations they generate19. Temporal dynamics of hypermutator propagation have been investigated by simulations, and with the method we propose here, it is possible to follow the dynamics of IS-mediated mutator alleles in experimental populations19.
In our representative results, the SV is an IS insertion, but the method described here can be easily extended to the other types of SV (see Figure 4). The triplet primer design follows exactly the same logic for deletion quantification as for insertion quantification: two primers sit around the deletion point and the third primer sits within the deleted sequence. For inversions and duplications, two of the primers sit around one of the breakpoints and the third one sits inside the inverted/duplicated fragment, close to the other breakpoint and directed toward the outside of the fragment. For insertions, deletions, and inversions, each amplicon is specific to one of the alleles, ancestral or derived. For duplications, one of the amplicons is common to the two alleles, and the other one is specific to the derived allele. This has to be considered when calculating the proportion of alleles from the calibration curve.
Finally, we have developed this method to follow an SV in experimental evolution populations of bacteria, for which "metagenomic" sequencing at the population level is common practice. However, this method can also be applied to experimental evolution populations of multicellular organism in a pool-seq approach, or to intra-host viral or microbial pathogen sampling17.
This method has some limitations. First, the alleles being compared must differ in size by at least 5%. This prevents us from following the dynamics of alleles produced by very small insertions or deletions. Fortunately, these allele dynamics may be followed by amplicon sequencing. Second, the sequence of the derived allele must be known to allow for the design of the primers around the insertion or deletion or breakpoints. Third, primer design and optimization are necessary for each site. Thus, this method is appropriate for following a limited number of target allele pairs of interest. If many alleles are to be followed in the same population, we suggest NGS of multiple time points.
The authors have nothing to disclose.
This work was supported by the ERC HGTCODONUSE (ERC-2015-CoG-682819) to S.B. Data used in this work were (partly) produced through the GenSeq technical facilities of the Institut des Sciences de l'Evolution de Montpellier with the support of LabEx CeMEB, an ANR "Investissements d'avenir" program (ANR-10-LABX-04-01).
96 Well Skirted PCR Plate | 4titude | 4Ti – 0740 | PCR |
Agarose molecular biology grade | Eurogentec | EP-0010-05 | Agarose gel electrophoresis |
Agilent DNF-474 HS NGS Fragment Kit Quick Guide for the Fragment Analyzer Systems | Agilent | PDF instruction guide | |
Buffer TBE | Panreac appliChem | A4228,5000Pc | Agarose gel electrophoresis |
Calibrated Disposable Inoculating Loops and Needles | LABELIANS | 8175CSR40H | Bacterial culture |
Dneasy Blood and Tissue Kit | Qiagen | 69506 | DNA extraction |
Electrophoresis power supply | Amilabo | ST606T | Agarose gel electrophoresis |
Fragment Analyzer Automated CE System | Agilent | Parallel capillary electrophoresis | |
Fragment DNA Ladder | Agilent | DNF-396, range 1-6000bp | Parallel capillary electrophoresis |
GENTAMICIN SULFATE SALT BIOREAGENT | Sigma-Aldrich | G1264-1G | Bacterial culture |
High Sensitivity diluent marker | Agilent | DNF-373 | Parallel capillary electrophoresis |
High Sensitivity NGS quantitative analysis kit | Agilent | DNF-474 | Parallel capillary electrophoresis |
Ladder quick load 1 kb plus DNA ladder | NEB | N0469S | Agarose gel electrophoresis |
LB Broth, VegitoneNutriSelect Plus | Millipore | 28713 | Bacterial culture |
Master Mix PCR High Fidelity Phusion Flash | Thermo Fisher Scientific | F548L | PCR |
Primers | Eurogentec | PCR | |
Prosize data analysis software v.4 | Agilent | V.4 | Parallel capillary electrophoresis |
Qubit assays | Invitrogen | MAN0010876 | DNA quantification |
Qubit dsDNA HS Assay Kit | LIFE TECHNOLOGIES SAS | Q32854 | DNA quantification |
Thermocycler | Eppendorf | Ep gradients | PCR |
UVbox, eBOX VX5 | Vilber Lourmat | Agarose gel electrophoresis visualisation | |
Water for injectable preparation | Aguettant | PROAMP | PCR |