AAV peptide display library generation and subsequent validation through the barcoding of candidates with novel properties for the creation of next-generation AAVs.
Gene delivery vectors derived from Adeno-associated virus (AAV) are one of the most promising tools for the treatment of genetic diseases, evidenced by encouraging clinical data and the approval of several AAV gene therapies. Two major reasons for the success of AAV vectors are (i) the prior isolation of various naturally occurring viral serotypes with distinct properties, and (ii) the subsequent establishment of powerful technologies for their molecular engineering and repurposing in high throughput. Further boosting the potential of these techniques are recently implemented strategies for barcoding selected AAV capsids on the DNA and RNA level, permitting their comprehensive and parallel in vivo stratification in all major organs and cell types in a single animal. Here, we present a basic pipeline encompassing this set of complementary avenues, using AAV peptide display to represent the diverse arsenal of available capsid engineering technologies. Accordingly, we first describe the pivotal steps for the generation of an AAV peptide display library for the in vivo selection of candidates with desired properties, followed by a demonstration of how to barcode the most interesting capsid variants for secondary in vivo screening. Next, we exemplify the methodology for the creation of libraries for next-generation sequencing (NGS), including barcode amplification and adaptor ligation, before concluding with an overview of the most critical steps during NGS data analysis. As the protocols reported here are versatile and adaptable, researchers can easily harness them to enrich the optimal AAV capsid variants in their favorite disease model and for gene therapy applications.
Gene transfer therapy is the introduction of genetic material in cells to repair, replace, or alter the cellular genetic material to prevent, treat, cure, or ameliorate disease. Gene transfer, both in vivo and ex vivo, relies on different delivery systems, non-viral and viral. Viruses have evolved naturally to efficiently transduce their target cells and can be used as delivery vectors. Amongst the different types of viral vectors employed in gene therapy, adeno-associated viruses have been increasingly used, owing to their lack of pathogenicity, safety, low immunogenicity, and most importantly their ability to sustain long-term, non-integrating expression1,2,3. AAV gene therapy has yielded considerable achievements over the past decade; three therapies have been approved by the European Medicines Agency and the US Food and Drug Administration for use in humans3,4. Several clinical trials are also underway to treat a variety of diseases, such as hemophilia, muscular, cardiac, and neurological diseases, as reviewed elsewhere3. Despite decades of advancement, the field of gene therapy has experienced a series of setbacks in recent years4, most importantly deaths in clinical trials5 that have been put on hold due to dose-limiting toxicities, particularly for tissues that are massive, such as muscle, or difficult to reach, such as brain6.
The AAV vectors currently being used in clinical trials belong to the natural serotypes with a few exceptions1. AAV engineering offers the opportunity to develop vectors with superior organ- or cell-specificity and efficiency. In the past two decades, several approaches have been successfully applied, such as peptide display, loop-swap, capsid DNA shuffling, error-prone PCR, and targeted design, to generate individual AAV variants or libraries thereof with diverse properties7. These are then subjected to multiple rounds of directed evolution to select the variants within them with the desired properties, as reviewed elsewhere1,3. Of all the capsid evolution strategies, peptide display AAV libraries have been the most widely used, due to some unique properties: they are relatively easy to generate, and they can achieve high diversity and high-throughput sequencing, which allows trailing their evolution.
The first successful peptide insertion AAV libraries were described almost 20 years ago. In one of the first, Perabo et al.8 constructed a library of modified AAV2 capsids, in which a pool of randomly generated oligonucleotides was inserted in a plasmid at a position that corresponds to amino-acid 587 of the VP1 capsid protein, in the three-fold axis protruding from the capsid. Using adenovirus co-infection, the AAV library was evolved through multiple rounds of selection, and the final re-targeted variants were shown to be capable of transducing cell lines refractory to the parental AAV28. Shortly thereafter, Müller et al.9 introduced the two-step system for library production, a significant improvement to the protocol. Initially, the plasmid library, together with an adenoviral helper plasmid, are used to produce an AAV library that contains chimeric capsids. This AAV shuttle library is used to infect cells at low multiplicity of infection (MOI), with the aim to introduce one viral genome per cell. Co-infection with adenovirus ensures the production of AAVs with a matching genome and capsid9. About a decade later, Dalkara10 used in vivo directed evolution to create the 7m8 variant. This variant has a 10 amino-acid insertion (LALGETTRPA), three of which act as linkers, and efficiently targets the outer retina after intravitreal injection10. This engineered capsid is an exceptional success story, as it is one of the few engineered capsids to make it to the clinic thus far11.
The field experienced a second boost with the introduction of next-generation sequencing (NGS) techniques. Two publications from Adachi et al.12 in 2014 and from Marsic et al.13 in 2015, showcased the power of NGS to track the distribution of barcoded AAV capsid libraries with high accuracy. A few years later, the NGS of barcoded regions was adapted to the peptide insertion region to follow the capsid evolution. Körbelin et al.14 performed an NGS-guided screening to identify a pulmonary-targeted AAV2-based capsid. The NGS analysis helped calculate three rating scores: the enrichment score between selection rounds, the general specificity score to determine tissue specificity, and finally the combined score14. The Gradinaru lab15 published the Cre-recombination-based AAV targeted evolution (CREATE) system in the same year, which facilitates a cell-type-specific selection. In this system, the capsid library carries a Cre-invertible switch, as the polyA signal is flanked by two loxP sites. The AAV library is then injected in Cre mice, where the polyA signal is inverted only in Cre+ cells, providing the template for binding of a reverse PCR primer with the forward primer within the capsid gene. This highly specific PCR rescue enabled the identification of the AAV-PHP.B variant that can cross the blood-brain barrier15. This system was further evolved into M-CREATE (Multiplexed-CREATE), in which NGS and synthetic library generation were integrated in the pipeline16.
An improved RNA-based version of this system from the Maguire lab17, iTransduce, allows selection on the DNA level of capsids that functionally transduce cells and express their genomes. The viral genome of the peptide display library comprises a Cre gene under the control of a ubiquitous promoter and the capsid gene under the control of the p41 promoter. The library is injected in mice that have a loxP-STOP-loxP cassette upstream of tdTomato. Cells transduced with AAV variants that express the viral genome and therefore Cre express tdTomato and, in combination with cell markers, can be sorted and selected17. Similarly, Nonnenmacher et al.18 and Tabebordbar et al.19 placed the capsid gene library under the control of tissue-specific promoters. After injection in different animal models, viral RNA was used to isolate the capsid variants.
An alternative approach is to use barcoding to tag capsid libraries. The Björklund lab20 used this approach to barcode peptide insertion capsid libraries and developed the barcoded rational AAV vector evolution (BRAVE). In one plasmid, the Rep2Cap cassette is cloned next to an inverted terminal repeats (ITR)-flanked, yellow fluorescent protein (YFP)-expressing, barcode-tagged transgene. Using loxP sites between the end of cap and the beginning of the barcode, an in vitro Cre recombination generates a fragment small enough for NGS, thereby allowing the association of peptide insertion with the unique barcode (look-up table, LUT). AAV production is performed using the plasmid library and the barcodes expressed in the mRNA are screened after in vivo application, again with NGS20. When the capsid libraries comprise variants of the whole capsid gene (i.e., shuffled libraries), long-read sequencing needs to be used. Several groups have used barcodes to tag these diverse libraries, which enables NGS with higher read depth. The Kay lab21 tagged highly diverse capsid shuffled libraries with barcodes downstream of the cap polyA signal. In a first step, a barcoded plasmid library was generated, and the shuffled capsid gene library was cloned into it. Then a combination of MiSeq (short read, higher read depth) and PacBio (long read, lower read depth) NGS as well as Sanger sequencing was used to generate their LUT21. In 2019, Ogden and colleagues from the Church lab22 delineated the AAV2 capsid fitness for multiple functions using libraries that had single point mutations, insertions, and deletions in every position, which ultimately enabled machine-guided design. For the generation of the library, smaller fragments of the capsid gene were synthesized, tagged with a barcode, next-generation sequenced, and then cloned into the full capsid gene. The NGS data were used to generate a LUT. The library was then screened using just the barcodes and short read sequencing, which in turn allows higher read depth22.
Barcoded libraries have been predominantly used to screen a pool of known, natural, and engineered variants following several rounds of selection of capsid libraries or independent of a capsid evolution study. The advantage of such libraries is the opportunity to screen multiple capsids, whilst reducing animal numbers and minimizing variation between animals. The first studies that introduced this technology to the AAV field were published almost a decade ago. The Nakai lab12 tagged 191 double alanine mutants covering amino acids 356 to 736 on the VP1 from AAV9 with a pair of 12-nucleotide barcodes. Using NGS, the library was screened in vivo for galactose binding and other properties12. Marsic and colleagues delineated the biodistribution of AAV variants using also a double-barcorded analysis 1 year later13. A more recent study in non-human primates compared the biodistribution in the central nervous system of 29 capsids using different routes of delivery23. Our lab has recently published barcoded AAV library screens of 183 variants that included natural and engineered AAVs. These screens on the DNA and RNA level led to the identification of a highly myotropic AAV variant24 in mice as well as others displaying a high cell-type specificity in the mouse brain25.
Here, we describe the methodology used in this work and expand on it to include screening of AAV peptide display libraries. This comprises the generation of AAV2 peptide display libraries, a digital droplet PCR (dd-PCR) method for quantification, and finally an NGS pipeline to analyze the AAV variants, based in part on the work by Weinmann and colleagues24. Finally, a description of the generation of barcoded AAV libraries and the NGS pipeline used in the same publication is provided.
1. AAV2 random 7-mer peptide display library preparation
NOTE: For the preparation of an AAV2 random peptide display library, synthesize the degenerate oligonucleotides as single-stranded DNA, convert it to double-stranded DNA, digest, ligate to the acceptor plasmid, and electroporate.
2. AAV2 random 7-mer peptide display library selection
3. Barcoded AAV capsid library preparation and analysis
NOTE: Following the identification of a set of potentially specific and efficient AAV capsids in the peptide display screen, verify the functionality of the identified peptide sequences and compare them with a set of commonly used or well-described reference AAV capsid variants. To do this, the capsid sequence is inserted into a Rep/Cap helper construct without ITRs.
Generation of an AAV2 peptide display library. As a first step toward the selection of engineered AAVs, the generation of a plasmid library is described. The peptide insert is produced by using degenerate primers. Reducing the combination of codons in those from 64 to 20 has the advantages of eliminating stop codons and facilitating NGS analysis, by reducing library diversity on the DNA but not the protein level. The oligonucleotide insert is purchased as single-stranded DNA (Figure 1), which is converted to double-stranded DNA using a PCR reaction. The quality of this reaction is controlled in a Bioanalyzer. As shown in Figure 2, three cycles produced a stronger band, compared to 10 or 30 cycles. The insert is then digested with BglI to generate three-nucleotide overhangs. The nucleotide next to the overhang sequence of the double-stranded insert can be an A or a T (W is the ambiguity code for A or T), which is in the third position of a codon that encodes arginine (R) or serine (S). The vector (pRep2Cap2_PIS) has a frameshift mutation in the peptide insertion site that prevents production of the capsid in the absence of an insert, due to the creation of a stop codon shortly after the insertion site. SfiI digestion of this plasmid generates three-nucleotide overhangs that match the overhangs generated in the peptide-encoding oligonucleotide insert. Ligation needs to be performed under optimal conditions to maximize the complexity of the plasmid library. To this end, for the transformation, electroporation was performed using commercially available bacteria with high efficiency.
The diversity of the plasmid library is calculated based on the colony count, which is typically around 1 x 108 for this type of library. The total number of colonies corresponds to the maximum potential diversity of the library at the NGS analysis, as is discussed later. The plasmid library is then used to generate the AAV library, which is not described in detail here but elsewhere27.
The quantification of this library was performed using dd-PCR. Typically, two regions are quantified, the AAV2 rep gene within the viral genome and the ITR (see Figure 3 and Table 1). As shown in Table 1, from the droplets positive for the viral genome, 99.2% are also positive for ITR (Ch1+ Ch2+), which is a quality control for the AAV library and suggests the AAV capsids contain full viral genomes. To obtain a concentration in vg/mL, the proportion of double-positive droplets to positives is calculated and used to get the correct number of copies before amplification by the dilution factor.
The quality of the AAV library is then assessed by NGS analysis, starting with a PCR using appropriate primers. Next, the PCR product is processed using a commercially available kit, which adds index-containing adaptors to the PCR product. The NGS products are sequenced, and the files are analyzed using Python. Three sample data from an AAV2 peptide display library are provided. Each sequence in the Script#1 input file (list of sequences of all the PCR copies of the amplified DNA segment in the sample) is bioinformatically searched for the BCVleft and BCVright sequences, or the BCVleft_comp and BCVright_comp sequences. If either combination is identified, the contained sequence is extracted and added to the output file (see Figure 4). The output of both scripts provides statistical data regarding the NGS library preparation. In all three sets, the extracted reads, based on signature sequences specific to the library, represented about 94% of the total reads, which suggests a good quality. The output of Script#2 provides further statistical data, and the translation of the extracted DNA sequence yields additional quality control data. The "# of Invalid PV reads" (i.e., sequences that lack the six nucleotides used to initiate the in-silico translation and encode residues RG or SG) is less than 1% of the recovered reads, which confirms a good sequencing quality. The outputs of the second script (i.e., the translation and ranking of the extracted DNA sequences) provide additional information, such as the number of reads per peptide variant or the number of DNA sequences giving rise to each peptide variant. Of the files, those that end in "analyzed_PVs" contain only valid DNA reads, and the analysis is performed on the peptide sequence level. Of the valid reads, more than 99% are unique, which suggests that the library is balanced, and the internal diversity is high.
Selection of an AAV2 peptide display library. This library can then be used for in vivo or in vitro selection. This is not included in this protocol, but an outline is provided in Figure 5. Briefly, for in vivo selection, the tissues are collected 1 week following systemic injection of 1 x 1012 vg/mouse and DNA is isolated. A rescue PCR on a larger fragment of the cap gene is performed, and the product is cloned into the plasmid vector using different restriction sites but a similar protocol to what is described here, and round-1 AAV libraries are prepared. For the NGS analysis, the PCR is performed on the isolated DNA or the AAV libraries. After selection, the percentage of unique PVs in the library typically decreases according to the pressure of selection. Depending on the project, the selection can be concluded once enough dominant PVs are identified by NGS.
Barcoded library selection and analysis. The data from a barcoded AAV library comprising 82 capsids has been described previously24. The dd-PCR quantification of the AAV particles (see Figure 7 and Table 1) shows that 94% of the capsids that are positive for the transgene also contain an ITR, indicative of a complete genome. This is lower than the wild-type library described above, but still suggests a good quality for recombinant AAVs, considering that they usually package less efficiently. After injection of the pooled library in animals, the tissues are collected, and DNA and RNA are isolated. PCR for NGS as well as NGS library preparation and sequencing are then performed as described above and previously24. To calculate vg/dg, the qPCR is performed and, in the sample data provided, the values range from 0.1-4. As is typical for systemic AAV delivery, the liver has over 10 vg/dg.
As part of the analysis pipeline, normalization steps are performed on different levels. In the input pooled library, the capsid variants are typically not equally represented. Therefore, the NGS analysis of the input library is used to generate a normalization file, which corrects the abundance of each capsid variant in the final tissue/organ based on the abundance of this capsid variant in the input library. The biodistribution of the pooled library members by NGS is performed on the DNA and RNA level. This normalization for the input library yields the quotient (P*αβ) of proportion within the tissue/organ (Pαβ) to the proportion in the input library (La). These calculations can be found in the "variant_comparison.txt" file. This quotient is then multiplied by the vg/dg values from the "Normalization_organ.txt" file to yield the value Βαβ, and the proportions of Βαβ values are calculated for a single tissue or for all. The proportion of the Βαβ value for each variant within one tissue (Vαβ) reflects the dissemination of this variant within this tissue ("organ_comparison.txt"). In contrast, the proportion of the Βαβ value for each variant in all tissues (Tαβ) indicates the dissemination of this variant within the whole body ("relativeconcetrations.xls"). These two proportions reflect the intra- and inter-tissue biodistribution of each variant. All these files can be used for different visualizations of capsid efficiency and specificity24. As an example, using the final table (found in "relativeconcetrations.xls"), the principal component analysis and hierarchical clustering is presented in Figure 9.
The normalization of the pooled library from the NGS sequencing shows that each capsid has an average proportion of 0.012, which also matches the theoretical proportion of each of the 82 capsids and suggests a well-balanced pooled library of 0.012 (1/82). The file "relativeconcentration.xls" generated by the bioinformatic pipeline reflects the inter-tissue capsid biodistribution, as illustrated in Figure 9. The heatmap shows relative concentration values on a log2 scale for each capsid of the pooled library hierarchically clustered according to the tissue biodistribution profile. The principal component analysis allows to distinguish clusters of AAV capsid variants with similar biodistribution properties and also highlights the outlying capsids with unique patterns of inter-tissue biodistribution. The two major branches of the heatmap hierarchy reflect the difference in the transduction efficiency of capsid variants. The left branch with the majority of the capsid variants includes all the capsids, which show a high value of relative concentration across the majority of tissues. Apart from strikingly high liver specificity, three other capsids (Var60, Var13, and Var63) exhibited specificity in the diaphragm (Di), skeletal muscle (SM), biceps (BlC), and in the brain (B). The right branch of the hierarchical clustering includes the capsid variants with overall lower transduction efficiency, which is pronounced in the duodenum (Du) and pancreas (P). The PCA for the original subset forms the cluster of capsid variants with high liver specificity (Var 64, 78, 65, 55, 56) and outlines the Var60 capsid with outstanding muscle tropism.
Figure 1: Overview of the cloning strategy for the AAV2 random 7-mer peptide display library.
The oligonucleotide with the random 7-mer peptide insertion sequence is flanked by sequences that contain the BglI digestion site and a binding site for the amplification reaction. The vector pRep2Cap2_PIS contains SfiI sites. The overhangs generated by the BglI and SflI digestion are complementary. Please click here to view a larger version of this figure.
Figure 2: Bioanalyzer quality control of oligonucleotide second-strand synthesis.
Second-strand synthesis of degenerate oligonucleotides is confirmed by analysis on a bioanalyzer. PCR reactions with three, 10, and 30 cycles of amplification is compared, showing that the most efficient is the three cycles of amplification. (A) Bioanalyzer data represented as gel image. (B-D) Plotted fragment lengths (in bp, x axis) versus fluorescence units (FU, y axis), compared to standard peaks, visible at 15 and 1500 bp. Red arrows indicate double-stranded oligonucleotides. Note that the highest FU value, representing the highest DNA concentration, of double-stranded oligonucleotides is observed after three cycles of amplification (red arrows). Please click here to view a larger version of this figure.
Figure 3: Titration of an AAV2 peptide display library using dd-PCR.
(A) Detection of rep2-positive droplets in channel 1 (FAM, Channel 1) for a non-template water control and a 1:106 diluted virus sample. (B) Detection of ITR-positive droplets in channel 2 (HEX, Channel 2). (C) Detection of droplets that are positive for both rep2 and ITR (highlighted in orange). Purple lines indicate thresholds for the detection of positive versus negative droplets. Please click here to view a larger version of this figure.
Figure 4: Overview of the DNA fragment used for NGS and settings for the python analysis.
The NGS PCR amplifies a 96-nucleotide region. The PCR fragment is used to generate the NGS library. For the bioinformatic analysis, recognition sequences right and left of the insertion site need to be given for both strands, as well as the distance from the beginning of the DNA fragment. Please click here to view a larger version of this figure.
Figure 5: Iterative selection of AAV libraries in vivo.
Mice are injected with the AAV library. Target ON- and OFF-tissues are collected 1 week later and subjected to NGS and analysis. The ON-target tissue is used to rescue the capsid gene, which is cloned into the parent vector. The selected AAV library is produced and used to repeat the aforementioned selection cycle. This figure was created with BioRender.com. Please click here to view a larger version of this figure.
Figure 6: Overview of the barcoded AAV library generation.
(A) Graphic representation of a self-complementary AAV genome bearing a CMV promoter-driven eyfp transgene flanked by ITRs. The 3' UTR contains a 15 nucleotide-long barcode (BC) located at the 3' UTR between the eyfp and the bovine growth hormone (BGH) polyadenylation signal. The BC enables capsid tracing at the DNA and mRNA level. (B) During AAV production, a unique barcoded genome is packaged by a sole variant of the cap gene, facilitating capsid identification. Please click here to view a larger version of this figure.
Figure 7: Titration of a barcoded AAV library using dd-PCR.
(A) Detection of YFP-positive droplets in channel 1 (FAM, Channel 1) for a non-template water control and a 1:106 diluted vector sample. (B) Detection of ITR-positive droplets in channel 2 (HEX, Channel 2). (C) Detection of droplets that are positive for both rep2 and ITR (highlighted in orange). Purple lines indicate thresholds for the detection of positive versus negative droplets. Please click here to view a larger version of this figure.
Figure 8: Overview of the DNA fragment used for NGS and settings for the Python analysis.
The NGS PCR amplifies a 113-nucleotide region. For the bioinformatic analysis, recognition sequences right and left of the barcode need to be given for both strands, as well as the distance from the beginning of the DNA fragment. Please click here to view a larger version of this figure.
Figure 9: Principal component analysis (PCA) and hierarchical cluster analysis.
(A) PCA of relative concentration values for 82 capsids across all tissues allows to define clusters of capsid variants with similar properties and variants with unique transduction patterns. (B) To better separate the highly populated cluster, the records of outlying unique variants were excluded from the matrix and PCA analysis was repeated. (C) Hierarchical cluster analysis allows to visually evaluate variant transduction profiles across tissues as a heatmap plot (Li = liver, Lu = lung, FatB = brown fat, H = heart, Di = diaphragm, SM = smooth muscle, Du = duodenum, P = pancreas, C = colon, BIC = biceps, O = ovaries, St = stomach, I = inner ear, K = kidney, Aa = abdominal aorta, At = thoracic aorta, B = brain, FatW = white fat, and S = spleen). Please click here to view a larger version of this figure.
Sample | Target | Copies/20 µL well | Positives | Ch1+ Ch2+ | CF | copies corrected | DF | VG/mL |
H2O | rep2 | 28 | 2 | 0 | 0 | 0 | 1.00E+06 | 5.60E+09 |
AAV2lib | rep2 | 90600 | 16396 | 16266 | 0.99 | 89882 | 1.00E+06 | 1.80E+13 |
H2O | YFP | 4 | 3 | 2 | 0.67 | 3 | 1.00E+06 | 8.00E+08 |
BCAAVlib | YFP | 34680 | 13229 | 12452 | 0.94 | 32643 | 1.00E+06 | 6.53E+12 |
Table 1: Titration results of an AAV2 peptide display library ("AAV2lib") and the barcoded EYFP vector library ("BCAAVlib").
In this protocol, the steps needed for peptide display AAV capsid engineering and for barcoded AAV library screening, as well as for bioinformatic analysis of library composition and capsid performance, are outlined. This protocol focuses on the steps that facilitate the bioinformatic analysis of these types of libraries, because most virology laboratories lag in programming skills to match their proficiency in molecular biology techniques. Both types of libraries have been extensively described in the literature, as outlined in the introduction, and can be reproduced with relative ease.
As a first step, the design of a peptide display library in position 588 in variable region VIII of AAV2 is outlined. This design (AAV2_Peptide(ii) in previous publication26) and the described cloning method can be easily adapted to other serotypes based on the information provided in a recent publication26. A critical step in the cloning pipeline is the ligation/transformation efficiency (using the ~1 x 108 colonies cut-off). It is recommended to add one ligation reaction with just vector. This helps in identifying the percentage of bacterial colonies with insert, which should be higher than 80%. A lower-than-expected efficiency, that is a number of bacterial colonies (calculating the percentage with insert) lower than the theoretical diversity of the oligonucleotide inserts, would negatively impact low abundance variants. Several improvements include longer digestion times and clean-up steps for the vector or ligation reaction, using commercial kits.
The next step is the quality control of the variant library using NGS of a PCR fragment that encompasses the oligonucleotide insertion site. The NGS is performed using an Illumina sequencing system. There are several alternatives, in which PCR products can be directly submitted without prior NGS library preparation. This is more suitable for small-scale experiments or when a high read depth is not required. The protocol reported here comprises NGS library preparation, including the addition of adaptors with Illumina indexes to the PCR products, using a commercially available kit. A common limitation with respect to NGS is that these PCR fragments have a low diversity, because the sequences flanking the high-diversity insertion site are identical in all variants, which in turn reduces sequencing efficiency. To address this, this kit adds any number of two to eight random nucleotides between the PCR fragment and the adaptor. Alternatively, the sample needs to be spiked with PhiX. The detailed Python pipeline is described to analyze AAV2 peptide display libraries. As a template, sample files extracted from the original NGS file of the AAV2 peptide display library are provided. This can be adapted to other serotypes with the instructions given. The output files of this analysis can be used in downstream analysis, such as the comparison of plasmid and AAV library30, amino-acid composition in each position30, calculation of the enrichment score between libraries after selection rounds14, or generation of sequence logos or graphs19. As far as the input library is concerned, the presence of a high percentage of unique variants is desired. However, some variants do not produce as well as others, which could lead to skewed distributions after production. A low variant diversity, or in other words the presence of dominant variants, could be attributed to low oligonucleotide insert quality or a high number of second-strand synthesis cycles (step 1.2). Furthermore, the amino-acid composition could be affected from production. Each amino acid should have a frequency of 5.00%. If the distribution varies greatly from this value, it is suggested to perform the same analysis on the plasmid library to identify potential biases30.
As the protocols for AAV library generation and the subsequent selection rounds in different animal and in vitro models have been extensively described in multiple publications and protocols27,30,31,32, here only the analysis of barcoded libraries of selected engineered variants and benchmarks is described. Of note, the library cloning after each round of selection can be performed by PCR isolation of the cap gene, as mentioned in the protocol and results sections. Extensive rounds of selection and PCR amplification can lead to the accruement of mutations or stop codons, that can be observed by NGS. Alternatively, enriched variants can be selected from the NGS data, and the oligonucleotides ordered and cloned the way it has been described for the generation of a peptide display library16,18. Finally, the protocol contains a brief description for the selection of libraries on the DNA level, 1 week after systemic injection. RNA-based selections are more stringent, as they also select for variants that traffic through infectious entry pathways, albeit are technically more challenging. It should be noted that RNA- or transgene-based selections (i.e., Cre) require a longer in vivo duration of about 3-4 weeks15,16,17,18,19,33. For DNA-based selections especially, it is critical to validate the selected variants against known natural and engineered serotypes on both the DNA- and RNA-level using a barcoded AAV library.
The second part of the protocol describes the generation and screening of barcoded AAV libraries using the pipeline previously developed24. Each AAV capsid in the pool contains the same transgene (eyfp under the control of a CMV promoter) with a distinct barcode between eyfp and the polyA signal. The barcodes used in this library can be found in the prior publication24. The design was based on the basic principle of Hamming distance (i.e., the barcode sequences need to be adequately different), so that sequencing errors do not lead to erroneous barcode assignments. As described in Lyons et al26, the chance of two errors occurring in reads between 25-100 nucleotides is very low. A Hamming distance of 4 means that two sequencing errors would be needed to assign a read as the wrong barcode. Reads with one error will be ignored during analysis, and in this case be categorized as "reads with unknown variants". In a relevant publication, guidelines and a Python script are provided to generate barcodes29 that can be used in the pipeline. For the identification of useful barcodes, another popular error-correcting code can be used as outlined in the publication by Buschmann and Bystrykh34, namely, the Levenshtein distance. This group also provides a software package for the R programming language35.
After AAV production, the pooled barcoded AAV library can be used for biodistribution studies in different models. This study outlines the pipeline using the 82-variant library from the previous publication24 and provides sample data for practice. This protocol can also be adapted based on each user's needs. The biodistribution analysis is based on the collection of different ON- and OFF-target tissues or cells, the extraction of DNA and RNA from them, PCR amplification of the barcode region for NGS sequencing, and the measurement of vg/dg on the DNA level. For the RNA, it would be best to calculate the ratio to the mRNA copy number of a reference gene. The reference gene of choice should be expressed similarly across different tissues36, such as RPP30 (Ribonuclease P/MRP Subunit P30)37 or Hprt (hypoxanthine phosphoribosyltransferase 1)36. However, this is arduous, so one may need to use several reference genes at the same time to normalize the RNA data. For this reason, the normalization to dg on the DNA level can be completed, which correlates roughly with the number of cells. This also points out the use of qPCR for this calculation, as previously described24, albeit dd-PCR is more precise and thus is preferred for future use, especially considering the progress in this field37. Last but not the least, basic molecular biology optimizations are extremely critical for the methodology described here. The PCR reactions need to be optimized to avoid introducing biases to the distribution of the libraries. The dd-PCR probes need to be designed to include intronic regions for DNA and inter-exonic regions for RNA. Good laboratory practice, such as physical compartmentalization of the steps, especially DNA and AAV production from library preparation, and proper disinfection is of paramount importance to avoid erroneous amplification and library contaminations.
Notably, the use of peptide display libraries and vector DNA/RNA barcoding to select for engineered capsids with novel tropisms or other clinically relevant properties only represents two examples of directed AAV capsid evolution technologies. They all have in common that they come with limitations and require further optimization to realize their full potential. For instance, the mere insertion of a peptide leaves a large proportion (~99%) of the underlying capsid sequence unchanged, whose properties including interaction with neutralizing anti-AAV antibodies might need to be modified further prior to human application38. Furthermore, the actual diversity of peptide display or other libraries is typically lower than the theoretical one, due to, for example, technical limitations during cloning or bacterial transformation31. Generally, there is also an active debate about the translational relevance of directed evolution in animal or in vitro models, fostered by increasing evidence for a possible species- or even strain-specific performance of synthetic AAV capsids38. Nonetheless, there is substantial hope that many or all these limitations will be overcome and that the current arsenal of technologies will be expanded to an even larger variety of disease models and become even more accessible to most research groups. In this respect, a particularly encouraging recent development is the use of barcoded AAV libraries such as those reported here to validate selected capsids no longer only on the organ, but now also on the cellular level, which can be achieved with novel technologies such as single-cell (sc) RNA sequencing39. The protocols presented here will facilitate the wider establishment of AAV evolution techniques and thus accelerate the development of novel capsids tailored to the needs of a plethora of research groups and human patients.
The authors have nothing to disclose.
D.G. greatly appreciates support by the German Research Foundation (DFG) through the DFG Collaborative Research Centers SFB1129 (Projektnummer 240245660) and TRR179 (Projektnummer 272983813), as well as by the German Center for Infection Research (DZIF, BMBF; TTU-HIV 04.819).
Amplification primer | ELLA Biotech (Munich, Germany) | – | Second-strand synthesis of oligonucleotide insert |
Agilent DNA 1000 Reagents | Agilent Technologies (Santa Clara, CA, USA) | 5067-1504 | DNA fragment validation |
Agilent 2100 Bioanalyzer System | Agilent Technologies (Santa Clara, CA, USA) | G2938C | DNA fragment validation |
AllPrep DNA/RNA Mini Kit | Qiagen (Venlo, Netherlands) | 80204 | DNA/RNA extraction |
Agilent DNA 1000 Reagents | Agilent Technologies (Santa Clara, CA, USA) | 5067-1504 | NGS Library preparation |
Agilent 2100 Bioanalyzer System | Agilent Technologies (Santa Clara, CA, USA) | G2938C | NGS Library preparation |
BC-seq fw: | IDT (San Joce, CA, CA, USA) | ATCACTCTCGGCATGGACGAGC | NGS Library preparation |
BC-seq rv: | IDT (San Joce, CA, CA, USA) | GGCTGGCAACTAGAAGGCACA | NGS Library preparation |
β-Mercaptoethanol | Millipore Sigma (Burlington, MA, USA) | 44-420-3250ML | DNA/RNA extraction |
BglI | New England Biolabs (Ipswich, MA, USA) | R0143 | Digestion of double-stranded insert |
C1000 Touch Thermal Cycler | Bio-Rad (Hercules, CA, USA) | 1851196 | dd-PCR cycler |
dNTPS | New England Biolabs (Ipswich, MA, USA) | N0447S | NGS Library preparation |
ddPCR Supermix for probes (no dUTP) | Bio-Rad (Hercules, CA, USA) | 1863024 | dd-PCR supermix |
Droplet Generation Oil for Probes | Bio-Rad (Hercules, CA, USA) | 1863005 | dd-PCR droplet generation oil |
DG8 Cartridges for QX100 / QX200 Droplet Generator | Bio-Rad (Hercules, CA, USA) | 1864008 | dd-PCR droplet generation cartridge |
DG8 Cartridge Holder | Bio-Rad (Hercules, CA, USA) | 1863051 | dd-PCR cartridge holder |
Droplet Generator DG8 Gasket | Bio-Rad (Hercules, CA, USA) | 1863009 | dd-PCR cover for cartridge |
ddPCR Plates 96-Well, Semi-Skirted | Bio-Rad (Hercules, CA, USA) | 12001925 | dd-PCR 96-well plate |
E.cloni 10G SUPREME Electrocompetent Cells | Lucigen (Middleton, WI, USA) | 60081-1 | Electrocompetent cells |
Electroporation cuvettes, 1mm | Biozym Scientific (Oldendorf, Germany) | 748050 | Electroporation |
GAPDH primer/probe mix | Thermo Fischer Scientific (Waltham, MA, USA) | Mm00186825_cn | Taqman qPCR primer |
Genepulser Xcell | Bio-Rad (Hercules, CA, USA) | 1652660 | Electroporation |
High-Capacity cDNA Reverse Transcription Kit | Applied Biosystems (Waltham, MA, USA) | 4368814 | cDNA reverse transcription |
ITR_fw | IDT (San Joce, CA, USA) | GGAACCCCTAGTGATGGAGTT (https://signagen.com/blog/2019/10/25/qpcr-primer-and-probe-sequences-for-raav-titration/) | dd-PCR primer |
ITR_rv | IDT (San Joce, CA, USA) | CGGCCTCAGTGAGCGA (https://signagen.com/blog/2019/10/25/qpcr-primer-and-probe-sequences-for-raav-titration/) | dd-PCR primer |
ITR_probe | IDT (San Joce, CA, USA) | HEX-CACTCCCTCTCTGCGCGCTCG-BHQ1 (https://signagen.com/blog/2019/10/25/qpcr-primer-and-probe-sequences-for-raav-titration/) | dd-PCR probe |
Illumina NextSeq 500 system | Illumina Inc (San Diego, CA, USA) | SY-415-1001 | NGS Library sequencing |
KAPA HiFi HotStart ReadyMix (2X)* | Roche AG (Basel, Switzerland) | KK2600 07958919001 | NGS sample prepration |
MagnaBot 96 Magnetic Separation Device | Promega GmbH (Madison, WI, USA) | V8151 | Sample prepration for NGS library |
NanoDrop 2000 spectrophotometer | Thermo Fischer Scientific (Waltham, MA, USA) | ND-2000 | Digestion of double-stranded insert |
NGS_frw | Sigma-Aldrich (Burlinght, MA, USA) | GTT CTG TAT CTA CCA ACC TC | NGS primer |
NGS_rev | Sigma-Aldrich (Burlinght, MA, USA) | CGC CTT GTG TGT TGA CAT C | NGS primer |
NextSeq 500/550 High Output Kit (75 cycles) | Illumina Inc (San Diego, CA, USA) | FC-404-2005 | NGS Library sequencing |
Ovation Library System for Low Complexity Samples Kit | NuGEN Technologies, Inc. (San Carlos, CA, USA) | 9092-256 | NGS Library preparation |
PX1 Plate Sealer | Bio-Rad (Hercules, CA, USA) | 1814000 | dd-PCR plate sealer |
Pierceable Foil Heat Seal | Bio-Rad (Hercules, CA, USA) | 1814040 | dd-PCR sealing foil |
Phusion High-Fidelity DNA-Polymerase | Thermo Fischer Scientific (Waltham, MA, USA) | F530S | Second-strand synthesis of oligonucleotide insert |
PEI MAX – Transfection Grade Linear Polyethylenimine Hydrochloride (MW 40,000) | Polysciences, Inc. (Warrington, PA, USA) | 24765-1G | AAV library preparation |
ProNex Size-Selective Purification System | Promega GmbH (Madison, WI, USA) | NG2002 | Sample prepration for NGS library |
Phusion Hot Start II Polymerase | Thermo Fischer Scientific (Waltham, MA, USA) | F549L | NGS Library preparation |
Proteinase K | Roche AG (Basel, Switzerland) | 5963117103 | DNA/RNA extraction |
pRep2Cap2_PIS | ITR-Rep2Cap2-ITR vector. Peptide insertion site within the Cap2 ORF, manufactured/prepared in the lab | ||
QX200 Droplet Generator | Bio-Rad (Hercules, CA, USA) | 1864002 | dd-PCR droplet generator |
QX200 Droplet Reader | Bio-Rad (Hercules, CA, USA) | 1864003 | dd-PCR droplet analysis |
QIAquick Nucleotide Removal Kit | Qiagen (Venlo, Netherlands) | 28306 | Second-strand synthesis of oligonucleotide insert purification |
QIAquick Gel Extraction Kit | Qiagen (Venlo, Netherlands) | 28704 | Plasmid vector purification |
QIAGEN Plasmid Maxi Kit | Qiagen (Venlo, Netherlands) | 12162 | Plasmid library DNA preparation |
Qiaquick PCR Purification kit | Qiagen (Venlo, Netherlands) | 28104 | Sample prepration for NGS library |
Qubit fluorometer | Invitrogen (Waltham, MA, USA) | Q32857 | NGS Library preparation |
Qubit dsDNA HS | Thermo Fischer Scientific (Waltham, MA, USA) | Q32851 | NGS Library preparation |
QuantiFast PCR Master Mix | Qiagen (Venlo, Netherlands) | 1044234 | Taqman qPCR |
rep_fw | IDT (San Joce, CA, USA) | AAGTCCTCGGCCCAGATAGAC | dd-PCR primer |
rep_rv | IDT (San Joce, CA, USA) | CAATCACGGCGCACATGT | dd-PCR primer |
rep_probe | IDT (San Joce, CA, USA) | FAM-TGATCGTCACCTCCAACA-BHQ1 | dd-PCR probe |
RNase-free DNase | Qiagen (Venlo, Netherlands) | 79254 | DNA/RNA extraction |
SfiI | New England Biolabs (Ipswich, MA, USA) | R0123 | Digestion of vector |
5 mm, steel Beads | Qiagen (Venlo, Netherlands) | 69989 | DNA/RNA extraction |
TRIMER-oligonucleotides | ELLA Biotech (Munich, Germany) | – | Degenerate oligonucleotide |
T4 Ligase | New England Biolabs (Ipswich, MA, USA) | M0202L | Plasmid library ligation |
TissueLyserLT | Qiagen (Venlo, Netherlands) | 85600 | DNA/RNA extraction |
YFP_fw | IDT (San Joce, CA, USA) | GAGCGCACCATCTTCTTCAAG | dd-PCR primer |
YFP_rv | IDT (San Joce, CA, USA) | TGTCGCCCTCGAACTTCAC | dd-PCR primer |
YFP_probe | IDT (San Joce, CA, USA) | FAM-ACGACGGCAACTACA-BHQ1 | dd-PCR probe |
Zymo DNA Clean & Concentrator-5 (Capped) | Zymo research (Irvine, CA, USA) | D4013 | Vector and Ligation purification |