Login processing...

Trial ends in Request Full Access Tell Your Colleague About Jove


Isolation of Next-Generation Gene Therapy Vectors through Engineering, Barcoding, and Screening of Adeno-Associated Virus (AAV) Capsid Variants

Published: October 18, 2022 doi: 10.3791/64389


AAV peptide display library generation and subsequent validation through the barcoding of candidates with novel properties for the creation of next-generation AAVs.


Gene delivery vectors derived from Adeno-associated virus (AAV) are one of the most promising tools for the treatment of genetic diseases, evidenced by encouraging clinical data and the approval of several AAV gene therapies. Two major reasons for the success of AAV vectors are (i) the prior isolation of various naturally occurring viral serotypes with distinct properties, and (ii) the subsequent establishment of powerful technologies for their molecular engineering and repurposing in high throughput. Further boosting the potential of these techniques are recently implemented strategies for barcoding selected AAV capsids on the DNA and RNA level, permitting their comprehensive and parallel in vivo stratification in all major organs and cell types in a single animal. Here, we present a basic pipeline encompassing this set of complementary avenues, using AAV peptide display to represent the diverse arsenal of available capsid engineering technologies. Accordingly, we first describe the pivotal steps for the generation of an AAV peptide display library for the in vivo selection of candidates with desired properties, followed by a demonstration of how to barcode the most interesting capsid variants for secondary in vivo screening. Next, we exemplify the methodology for the creation of libraries for next-generation sequencing (NGS), including barcode amplification and adaptor ligation, before concluding with an overview of the most critical steps during NGS data analysis. As the protocols reported here are versatile and adaptable, researchers can easily harness them to enrich the optimal AAV capsid variants in their favorite disease model and for gene therapy applications.


Gene transfer therapy is the introduction of genetic material in cells to repair, replace, or alter the cellular genetic material to prevent, treat, cure, or ameliorate disease. Gene transfer, both in vivo and ex vivo, relies on different delivery systems, non-viral and viral. Viruses have evolved naturally to efficiently transduce their target cells and can be used as delivery vectors. Amongst the different types of viral vectors employed in gene therapy, adeno-associated viruses have been increasingly used, owing to their lack of pathogenicity, safety, low immunogenicity, and most importantly their ability to sustain long-term, non-integrating expression1,2,3. AAV gene therapy has yielded considerable achievements over the past decade; three therapies have been approved by the European Medicines Agency and the US Food and Drug Administration for use in humans3,4. Several clinical trials are also underway to treat a variety of diseases, such as hemophilia, muscular, cardiac, and neurological diseases, as reviewed elsewhere3. Despite decades of advancement, the field of gene therapy has experienced a series of setbacks in recent years4, most importantly deaths in clinical trials5 that have been put on hold due to dose-limiting toxicities, particularly for tissues that are massive, such as muscle, or difficult to reach, such as brain6.

The AAV vectors currently being used in clinical trials belong to the natural serotypes with a few exceptions1. AAV engineering offers the opportunity to develop vectors with superior organ- or cell-specificity and efficiency. In the past two decades, several approaches have been successfully applied, such as peptide display, loop-swap, capsid DNA shuffling, error-prone PCR, and targeted design, to generate individual AAV variants or libraries thereof with diverse properties7. These are then subjected to multiple rounds of directed evolution to select the variants within them with the desired properties, as reviewed elsewhere1,3. Of all the capsid evolution strategies, peptide display AAV libraries have been the most widely used, due to some unique properties: they are relatively easy to generate, and they can achieve high diversity and high-throughput sequencing, which allows trailing their evolution.

The first successful peptide insertion AAV libraries were described almost 20 years ago. In one of the first, Perabo et al.8 constructed a library of modified AAV2 capsids, in which a pool of randomly generated oligonucleotides was inserted in a plasmid at a position that corresponds to amino-acid 587 of the VP1 capsid protein, in the three-fold axis protruding from the capsid. Using adenovirus co-infection, the AAV library was evolved through multiple rounds of selection, and the final re-targeted variants were shown to be capable of transducing cell lines refractory to the parental AAV28. Shortly thereafter, Müller et al.9 introduced the two-step system for library production, a significant improvement to the protocol. Initially, the plasmid library, together with an adenoviral helper plasmid, are used to produce an AAV library that contains chimeric capsids. This AAV shuttle library is used to infect cells at low multiplicity of infection (MOI), with the aim to introduce one viral genome per cell. Co-infection with adenovirus ensures the production of AAVs with a matching genome and capsid9. About a decade later, Dalkara10 used in vivo directed evolution to create the 7m8 variant. This variant has a 10 amino-acid insertion (LALGETTRPA), three of which act as linkers, and efficiently targets the outer retina after intravitreal injection10. This engineered capsid is an exceptional success story, as it is one of the few engineered capsids to make it to the clinic thus far11.

The field experienced a second boost with the introduction of next-generation sequencing (NGS) techniques. Two publications from Adachi et al.12 in 2014 and from Marsic et al.13 in 2015, showcased the power of NGS to track the distribution of barcoded AAV capsid libraries with high accuracy. A few years later, the NGS of barcoded regions was adapted to the peptide insertion region to follow the capsid evolution. Körbelin et al.14 performed an NGS-guided screening to identify a pulmonary-targeted AAV2-based capsid. The NGS analysis helped calculate three rating scores: the enrichment score between selection rounds, the general specificity score to determine tissue specificity, and finally the combined score14. The Gradinaru lab15 published the Cre-recombination-based AAV targeted evolution (CREATE) system in the same year, which facilitates a cell-type-specific selection. In this system, the capsid library carries a Cre-invertible switch, as the polyA signal is flanked by two loxP sites. The AAV library is then injected in Cre mice, where the polyA signal is inverted only in Cre+ cells, providing the template for binding of a reverse PCR primer with the forward primer within the capsid gene. This highly specific PCR rescue enabled the identification of the AAV-PHP.B variant that can cross the blood-brain barrier15. This system was further evolved into M-CREATE (Multiplexed-CREATE), in which NGS and synthetic library generation were integrated in the pipeline16.

An improved RNA-based version of this system from the Maguire lab17, iTransduce, allows selection on the DNA level of capsids that functionally transduce cells and express their genomes. The viral genome of the peptide display library comprises a Cre gene under the control of a ubiquitous promoter and the capsid gene under the control of the p41 promoter. The library is injected in mice that have a loxP-STOP-loxP cassette upstream of tdTomato. Cells transduced with AAV variants that express the viral genome and therefore Cre express tdTomato and, in combination with cell markers, can be sorted and selected17. Similarly, Nonnenmacher et al.18 and Tabebordbar et al.19 placed the capsid gene library under the control of tissue-specific promoters. After injection in different animal models, viral RNA was used to isolate the capsid variants.

An alternative approach is to use barcoding to tag capsid libraries. The Björklund lab20 used this approach to barcode peptide insertion capsid libraries and developed the barcoded rational AAV vector evolution (BRAVE). In one plasmid, the Rep2Cap cassette is cloned next to an inverted terminal repeats (ITR)-flanked, yellow fluorescent protein (YFP)-expressing, barcode-tagged transgene. Using loxP sites between the end of cap and the beginning of the barcode, an in vitro Cre recombination generates a fragment small enough for NGS, thereby allowing the association of peptide insertion with the unique barcode (look-up table, LUT). AAV production is performed using the plasmid library and the barcodes expressed in the mRNA are screened after in vivo application, again with NGS20. When the capsid libraries comprise variants of the whole capsid gene (i.e., shuffled libraries), long-read sequencing needs to be used. Several groups have used barcodes to tag these diverse libraries, which enables NGS with higher read depth. The Kay lab21 tagged highly diverse capsid shuffled libraries with barcodes downstream of the cap polyA signal. In a first step, a barcoded plasmid library was generated, and the shuffled capsid gene library was cloned into it. Then a combination of MiSeq (short read, higher read depth) and PacBio (long read, lower read depth) NGS as well as Sanger sequencing was used to generate their LUT21. In 2019, Ogden and colleagues from the Church lab22 delineated the AAV2 capsid fitness for multiple functions using libraries that had single point mutations, insertions, and deletions in every position, which ultimately enabled machine-guided design. For the generation of the library, smaller fragments of the capsid gene were synthesized, tagged with a barcode, next-generation sequenced, and then cloned into the full capsid gene. The NGS data were used to generate a LUT. The library was then screened using just the barcodes and short read sequencing, which in turn allows higher read depth22.

Barcoded libraries have been predominantly used to screen a pool of known, natural, and engineered variants following several rounds of selection of capsid libraries or independent of a capsid evolution study. The advantage of such libraries is the opportunity to screen multiple capsids, whilst reducing animal numbers and minimizing variation between animals. The first studies that introduced this technology to the AAV field were published almost a decade ago. The Nakai lab12 tagged 191 double alanine mutants covering amino acids 356 to 736 on the VP1 from AAV9 with a pair of 12-nucleotide barcodes. Using NGS, the library was screened in vivo for galactose binding and other properties12. Marsic and colleagues delineated the biodistribution of AAV variants using also a double-barcorded analysis 1 year later13. A more recent study in non-human primates compared the biodistribution in the central nervous system of 29 capsids using different routes of delivery23. Our lab has recently published barcoded AAV library screens of 183 variants that included natural and engineered AAVs. These screens on the DNA and RNA level led to the identification of a highly myotropic AAV variant24 in mice as well as others displaying a high cell-type specificity in the mouse brain25.

Here, we describe the methodology used in this work and expand on it to include screening of AAV peptide display libraries. This comprises the generation of AAV2 peptide display libraries, a digital droplet PCR (dd-PCR) method for quantification, and finally an NGS pipeline to analyze the AAV variants, based in part on the work by Weinmann and colleagues24. Finally, a description of the generation of barcoded AAV libraries and the NGS pipeline used in the same publication is provided.

Subscription Required. Please recommend JoVE to your librarian.


1. AAV2 random 7-mer peptide display library preparation

NOTE: For the preparation of an AAV2 random peptide display library, synthesize the degenerate oligonucleotides as single-stranded DNA, convert it to double-stranded DNA, digest, ligate to the acceptor plasmid, and electroporate.

  1. Design of degenerate oligonucleotides
    1. Order the degenerate oligonucleotides and avoid codon bias. In the oligonucleotide 5' CAGTCGGCCAG AG W GGC (X01)7 GCCCAGGCGGCTGACGAG 3', X01 corresponds to 20 codons, each encoding one of the 20 amino acids. The W can be A or T, producing the codons AGA or AGT, which encode the amino acids arginine (R) or serine (S).
    2. Order the amplification primer: 5' CTCGTCAGCCGCCTGG 3' (see Figure 1 for details). This produces the following protein insert: R/S G X7. The theoretical diversity is calculated as follows: 1 x 2 x 207 = 2.56 x 109 unique variants.
      NOTE: It should be noted that this diversity might be restricted by the transformation efficiency.
  2. Second-strand synthesis
    1. Resuspend both the oligonucleotides (degenerate oligonucleotides and amplification primer) to a 100 µM final concentration with TE buffer.
      1. For the PCR reaction, set up a 50 µL reaction with 1 µL of each primer, 10 µL of the buffer, 1.5 µL of DMSO, 0.5 µL of dNTPs (10 mM), 0.5 µL of Hi-fidelity Hot Start Polymerase II, and 35.5 µL of nuclease-free water.
      2. Transfer the reaction to a thermocycler and run a pre-incubation step for 10 s at 98 °C, followed by three cycles of 10 s at 98 °C, 30 s at 59 °C, and 10 s at 72 °C, then 5 min at 72 °C and a final cooling step.
    2. Purify the reaction using a nucleotide removal kit and elute in 100 µL of nuclease-free water.
    3. Confirm the efficiency of the second-strand synthesis by analysis on a Bioanalyzer (see Figure 2). Analyze the size and purity of the double-stranded insert by loading 1 µL of the reaction to a microfluidic chip from a DNA 1000 Reagents kit according to the manufacturer's instructions. This kit is optimized to measure the size and concentration of double-stranded DNA fragments from 25-1,000 bps.
  3. Digestion of insert and plasmid vector
    1. Digest 85 µL of the purified insert with 10 µL of 10x buffer and 5 µL of BglI enzyme in a final 100 µL reaction volume (see Figure 1 for details). Incubate at 37 °C overnight. Purify using a nucleotide removal kit, elute in 50 µL of nuclease-free water, and quantify using the type "Oligo DNA" in a spectrophotometer.
    2. Digest 10 µg of a replication-competent AAV plasmid (pRep2Cap2_PIS)26 (ITR-flanked viral genome) with 20 µL of 10x buffer and 10 µL of SfiI enzyme in a final 200 µL reaction volume (see Figure 1 for details). Incubate at 50 °C overnight. Purify the vector on a 1% agarose gel using the gel extraction kit followed by an additional purification step using a DNA purifying kit. Quantify the concentration in a spectrophotometer.
  4. Ligation of insert to vector
    1. Ligate 955 ng of plasmid vector with 45 ng of insert with 2 µL of buffer and 2 µL of ligase in a 20 µL ligation reaction. Incubate at 16 °C overnight, followed by 10 min at 70 °C to heat-inactivate the ligase.
  5. Transformation, complexity calculation, and plasmid library preparation
    1. Purify the reaction with a DNA purifying kit following the manufacturer's instructions. Elute the reaction in about 80% of the starting volume of nuclease-free water and store on ice for subsequent transformation.
    2. Transform electrocompetent cells: thaw one vial of electrocompetent cells on ice for 10 min. Then add 1-2 µL of the purified ligation reaction to 30 µL (one vial) of electrocompetent cells and mix by gently tapping. Next, carefully pipette the cell/DNA mixture to a pre-chilled 1 mm gap electroporation cuvette without introducing air bubbles.
    3. Electroporate using the following settings: 1800 V, 600 Ω, and 10 µF. Within 10 s of the electroporation pulse, add 970 µL of pre-warmed recovery media (provided with the electrocompetent cells) to the cuvette and mix by pipetting. Lastly, transfer the cells to a micro centrifuge tube and incubate for 1 h at 37 °C at 250 rpm. To achieve a desired diversity, perform 10-100 reactions, and after incubation, pool all reactions in one tube.
    4. Calculate the diversity by diluting 10 µL of the pooled transformations 10-, 100-, or 1,000-fold in PBS and spread 100 µL on nutrient agar plates containing the appropriate antibiotic (75 mg/mL of ampicillin). Incubate the agar plates overnight at 37 °C and then count the colonies on the agar plates.
    5. Calculate the theoretical diversity as follows:
      Theoretical maximal diversity = 10 x dilution factor x number of colonies x number of electroporation reactions.
      ​NOTE: To confirm the library quality, sequence at least 20 colonies by Sanger sequencing. Most clones should contain an insert, and all should be unique.
    6. Inoculate 400-1,000 mL of LB medium containing the appropriate antibiotic with the rest of the pooled transformations and incubate overnight at 37 °C, 180 rpm.
  6. Preparation of plasmid library
    1. From the overnight culture, prepare a glycerol stock (mix equal volumes of bacterial culture and 50% glycerol solution in nuclease-free water and freeze at -80 °C) and purify the plasmid library using a plasmid maxi kit.
  7. Production of AAV viral library
    1. Prepare the viral library as previously described27. Transfect the plasmid library (pRep2Cap2_PI, peptide insert) together with an adeno-helper plasmid to HEK293T cells using a transfection reagent such as polyethylenimine (PEI).
    2. Collect the cells after 3 days and subject them to three cycles of freeze-thaw. Purify the viral lysate using cesium chloride gradient ultracentrifugation, followed by buffer exchange to PBS, and finally concentrate the viral particles.
  8. AAV vector titration using dd-PCR
    1. Serially dilute 2 µL of the AAV vector stock in 198 µL of nuclease-free water to yield a 1:106 final dilution. Mix thoroughly each time using a 200 µL pipette. Add one no-template control (NTC) as a negative control.
      NOTE: Additional lower or higher dilutions may be assayed (1:105-1:107).
    2. Prepare a 20x primer-probe mix. Add 3.6 µL of each of the 100 µM primers (forward and reverse, Rep2, and ITR), 1 µL each of the 100 µM dd-PCR probes (Rep2 and ITR), and 3.6 µL of nuclease-free water to a 1.5 mL centrifuge tube.
      NOTE: The AAV library is measured using a transgene-targeted primer-probe set (Rep2) detected with a FAM-labeled probe, and an ITR-targeted primer-probe set detected with a HEX-labeled probe.
    3. Prepare a 22 µL PCR reaction by adding 5.5 µL of sample, 1.1 µL of 20x primer-probe mix, 11 µL of dd-PCR supermix for probes (no dUTP), and 4.4 µL of nuclease-free water. This yields concentrations of 900 nM and 250 nM for the primers and the probe, respectively.
    4. Generate the droplets using a droplet generator, transfer the reaction to a 96-well plate, place the plate into a thermocycler, and run a denaturation step for 10 min at 94 °C, followed by 40 cycles of 30 s at 94 °C and 1 min at 58 °C. Next, heat-inactivate the polymerase for 10 min at 98 °C and add a final cooling step. Read the reactions in a droplet reader and proceed to the analysis28.
    5. Open the saved dd-PCR plate file using the analysis software. Use the threshold tool in the 1D Amplitude tab (fluorescence amplitude vs. event number) to separate the negative and positive droplets for each channel, using the NTC as a guide, and export the data to a csv file.
    6. To calculate the vector concentration, first calculate the correction factor CF using the formula:
      Equation 1
      CF determines the proportion of droplets positive for the transgene [Positives] that are positive for both, transgene and ITR [Ch1+ Ch2+], to ensure the detection of functional vector particles. The final vector concentration c can now be calculated using the following equation:
      Equation 2
      DF is the dilution factor (1:105-1:107 as determined earlier). The copies per 20 µL/well reaction correspond to 5 µL of the diluted sample. The factor 1,000 corrects the scale to VG/mL (viral genome/mL). An exemplary titration result is demonstrated in Table 1 and Figure 3.
  9. Analysis of the AAV viral library by NGS
    1. Amplify the 96-nucleotide peptide insertion fragment by setting up a 20 µL PCR reaction using a proof-reading polymerase kit (2x; see Figure 4). Add 1 µL of AAV stock containing 1 x 108 vg, 0.5 µL of each of 100 µM primer (NGS_forward and NGS_reverse), and 10 µL of the enzyme mix to the reaction. Adjust the final volume to 20 µL with nuclease-free water.
    2. Transfer the reaction to a thermocycler and run a denaturation step for 3 min at 98 °C, followed by 30-35 cycles of 10 s at 98 °C, 10 s at 59 °C, and 20 s at 72 °C, followed by 5 min at 72 °C and a final cooling step.
    3. Purify the samples using a PCR purification kit. Quantify the concentration in a spectrophotometer and run a 3% agarose gel to verify purity and fragment size.
    4. Process the PCR fragments using the library system for low complexity samples kit according to the manufacturer's instructions for the preparation of an NGS library. Perform the end repair reaction with 30 ng of PCR fragment, followed by adaptor ligation and PCR amplification for 10 cycles. Use the PCR purification kit for the purification of the reactions.
    5. Process the final products on a Bioanalyzer to verify the size and purity, using a DNA reagents kit according to the manufacturer's instructions.
    6. Quantify the amplicons using a fluorometer and pool them. Quantify the final pooled NGS library again on a fluorometer (according to the manufacturer's instructions) and verify the quality on a Bioanalyzer.
    7. Sequence the NGS libraries in a single-end (SE) mode, using a 75-cycle high output kit, with a read length of 84 and an index 1 of 8.
      NOTE: Sequencing of the examples in this article was performed at the GeneCore facility of EMBL Heidelberg (http://www.genecore.embl.de/).
    8. Analyze the NGS sequencing data with Python 3 and biopython. The files can be found at https://github.com/grimmlabs/AAV_GrimmLab_JoVE2022 (alternatively at https://doi.org/10.5281/zenodo.7032215). The NGS analysis is composed of two steps. 
      1. In the first step, search the sequence files for sequences that satisfy certain criteria (presence of recognition sequences flanking the insertion site) (see Figure 4, step This is done using a script (Script#1) and a configuration file that provides the information needed. Once the correct sequence is identified, the program extracts and stores the sequence in the output file, which is a txt file with the same name as the sequencing file.
      2. The second step is the analysis of the output files. The sequences in the library start with any of six nucleotides (AGWggc, W =A/T) in the nine amino-acid insert. Based on this start sequence, the peptide is translated. This generates the output files that contain the peptide variants (PVs).
      3. Prepare two folders: Script and Data. To the Data folder, copy the gzip-compressed files resulting from the sequencing. To the Script folder, copy the following files, Python file: Script#1_DetectionExtraction_JoVE_Py3.py; Python file: Script#2_PV_extraction_and_ranking_Py3.py; Configuration file: Barcode_Script_JoVE.conf; and Look-up table (LUT) file: Zuordnung.txt.
      4. Before running the scripts, edit the following files in the Script folder. Open the "Zuordnung.txt" file and add in two tab-separated columns, the names of the gzip files (column 1), and the desired final name (column 2; tab-separated values).
        NOTE: Sample txt files are found in the GitHub folder "PV_analysis_script". The files provided in the GitHub folder are prepared for the analysis of three sample data from the above library: xaa.txt.gz, xab.txt.gz, and xac.txt.gz. The output files are also provided.
      5. Change the following variables in the configuration file "Barcode_Script_JoVE.conf":
        my_dir = "~/Data/"
        filename_sample_file = "~/Script/Zuordnung.txt"
        The sequence-specific variables: BCV_size = 27, BCVleft = TCCAGGGCCAG, BCVright = GCCCAGG, BCVloc = 30, BCVmargin = 8, BCVleft_revcomp = GCCGCCTGGGC, BCVright_revcomp = CTGGCCC, and BCVloc_revcomp = 41 (see Figure 4 for details).
      6. Use the following command to call the variant sequence detection and extraction:
        >python3 ~/Script#1_DetectionExtraction_JoVE_Py3.py ~/Barcode_Script_JoVE.conf
        NOTE: The output are txt files with the extracted DNA sequences and their numbers of reads. The header of this file contains statistical data (i.e., the total number of reads and the extracted reads). These data are transferred to the next files. These txt data are the input files for Script#2, in which the DNA sequences are translated, ranked, and analyzed.
      7. Perform PV extraction and analysis using the following command:
        >python3 ~/Script#2_PV_extraction_and_ranking_Py3.py ~/Barcode_Script_JoVE.conf
      8. Analyze the text output files of Script#2. The output files of Script#2 are named using the second column of the LUT in "Zuordnung.txt" with extensions based on the type of analysis.
        ​NOTE: Ensure that the three output files contain statistical data in the first rows ("# of Valid PV reads", "# of Invalid PV reads", and "# of unique PV reads"), a first column with the index of each DNA sequence from the input txt files (output of Script#1), and the following columns: (1) "…analyzed_all.csv": "Sample:" (DNA sequence), "#" (number of reads), "Frw or Rev" (forward or reverse read), and "PVs" (translated peptide sequence). The invalid sequences have "NA" and "not valid" in the last two columns. (2) "…analyzed_validSeq.csv": same as the previous file, filtered for valid sequences. (3) "…analyzed_PV.csv": "PVs" (translated peptide sequence), "#" (number of reads), and "count" (the frw and rev counts in the previous files are merged and the count is given 1 or 2).
      9. Visualize the output files using available software based on the user's needs.

2. AAV2 random 7-mer peptide display library selection

  1. Use the AAV library after quantification and quality-control (section 1) for directed evolution in a model of choice to iteratively select for candidates with desired properties (See Figure 5)16,18,21.
    ​NOTE: These candidates are then used for the generation of a barcoded library as described below in section 3.

3. Barcoded AAV capsid library preparation and analysis

NOTE: Following the identification of a set of potentially specific and efficient AAV capsids in the peptide display screen, verify the functionality of the identified peptide sequences and compare them with a set of commonly used or well-described reference AAV capsid variants. To do this, the capsid sequence is inserted into a Rep/Cap helper construct without ITRs.

  1. Production of barcoded AAV library
    1. Perform recombinant AAV production for each capsid variant using the three-plasmid system, as previously described24.
      NOTE: To distinguish the different capsid variants, the ITR-flanked reporter transgene plasmid harbors a unique barcode of 15 nucleotides in length. The barcode is located at the 3' UTR (untranslated region) between the enhanced yellow fluorescent protein (EYFP) and the polyA signal (see Figure 6A). EYFP expression is driven by a strong ubiquitous cytomegalovirus (CMV) promoter that provides sufficient levels of RNA transcripts.
    2. Design barcodes of 15 nucleotides in length with homopolymers of less than three nucleotides, GC content of <65%29, and a Hamming distance greater than four nucleotides24.
    3. Produce each capsid separately in combination with a transgene plasmid carrying a unique barcode. This way, each capsid variant is tagged with a distinct barcode that enables its specific tracking (see Figure 6B).
  2. AAV vector titration using dd-PCR
    1. Perform the AAV titration as previously described in section 1.8, by replacing the Rep2 primer pair with the YFP primer pair.
    2. Quantify the individual AAV productions and pool equal amounts of each production to generate the final barcoded library.
    3. Quantify the final library again to check the final concentration and quality (see Figure 7).
  3. Barcoded AAV library in vivo application
    1. Apply the barcoded AAV library systemically to the model system of choice (e.g. systemically in mice24).
    2. Collect ON- and OFF-target tissues (i.e., liver, lung, heart, diaphragm, smooth muscle, duodenum, pancreas, colon, biceps, ovaries, stomach, inner ear, kidney, abdominal aorta, thoracic aorta, brain, brown and white fat, and spleen) or cell types based on the experiment. Freeze them at -80 °C, extract the DNA/RNA, and apply NGS quantitation analysis, as described in the next section.
  4. DNA/RNA extraction
    1. Extract the DNA and RNA from the tissues of interest using the DNA/RNA Mini Kit.
    2. Place a small piece of the tissue of interest (1 mm3, about 5 mg) in a 2 mL reaction tube.
    3. Add 350 µL of lysis buffer mixed with β-mercaptoethanol (1%) and 5 mm steel beads to the tissue (handle samples with β-mercaptoethanol under a fume hood).
    4. Homogenize the tissue in a tissueLyser for 45 s at 40 Hz.
    5. Add 10 µL of proteinase K (10 mg/mL) and incubate for 15 min at 55 °C while shaking at 400 rpm.
    6. Centrifuge at 20,000 x g for 3 min at room temperature, collect the supernatant, and proceed with the manufacturer's protocol of the DNA/RNA Kit.
    7. Split the washing step into two steps with 350 µL of wash buffer in each step. In between these washing steps, digest the remnant DNA on the column with RNase-free DNase I. Add 80 µL of the DNase I solution, prepared according to the manufacturer's instruction, onto the column and incubate at room temperature for 15 min.
    8. Elute RNA/DNA from the column with nuclease-free water. Store the isolated RNA at -80 °C and the gDNA at -20 °C.
  5. cDNA synthesis
    1. Subject the RNA samples to another round of DNase I treatment of 15-30 min (for complete removal of contaminating DNA from the RNA samples) before the reverse-transcription reaction. Add 1 µL of the DNase I solution, 4 µL of buffer (provided with the kit), and nuclease-free water to a final volume of 40 µL to 212 ng of RNA. Incubate for 30 min at room temperature and heat inactivate at 70 °C for 10 min.
    2. Synthesize cDNA, using 150 ng of RNA using a kit according to the manufacturer's instructions. Include controls without the reverse transcriptase, to ensure the absence of contaminating viral DNA from the sample. The cDNA is stored at -20 °C.
      NOTE: The amount of input RNA for optimal reverse transcription can vary depending on the tissue type and the expected transduction efficiency in the respective tissue.
  6. Analysis of AAV viral library (in-vivo) by NGS
    1. To achieve high sequencing depth at low cost, perform NGS via Illumina sequencing as previously described (section 1.9). Amplify the barcode sequence, and then ligate the sequencing adaptors to the amplicon.
    2. Due to the short-read length and the ligation of the sequencing adapters on both sides of the amplicon, when designing, check that the amplicon is sufficiently small to ensure presence of the barcode sequence within the NGS read. For the sequencing of the barcodes within the viral genomes and the viral transcripts, the PCR amplicon is designed to be 113 bp long (see Figure 8).
    3. Amplify the barcoded region with the primers BC-seq forward and BC-seq reverse. Prepare the following PCR reaction: 0.5 µL of Hi-fidelity DNA polymerase, 10 µL of 5x buffer, 0.25 µL of each 100 µM primer (BC-seq fw/BC-seq rv), and 1 µl of 10 mM dNTPs. Use 25 ng of the cDNA or DNA/reaction as a template and adjust the final volume to 50 µL with nuclease-free water.
    4. Prepare the PCR master-mix under a clean PCR hood to avoid contamination. Use the following cycling conditions: 30 s at 98 °C, followed by 40 cycles at 98 °C for 10 s and 72 °C for 20 s, and a final 5 min step at 72 °C.
    5. Include PCR controls to confirm the absence of contaminating DNA in the PCR master-mix. For the cDNA samples, include the controls without reverse transcriptase. Finally, include a sample with the AAV input library. This information will be used to generate the Normalization_Variant.txt file used in the analysis.
    6. Verify the size of the PCR fragment of each sample by gel electrophoresis before PCR purification. The latter is achieved by using either commercially available magnetic beads or column-based DNA purification systems (see Table of Materials).
    7. Prepare the NGS library using the library system for low complexity samples according to the manufacturer's instructions, as previously described in section 1.9.
    8. Determine the DNA concentration via the dsDNA HS Kit and analyze the quality of the library as previously described (section 1.9.6), followed by pooling. Quantify the pooled library on a fluorometer and assess the quality on a Bioanalyzer.
    9. Perform NGS sequencing as discussed in section 1.9.7.
    10. Quantify by qPCR the copy number of the transgene (viral genomes) and the housekeeping gene to assess the distribution of the pooled library between tissues or organs on the DNA.
    11. Set up a 30 µL qPCR reaction as follows, to determine the copy number of EYFP (transgene) and GAPDH (glyceraldehyde 3-phosphate dehydrogenase, housekeeping gene):
      1. Prepare a 60x primer/probe mix for EYFP (1.5 µM YFP_fw, 1.5 µM YFP_rv, and 0.6 µM YFP_probe; see Table of Materials). Use GAPDH primer/probe mix (see Table of Materials) to determine the copy number of the housekeeper gene. Set up the reaction on ice.
      2. Prepare a PCR master mix (15 µL, see Table of Materials) and add 60x primer/probe mix (0.5 µL) for all samples and standards (to calculate copy numbers for the standards, use the following link: http://cels.uri.edu/gsc/cndna.html). Set up the reaction on ice.
      3. Transfer 15.5 µL of the master mix into a 96-well plate and add 14.5 µL of sample (75 ng of total DNA concentration) or standard to the respective well. Seal the 96-well plate with foil, vortex, and spin briefly.
      4. Transfer 10 µL of each sample into a 384-well plate in duplicates. Seal the plate with foil and spin at 800 x g for 5 min at 4 °C.
      5. Incubate the reaction mix in a thermocycler using an initial temperature of 50 °C for 2 min, followed by an initial activation step of 10 min at 95 °C. Perform 40 cycles of denaturation at 95 °C for 15 s and annealing/extension at 60 °C for 1 min24.
      6. To obtain the number of diploid genomes (dg), use the GAPDH copy number and divide by two. Then, take the value of the EYFP copy number and divide by the number of dg, resulting in vector genomes per diploid genome (vg/dg). Use this value to generate the Normalization_Organ.txt file for the bioinformatic analysis.
    12. Perform the analysis of the NGS sequencing data like Weinmann et al.24, using custom code in Python3 (https://github.com/grimmlabs/AAV_GrimmLab_JoVE2022). The workflow comprises the detection of barcode sequences guided by flanking sequences, their length and location (Script#1_BarcodeDetection.py), as well as analysis of barcode enrichment and distribution over the set of tissues (Script#2_BarcodeAnalysis.py).
      1. Detect barcode and assign them to AAV variants. Place the sequencing data as archived fastq files in one directory (e.g., "Data_to_analyze"). The sequencing data file for the input library is included in this directory and used only to calculate the capsid proportions in the input library.
      2. Before executing the script, create two tab-delimited text files: the capsid variants file (see example file "Variants.txt") with the barcode sequences assigned to AAV capsid variant names, and the contamination file (see "Contaminations.txt") with barcode sequences which come from possible contamination (other barcodes available in the lab, contributing to contamination).
      3. Finally, edit the configuration file "Barcode_Script.conf" to include the following information: path to folder with sequencing data (e.g., "Data_to_analyze"), sequence of flanking regions of the barcodes, their position, and window size for barcode detection (similar to, see Figure 8).
      4. Use the following command to call for barcode detection with provided paths to Script#1_BarcodeDetection.py and configuration files:
        >python3 ~/Script#1_BarcodeDetection.py ~/Barcode_Script.conf
        NOTE: The output of Script#1_BarcodeDetection.py execution is text files with read counts per capsid variant as well as the total number of reads recovered from the raw data.
      5. Evaluate the distribution of barcoded AAV capsids among tissues or organs, by executing Script#2_BarcodeAnalysis.py together with the following txt files:
        1. In the "Zuordnung.txt" file, assign the name to each txt file obtained from the barcode detection run to a tissue/organ name: names of txt files in the first column and corresponding tissue/organ names in tab-delimited assignment.
          NOTE: For an example, check in the "Example" folder (https://github.com/grimmlabs/AAV_GrimmLab_JoVE2022). Of note, the tissue/organ name can include characters defining cDNA or gDNA measurement and biological replicate number (M1, M2, etc.).
        2. Create an "organs.txt" text file with the list of names for ON- and OFF-target organs, which correspond to the names given in the assignment "Zuordnung.txt" file (see "Example" folder: https://github.com/grimmlabs/AAV_GrimmLab_JoVE2022).
        3. Create "Normalization_Organ.txt" and "Normalization_Variant.txt" tab-delimited text files with normalized values for all capsid variants and all organs/tissues. In the first column of the "Normalization_Organ.txt" file, write the names given for each organ (as in the assignment file "Zuordnung.txt") and in the second column the normalization values for the corresponding tissues, generated in section 3.6.11.
        4. Fill the first column of the "Normalization_Variant.txt" file with the list of capsid names and the second column with the normalized values of the read counts for each capsid in the pooled library (normalization can be calculated based on the txt output file for the input library resulting from the first script).
        5. Edit the configuration file by specifying the full paths to all additional files mentioned above. Execute Script#2_BarcodeAnalysis.py as:
          >python3 /Script#2_BarcodeAnalysis.py ~/Barcode_Script.conf
          NOTE: The barcode analysis script outputs several files: text files with relative concentration (RC) values of capsid distribution within different tissues based on multiple normalization steps described earlier, and the spreadsheet file which combines text file data into merged matrix data. The latter can be used for cluster analysis and visualization.
        6. Visualize the data and perform cluster analysis of the matrix data in order to distinguish capsid properties and evaluate their similarities based on RC profiles across tissues. Use the additional script PCA_heatmap_plot.R placed in the repository:
          >Rscript --vanilla ~/PCA.R ~/relativeconcentration.xls
          ​NOTE: The script takes relativeconcentration.xls files as input and generates two plots of hierarchical cluster heatmap and principal component analysis (PCA).
        7. To modify plots (axes of heatmap, principal components of PCA) or png parameters (color, size, labeling), open the R script and follow the instructions provided in the commented sections.

Subscription Required. Please recommend JoVE to your librarian.

Representative Results

Generation of an AAV2 peptide display library. As a first step toward the selection of engineered AAVs, the generation of a plasmid library is described. The peptide insert is produced by using degenerate primers. Reducing the combination of codons in those from 64 to 20 has the advantages of eliminating stop codons and facilitating NGS analysis, by reducing library diversity on the DNA but not the protein level. The oligonucleotide insert is purchased as single-stranded DNA (Figure 1), which is converted to double-stranded DNA using a PCR reaction. The quality of this reaction is controlled in a Bioanalyzer. As shown in Figure 2, three cycles produced a stronger band, compared to 10 or 30 cycles. The insert is then digested with BglI to generate three-nucleotide overhangs. The nucleotide next to the overhang sequence of the double-stranded insert can be an A or a T (W is the ambiguity code for A or T), which is in the third position of a codon that encodes arginine (R) or serine (S). The vector (pRep2Cap2_PIS) has a frameshift mutation in the peptide insertion site that prevents production of the capsid in the absence of an insert, due to the creation of a stop codon shortly after the insertion site. SfiI digestion of this plasmid generates three-nucleotide overhangs that match the overhangs generated in the peptide-encoding oligonucleotide insert. Ligation needs to be performed under optimal conditions to maximize the complexity of the plasmid library. To this end, for the transformation, electroporation was performed using commercially available bacteria with high efficiency.

The diversity of the plasmid library is calculated based on the colony count, which is typically around 1 x 108 for this type of library. The total number of colonies corresponds to the maximum potential diversity of the library at the NGS analysis, as is discussed later. The plasmid library is then used to generate the AAV library, which is not described in detail here but elsewhere27.

The quantification of this library was performed using dd-PCR. Typically, two regions are quantified, the AAV2 rep gene within the viral genome and the ITR (see Figure 3 and Table 1). As shown in Table 1, from the droplets positive for the viral genome, 99.2% are also positive for ITR (Ch1+ Ch2+), which is a quality control for the AAV library and suggests the AAV capsids contain full viral genomes. To obtain a concentration in vg/mL, the proportion of double-positive droplets to positives is calculated and used to get the correct number of copies before amplification by the dilution factor.

The quality of the AAV library is then assessed by NGS analysis, starting with a PCR using appropriate primers. Next, the PCR product is processed using a commercially available kit, which adds index-containing adaptors to the PCR product. The NGS products are sequenced, and the files are analyzed using Python. Three sample data from an AAV2 peptide display library are provided. Each sequence in the Script#1 input file (list of sequences of all the PCR copies of the amplified DNA segment in the sample) is bioinformatically searched for the BCVleft and BCVright sequences, or the BCVleft_comp and BCVright_comp sequences. If either combination is identified, the contained sequence is extracted and added to the output file (see Figure 4). The output of both scripts provides statistical data regarding the NGS library preparation. In all three sets, the extracted reads, based on signature sequences specific to the library, represented about 94% of the total reads, which suggests a good quality. The output of Script#2 provides further statistical data, and the translation of the extracted DNA sequence yields additional quality control data. The "# of Invalid PV reads" (i.e., sequences that lack the six nucleotides used to initiate the in-silico  translation and encode residues RG or SG) is less than 1% of the recovered reads, which confirms a good sequencing quality. The outputs of the second script (i.e., the translation and ranking of the extracted DNA sequences) provide additional information, such as the number of reads per peptide variant or the number of DNA sequences giving rise to each peptide variant. Of the files, those that end in "analyzed_PVs" contain only valid DNA reads, and the analysis is performed on the peptide sequence level. Of the valid reads, more than 99% are unique, which suggests that the library is balanced, and the internal diversity is high.

Selection of an AAV2 peptide display library. This library can then be used for in vivo or in vitro selection. This is not included in this protocol, but an outline is provided in Figure 5. Briefly, for in vivo selection, the tissues are collected 1 week following systemic injection of 1 x 1012 vg/mouse and DNA is isolated. A rescue PCR on a larger fragment of the cap gene is performed, and the product is cloned into the plasmid vector using different restriction sites but a similar protocol to what is described here, and round-1 AAV libraries are prepared. For the NGS analysis, the PCR is performed on the isolated DNA or the AAV libraries. After selection, the percentage of unique PVs in the library typically decreases according to the pressure of selection. Depending on the project, the selection can be concluded once enough dominant PVs are identified by NGS.

Barcoded library selection and analysis. The data from a barcoded AAV library comprising 82 capsids has been described previously24. The dd-PCR quantification of the AAV particles (see Figure 7 and Table 1) shows that 94% of the capsids that are positive for the transgene also contain an ITR, indicative of a complete genome. This is lower than the wild-type library described above, but still suggests a good quality for recombinant AAVs, considering that they usually package less efficiently. After injection of the pooled library in animals, the tissues are collected, and DNA and RNA are isolated. PCR for NGS as well as NGS library preparation and sequencing are then performed as described above and previously24. To calculate vg/dg, the qPCR is performed and, in the sample data provided, the values range from 0.1-4. As is typical for systemic AAV delivery, the liver has over 10 vg/dg.

As part of the analysis pipeline, normalization steps are performed on different levels. In the input pooled library, the capsid variants are typically not equally represented. Therefore, the NGS analysis of the input library is used to generate a normalization file, which corrects the abundance of each capsid variant in the final tissue/organ based on the abundance of this capsid variant in the input library. The biodistribution of the pooled library members by NGS is performed on the DNA and RNA level. This normalization for the input library yields the quotient (P*αβ) of proportion within the tissue/organ (Pαβ) to the proportion in the input library (La). These calculations can be found in the "variant_comparison.txt" file. This quotient is then multiplied by the vg/dg values from the "Normalization_organ.txt" file to yield the value Βαβ, and the proportions of Βαβ values are calculated for a single tissue or for all. The proportion of the Βαβ value for each variant within one tissue (Vαβ) reflects the dissemination of this variant within this tissue ("organ_comparison.txt"). In contrast, the proportion of the Βαβ value for each variant in all tissues (Tαβ) indicates the dissemination of this variant within the whole body ("relativeconcetrations.xls"). These two proportions reflect the intra- and inter-tissue biodistribution of each variant. All these files can be used for different visualizations of capsid efficiency and specificity24. As an example, using the final table (found in "relativeconcetrations.xls"), the principal component analysis and hierarchical clustering is presented in Figure 9.

The normalization of the pooled library from the NGS sequencing shows that each capsid has an average proportion of 0.012, which also matches the theoretical proportion of each of the 82 capsids and suggests a well-balanced pooled library of 0.012 (1/82). The file "relativeconcentration.xls" generated by the bioinformatic pipeline reflects the inter-tissue capsid biodistribution, as illustrated in Figure 9. The heatmap shows relative concentration values on a log2 scale for each capsid of the pooled library hierarchically clustered according to the tissue biodistribution profile. The principal component analysis allows to distinguish clusters of AAV capsid variants with similar biodistribution properties and also highlights the outlying capsids with unique patterns of inter-tissue biodistribution. The two major branches of the heatmap hierarchy reflect the difference in the transduction efficiency of capsid variants. The left branch with the majority of the capsid variants includes all the capsids, which show a high value of relative concentration across the majority of tissues. Apart from strikingly high liver specificity, three other capsids (Var60, Var13, and Var63) exhibited specificity in the diaphragm (Di), skeletal muscle (SM), biceps (BlC), and in the brain (B). The right branch of the hierarchical clustering includes the capsid variants with overall lower transduction efficiency, which is pronounced in the duodenum (Du) and pancreas (P). The PCA for the original subset forms the cluster of capsid variants with high liver specificity (Var 64, 78, 65, 55, 56) and outlines the Var60 capsid with outstanding muscle tropism.

Figure 1
Figure 1: Overview of the cloning strategy for the AAV2 random 7-mer peptide display library.
The oligonucleotide with the random 7-mer peptide insertion sequence is flanked by sequences that contain the BglI digestion site and a binding site for the amplification reaction. The vector pRep2Cap2_PIS contains SfiI sites. The overhangs generated by the BglI and SflI digestion are complementary. Please click here to view a larger version of this figure.

Figure 2
Figure 2: Bioanalyzer quality control of oligonucleotide second-strand synthesis.
Second-strand synthesis of degenerate oligonucleotides is confirmed by analysis on a bioanalyzer. PCR reactions with three, 10, and 30 cycles of amplification is compared, showing that the most efficient is the three cycles of amplification. (A) Bioanalyzer data represented as gel image. (B-D) Plotted fragment lengths (in bp, x axis) versus fluorescence units (FU, y axis), compared to standard peaks, visible at 15 and 1500 bp. Red arrows indicate double-stranded oligonucleotides. Note that the highest FU value, representing the highest DNA concentration, of double-stranded oligonucleotides is observed after three cycles of amplification (red arrows). Please click here to view a larger version of this figure.

Figure 3
Figure 3: Titration of an AAV2 peptide display library using dd-PCR.
(A) Detection of rep2-positive droplets in channel 1 (FAM, Channel 1) for a non-template water control and a 1:106 diluted virus sample. (B) Detection of ITR-positive droplets in channel 2 (HEX, Channel 2). (C) Detection of droplets that are positive for both rep2 and ITR (highlighted in orange). Purple lines indicate thresholds for the detection of positive versus negative droplets. Please click here to view a larger version of this figure.

Figure 4
Figure 4: Overview of the DNA fragment used for NGS and settings for the python analysis.
The NGS PCR amplifies a 96-nucleotide region. The PCR fragment is used to generate the NGS library. For the bioinformatic analysis, recognition sequences right and left of the insertion site need to be given for both strands, as well as the distance from the beginning of the DNA fragment. Please click here to view a larger version of this figure.

Figure 5
Figure 5: Iterative selection of AAV libraries in vivo.
Mice are injected with the AAV library. Target ON- and OFF-tissues are collected 1 week later and subjected to NGS and analysis. The ON-target tissue is used to rescue the capsid gene, which is cloned into the parent vector. The selected AAV library is produced and used to repeat the aforementioned selection cycle. This figure was created with BioRender.com. Please click here to view a larger version of this figure.

Figure 6
Figure 6: Overview of the barcoded AAV library generation.
(A) Graphic representation of a self-complementary AAV genome bearing a CMV promoter-driven eyfp transgene flanked by ITRs. The 3' UTR contains a 15 nucleotide-long barcode (BC) located at the 3' UTR between the eyfp and the bovine growth hormone (BGH) polyadenylation signal. The BC enables capsid tracing at the DNA and mRNA level. (B) During AAV production, a unique barcoded genome is packaged by a sole variant of the cap gene, facilitating capsid identification. Please click here to view a larger version of this figure.

Figure 7
Figure 7: Titration of a barcoded AAV library using dd-PCR.
(A) Detection of YFP-positive droplets in channel 1 (FAM, Channel 1) for a non-template water control and a 1:106 diluted vector sample. (B) Detection of ITR-positive droplets in channel 2 (HEX, Channel 2). (C) Detection of droplets that are positive for both rep2 and ITR (highlighted in orange). Purple lines indicate thresholds for the detection of positive versus negative droplets. Please click here to view a larger version of this figure.

Figure 8
Figure 8: Overview of the DNA fragment used for NGS and settings for the Python analysis.
The NGS PCR amplifies a 113-nucleotide region. For the bioinformatic analysis, recognition sequences right and left of the barcode need to be given for both strands, as well as the distance from the beginning of the DNA fragment. Please click here to view a larger version of this figure.

Figure 9
Figure 9: Principal component analysis (PCA) and hierarchical cluster analysis.
(A) PCA of relative concentration values for 82 capsids across all tissues allows to define clusters of capsid variants with similar properties and variants with unique transduction patterns. (B) To better separate the highly populated cluster, the records of outlying unique variants were excluded from the matrix and PCA analysis was repeated. (C) Hierarchical cluster analysis allows to visually evaluate variant transduction profiles across tissues as a heatmap plot (Li = liver, Lu = lung, FatB = brown fat, H = heart, Di = diaphragm, SM = smooth muscle, Du = duodenum, P = pancreas, C = colon, BIC = biceps, O = ovaries, St = stomach, I = inner ear, K = kidney, Aa = abdominal aorta, At = thoracic aorta, B = brain, FatW = white fat, and S = spleen). Please click here to view a larger version of this figure.

Sample Target Copies/20 µL well Positives Ch1+ Ch2+ CF copies corrected DF VG/mL
H2O rep2 28 2 0 0 0 1.00E+06 5.60E+09
AAV2lib rep2 90600 16396 16266 0.99 89882 1.00E+06 1.80E+13
H2O YFP 4 3 2 0.67 3 1.00E+06 8.00E+08
BCAAVlib YFP 34680 13229 12452 0.94 32643 1.00E+06 6.53E+12

Table 1: Titration results of an AAV2 peptide display library ("AAV2lib") and the barcoded EYFP vector library ("BCAAVlib").

Subscription Required. Please recommend JoVE to your librarian.


In this protocol, the steps needed for peptide display AAV capsid engineering and for barcoded AAV library screening, as well as for bioinformatic analysis of library composition and capsid performance, are outlined. This protocol focuses on the steps that facilitate the bioinformatic analysis of these types of libraries, because most virology laboratories lag in programming skills to match their proficiency in molecular biology techniques. Both types of libraries have been extensively described in the literature, as outlined in the introduction, and can be reproduced with relative ease.

As a first step, the design of a peptide display library in position 588 in variable region VIII of AAV2 is outlined. This design (AAV2_Peptide(ii) in previous publication26) and the described cloning method can be easily adapted to other serotypes based on the information provided in a recent publication26. A critical step in the cloning pipeline is the ligation/transformation efficiency (using the ~1 x 108 colonies cut-off). It is recommended to add one ligation reaction with just vector. This helps in identifying the percentage of bacterial colonies with insert, which should be higher than 80%. A lower-than-expected efficiency, that is a number of bacterial colonies (calculating the percentage with insert) lower than the theoretical diversity of the oligonucleotide inserts, would negatively impact low abundance variants. Several improvements include longer digestion times and clean-up steps for the vector or ligation reaction, using commercial kits.

The next step is the quality control of the variant library using NGS of a PCR fragment that encompasses the oligonucleotide insertion site. The NGS is performed using an Illumina sequencing system. There are several alternatives, in which PCR products can be directly submitted without prior NGS library preparation. This is more suitable for small-scale experiments or when a high read depth is not required. The protocol reported here comprises NGS library preparation, including the addition of adaptors with Illumina indexes to the PCR products, using a commercially available kit. A common limitation with respect to NGS is that these PCR fragments have a low diversity, because the sequences flanking the high-diversity insertion site are identical in all variants, which in turn reduces sequencing efficiency. To address this, this kit adds any number of two to eight random nucleotides between the PCR fragment and the adaptor. Alternatively, the sample needs to be spiked with PhiX. The detailed Python pipeline is described to analyze AAV2 peptide display libraries. As a template, sample files extracted from the original NGS file of the AAV2 peptide display library are provided. This can be adapted to other serotypes with the instructions given. The output files of this analysis can be used in downstream analysis, such as the comparison of plasmid and AAV library30, amino-acid composition in each position30, calculation of the enrichment score between libraries after selection rounds14, or generation of sequence logos or graphs19. As far as the input library is concerned, the presence of a high percentage of unique variants is desired. However, some variants do not produce as well as others, which could lead to skewed distributions after production. A low variant diversity, or in other words the presence of dominant variants, could be attributed to low oligonucleotide insert quality or a high number of second-strand synthesis cycles (step 1.2). Furthermore, the amino-acid composition could be affected from production. Each amino acid should have a frequency of 5.00%. If the distribution varies greatly from this value, it is suggested to perform the same analysis on the plasmid library to identify potential biases30.

As the protocols for AAV library generation and the subsequent selection rounds in different animal and in vitro models have been extensively described in multiple publications and protocols27,30,31,32, here only the analysis of barcoded libraries of selected engineered variants and benchmarks is described. Of note, the library cloning after each round of selection can be performed by PCR isolation of the cap gene, as mentioned in the protocol and results sections. Extensive rounds of selection and PCR amplification can lead to the accruement of mutations or stop codons, that can be observed by NGS. Alternatively, enriched variants can be selected from the NGS data, and the oligonucleotides ordered and cloned the way it has been described for the generation of a peptide display library16,18. Finally, the protocol contains a brief description for the selection of libraries on the DNA level, 1 week after systemic injection. RNA-based selections are more stringent, as they also select for variants that traffic through infectious entry pathways, albeit are technically more challenging. It should be noted that RNA- or transgene-based selections (i.e., Cre) require a longer in vivo duration of about 3-4 weeks15,16,17,18,19,33. For DNA-based selections especially, it is critical to validate the selected variants against known natural and engineered serotypes on both the DNA- and RNA-level using a barcoded AAV library.

The second part of the protocol describes the generation and screening of barcoded AAV libraries using the pipeline previously developed24. Each AAV capsid in the pool contains the same transgene (eyfp under the control of a CMV promoter) with a distinct barcode between eyfp and the polyA signal. The barcodes used in this library can be found in the prior publication24. The design was based on the basic principle of Hamming distance (i.e., the barcode sequences need to be adequately different), so that sequencing errors do not lead to erroneous barcode assignments. As described in Lyons et al26, the chance of two errors occurring in reads between 25-100 nucleotides is very low. A Hamming distance of 4 means that two sequencing errors would be needed to assign a read as the wrong barcode. Reads with one error will be ignored during analysis, and in this case be categorized as "reads with unknown variants". In a relevant publication, guidelines and a Python script are provided to generate barcodes29 that can be used in the pipeline. For the identification of useful barcodes, another popular error-correcting code can be used as outlined in the publication by Buschmann and Bystrykh34, namely, the Levenshtein distance. This group also provides a software package for the R programming language35.

After AAV production, the pooled barcoded AAV library can be used for biodistribution studies in different models. This study outlines the pipeline using the 82-variant library from the previous publication24 and provides sample data for practice. This protocol can also be adapted based on each user's needs. The biodistribution analysis is based on the collection of different ON- and OFF-target tissues or cells, the extraction of DNA and RNA from them, PCR amplification of the barcode region for NGS sequencing, and the measurement of vg/dg on the DNA level. For the RNA, it would be best to calculate the ratio to the mRNA copy number of a reference gene. The reference gene of choice should be expressed similarly across different tissues36, such as RPP30 (Ribonuclease P/MRP Subunit P30)37 or Hprt (hypoxanthine phosphoribosyltransferase 1)36. However, this is arduous, so one may need to use several reference genes at the same time to normalize the RNA data. For this reason, the normalization to dg on the DNA level can be completed, which correlates roughly with the number of cells. This also points out the use of qPCR for this calculation, as previously described24, albeit dd-PCR is more precise and thus is preferred for future use, especially considering the progress in this field37. Last but not the least, basic molecular biology optimizations are extremely critical for the methodology described here. The PCR reactions need to be optimized to avoid introducing biases to the distribution of the libraries. The dd-PCR probes need to be designed to include intronic regions for DNA and inter-exonic regions for RNA. Good laboratory practice, such as physical compartmentalization of the steps, especially DNA and AAV production from library preparation, and proper disinfection is of paramount importance to avoid erroneous amplification and library contaminations.

Notably, the use of peptide display libraries and vector DNA/RNA barcoding to select for engineered capsids with novel tropisms or other clinically relevant properties only represents two examples of directed AAV capsid evolution technologies. They all have in common that they come with limitations and require further optimization to realize their full potential. For instance, the mere insertion of a peptide leaves a large proportion (~99%) of the underlying capsid sequence unchanged, whose properties including interaction with neutralizing anti-AAV antibodies might need to be modified further prior to human application38. Furthermore, the actual diversity of peptide display or other libraries is typically lower than the theoretical one, due to, for example, technical limitations during cloning or bacterial transformation31. Generally, there is also an active debate about the translational relevance of directed evolution in animal or in vitro models, fostered by increasing evidence for a possible species- or even strain-specific performance of synthetic AAV capsids38. Nonetheless, there is substantial hope that many or all these limitations will be overcome and that the current arsenal of technologies will be expanded to an even larger variety of disease models and become even more accessible to most research groups. In this respect, a particularly encouraging recent development is the use of barcoded AAV libraries such as those reported here to validate selected capsids no longer only on the organ, but now also on the cellular level, which can be achieved with novel technologies such as single-cell (sc) RNA sequencing39. The protocols presented here will facilitate the wider establishment of AAV evolution techniques and thus accelerate the development of novel capsids tailored to the needs of a plethora of research groups and human patients.

Subscription Required. Please recommend JoVE to your librarian.


D.G. is a co-founder of AaviGen GmbH. D.G. and K.R. are inventors on a pending patent application related to the generation of immune-evading AAV capsid variants. The rest of the authors have nothing to disclose.


D.G. greatly appreciates support by the German Research Foundation (DFG) through the DFG Collaborative Research Centers SFB1129 (Projektnummer 240245660) and TRR179 (Projektnummer 272983813), as well as by the German Center for Infection Research (DZIF, BMBF; TTU-HIV 04.819).


Name Company Catalog Number Comments
Amplification primer ELLA Biotech (Munich, Germany) - Second-strand synthesis of oligonucleotide insert
Agilent DNA 1000 Reagents Agilent Technologies (Santa Clara, CA, USA) 5067-1504 DNA fragment validation
Agilent 2100 Bioanalyzer System Agilent Technologies (Santa Clara, CA, USA) G2938C DNA fragment validation
AllPrep DNA/RNA Mini Kit  Qiagen (Venlo, Netherlands) 80204 DNA/RNA extraction
Agilent DNA 1000 Reagents Agilent Technologies (Santa Clara, CA, USA) 5067-1504 NGS Library preparation
Agilent 2100 Bioanalyzer System Agilent Technologies (Santa Clara, CA, USA) G2938C NGS Library preparation
BC-seq fw:  IDT (San Joce, CA, CA, USA) ATCACTCTCGGCATGGACGAGC  NGS Library preparation
BC-seq rv:   IDT (San Joce, CA, CA, USA) GGCTGGCAACTAGAAGGCACA  NGS Library preparation
β-Mercaptoethanol Millipore Sigma (Burlington, MA, USA) 44-420-3250ML DNA/RNA extraction
BglI New England Biolabs (Ipswich, MA, USA) R0143 Digestion of double-stranded insert
C1000 Touch Thermal Cycler Bio-Rad (Hercules, CA, USA) 1851196 dd-PCR cycler
dNTPS  New England Biolabs (Ipswich, MA, USA) N0447S NGS Library preparation
ddPCR Supermix for probes (no dUTP) Bio-Rad (Hercules, CA, USA) 1863024 dd-PCR supermix
Droplet Generation Oil for Probes Bio-Rad (Hercules, CA, USA) 1863005 dd-PCR droplet generation oil
DG8 Cartridges for QX100 / QX200 Droplet Generator Bio-Rad (Hercules, CA, USA) 1864008 dd-PCR droplet generation cartridge
DG8 Cartridge Holder Bio-Rad (Hercules, CA, USA) 1863051 dd-PCR cartridge holder
Droplet Generator DG8 Gasket Bio-Rad (Hercules, CA, USA) 1863009 dd-PCR cover for cartridge
ddPCR Plates 96-Well, Semi-Skirted Bio-Rad (Hercules, CA, USA) 12001925 dd-PCR 96-well plate
E.cloni 10G SUPREME Electrocompetent Cells Lucigen (Middleton, WI, USA) 60081-1 Electrocompetent cells
Electroporation cuvettes, 1mm Biozym Scientific (Oldendorf, Germany) 748050 Electroporation
GAPDH primer/probe mix Thermo Fischer Scientific (Waltham, MA, USA) Mm00186825_cn Taqman qPCR primer
Genepulser Xcell Bio-Rad (Hercules, CA, USA) 1652660 Electroporation
High-Capacity cDNA Reverse Transcription Kit Applied Biosystems (Waltham, MA, USA) 4368814 cDNA reverse transcription
ITR_fw IDT (San Joce, CA, USA) GGAACCCCTAGTGATGGAGTT (https://signagen.com/blog/2019/10/25/qpcr-primer-and-probe-sequences-for-raav-titration/) dd-PCR primer
ITR_rv IDT (San Joce, CA, USA) CGGCCTCAGTGAGCGA (https://signagen.com/blog/2019/10/25/qpcr-primer-and-probe-sequences-for-raav-titration/) dd-PCR primer
ITR_probe IDT (San Joce, CA, USA) HEX-CACTCCCTCTCTGCGCGCTCG-BHQ1 (https://signagen.com/blog/2019/10/25/qpcr-primer-and-probe-sequences-for-raav-titration/) dd-PCR probe
Illumina NextSeq 500 system Illumina Inc (San Diego, CA, USA) SY-415-1001 NGS Library sequencing
KAPA HiFi HotStart ReadyMix (2X)* Roche AG (Basel, Switzerland) KK2600 07958919001 NGS sample prepration
MagnaBot 96 Magnetic Separation Device Promega GmbH (Madison, WI, USA) V8151 Sample prepration for NGS library
NanoDrop 2000 spectrophotometer Thermo Fischer Scientific (Waltham, MA, USA) ND-2000 Digestion of double-stranded insert
NGS_frw Sigma-Aldrich (Burlinght, MA, USA) GTT CTG TAT CTA CCA ACC TC NGS primer
NGS_rev Sigma-Aldrich (Burlinght, MA, USA) CGC CTT GTG TGT TGA CAT C NGS primer
NextSeq 500/550 High Output Kit (75 cycles) Illumina Inc (San Diego, CA, USA) FC-404-2005 NGS Library sequencing
Ovation Library System for Low Complexity Samples Kit  NuGEN Technologies, Inc. (San Carlos, CA, USA) 9092-256 NGS Library preparation
PX1 Plate Sealer  Bio-Rad (Hercules, CA, USA) 1814000 dd-PCR plate sealer
Pierceable Foil Heat Seal Bio-Rad (Hercules, CA, USA) 1814040 dd-PCR sealing foil
Phusion High-Fidelity DNA-Polymerase Thermo Fischer Scientific (Waltham, MA, USA) F530S Second-strand synthesis of oligonucleotide insert
PEI MAX - Transfection Grade Linear Polyethylenimine Hydrochloride (MW 40,000) Polysciences, Inc. (Warrington, PA, USA) 24765-1G AAV library preparation
ProNex Size-Selective Purification System Promega GmbH (Madison, WI, USA) NG2002 Sample prepration for NGS library
Phusion Hot Start II Polymerase  Thermo Fischer Scientific (Waltham, MA, USA) F549L NGS Library preparation
Proteinase K Roche AG (Basel, Switzerland) 5963117103 DNA/RNA extraction
pRep2Cap2_PIS ITR-Rep2Cap2-ITR vector. Peptide insertion site within the Cap2 ORF, manufactured/prepared in the lab 
QX200 Droplet Generator Bio-Rad (Hercules, CA, USA) 1864002 dd-PCR droplet generator
QX200 Droplet Reader Bio-Rad (Hercules, CA, USA) 1864003 dd-PCR droplet analysis
QIAquick Nucleotide Removal Kit Qiagen (Venlo, Netherlands) 28306 Second-strand synthesis of oligonucleotide insert purification
QIAquick Gel Extraction Kit Qiagen (Venlo, Netherlands) 28704 Plasmid vector purification
QIAGEN Plasmid Maxi Kit Qiagen (Venlo, Netherlands) 12162 Plasmid library DNA preparation
Qiaquick PCR Purification kit Qiagen (Venlo, Netherlands) 28104 Sample prepration for NGS library
Qubit fluorometer Invitrogen (Waltham, MA, USA) Q32857 NGS Library preparation
Qubit dsDNA HS Thermo Fischer Scientific (Waltham, MA, USA) Q32851 NGS Library preparation
QuantiFast PCR Master Mix Qiagen (Venlo, Netherlands) 1044234 Taqman qPCR
rep_rv IDT (San Joce, CA, USA) CAATCACGGCGCACATGT dd-PCR primer
rep_probe IDT (San Joce, CA, USA) FAM-TGATCGTCACCTCCAACA-BHQ1 dd-PCR probe
RNase-free DNase Qiagen (Venlo, Netherlands) 79254 DNA/RNA extraction
SfiI New England Biolabs (Ipswich, MA, USA) R0123 Digestion of vector
5 mm, steel Beads Qiagen (Venlo, Netherlands) 69989 DNA/RNA extraction
TRIMER-oligonucleotides ELLA Biotech (Munich, Germany) - Degenerate oligonucleotide
T4 Ligase New England Biolabs (Ipswich, MA, USA) M0202L Plasmid library ligation
TissueLyserLT Qiagen (Venlo, Netherlands) 85600 DNA/RNA extraction
Zymo DNA Clean & Concentrator-5 (Capped) Zymo research (Irvine, CA, USA) D4013 Vector and Ligation purification



  1. Wang, D., Tai, P. W. L., Gao, G. Adeno-associated virus vector as a platform for gene therapy delivery. Nature Reviews Drug Discovery. 18 (5), 358-378 (2019).
  2. Muhuri, M., Levy, D. I., Schulz, M., McCarty, D., Gao, G. Durability of transgene expression after rAAV gene therapy. Molecular Therapy. 30 (4), 1364-1380 (2022).
  3. Li, C., Samulski, R. J. Engineering adeno-associated virus vectors for gene therapy. Nature Reviews Genetics. 21 (4), 255-272 (2020).
  4. Kuzmin, D. A., et al. The clinical landscape for AAV gene therapies. Nature Reviews Drug Discovery. 20 (3), 173-174 (2021).
  5. Mullard, A. Gene therapy community grapples with toxicity issues, as pipeline matures. Nature Reviews Drug Discovery. 20 (11), 804-805 (2021).
  6. Nature Biotechnology. Gene therapy at the crossroads. Nature Biotechnology. 40 (5), 621 (2022).
  7. Becker, P., et al. Fantastic AAV Gene Therapy Vectors and How to Find Them-Random Diversification, Rational Design and Machine Learning. Pathogens. 11 (7), 756 (2022).
  8. Perabo, L., et al. In vitro selection of viral vectors with modified tropism: the adeno-associated virus display. Molecular Therapy. 8 (1), 151-157 (2003).
  9. Muller, O. J., et al. Random peptide libraries displayed on adeno-associated virus to select for targeted gene therapy vectors. Nature Biotechnology. 21 (9), 1040-1046 (2003).
  10. Dalkara, D., et al. In vivo-directed evolution of a new adeno-associated virus for therapeutic outer retinal gene delivery from the vitreous. Science Translational Medicine. 5 (189), (2013).
  11. Sahel, J. A., et al. Partial recovery of visual function in a blind patient after optogenetic therapy. Nature Medicine. 27 (7), 1223-1229 (2021).
  12. Adachi, K., Enoki, T., Kawano, Y., Veraz, M., Nakai, H. Drawing a high-resolution functional map of adeno-associated virus capsid by massively parallel sequencing. Nature Communications. 5, 3075 (2014).
  13. Marsic, D., Mendez-Gomez, H. R., Zolotukhin, S. High-accuracy biodistribution analysis of adeno-associated virus variants by double barcode sequencing. Molecular Therapy-Methods & Clinical Development. 2, 15041 (2015).
  14. Korbelin, J., et al. Pulmonary targeting of adeno-associated viral vectors by next-generation sequencing-guided screening of random capsid displayed peptide libraries. Molecular Therapy. 24 (6), 1050-1061 (2016).
  15. Deverman, B. E., et al. Cre-dependent selection yields AAV variants for widespread gene transfer to the adult brain. Nature Biotechnology. 34 (2), 204-209 (2016).
  16. Ravindra Kumar, S., et al. Multiplexed Cre-dependent selection yields systemic AAVs for targeting distinct brain cell types. Nature Methods. 17 (5), 541-550 (2020).
  17. Hanlon, K. S., et al. Selection of an efficient AAV vector for robust CNS transgene expression. Molecular Therapy-Methods & Clinical Development. 15, 320-332 (2019).
  18. Nonnenmacher, M., et al. Rapid evolution of blood-brain-barrier-penetrating AAV capsids by RNA-driven biopanning. Molecular Therapy-Methods & Clinical Development. 20, 366-378 (2021).
  19. Tabebordbar, M., et al. Directed evolution of a family of AAV capsid variants enabling potent muscle-directed gene delivery across species. Cell. 184 (19), 4919-4938 (2021).
  20. Davidsson, M., et al. A systematic capsid evolution approach performed in vivo for the design of AAV vectors with tailored properties and tropism. Proceedings of the National Academy of Sciences. 116 (52), 27053-27062 (2019).
  21. Pekrun, K., et al. Using a barcoded AAV capsid library to select for clinically relevant gene therapy vectors. Journal of Clinical Investigation Insight. 4 (22), (2019).
  22. Ogden, P. J., Kelsic, E. D., Sinai, S., Church, G. M. Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science. 366 (6469), 1139-1143 (2019).
  23. Kondratov, O., et al. A comprehensive study of a 29-capsid AAV library in a non-human primate central nervous system. Molecular Therapy. 29 (9), 2806-2820 (2021).
  24. Weinmann, J., et al. Identification of a myotropic AAV by massively parallel in vivo evaluation of barcoded capsid variants. Nature Communications. 11 (1), 5432 (2020).
  25. Kremer, L. P. M., et al. High throughput screening of novel AAV capsids identifies variants for transduction of adult NSCs within the subventricular zone. Molecular Therapy-Methods & Clinical Development. 23, 33-50 (2021).
  26. Borner, K., et al. Pre-arrayed pan-AAV peptide display libraries for rapid single-round screening. Molecular Therapy. 28 (4), 1016-1032 (2020).
  27. Kienle, E., et al. Engineering and evolution of synthetic adeno-associated virus (AAV) gene therapy vectors via DNA family shuffling. Journal of Visualized Experiments. (62), e3819 (2012).
  28. Furuta-Hanawa, B., Yamaguchi, T., Uchida, E. Two-dimensional droplet digital PCR as a tool for titration and integrity evaluation of recombinant adeno-associated viral vectors. Human Gene Therapy Methods. 30 (4), 127-136 (2019).
  29. Lyons, E., Sheridan, P., Tremmel, G., Miyano, S., Sugano, S. Large-scale DNA barcode library generation for biomolecule identification in high-throughput screens. Scientific Reports. 7 (1), 13899 (2017).
  30. Korbelin, J., et al. Optimization of design and production strategies for novel adeno-associated viral display peptide libraries. Gene Therapy. 24 (8), 470-481 (2017).
  31. Korbelin, J., Trepel, M. How to successfully screen random adeno-associated virus display peptide libraries in vivo. Human Gene Therapy Methods. 28 (3), 109-123 (2017).
  32. Herrmann, A. K., et al. A robust and all-inclusive pipeline for shuffling of adeno-associated viruses. American Chemical Society Synthetic Biology. 8 (1), 194-206 (2019).
  33. Choudhury, S. R., et al. In vivo selection yields AAV-B1 Capsid for central nervous system and muscle gene therapy. Molecular Therapy. 24 (7), 1247-1257 (2016).
  34. Buschmann, T., Bystrykh, L. V. Levenshtein error-correcting barcodes for multiplexed DNA sequencing. BMC Bioinformatics. 14, 272 (2013).
  35. Buschmann, T. DNABarcodes: an R package for the systematic construction of DNA sample tags. Bioinformatics. 33 (6), 920-922 (2017).
  36. Li, B., et al. A comprehensive mouse transcriptomic BodyMap across 17 tissues by RNA-seq. Scientific Reports. 7 (1), 4200 (2017).
  37. Clarner, P., et al. Development of a one-step RT-ddPCR method to determine the expression and potency of AAV vectors. Molecular Therapy-Methods & Clinical Development. 23, 68-77 (2021).
  38. Zolotukhin, S., Vandenberghe, L. H. AAV capsid design: A Goldilocks challenge. Trends in Molecular Medicine. 28 (3), 183-193 (2022).
  39. Brown, D., et al. deep parallel characterization of AAV tropism and AAV-mediated transcriptional changes via single-cell RNA sequencing. Frontiers in Immunology. 12, 730825 (2021).
This article has been published
Video Coming Soon

Cite this Article

Rapti, K., Maiakovska, O., Becker, J., Szumska, J., Zayas, M., Bubeck, F., Liu, J., Gerstmann, E., Krämer, C., Wiedtke, E., Grimm, D. Isolation of Next-Generation Gene Therapy Vectors through Engineering, Barcoding, and Screening of Adeno-Associated Virus (AAV) Capsid Variants. J. Vis. Exp. (188), e64389, doi:10.3791/64389 (2022).More

Rapti, K., Maiakovska, O., Becker, J., Szumska, J., Zayas, M., Bubeck, F., Liu, J., Gerstmann, E., Krämer, C., Wiedtke, E., Grimm, D. Isolation of Next-Generation Gene Therapy Vectors through Engineering, Barcoding, and Screening of Adeno-Associated Virus (AAV) Capsid Variants. J. Vis. Exp. (188), e64389, doi:10.3791/64389 (2022).

Copy Citation Download Citation Reprints and Permissions
View Video

Get cutting-edge science videos from JoVE sent straight to your inbox every month.

Waiting X
Simple Hit Counter