Averaging of Viral Envelope Glycoprotein Spikes from Electron Cryotomography Reconstructions using Jsubtomo

Enveloped viruses utilize membrane glycoproteins on their surface to mediate entry into host cells. Three-dimensional structural analysis of these glycoprotein ‘spikes’ is often technically challenging but important for understanding viral pathogenesis and in drug design. Here, a protocol is presented for viral spike structure determination through computational averaging of electron cryo-tomography data. Electron cryo-tomography is a technique in electron microscopy used to derive three-dimensional tomographic volume reconstructions, or tomograms, of pleomorphic biological specimens such as membrane viruses in a near-native, frozen-hydrated state. These tomograms reveal structures of interest in three dimensions, albeit at low resolution. Computational averaging of sub-volumes, or sub-tomograms, is necessary to obtain higher resolution detail of repeating structural motifs, such as viral glycoprotein spikes. A detailed computational approach for aligning and averaging sub-tomograms using the Jsubtomo software package is outlined. This approach enables visualization of the structure of viral glycoprotein spikes to a resolution in the range of 20-40 Å and study of the study of higher order spike-to-spike interactions on the virion membrane. Typical results are presented for Bunyamwera virus, an enveloped virus from the family Bunyaviridae. This family is a structurally diverse group of pathogens posing a threat to human and animal health.


Introduction
Electron cryo-tomography is an electron cryo-microscopy imaging technique allowing the calculation of a three-dimensional (3D) reconstruction of complex biological specimens. Suitable specimens range from purified macromolecular complexes 1 , filaments 2 , coated vesicles 3 , and pleomorphic membrane viruses 4 to whole prokaryotic cells 5 and even thin areas of whole eukaryotic cells 6 . Following the data collection of a tiltseries, 3D tomographic volumes, or tomograms, may be calculated using several established software packages, including Bsoft 7 and IMOD 8 .
Two aspects inherent to the study of biological specimens by electron cryo-tomography limit the biological interpretation of the corresponding tomographic volumes. First, due to the limited electron dose that can be applied to biological materials before introducing significant radiation damage, signal-to-noise ratios in tomographic data are typically very low. Second, as a result of limited sample tilt geometry during data collection, some views of the object remain absent, leading to a so-called 'missing wedge' artifact in the tomographic volume. However, both of these limitations can be overcome if the tomographic volume contains repeating identical structures, such as macromolecular complexes, that can be successfully averaged [9][10][11][12] .
Prior to averaging structures from tomogram reconstructions, objects of interest must be found and aligned to the same orientation. Locating such structures may be achieved by cross-correlation of a template structure in the tomographic volume using an approach often referred to as template matching 13 . The template used in this matching process can be derived from electron cryo-microscopy or electron cryo-tomography combined with 3D reconstruction, or it can be a density map simulated from an atomic structure. Several computational packages have been developed to carry out these tasks 11 .
Averaging of glycoprotein spikes of membrane viruses, such as HIV-1, has been a particularly successful approach for studying their structure [14][15][16] . An understanding of the structure is integral for revealing both the molecular basis of virus-host interactions and guiding antiviral and vaccine design development. While macromolecular crystallography is the technique of choice for high-resolution (usually better than 4 Å) structural analysis of individual viral glycoproteins and their complexes, the X-ray structures resulting from this method are of proteins isolated from the natural membranous environment on the virion. Thus, important details such as the higher order architecture of viral glycoproteins, in the context of the virion, remain lacking. On the other hand, electron cryo-microscopy and single particle reconstruction of entire enveloped viruses is restricted to virions with icosahedral symmetry 17,18 We have developed software named Jsubtomo (www.opic.ox.ac.uk/jsubtomo) for the detection, alignment, and averaging of tomographic sub-volumes. Jsubtomo has been utilized in the structure determination of a number of cellular and viral structures [19][20][21][22][23][24][25][26] . Here, we outline a detailed protocol, which enables the determination viral-surface spike structures. To circumvent over-refinement of averaged structures by correlating noise, the 'gold-standard' refinement scheme is adopted 10,27 . Finally, strategies for visualization and interpretation of typical results are discussed.

Protocol
A detailed protocol for the computational alignment and subsequent averaging of viral glycoprotein spikes is outlined. The protocol follows the workflow illustrated in Figure 1 and combines an automated search for the spikes using initial template structures and a gold-standard structure refinement.
Input data for this protocol is a set of tomographic reconstructions of the virions. One tomogram contains one or more virions. Initially, a small subset of spikes is manually picked and used to average and refine two independent models. These models are used to automatically locate spikes on all of the virions. Finally, two independent refinements are run and the resulting averages are compared and combined to produce the final structure.
The refinement approach is demonstrated by using programs from the Jsubtomo package. Programs from the Bsoft package 28 are used for general image processing tasks and molecular graphics package UCSF Chimera 29 is used to visualize results. The names of individual programs are given in italics and file formats are denoted with uppercase filename extensions.

Generation and Alignment of Seeds to the Virus Surface for Template Matching
1. Generate seeds located evenly on the virion surface for template matching and assign an initial view vector to the seeds by running jviews.py. Use the STAR files generated in step 1.2.2 as input files. Generate approximately 1.5 times more seeds than the expected number of spikes. NOTE: This view vector approximates the direction of the spike closest to each seed point. 1. To generate evenly distributed seeds on a roughly spherical virion (parameter "--Even"), give the radius (parameter "--radius"), angular separation (e.g., 20 degrees) of the seeds (parameter "--angle 20") and the central coordinate of the virion. E.g., if the virion is centered (after step 1.2) in a box with a size of 250 x 250 x 250 pixels, the central coordinate is 125,125,125 pixels (option "--origin 125,125,125"). 2. To generate evenly distributed seeds on a filamentous particle, use the parameter "--Filament" and specify both the radius and helical symmetry parameters (rise and twist). NOTE: The helical symmetry parameters are used here only as a convenient way of defining evenly distributed seed positions and they do not need to reflect the actual ordering of the spikes on the filament.
2. Generate two independent averages of the virus surface as explained in section 2. 1. Generate a SEL file defining virions belonging to sets "1" and "2" using jsubtomo_evenodd.py and the STAR files generated in step 4.1. NOTE: To ensure the independence of datasets, any previous assignment of the virions into groups "1" and "2" must stay the same.
3. Refine the position of the seeds. 1. Follow the instructions in step 3.1 unless stated otherwise. Use the SEL file generated in step 4.2.1 as the input file. 2. Allow the seeds to shift only along the normal to the membrane (parameter "--zshiftlimit"). Adjust the allowed amount of shift depending on how much the virions deviate from ideal spherical geometry (e.g., parameter --zshiftlimit 25"). 3. Use the two MAP files generated in step 4.2 (denoted by tags "even" and "odd" in the filename) as input template files (parameters "--Template1" and --Template2). 4. Use a unique suffix to avoid overwriting the original input STAR files (e.g., parameter "--suffix _seeds"). 5. Use binning of 4 to speed up the calculation (parameter "--bin 4") and a low-pass filter (e.g., parameter "--resolution 50"). 4. Generate Chimera marker files (CMM) of the refined seed STAR files using jviews.py (parameter "--cmm"). Examine the seeds by opening the CMM files (and the associated virion MAP files) in chimera. Make sure that the refined seeds are aligned correctly relative to the virus membrane. NOTE: Different colors can be used to differentiate separate sets of markers (parameter "--color"). Alternatively the refined markers can be colored based on their cross correlation coefficient (e.g., parameter "--fomcolor 0.1,0.3").

Gold-standard Iterative Alignment and Averaging of the Spike Structure
Automatically locate all the spikes in the virion sub-volumes using local template matching around the refined seeds and align and average the located spikes. Use the averages generated from a subset of manually picked spikes as initial templates.
1. Perform local template matching around the seeds to average all the spikes using program jsubtomo_iterate_gold.py. Follow the instructions in step 3.5 unless stated otherwise. 1. The input file is the SEL file for the refined seeds generated in step 4.3.
2. Use a sufficiently large limit for the view vector angle. E.g., if seeds were generated every 20 degrees (step 4.1), use an angle slightly larger than half of this value (parameter "--thetaphilimit 12"). Allow appropriate changes in the angle around the spike view vector.

Representative Results
We demonstrate the application of the sub-tomogram averaging workflow outlined above for the envelope glycoprotein complex of Bunyamwera virus (Orthobunyavirus, Bunyaviridae) using a previously published data set 24 . Data collection and refinement parameters are listed in Table 1. One representative tomogram is shown in Figure 2.

Data collection
Voltage (kV) 300  Table 1: Bunyamwera data collection and refinement statistics. a CTF, contrast transfer function. b Calculated using Fourier shell correlation between two independently refined structures at a threshold of 0.143.
The most critical step within this protocol is constructing two reliable starting models that are statistically independent from one another. Successful execution of this step assumes that glycoprotein spikes are sufficiently large and not packed against each other too tightly, so that individual spikes can be visually recognized and manually picked in the tomograms, and two independent models averaged. If this is not feasible, two modifications to the protocol can be attempted. First, two independent random models can be constructed by first defining two random subsets of subtomograms and then averaging the subtomograms within these subsets 30 . Second, if a structure of the isolated spike has been derived by other means, for example by X-ray crystallography, it can be used as a starting model. However, care must be taken to low-pass filter this model using a low-resolution cut-off (50-70 Å), as the two resulting models in the next round of refinement will be statistically independent only beyond this resolution. Due to this caveat, the former approach is recommended.
The obtainable resolution from this protocol depends on four major factors: i. data collection strategy and the quality of the input data, ii. number of the subtomograms, iii. alignment accuracy of the subtomograms, and iv. heterogeneity of the structures. While the first and second limitation can be largely overcome by using high signal-to-noise direct electron detectors combined with CTF corrected tomography and automated data collection, the alignment accuracy is further affected by the size and shape of the structure of interest itself. When applying this protocol on small spikes lacking prominent features, it may be advantageous to bind Fab fragments to the spike to improve the alignment accuracy and thus resolution 31 . Finally, if the structures to be averaged exhibit multiple conformations, sub-tomogram classification methods may be used to average different conformations separately. To that end, Jsubtomo integrates with the Dynamo package, offering powerful subtomogram classification 9 .
The above protocol is complementary to X-ray crystallography of isolated viral glycoproteins. Crystallographic structures can be fitted into subtomogram averages to obtain the precise orientation of the glycoprotein with respect to the virion membrane. Application of this methodology will undoubtedly continue to shed light onto enveloped virus structure and pathobiology.

Disclosures
The authors have nothing to disclose.