$$\rightleftharpoonup{xx}$$
$$\longleftharp{xx}$$,
$$\longrightharp{xx}$$,
The critical steps of SAXS data analysis outlined in the protocol section of this paper include buffer subtraction, Guinier analysis, Kratky analysis, data merging and P(r) distribution. The ab initio bead modeling is too extensive to be covered here in detail and is therefore only covered briefly.
At synchrotrons (e.g. DESY in Germany, DIAMOND in the UK and ESRF in France), it is possible to collect SAXS data for a very tiny fraction (~few µL) of each sample as the fractions are being eluted from the s column that is connected in-line (see Figure 1). The elastically scattered SAXS data is radially averaged using the packages provided by the instrument manufacturer or by the synchrotron before buffer subtraction can take place. The resulting 1D data represents the amount of scattered light (In I(q)) on the Y-axis and scattering angle (q=4πsinθ/λ, where λ is the wavelength of incident X-rays) and is outlined in Figure 1. The program PRIMUS/qt12 is used to directly subtract any background due to buffer and is described in section 1.1. Other programs such as; ScÅtter43 (download available at www.bioisis.net) with a tutorial available at https://www.youtube.com/channel/UCvFatdC5HcZOLv6OSjblfeA, and bioXtas RAW44 (available at https://bioxtas-raw.readthedocs.io/en/Latest/index.html) can be utilized as an alternative to the ATSAS package.
The Guinier analysis provides information on sample aggregation and homogeneity as well as providing the Radius of Gyration (Rg) for the macromolecule of interest based on the SAXS data from the low s region14. A plot is constructed with PRIMUS/qt for SAXS data obtained from each concentration, followed by curve fitting with the maximum range of up to 1.30 for q x Rg. A monodispersed sample preparation should provide a linear Guinier plot in this region (Figure 2D), whereas aggregation results in a nonlinear Guinier plot15,16. If the Guinier analysis is linear, the degree of “unfoldedness” of a macromolecule of interest can be observed with the Kratky plot, which is useful when deciding whether to perform rigid body modeling or construct ensembles of low-resolution models. A globular protein will appear in a Kratky plot to have a bell-shaped curve, whereas extended molecules or unfolded peptides will appear to plateau or even increase in the larger q range and lack the bell-shape (Figure 2C).
Obtaining the Rg from Guinier analysis only considers data points from the low q region of the 1D scatter plot (Figure 2D), however, it is possible to use almost the entire dataset to perform an indirect Fourier transformation to convert the reciprocal-space information of ln(I(q)) vs. (q) into a real space distance distribution function (P(r)) which provides information on Dmax and Rg (Figure 2B) The shape of the P(r) plot represents the gross solution conformation of the macromolecule of interest18,19. the conversion of reciprocal-space data to real-space data is a critical step but a detailed description is not within the scope of this paper. Therefore, refer to an article by Svergun20 to understand each parameter.
Once the buffer subtracted data at individual concentrations are processed through Guinier analysis with a consistent value for Rg, followed by investigating their folding pattern using Kratky analysis, these data can be merged. The merged data for nidogen-1, laminin γ-1, and their complex were processed as described above and the resulting P(r) plots are presented in Figure 2B. Ideally, one should also calculate the pair-distance distribution function P(r) for each concentration to determine if SAXS data collected for each concentration provides similar Rg and Dmax values. If the Rg and Dmax remain similar over a wide range of concentrations, then the user should proceed. It should be noted that depending on the signal, data can be truncated prior to data merging. This is often the case if the concentrations and/or molecular weight of the macromolecules under investigation is low.
Low-resolution shape analysis using DAMMIN can be performed in various modes (e.g. Fast, Slow, Expert modes, etc.). The Fast mode is an ideal first step to evaluate if the P(r) plot provides good quality models. Typically, at least 10 models should be obtained for each P(r) plot to check if reproducible results, in terms of the low-resolution structure, are obtained, with a low goodness of fit parameter called χ (a value of 0.5-1.0 is considered good based on our extensive work), a value that describes an agreement between experimentally collected SAXS data and model-derived data. For publication purpose, we typically use Slow or Expert mode and calculate at least 15 models. In addition to DAMMIN, a faster version of it, DAMMIF37, as well as GASBOR38 are also alternatives. Furthermore, to study protein-protein or protein-nucleic acid complexes, it is possible to use the MONSA program35, which facilitates simultaneous fitting of the individual SAXS data for both macromolecules as well as their complex. For more details on high-resolution model calculations as well for RNA-protein interaction studies, refer to a recent article by Patel et al3.
SAXS is theoretically simple but undoubtedly a highly complementary method to other structural biology tools and results in low-resolution structural data that can be used on its own or in conjunction with high-resolution techniques to elucidate information about macromolecular structure and dynamics. As long as a monodispersed preparation of macromolecules and their complexes can be obtained, SAXS can be utilized to study in-solution structure and interactions of any type of biological macromolecule. In the case of the complex discussed here, it is remarkable that less than 10% of the overall accessible surface area of nitrogen-1 and laminin γ-1 is buried in this complex, whereas the rest of the domains of both proteins are freely accessible to interact with other proteins at the extracellular matrix to maintain its structural rigidity (Figure 3). Obtaining such information for a complex with ~240kDa would be very challenging using other structural biology techniques such as X-Ray Crystallography, NMR, and Cryo-EM Microscopy.
Uncovering protein structure via X-Ray Crystallography or NMR is an inherently time-consuming process. This bottleneck in structure determination is one area where SAXS shows its strength as a structural technique; data acquisition for a single SAXS experiment can take less than an hour and with the help of streamlined analysis software, analysis can be done quickly and efficiently. SAXS has the potential to greatly increase throughput of structural studies as a stand-alone technique because it offers a low-resolution model of the macromolecular structure before high-resolution data is available. A barrier to other structural techniques is the requirement for a highly pure, concentrated sample for data acquisition, which necessitates a high level of protein expression and stability over a long period of time. While SAXS samples also need to be pure and concentrated, the sample volumes are roughly 100 µL making SAXS a relatively inexpensive method of analysis compared to other structural techniques. Moreover, SAXS coupled with size exclusion chromatography is becoming increasingly common which provides an additional quality control step. Recently there has been strong advances in the combination of NMR and SAXS data using the Ensemble Optimization Method (EOM)45,46 to elucidate flexible systems. In a recent paper by Mertens and Svergun47, the authors describe multiple recent examples of EOM SAXS in combination with NMR, along with many other examples of SAXS data being used in conjunction with NMR. Advances are continually being made in the field of SAXS, and new techniques are being developed for SAXS to be used in conjunction with, not just complimentary to, other structural techniques. Consequently, we believe that the demand for SAXS will only increase over time, especially in conjunction with NMR to characterize dynamic systems where functions are defined by flexibility.