Here, we present how Small Angle X-Ray Scattering (SAXS) can be utilized to obtain information on low-resolution envelopes representing the macromolecular structures. When used in conjunction with high-resolution structural techniques such as X-Ray Crystallography and Nuclear Magnetic Resonance, SAXS can provide detailed insights into multidomain proteins and macromolecular complexes in-solution.
Protein-protein interactions involving proteins with multiple globular domains present technical challenges for determining how such complexes form and how the domains are oriented/positioned. Here, a protocol with the potential for elucidating which specific domains mediate interactions in multicomponent system through ab initio modeling is described. A method for calculating solution structures of macromolecules and their assemblies is provided that involves integrating data from small angle X-ray scattering (SAXS), chromatography, and atomic resolution structures together in a hybrid approach. A specific example is that of the complex of full-length nidogen-1, which assembles extracellular matrix proteins and forms an extended, curved nanostructure. One of its globular domains attaches to laminin γ-1, which structures the basement membrane. This provides a basis for determining accurate structures of flexible multidomain protein complexes and is enabled by synchrotron sources coupled with automation robotics and size exclusion chromatography systems. This combination allows rapid analysis in which multiple oligomeric states are separated just prior to SAXS data collection. The analysis yields information on the radius of gyration, particle dimension, molecular shape and interdomain pairing. The protocol for generating 3D models of complexes by fitting high-resolution structures of the component proteins is also given.
Cells contain intricate networks of proteins that act as molecular machines to carry out cellular functions such as signaling cascades and maintaining structural integrity. The ways in which these different components move and interact in three-dimensional space gives rise to the specific functions of the macromolecules. The importance of protein structure, dynamics, and interactions in determining function has provided the need for continually evolving, complex techniques to measure these properties. Of these, Nuclear Magnetic Resonance (NMR), X-Ray Crystallography (XRC) and more recently, Cryo-Electron Microscopy (CEM) provide high-resolution structural information. However, XRC and CEM yield structures of one of many biomolecular states and lack information about the dynamics of the protein structure, while 3D structure determination by NMR is typically limited to smaller globular proteins. One way to overcome these limitations is to utilize Small Angle X-Ray Scattering (SAXS) to generate molecular envelopes of large, multidomain, or complexed systems, and combine the high-resolution rigid macromolecular structures to elucidate the global architecture and dynamic features.
SAXS produces low-resolution envelopes of macromolecular complexes with a resolution of approximately 10-20 Å 1, giving insight not only into the structure but also the dynamic characteristics that the complex displays. Although SAXS utilizes X-rays to uncover molecular structure, it is unlike XRC in that the random isotropic orientation of the particles in solution does not lead to diffraction, but rather to scattering, which cannot yield atomic resolution. Instead, an electron “envelope” of the macromolecule is generated that represents an average of the conformations that the macromolecule displays. This information can be used in direct fitting of previously solved atomic resolution structures to infer regions of flexibility in a single protein or subunit organization, or dynamics in a larger, multi-protein complex. SAXS data is collected at synchrotrons using high-energy monochromatic X-rays or from in-house sources, which offer a weaker X-rays source requiring hours rather than seconds of sample exposure time (Figure 1). SAXS data is often collected from several samples with a single experimental setup and buffer, requiring an extended time to collect a round of useful data on a system. Samples should, therefore, be stable and non-aggregating for at least a few hours based on verifiable quality control methods such as dynamic light scattering (DLS) and/or analytical ultracentrifuge (AUC) analysis to obtain high-quality SAXS data2,3. Here we provide a practical description of SAXS, the principles behind its usage, benefits, limitations and sample preparation and focus heavily on data collection and analysis, along with touching briefly on ab initio modeling using the extracellular matrix proteins nidogen-1 and laminin γ-1 as an experimental example.
Principles, Benefits, and Limitations of SAXS:
The guiding principle(s) behind SAXS is relatively simple: a solution of the monodispersed preparation of macromolecule(s) of interest is placed within a capillary and is exposed to a high energy monochromatic X-ray beam. The photons cause electrons of the atomic shell to begin oscillating, resulting in a spherical wave being emitted of the same energy and wavelength. Since every electron will oscillate, a constant background will be achieved, and the resulting electron density of the macromolecule is contrasted to the background. The resulting scattering intensity is collected as a function of the scattering angle, 2Θ (Figure 1).
While other techniques such as XRC, NMR, and CEM provide structural information at the atomic level, there are multiple benefits to SAXS that other techniques cannot provide. SAXS can be performed in almost any buffer and does not require any special sample preparation. This is particularly important in studying the behavior and structure of macromolecules under varying conditions, such as the presence or absence of mono- or divalent cations or changes in pH4,5. SAXS has the ability to provide information about flexible regions of a macromolecule6, something the other listed techniques can struggle with. Therefore, SAXS can be used as a strong complimentary technique with the stable portions of a macromolecule being studied with XRC, NRM or CEM, and the entire macromolecule or complex analyzed in low resolution with SAXS and combined using various analysis tools such as FoXSDock7 or CRYSOL8. Since SAXS is a solution technique, it is often used to confirm if static structures such as those obtained from XRC are consistent in solution6. SAXS also has the advantage of being a technique that requires a relatively small amount of sample investment (typically 50-100 µL) and a relatively small amount of experiment time (30 min-1 h).
The largest limitation of SAXS is the vulnerability to sample aggregation and/or degradation, which can lead to incorrect structural predictions. An aggregation, even as low as 5%, can scatter light in very high amounts, leading to an overestimation of the maximal particle dimension (Dmax) and radius of gyration (Rg). On the other hand, sample degradation can lead to an underestimate of molecular properties. This vulnerability arises from SAXS being an averaging technique, which means that sample homogeneity is critical to achieving reliable and reproducible results. Any sample that is to be analyzed by SAXS should, therefore, undergo multiple methods of purification and homogeneity checks, such as denaturing and native gel electrophoresis, size exclusion chromatography, dynamic light scattering, and analytical ultracentrifugation. Often, SAXS beamlines will run samples through high-performance liquid chromatography as a final quality control step before SAXS (S-SAXS)3,9. SAXS data should be collected at multiple concentrations and the Rg of each data set should be compared, ensuring a close similarity to avoid interparticle interactions and aggregation, which results in an overestimation of particle dimensions, leading to inaccurate data analysis and modeling. Since scattering depends on both concentration and size, smaller macromolecules may require a more specific optimization of the concentration range. This is due to the Reciprocity Theorem, where large sizes scatter towards small angles and small sizes towards large angles. This manifests in data collection, where IO is proportional to R6, where R is the particle radius. A final limitation of SAXS is the potential for radiation damage to the sample during exposure, which can lead to distortion of the data. It is good practice to compare sample quality before and after SAXS sample exposure to ensure this is not occurring.
1. SAXS Sample Preparation and Data Acquisition
2. Data Analysis
NOTE: Currently, there are a few software packages that are useful for SAXS data analysis: ScÅtter43 (download available at www.bioisis.net), bioXtas RAW44, and the ATSAS suite13. This section provides an overview of general steps to be taken when analyzing raw SAXS data using the ATSAS program suite and specific steps are taken from the ATSAS 2.8.1 download. Other programs can be used and are briefly discussed later.
3. Ab initio bead modeling and averaging
The data analysis approach described above was utilized to calculate the Rg and Dmax for nidogen-1, laminin γ-1, and their complex using the P(r) function. We obtained Rg values of 7.20 (±0.10) nm, 8.10 (±0.20) nm, and 10.9 (±0.4) nm for nidogen-1, laminin γ-1 and their complex respectively (Figure 2A-B). In addition, Dmax values of 24 nm, 26 nm, and 35 nm for nidogen-1, laminin γ-1, and their complex respectively (Figure 2)10 were obtained. The DAMMIF program was used to obtain low-resolution structures of nidogen-1 and laminin γ-1, which suggested that both proteins adopt an extended shape in solution. The X and NSD values for nidogen-1 (~1 and 0.8) and laminin γ-1 (~0.9 and 0.8 respectively) were also in the acceptable range. The alignment of high-resolution structures, two domains of nidogen-1 and two of laminin γ-1, on their low-resolution structures obtained using SAXS allowed identification of their N- and C-terminal regions10.
Nidogen-1 was identified as an interacting partner of laminin γ-139,40 and the interaction site was mapped using X-ray crystallography to the C-terminal domains41. However, high-resolution structures only involved interacting domains and not the full-length nidogen-1 or the entire laminin γ-1 arm. Therefore, we purified a complex containing nidogen-1 (full length) and the laminin γ-1 arm to identify the interacting regions as well as to study the relative orientation of the N-terminal domains of both proteins. The SAXS data for the complex yielded an Rg of 10.9 (±0.4) nm and a Dmax of 35 nm. We utilized MONSA to obtain the low-resolution structure of the entire complex, which suggested that indeed, only the C-terminal region of both proteins participate in mediating interactions, whereas the rest of the domains are far apart from each other (Figure 3, Video 1).
Figure 1. Schematics of SAXS set-up. A monodispersed preparation of biomolecules or their complexes is prepared, followed by exposure with high energy X-rays. Depending on the source (e.g., in-house vs. synchrotron), the energy of X-rays and the sample to source distance can vary. The X-rays’ scattering pattern (that depends on the size and shape of biomolecules) is recorded and radially averaged to obtain a 1-dimensional plot (1D) that contains information on the intensity of scattered light with respect to the scattering angle. As buffer molecules also scatter light, the contributions from these molecules are subtracted to obtain a scattering pattern of the biomolecules of interest. At the synchrotron, prior to the SAXS data collection, an additional purification step using in-line size exclusion/high-performance chromatography is also typically performed (top view). This step is critical to remove any aggregated and/or degraded product as well as to remove any unbound biomolecules from the complex. The 1D scattering plot is converted to the electron pair-distance distribution plot (P(r) plot), which provides the radius of gyration and maximum particle dimension of biomolecules. This plot is used as the input file for the ab initio modeling packages (i.e., DAMMIN/DAMMIF) to obtain low-resolution structures of biomolecules, or other packages (i.e., SASREF/CORAL) if the high-resolution structure of parts of the biomolecules or individual biomolecules of the complex is known. Please click here to view a larger version of this figure.
Figure 2. (A) A plot of an intensity of scattered light vs. scattering angle (q=4πsinθ/λ, nm-1) suggesting the quality of biomolecules (low region) and shape (high region) of biomolecules. (B) The electron pair-distance distribution P(r) determined from the scattering data suggest an elongated shape of biomolecules under investigation (laminin γ-1, nidogen-1, and their complex). (C) Kratky plot suggesting that nidogen-1 and laminin γ-1 proteins are not unfolded. (D) Guinier plot for nidogen-1, laminin γ-1 and their complex, indicating the linear region for determination of the radius of gyration using data at low-scattering angle. Please click here to view a larger version of this figure.
Figure 3. Low-resolution structure of the complex of nidogen-1, and laminin γ-1 obtained by analysis of merged data sets using the program MONSA. The color scheme is the same as Figure 2. Please click here to view a larger version of this figure.
Video 1. The low-resolution structure of the nidogen-1 and laminin γ-1 complex. This movie was prepared using PYMOL to visualize various structural features of the complex. The crystal structure of the laminin-nidogen complex (PDB ID: 1NPE) is shown as ribbon cartoons, highlighting the interacting sites for this complex. The color scheme is the same as Figure 2. Please click here to view this video. (Right-click to download.)
The critical steps of SAXS data analysis outlined in the protocol section of this paper include buffer subtraction, Guinier analysis, Kratky analysis, data merging and P(r) distribution. The ab initio bead modeling is too extensive to be covered here in detail and is therefore only covered briefly.
At synchrotrons (e.g. DESY in Germany, DIAMOND in the UK and ESRF in France), it is possible to collect SAXS data for a very tiny fraction (~few µL) of each sample as the fractions are being eluted from the s column that is connected in-line (see Figure 1). The elastically scattered SAXS data is radially averaged using the packages provided by the instrument manufacturer or by the synchrotron before buffer subtraction can take place. The resulting 1D data represents the amount of scattered light (In I(q)) on the Y-axis and scattering angle (q=4πsinθ/λ, where λ is the wavelength of incident X-rays) and is outlined in Figure 1. The program PRIMUS/qt12 is used to directly subtract any background due to buffer and is described in section 1.1. Other programs such as; ScÅtter43 (download available at www.bioisis.net) with a tutorial available at https://www.youtube.com/channel/UCvFatdC5HcZOLv6OSjblfeA, and bioXtas RAW44 (available at https://bioxtas-raw.readthedocs.io/en/Latest/index.html) can be utilized as an alternative to the ATSAS package.
The Guinier analysis provides information on sample aggregation and homogeneity as well as providing the Radius of Gyration (Rg) for the macromolecule of interest based on the SAXS data from the low s region14. A plot is constructed with PRIMUS/qt for SAXS data obtained from each concentration, followed by curve fitting with the maximum range of up to 1.30 for q x Rg. A monodispersed sample preparation should provide a linear Guinier plot in this region (Figure 2D), whereas aggregation results in a nonlinear Guinier plot15,16. If the Guinier analysis is linear, the degree of “unfoldedness” of a macromolecule of interest can be observed with the Kratky plot, which is useful when deciding whether to perform rigid body modeling or construct ensembles of low-resolution models. A globular protein will appear in a Kratky plot to have a bell-shaped curve, whereas extended molecules or unfolded peptides will appear to plateau or even increase in the larger q range and lack the bell-shape (Figure 2C).
Obtaining the Rg from Guinier analysis only considers data points from the low q region of the 1D scatter plot (Figure 2D), however, it is possible to use almost the entire dataset to perform an indirect Fourier transformation to convert the reciprocal-space information of ln(I(q)) vs. (q) into a real space distance distribution function (P(r)) which provides information on Dmax and Rg (Figure 2B) The shape of the P(r) plot represents the gross solution conformation of the macromolecule of interest18,19. the conversion of reciprocal-space data to real-space data is a critical step but a detailed description is not within the scope of this paper. Therefore, refer to an article by Svergun20 to understand each parameter.
Once the buffer subtracted data at individual concentrations are processed through Guinier analysis with a consistent value for Rg, followed by investigating their folding pattern using Kratky analysis, these data can be merged. The merged data for nidogen-1, laminin γ-1, and their complex were processed as described above and the resulting P(r) plots are presented in Figure 2B. Ideally, one should also calculate the pair-distance distribution function P(r) for each concentration to determine if SAXS data collected for each concentration provides similar Rg and Dmax values. If the Rg and Dmax remain similar over a wide range of concentrations, then the user should proceed. It should be noted that depending on the signal, data can be truncated prior to data merging. This is often the case if the concentrations and/or molecular weight of the macromolecules under investigation is low.
Low-resolution shape analysis using DAMMIN can be performed in various modes (e.g. Fast, Slow, Expert modes, etc.). The Fast mode is an ideal first step to evaluate if the P(r) plot provides good quality models. Typically, at least 10 models should be obtained for each P(r) plot to check if reproducible results, in terms of the low-resolution structure, are obtained, with a low goodness of fit parameter called χ (a value of 0.5-1.0 is considered good based on our extensive work), a value that describes an agreement between experimentally collected SAXS data and model-derived data. For publication purpose, we typically use Slow or Expert mode and calculate at least 15 models. In addition to DAMMIN, a faster version of it, DAMMIF37, as well as GASBOR38 are also alternatives. Furthermore, to study protein-protein or protein-nucleic acid complexes, it is possible to use the MONSA program35, which facilitates simultaneous fitting of the individual SAXS data for both macromolecules as well as their complex. For more details on high-resolution model calculations as well for RNA-protein interaction studies, refer to a recent article by Patel et al3.
SAXS is theoretically simple but undoubtedly a highly complementary method to other structural biology tools and results in low-resolution structural data that can be used on its own or in conjunction with high-resolution techniques to elucidate information about macromolecular structure and dynamics. As long as a monodispersed preparation of macromolecules and their complexes can be obtained, SAXS can be utilized to study in-solution structure and interactions of any type of biological macromolecule. In the case of the complex discussed here, it is remarkable that less than 10% of the overall accessible surface area of nitrogen-1 and laminin γ-1 is buried in this complex, whereas the rest of the domains of both proteins are freely accessible to interact with other proteins at the extracellular matrix to maintain its structural rigidity (Figure 3). Obtaining such information for a complex with ~240kDa would be very challenging using other structural biology techniques such as X-Ray Crystallography, NMR, and Cryo-EM Microscopy.
Uncovering protein structure via X-Ray Crystallography or NMR is an inherently time-consuming process. This bottleneck in structure determination is one area where SAXS shows its strength as a structural technique; data acquisition for a single SAXS experiment can take less than an hour and with the help of streamlined analysis software, analysis can be done quickly and efficiently. SAXS has the potential to greatly increase throughput of structural studies as a stand-alone technique because it offers a low-resolution model of the macromolecular structure before high-resolution data is available. A barrier to other structural techniques is the requirement for a highly pure, concentrated sample for data acquisition, which necessitates a high level of protein expression and stability over a long period of time. While SAXS samples also need to be pure and concentrated, the sample volumes are roughly 100 µL making SAXS a relatively inexpensive method of analysis compared to other structural techniques. Moreover, SAXS coupled with size exclusion chromatography is becoming increasingly common which provides an additional quality control step. Recently there has been strong advances in the combination of NMR and SAXS data using the Ensemble Optimization Method (EOM)45,46 to elucidate flexible systems. In a recent paper by Mertens and Svergun47, the authors describe multiple recent examples of EOM SAXS in combination with NMR, along with many other examples of SAXS data being used in conjunction with NMR. Advances are continually being made in the field of SAXS, and new techniques are being developed for SAXS to be used in conjunction with, not just complimentary to, other structural techniques. Consequently, we believe that the demand for SAXS will only increase over time, especially in conjunction with NMR to characterize dynamic systems where functions are defined by flexibility.
The authors have nothing to disclose.
This project has been supported by NSERC RGPIN-2018-04994, Campus Alberta Innovation Program (RCP-12-002C) and Alberta Prion Research Institute / Alberta Innovates Bio Solutions (201600018) grants awarded to M.O. TRP is a Canada Research Chair in RNA & Protein Biophysics (201704) and acknowledges NSERC Discovery grant (RGPIN-2017-04003). TM is funded by the NSERC Discovery grant to TRP.
HEK 293 EBNA Cell Line | In-Lab availability | – | Cell line used to overexpress protein(s) |
Ni Sepharose High Performance histidine-tagged protein purification resin | GE Healthcare | 17524801 | Affinity protein purification resin |
Superdex 200 Increase 10/300 | GE Healthcare | 28990944 | SEC Column |
ÄKTA Pure FPLC | GE Healthcare | – | FPLC System |
Nanodrop | Nanodrop | – | Spectrophotometer |
Basic Reagents (NaCl, Tris-HCl etc.) | |||
S-MAX3000 | Rigaku | – | SAXS Pinhole Camera System |
Zetasizer Nano-S | Malvern Instruments Ltd | – | Dynamic Light Scattering instrument |
0.1µm Filter | Millipore | JVWP04700 | Used to Concentrate Sample Prior to DLS |
Thrombin cleavage kit | abcam | ab207000 | Thrombin cleavage to remove His tag |
Strep-Tactin Sepharose Column | IBA | 2-1201-010 | Strep-Tag Affinity Purification |
D-desthiobiotin | Sigma-Aldrich | 533-48-2 | Elution of Strep Tag Protein |
Software | |||
SAXGUI | Rigaku | – | Data Collection for SAXS and data reduction |
ATSAS Suit | Franke et al., 2017 | – | SAXS Data Analysis Software program suite |
PRIMUS | Konarev et al., 2003 | – | Buffer Subtraction |
GNOM | Svergun, 1992 | – | Rg, Dmax and p(r) Calculation |
DAMMIF | Franke and Svergun, 2009 | – | Ab initio model calculation |
DAMAVER | Volkov and Svergun, 2003 | – | Averaged Solution Conformation calculations |
MONSA | Svergun, 1999 | – | Simultaneous model fitting for the complex |
GASBOR | Svergun et al., 2001 | – | Alternative Ab initio model calculations |
DTS Software V6.20 | Malvern Instruments Ltd | – | DLS supplied instrument software |
PyMOL | Schrodinger, LLC. | – | The PyMOL Molecular Graphics System V2.0 |
The Protein Data Bank | Berman et al., 2000 | – | PDB ID: 1NPE |