We describe a framework incorporating straightforward biochemical and computational analysis to guide the characterization and crystallization of large coiled-coil domains. This framework can be adapted for globular proteins or extended to incorporate a variety of high-throughput techniques.
Obtaining crystals for structure determination can be a difficult and time consuming proposition for any protein. Coiled-coil proteins and domains are found throughout nature, however, because of their physical properties and tendency to aggregate, they are traditionally viewed as being especially difficult to crystallize. Here, we utilize a variety of quick and simple techniques designed to identify a series of possible domain boundaries for a given coiled-coil protein, and then quickly characterize the behavior of these proteins in solution. With the addition of a strongly fluorescent tag (mRuby2), protein characterization is simple and straightforward. The target protein can be readily visualized under normal lighting and can be quantified with the use of an appropriate imager. The goal is to quickly identify candidates that can be removed from the crystallization pipeline because they are unlikely to succeed, affording more time for the best candidates and fewer funds expended on proteins that do not produce crystals. This process can be iterated to incorporate information gained from initial screening efforts, can be adapted for high-throughput expression and purification procedures, and is augmented by robotic screening for crystallization.
Structure determination via X-ray crystallography has made fundamental contributions to every field of modern biology; providing an atomic view of the macromolecules that support life and how they interact with one another in a variety of contexts; allowing us to understand the mechanisms that cause disease and providing opportunities to rationally design drugs to treat disease. Crystallography has long been the dominant experimental technique for determining macromolecular structure, and currently accounts for 89.3% of the structural database (www.rcsb.org). This technique has many advantages, including the potential for very high resolution, the ability to visualize macromolecules with a broad range of sizes, relatively easy data collection, and the opportunity to visualize how the macromolecule interacts with solvent as well as ligands.
Despite numerous technological improvements in recombinant protein expression1,2, purification3, and molecular biology used to generate these systems4, the single biggest obstacle in the crystallographic process remains the ability to grow diffraction quality crystals. This has been especially true for proteins which contain large coiled-coil domains. It has been estimated that as much as 5% of all amino acids are found within coiled-coils5,6, making this a very common structural feature7, yet these proteins are often more difficult to purify and crystallize than globular proteins8-10. This is further compounded by the fact that coiled-coil domains are often found within the context of a larger protein, therefore correctly predicting the boundaries of these domains is critical to avoid the inclusion of unstructured or flexible sequence that is often detrimental for crystallization.
Here we present a conceptual framework combining web-based computational analyses with experimental data from the bench, to help guide users through the initial stages of the crystallographic process including: how to select protein fragment(s) for structural studies, and how to prepare and characterize protein samples prior to crystallization attempts. We focus our analysis on two proteins containing large coiled-coil domains, Shroom (Shrm) and Rho-kinase (Rock). These proteins were chosen as they both contain coiled-coil domains and are known to form a biologically relevant complex11-16. Shroom and Rho-kinase (Rock) are predicted to contain ~200 and 680 residues of coiled-coil respectively, many portions of which have been characterized structurally17-20. The method described here provides a streamlined workflow to quickly identify fragments of coiled-coil containing protein that will be amenable for crystallization, however, the techniques described can easily be adapted for most protein or protein-complexes or modified to incorporate high-throughput approaches as available. Lastly, these methods are generally inexpensive and can be performed by users at nearly all experience levels.
NOTE: A diagram of the conceptual framework or workflow is described in Figure 1 for reference. The protocol can be broken down into four stages: computational or sequence based predictions, protein expression and purification, biochemical characterization, and crystallization. The examples shown analyze Shroom SD2 domains and/or Shroom-Rock complexes, but can be utilized with any protein.
1. Use Established Web-based Tools to Generate Computational Predictions of Coiled-coil Domain Boundaries
2. Express and Purify Proteins with the Domain Boundaries Identified in Section 1
NOTE: The goal of this section is to use a series of quick and easily quantifiable assays to screen hypothetical domain boundaries generated in Section 1.
3. Characterize Protein Sample to Identify Those with Advantageous Properties
4. Producing High Quality Crystals of the Coiled-coil SD2 Domain from Shroom
NOTE: All steps within section 4.1 are performed at room temperature unless the protein would benefit from purification at a different temperature, usually 4 °C.
A diagram depicting the workflow utilized in this system is shown in Figure 1 and includes three main stages. Computational analysis of the sequence is utilized to develop hypotheses about the domain boundaries of the coiled-coil protein of interest. An example of an annotated analysis of the Shrm2 SD2 domain is shown in Figure 2. In this diagram, the goal was to identify possible domain boundaries for a conserved domain at the C-terminus of the cytoskeletal regulator Shroom called SD2. From this analysis was three distinct sets of hypothetical domain boundaries were generated containing coiled-coil fragments that spanning the entire conserved SD2 or minimal fragments near the C-terminus which ended up as the best candidates for crystallization. Candidates identified from the sequence analysis are then quickly tested in small scale using a protein fusion with mRuby2 (Figure 3) for efficient analysis. Representative small scale (50 ml of culture) purifications from two His10-mRuby2-tagged proteins are shown in Figure 4. In this figure, the behavior of an insoluble protein is readily apparent as compared to its soluble counterpart. Poorly expressing or insoluble protein fusions are easily and quickly identified in this manner. Biochemical analyses of protein fragments by limited proteolysis are shown in Figure 5 for a variety of fragments using wither the Ruby system described or with isolated and purified Shroom SD2 domains. In Figure 5A, two variants of mouse Shrm SD2 corresponding to hypothetical domain boundaries #2 and #1 in Figure 3 were digested using a gradient of protease concentration from (0-1.0%) for 30 min at room temperature. Both of these were effectively degraded into a host of smaller products at moderate enzyme concentrations. Figure 5B shows the same experiment performed using hypothetical boundary #3 from Drosophila Shrm SD2. This protein did crystallize and its structure has been described19. Proteolytic analysis can also be useful when examining Ruby-fusions as generated from the protocol above. As shown in Figure 5C, a Ruby fusion protein with human Shrm2 SD2 (1427-1610) was digested with a high concentration (0.025%) of the non-specific protease Subtilisin at room temperature for the indicated time points. Here the linker between Ruby and Shrm was immediately cleaved as would be expected. Additionally, minor products were formed indicating this protein has two other protease sensitive regions, however, the products of degradation which were produced within 2 min remained largely intact throughout the rest of the experiment indicating the protein is actually quite stable.
A complementary approach is shown using native PAGE analysis in Figure 6. In Figure 6A, Ruby-tagged human Shrm SD2 (1427-1610) as well as protein containing the indicated point mutations within Shrm are analyzed by native PAGE. In this experiment, the wild-type protein (which forms crystals) runs as two distinct bands, while the three mutants which do not crystallize have dramatically different behavior. To demonstrate that the system can also be useful on protein complexes, a variety of Shroom-Rock complexes were analyzed by native PAGE in Figure 6B. In this case, the same fragment of human Shrm2 SD2 (1427-1610) was used to help clarify the analysis, while different fragments of human Rock were utilized. This approach suggested that complexes formed using Rock 700-906 and 746-906 had multiple species, were smeary, and less homogenous. Complexes utilizing 788-906 were improved, albeit not dramatically, and this species was able to crystallize, although crystals took many weeks to form and contained degraded Rock protein. Complexes generated using Rock 834-913 formed a single and more uniform species on the gel and readily crystallized overnight. Figure 7 shows a set of common crystallization conditions that are used to inform on the behavior of the protein sample in crystallization trials, and could be used generally with any protein. Ideally, a mix of clear and precipitating conditions will be obtained. Proteins that do not form precipitates require high concentrations or more stringent buffering conditions while those that precipitate in many conditions should be used at a lower protein concentration or with buffering conditions that promote protein solubility.
Figure 1: Workflow Diagram. A generalized diagram depicting the integration of computational sequence analysis and biochemical and other wet lab techniques into a comprehensive strategy for delineating domain boundaries and identifying protein fragments for crystallization. Please click here to view a larger version of this figure.
Figure 2: Annotated sequence analysis for the coiled-coil SD2 domain from Shroom. An overlay of computational analyses for the Shroom SD2 domain, including a multiple sequence alignment generated by CLUSTAL omega and colored by sequence identity within Jalview (Step 1.1). Also included are predicted secondary structure, disordered sequence predictions, and coiled-coil predictions of the Shroom SD2 domain (Step 1.2). Hypothetical domain boundaries (Step 1.3) are indicated as are the observed boundaries and secondary structure elements as revealed by subsequent crystallographic analysis19. Please click here to view a larger version of this figure.
Figure 3: Diagram of the His10-mRuby2-expression system. (A) Schematic of the expression vector His10-mRuby2-XH2 vector used in this study. (B) Diagram of the multiple cloning site from this vector. Protein coding sequences are typically inserted into this vector via BamHI and EcoRI cloning sites. The extreme C-terminus of the mRuby2 protein is shown in red, the TEV protease cleavage site is indicated with brackets and with the site of cleavage shown as a red triangle. A linker sequence between the mRuby2 and the TEV site is shown in cyan. Please click here to view a larger version of this figure.
Figure 4: Representative small scale purifications of two mRuby2 tagged fusion proteins. (A) An image of samples of bacterial culture expression Ruby Shrm SD2 or an unrelated Ruby fusion protein known to be insoluble are pictured. A separate culture expressing a protein without a Ruby-tag is shown for comparison. (B) The soluble fraction from the cultures above were imaged following lysis and centrifugation at 30,000 x g for 30 min, demonstrating the visualization of soluble Ruby-Shrm SD2. (C) Image of the Ni-NTA beads after binding and subsequent washing steps indicating that Ruby-Shrm SD2 is being immobilized onto the beads. (D) Samples of washing and elution steps as described in steps 2.3.6 and 2.3.7 demonstrate that the Ruby-Shrm SD2 fusion remains bound to the resin while in the presence of up to 80 mM imidazole and is effectively eluted off the column with 1 M imidazole elution buffer. Please click here to view a larger version of this figure.
Figure 5: Limited proteolysis of candidate SD2 domains from different Shroom proteins. A comparison of four SD2 domains from various Shroom proteins. (A and B) The indicated Shrm SD2 fragments were incubated with a concentration gradient of the protease trypsin from 0 to 1.0% and the results analyzed by SDS-PAGE. The relationship of that protein fragment with the hypothetical boundaries are indicated. (C) SDS-PAGE analysis of limited proteolytic digestion of His10-mRuby2-Shroom SD2 fusion protein using the time course method described in the protocol. The reaction occurred with 0.025% trypsin at room temperature. Digestion of the linker between Ruby and Shroom SD2 is rapid and serves as an internal control. A presumed degradation product that co-purifies with the His10-mRuby2-Shroom SD2 fusion protein is indicated (*). Please click here to view a larger version of this figure.
Figure 6: Observing changes in protein behavior using native gel. (A) 10% Native PAGE demonstrating the effect of a mutant that changes the properties of mRuby2-Shroom SD2. Under these conditions Shrm SD2 runs as two discrete bands, while indicated point mutants within the SD2 display a range of aberrant migration patterns. (B) A comparison of Shroom SD2-Rock complexes formed using different versions of Rock and analyzed by native PAGE. Complexes between Shroom SD2 and the indicated fragments of Rock kinase (both coiled-coil proteins) were resolved by native PAGE. Crystals were obtained for the complex containing Rock 788-906 after many weeks but complex containing Rock 834-913 crystallized rapidly. Please click here to view a larger version of this figure.
Figure 7: Quick primary screen for crystallization. (A) A quick screen of common crystallization conditions is used to assess the behavior of the protein sample in crystallization trials. (B) Examples of the simple scoring matrix to assess crystallization drops. Please click here to view a larger version of this figure.
The protocol described here is designed to help the user identify domain boundaries within large coiled-coil proteins to facilitate their crystallization. The protocol relies on a holistic incorporation of a variety of data from computational predictions and other sources to generate a series of potential domain boundaries. These are followed by a set of biochemical experiments which are quick and inexpensive, and are used to further refine these initial hypotheses. Using this approach, the user could quickly eliminate potential protein fragments that are undesirable, and focus more attention on better candidates, thereby improving the prospects of obtaining crystals.
There are many important steps within this technique, however, none as critical for the production of crystals as the development of the initial set of hypothetical domain boundaries. This step incorporates a variety of computational approaches, as well as information obtained from the literature or functional data when available. Care should be taken to avoid using the strategy to hone immediately down to the single "best" solution as there is currently no method in place to attach a quantifiable confidence value to any of the predictions. Instead, it is best used as a method to quickly identify a small set of possible domain boundaries which need to be experimentally verified.
The properties of coiled-coil proteins which can be analyzed with this protocol are quite broad. From a computational perspective, the strength of coiled-coil predictions is limited below 20 amino acids, and as mentioned in step 1.2.1, coiled-coil regions larger than 1,000 amino acids would need to be split into multiple sections for analysis by DISOPRED. We view this later limitation as temporary as it is imposed by the webserver and may change as the method is upgraded in the future. The subsequent biochemical analysis however, can suffer in various ways. First, internal loops or regions hypersensitive to proteases may make the sample appear to be less stable than it actually is. Stable proteins that have cleaved loops or are otherwise nicked by the protease should still give a stable appearance on native PAGE, which is why it is recommended that the user explore both strategies. Native PAGE may be difficult for some proteins, however, either because they are very long and migrate slowly into gel or because their particular charge may make them run the opposite direction out of the gel entirely. In this cases, it may be helpful to explore different buffering conditions for the native PAGE gel system.
While the focus of the work presented here has been on large coiled-coil containing proteins, this protocol can be utilized for almost any protein with very few adaptations. The primary addition for globular proteins would be the addition of domain prediction software such as pDomTHREADER28 or DomPred29 to look for domain boundaries, and tertiary structure predictions such as Phyre230 to enhance the predictive power of the sequence analysis. Additionally, there are many algorithms available for performing secondary structure predictions and disordered predictions, and the inclusion of additional algorithms could be helpful. Globular proteins often possess enzymatic activity or other functional readouts that would provide important additional information during target selection. Further, moderate or high-throughput techniques can be incorporated as appropriate. For example, inexpensive systems for 1-2 ml cultures and metal affinity purifications in 96-well formats are readily available31. Lastly, the use of robotics for setting up large numbers of crystallization experiments using minimal sample is becoming routine, but efficient operation of robotic systems is predicated on the behavior of well characterized protein samples. Additional modifications to this protocol could include sampling a variety of affinity tags. Many different fluorescent tags are available, or His-, GST-, Thioredoxin-, Strep-, and MBP- are among dozens of available options if a fluorescent tag is not needed or helpful.
When performing this procedure, the user should be mindful of the effect that the mRuby2 tag might have on the users' protein of interest. Anecdotal evidence from using this tag on a variety of different fusions supports the fact that mRuby2 will sometimes bind quite tightly to the protein of interest, remaining bound through multiple chromatographic steps. This behavior is obviously undesirable, but unintended Ruby complexes can usually be separated and removed chromatographically. It is unclear whether this is a chaperone-like behavior as has been observed for MBP32.
After obtaining crystals, there are still many challenges that are commonly faced with coiled-coil proteins that are not addressed in this protocol. The most common is poor diffraction quality, either due to imperfectly formed lattices or anisotropic diffraction. There are tools in place to assist with anisotropic diffraction33, and these have been critical for solving some coiled-coil structures18,20. Many crystal pathologies are unfortunately difficult to overcome quickly, therefore it is often prudent to search for additional crystal forms with enhanced diffraction properties. This is facilitated by robotic screening of thousands of conditions. Alternatively there are many post-crystallization techniques to improve diffraction quality such as dehydration, or seeding34.
The authors have nothing to disclose.
This work was supported by grant NIH R01 GM097204 (APV and JDH). Funding for JHM was supplied by an HHMI Undergraduate Research Summer Fellowship.
BL21(DE3) Rosetta | Emd Millipore | 70954-3 | |
BL21(DE3) Star | ThermoFisher Scientific | C601003 | |
BL21(DE3) Codon Plus | Agilent Technologies | 230245 | |
Lysozyme | Spectrum Chemical Mfg Corp | L3008-5GM | |
Ni-NTA resin | Life Technologies | 25216 | |
SubtilisinA | Spectrum Chemical Mfg Corp | S1211-10ML | |
24 well Cryschem Plate | Hampton research | HR3-160 | |
INTELLI-PLATE 96: | Art Robbins Instruments | 102-0001-03 | |
PEG 3350 | Hampton research | HR2-591 | |
PEG 8000 | Hampton research | HR2-515 | |
PEG 400 | Hampton research | HR2-603 | |
PEG 4000 | Hampton research | HR2-605 | |
pcDNA3.1-Clover-mRuby2 | Addgene | 49089 | |
Overnight Express Autoinduction System 1 | Emd Millipore | 71300 | |
Lysogeny Broth powder | ThermoFisher Scientific | 12795027 |