Structure-based drug design plays an important role in drug development. Pursuing multiple targets in parallel greatly increases the chance of success for lead discovery. The following article highlights how the Seattle Structural Genomics Center for Infectious Disease utilizes a multi-target approach for gene-to-structure determination of the PB2 influenza A subunit.
Pandemic outbreaks of highly virulent influenza strains can cause widespread morbidity and mortality in human populations worldwide. In the United States alone, an average of 41,400 deaths and 1.86 million hospitalizations are caused by influenza virus infection each year 1. Point mutations in the polymerase basic protein 2 subunit (PB2) have been linked to the adaptation of the viral infection in humans 2. Findings from such studies have revealed the biological significance of PB2 as a virulence factor, thus highlighting its potential as an antiviral drug target.
The structural genomics program put forth by the National Institute of Allergy and Infectious Disease (NIAID) provides funding to Emerald Bio and three other Pacific Northwest institutions that together make up the Seattle Structural Genomics Center for Infectious Disease (SSGCID). The SSGCID is dedicated to providing the scientific community with three-dimensional protein structures of NIAID category A-C pathogens. Making such structural information available to the scientific community serves to accelerate structure-based drug design.
Structure-based drug design plays an important role in drug development. Pursuing multiple targets in parallel greatly increases the chance of success for new lead discovery by targeting a pathway or an entire protein family. Emerald Bio has developed a high-throughput, multi-target parallel processing pipeline (MTPP) for gene-to-structure determination to support the consortium. Here we describe the protocols used to determine the structure of the PB2 subunit from four different influenza A strains.
An overview of the protocol is presented in Figure 1.
Biologia Molecular
1. Construct Design
Use Gene Composer software to design protein construct and codon engineered synthetic gene sequences. The use of Gene Composer software has been offered in detail elsewhere 3.
2. Polymerase Incomplete Primer Extension (PIPE) Cloning
3. Prepare Vector PCR (vPCR)
4. Merge iPCR and vPCR Products
5. Preparing Glycerol Stocks of Successfully Cloned Constructs
6. Expression Testing
Lysis Buffer | Wash Buffer | Elution Buffer |
50 mM NaH2PO4, pH 8.0 300 mM NaCl 10 mM Imidazole 1% Tween 20 2 mM MgCl2 0.1 μl/ml Benzonase 1 mg/ml Lysozyme |
50 mM NaH2PO4, pH 8.0 300 mM NaCl 20 mM Imidazole 0.05% Tween 20 |
25 mM Tris, pH 8.0 300 mM NaCl 250 mM Imidazole 0.05% Tween 20 |
* Add Benzonase, lysozyme, and protease inhibitor immediately before lysis.
7. Large Scale Fermentation
PROTEIN PURIFICATION
Buffers:
Lysis Buffer | Buffer A (Equilibration) | Buffer B (Elution) | Sizing Column Buffer |
25 mM Tris, pH 8.0 200 mM NaCl 0.5 % Glycerol 0.02 % CHAPS 10 mM Imidazole 1 mM TCEP 50 mM Arginine 5 μl Benzonase 100 mg Lysozyme 3 Protease Inhibitor Tablets (EDTA-free) |
25 mM Tris, pH 8.0 200 mM NaCl 10 mM Imidazole 1 mM TCEP 50 mM Arginine 0.25% Glycerol |
25 mM Tris, pH 8.0 200 mM NaCl 200 mM Imidazole 1 mM TCEP |
25 mM Tris, pH 8.0 200 mM NaCl 1% Glycerol 1 mM TCEP |
* Add Benzonase, lysozyme, and protease inhibitor tablets to each 150 ml sample immediately before lysis.
8. Cell Lysis
9. Pre-run Protein Maker Setup
10. Nickel 1 (Ni1) Column
11. ULP1 Cleavage
12. Nickel 2 (Ni2) Column
13. Concentrating
14. Size-exclusion Chromatography (SEC)
CRYSTALLIZATION
15. Protein Crystallization
16. Crystal Harvesting
17. Crystal Screening/Data Collection
18. Data Processing/Structure Determination
The following results illustrate the expected outcomes of the described protocol, and in the case of PB2, the observed outcomes.
Using Gene Composer, five full-length target amino acid sequences of the influenza virus polymerase subunit PB2 were designed (Figure 2). The PB2 sequences were back translated and subjected to many engineering steps 3, resulting in codon harmonized sequences optimized for expression in E. coli. From the iPCR products (Figure 3b), a total of thirty-four constructs were successfully cloned into a modified pET28 vector system 10 with an N-terminal 6x His-Smt fusion tag using PIPE cloning 3 as shown in Figure 3a. A summary of the cloning workflow is presented in Figure 4.
After successful cloning, micro-scale protein expression of each construct was tested in BL21(DE3) E. coli cells. Cells were grown in TB medium supplemented with Novagen Overnight Express 1 medium (according to the manufacturer’s protocol) for 48 hr at 20 °C in a shaking incubator set at 220 rpm. After growth, cells were harvested and tested for soluble protein expression using capillary electrophoreses with a Caliper LabChip 90. Fourteen of the thirty-four PB2 constructs led to soluble target protein and entered large-scale fermentation. Large-scale cultures of each construct were grown in TB medium supplemented with Novagen’s Overnight Express 1 medium according to the manufacturer’s protocol. After growth, cells were harvested via centrifugation and stored at -80 °C. Large-scale protein expression of each culture was confirmed via SDS-PAGE analysis (Figure 5) before proceeding with large-scale purification.
The Protein Maker was used to conduct parallel purification of the fourteen PB2 constructs. The clarified lysates of all fourteen constructs were run through a nickel-chelate column. After determining which fractions contained target protein by SDS-PAGE, the corresponding fractions were pooled for each sample and the concentration of each was determined by an A280 reading. Removal of the 6x His-Smt tag was conducted by the addition of ULP1 followed by overnight dialysis and a second nickel column. Confirmation of the His-Smt tag removal was conducted by SDS-PAGE (Figure 6), and each sample was concentrated with a 10 kDa Amicon Ultra centrifuge tube. After concentration using the Amicon Ultra centrifuge tubes, each sample was run over a sizing column to achieve crystallographic purity. A second concentration was conducted to increase the protein concentration to a level necessary for crystallization. All fourteen constructs were successfully purified and entered into crystallization trials.
Crystallization was initiated by thawing the previously frozen protein. Crystallization was performed in a climate controlled room at 16 °C with specially designed plates (Emerald Bio) for sitting drop vapor diffusion (Figure 7). Initial screening was conducted with four sparse-matrix screens; JCSG+, Pact, Wizard Full, and CryoFull (Emerald Bio), following an extended Newman strategy. 0.4 μl of protein solution was then mixed with 0.4 μl of crystallant (or reservoir solution) from the corresponding reservoir using 96-well Compact Jr crystallization plates (Emerald Bio). Of the fourteen purified samples nine of them yielded crystals suitable for diffraction studies (Figure 8). An in-house diffraction data set was collected on five of the nine constructs crystallized at Cu Kα wavelength using a Rigaku SuperBright FR-E+ rotating-anode X-ray generator equipped with Osmic VariMax HF optics and a Saturn 944+ CCD detector (Figure 9). Each data set was processed with XDS/XSCALE4 and scaled to a final resolution. Attempts to solve the structures by molecular replacement were carried out with Phaser 5 from the CCP4 suite 7. The final models were obtained after refinement in REFMAC 7 and manual rebuilding with Coot 11. The structures were assessed and corrected for geometry and fitness withMolProbity 9. A total of four structures of the PB2 subunit were determined (Figure 10) and deposited into the PDB. Figure 11 illustrates the overall outcome at each stage in the MTPP pipeline.
Figure 1. Overview of the SSGCID gene-to-structure pathway for Multi-target parallel processing at Emerald Bio.
Figure 2. Alignment Viewer and Protein Construct Design Module in Gene Composer software. Amino-acid base construct of target is shown in green (middle window) and the structure guided truncations of alternative constructs are shown in gold (bottom window). An alignment of multiple Flu viral PB2 sequences is shown compared to the sequence and secondary structure elements of the C-terminal domain from PDBID 3CW4. Knowledge of the domain structure and secondary structure elements allows N-terminal truncations to be chosen within the Gene Composer Design Module by right-clicking on the desired amino acid residue. Click here to view larger figure.
Figure 3a. PIPE cloning is illustrated wherein the synthetic gene insert (orange) is amplified by designed forward (red-orange lines) and reverse (orange-blue lines) primers to generate insert PCR material. The expression vector is amplified with reverse (red-black lines) and forward (blue-black lines) primers to generate vector PCR material. The terminal sequences iPCR products are complementary to the terminal sequences of vPCR products (red of iPCR complements red of vPCR and blue of iPCR complements blue of vPCR). This allows the iPCR and vPCR products to anneal to form plasmids that are replicated upon transformation into host BL21(DE3) chemically competent E. coli cells.
Figure 3b. Agarose gel analysis of iPCR products from the PB2 subunit. iPCR failures may be seen as faint or smeary bands, while successful iPCR products are represented by robust bands. iPCR product quality can generally be correlated with cloning success. Molecular weight markers are in kiloDaltons. Figure is reproduced from Raymond et al., 2011 12.
Figure 4. Gene engineering steps of target PB2 proteins were performed using Gene Composer software. After the engineered nucleic acid sequence was established for each target, 6-7 alternative protein constructs were designed for each. Multi-target parallel processing in the initial steps of gene design and cloning resulted in 34 constructs, 14 of which were viable targets that produced soluble proteins in E. coli.
Figure 5. Representative SDS-PAGE analysis of large scale fermentation showing robust protein expression (expected size of 25.76 kDa), roughly 50% soluble (lane 4) and about 50% cleavage of 6x His-Smt tag from eluted protein (lane 7).
Figure 6. SDS-PAGE results for three constructs of the polymerase PB2 subunit. Lane 1, molecular-weight markers (labeled on the left in kDa); lanes 2, 6, and 10, pooled protein from Nickel 1 column; lanes 3, 7, and 11, flow-through of cleaved protein in buffer A from Nickel 2; lanes 4, 8, and 12, removal of 6x His-Smt tag in buffer B from Nickel 2.
Figure 7. A schematic of vapor diffusion by the sitting drop method. The sitting drop method for protein crystallization falls under the category of vapor diffusion. This method entails a purified sample of protein and precipitant to equilibrate with a larger reservoir containing similar conditions in a higher concentration. As water vaporizes from the protein sample and transfers to the reservoir, the precipitant concentration increases to an optimal level for protein crystallization.
Figure 8. Protein crystal of polymerase PB2 subunit from a strain of the influenza virus.
Figure 9. X-ray diffraction image of the polymerase PB2 subunit from a strain of the influenza virus.
Figure 10. Ribbon diagrams of the molecules in the crystallographic asymmetric unit of 4 PB2 structures. Secondary structures colored in rainbow pattern with corresponding PDB codes. (a) 3K2V (A/Yokohama/2017/2003/H3N2) (b) 3KHW (A/Mexico/InDRE4487/2009/H1N1) (c) 3KC6 (A/Vietnam/1203/2004/H5N1) (d) 3L56 (A/Vietnam/1203/2004/H5N1).
Figure 11. Outcome analysis for influenza PB2 targets by the methods described. The structure determination pipeline is illustrated in five steps: Cloning, solubility, purification, crystallization and structure determination.
Multi-Target Parallel Processing
Structure-based drug design plays an important role in drug discovery. The SSGCID is dedicated to providing the scientific community with three-dimension protein structures from NIAID category A-C pathogens. Making such structural information widely available will ultimately serve to accelerate structure-based drug design.
The first critical step of the MTPP approach is construct design. Multiple constructs of each target protein increases the probability of successful structure determination and increases turnaround. It is inevitable that some protein constructs will fail during stages of the pipeline. Implementing the PIPE cloning method supports the MTPP method by allowing the generation of many constructs in 96-well format without labor intensive purification steps. Pairing PIPE cloning with the ability to analyze protein expression in the same 96-well format (Caliper LabChip 90) further expedites the overall flow. The pairing of these methods allows for quick identification of constructs that produce soluble protein which ensures the success of large-scale protein production and purification.
An essential aspect to the success of the MTPP high-throughput is the Protein Maker (US Patent No. 6818060, Emerald Bio) instrument. The Protein Maker is a 24-channel parallel liquid-chromatography system developed specifically to boost the efficiency of high-throughput protein production and related structural genomic pipeline research applications. Using the previously described protocol for the Protein Maker, the advantages are apparent in comparison to a single line FPLC system. A single person can purify up to 48 targets in parallel within an eight hour period. In contrast, a single person using a single line FPLC system can only purify a maximum of four targets within the same timeframe. The high levels of purity for each target achieved with the Protein Maker are a critical factor in the later success of growing protein crystals for structure analysis.
Limitations and Troubleshooting
Solving three-dimensional structures by x-ray crystallography is a multi-staged effort with many challenges, one of which is the inability to obtain large amounts of soluble target protein. One strategy that can be implemented to overcome the solubility problem is the use of an alternative expression host as E. coli cells are unable to perform several important eukaryotic post-translational modifications. Expression in various yeast, insect and mammalian cell lines that are capable of performing these post-translational modifications are often a suitable alternative. Target proteins are sometimes expressed but completely insoluble in the standard lysis conditions. The Protein Maker can be a valuable resource for the rapid testing of alternative cell lysis conditions as described in Smith et al. 2011 13. This strategy is often necessary to keep targets moving through the pipeline. In any structural genomics pipeline, standardized protocols may not be suitable for every target that comes through the pipeline and targets may need individual optimization. For example, we have chosen to use 20% ethylene glycol for every cryoprotectant. In cases that this condition is not suitable, alternative cryoprotectants or concentrations may need to be tested.
Due to the unique nature of each individual protein target, the rate-limiting and unpredictable step in determining a structure is crystallization. The MTPP pipeline offsets the commonly low success rate of protein crystallization with optimization from the initial sparse matrix screens. Each initial crystal hit from commercially available sparse matrix screens is further optimized with an E-Screen Builder (Emerald Bio). The optimization screen is designed around the condition of the initial crystal hit, altering the concentrations of the buffers, salts, and additives. Successful optimization screens yield crystals suitable for diffraction studies and structure determination.
The structural genomics program put forth by the National Institute of Allergy and Infectious Disease (NIAID) provides funding to Emerald Bio and three other Pacific Northwest institutions who together are the SSGCID (Emerald Bio, SeattleBiomed, the University of Washington and Pacific Northwest National Laboratory). Each member of the consortium was chosen for their expertise in applying state-of-the-art technologies required for accomplishing the goals of the NIAID structural genomics program. To date, SSGCID has deposited 461 structures into the PDB ranking it as the seventh largest contributor in the world, and in 2011, the most productive. The protocols and methodologies of the SSGCID are provided with the intention of benefiting the scientific community and perpetuating the research of infectious diseases.
The authors have nothing to disclose.
The authors would like to thank all members of the SSGCID consortium. Achievement of the SSGCID’s goals is made possible by the tremendous efforts of all team members at Emerald Bio. This research was funded under Federal Contract No. HHSN272200700057C from the National Institute of Allergy and Infectious Diseases, the National Institutes of Health and the Department of Health and Human Services.
Item | Vendor | Catalog number |
Primers | IDT | |
Genes | DNA 2.0 | |
TE buffer | Qiagen | provided in kit |
96-well half skirt PCR plates | VWR | 10011-248 |
PFU Master Mix | ||
6X Orange Loading Dye | Fermentas | R0631 |
10X TAE | Teknova | T0280 |
Agarose | Sigma-Aldrich | A9414-10G |
pET28 vector | ||
2-YT Broth | VWR | 101446-848 |
Kanamycin | Teknova | K2151 |
Restriction Enzymes | Fermentas | |
QIAquick Gel Extraction Kit | Qiagen | 28704 |
Top 10 chemically comp cells | Invitrogen | C4040-06 |
Disposable Troughs (Sterile, 25 ml) | VWR | 89094-662 |
Airpore covers (Rayon films for bio cultures) | VWR | 60941-086 |
24-well blocks | VWR | 13503-188 |
QIAvac 96 | Qiagen | 19504 |
BL21(DE3) cells chemcomp (phageR) | NEB | C2527H |
50% Glycerol | VWR | 100217-622 |
TB Media | Teknova | T7060 |
IPTG | Sigma-Aldrich | |
1 M Tris pH 8.0 | Mediatech | 46-031-CM |
5 M NaCl | Teknova | S0251 |
Glycerol | Aldrich | G7893-4L |
CHAPS | JT Baker | 4145-01 |
Imidazole | Sigma | 56749-1KG |
TCEP | Amresco | K831-10G |
L-arginine | Amresco | 0877-500G |
Benzonase | EMD | 70746-3 |
Lysozyme | USB | 1864525GM |
10 kDa MWCO dialysis tubing | Thermo | 68100 |
Amicon Ultra 10 ka MWCO concentrators | Millipore | UFC901024 |
HisTrap FF columns | GE | 17-5255-01 |
HiTrap Chelating columns | GE | 17/0408-01 |
Compact Jr crystallization plates | Emerald Bio | EBS-XJR |
Crystalization screens | Emerald Bio | |
Ethylene Glycol 100% | Emerald Bio | EBS-250-EGLY |
Crystal Wand Magnetic Straight | Hampton Research | HR4-729 |
Mounted CryoLoop 0.1-0.2 mm | Hampton Research | HR4-955 |
ALS style puck | ||
Puck Wand | ||
Bent Tongs | ||
Puck Pusher |