To overcome the limitations of classical site-directed mutagenesis, proline analogs with specific modifications were incorporated into several fluorescent proteins. We show how the replacement of hydrogen by fluorine or of the single by double bonds in proline residues (“molecular surgery”) affects fundamental protein properties, including their folding and interaction with light.
Replacement of proline (Pro) residues in proteins by the traditional site-directed mutagenesis by any of the remaining 19 canonical amino acids is often detrimental to protein folding and, in particular, chromophore maturation in green fluorescent proteins and related variants. A reasonable alternative is to manipulate the translation of the protein so that all Pro residues are replaced residue-specifically by analogs, a method known as selective pressure incorporation (SPI). The built-in chemical modifications can be used as a kind of "molecular surgery" to finely dissect measurable changes or even rationally manipulate different protein properties. Here, the study demonstrates the usefulness of the SPI method to study the role of prolines in the organization of the typical β-barrel structure of spectral variants of the green fluorescent protein (GFP) with 10-15 prolines in their sequence: enhanced green fluorescent protein (EGFP), NowGFP, and KillerOrange. Pro residues are present in connecting sections between individual β-strands and constitute the closing lids of the barrel scaffold, thus being responsible for insulation of the chromophore from water, i.e., fluorescence properties. Selective pressure incorporation experiments with (4R)-fluoroproline (R-Flp), (4S)-fluoroproline (S-Flp), 4,4-difluoroproline (Dfp), and 3,4-dehydroproline (Dhp) were performed using a proline-auxotrophic E. coli strain as expression host. We found that fluorescent proteins with S-Flp and Dhp are active (i.e., fluorescent), while the other two analogs (Dfp and R-Flp) produced dysfunctional, misfolded proteins. Inspection of UV-Vis absorption and fluorescence emission profiles showed few characteristic alterations in the proteins containing Pro analogs. Examination of the folding kinetic profiles in EGFP variants showed an accelerated refolding process in the presence of S-Flp, while the process was similar to wild-type in the protein containing Dhp. This study showcases the capacity of the SPI method to produce subtle modifications of protein residues at an atomic level ("molecular surgery"), which can be adopted for the study of other proteins of interest. It illustrates the outcomes of proline replacements with close chemical analogs on the folding and spectroscopic properties in the class of β-barrel fluorescent proteins.
Classical site-directed mutagenesis allows permutation of any existing gene-encoded protein sequence by codon manipulation at the DNA level. To study protein folding and stability, it is often desirable to replace similar amino acids with similar counterparts. However, traditional protein mutagenesis is definitely limited to structurally similar replacements among canonical amino acids such as Ser/Ala/Cys, Thr/Val, Glu/Gln, Asp/Asn, Tyr/Phe, which are present in the standard genetic code repertoire. On the other hand, there are no such possibilities for other canonical amino acids such as Trp, Met, His, or Pro, which often play essential structural and functional roles in proteins1. An ideal approach to study these interactions in the context of the highly specific internal architecture of proteins and their folding process is to generate non-disruptive isosteric modifications. Indeed, when isosteric amino acid analogs of these canonical amino acids, also known as non-canonical amino acids (ncAAs), are inserted into proteins, they allow for subtle changes even at the level of single atoms or atom groups such as H/F, CH2/S/Se/Te known as "atomic mutations"2. Such "molecular surgery" produces altered proteins whose properties result solely from the exchange of single atoms or groups of atoms, which in favorable cases can be analyzed, and the detected changes can be rationalized. In this way, the scope of protein synthesis to study protein folding and structure is extended far beyond classical DNA mutagenesis. Note that proteins generated by site-directed mutagenesis are usually referred to as "mutants," whereas proteins with substituted canonical amino acids are referred to as "variants" 3, "alloproteins"4, or "protein congeners"5.
The green fluorescent protein (GFP), first identified in the marine organism Aequorea victoria, exhibits bright green fluorescence when exposed to ultraviolet-to-blue light6,7. Today, GFP is commonly used as a highly sensitive labeling tool for routine visualization of gene expression and protein localization in cells via fluorescent microscopy. GFP has also proven useful in various biophysical8,9,10 and biomedical11,12 studies, as well as in protein engineering13,14,15. Rigorous analysis of the GFP structure enabled the creation of numerous variants characterized by varied stability and fluorescence maxima16,17. Most of the GFP variants used in cell and molecular biology are monomeric proteins both in solution and in the crystal18. Their principal structural organization is typical for all members of the GFP family, independent of their phylogenetic origin, and consists of 11 β-strands forming a so-called β-barrel, while a kinked α-helix is running through the center of the barrel and bears the chromophore (Figure 1A). The autocatalytic maturation of the chromophore (Figure 1B) requires the precise positioning of the side chains surrounding it in the central place of the protein; many of these side chains are highly conserved in other GFP variants19. In most fluorescent proteins from jellyfish such as Aequorea victoria, the green-emitting chromophore consists of two aromatic rings, including a phenol ring of Tyr66 and the five-membered heterocyclic structure of imidazolinone (Figure 1B). The chromophore, when properly embedded in the protein matrix, is responsible for the characteristic fluorescence of the whole protein. It is located in the center of the structure, while the barrel structure insulates it from the aqueous medium20. Exposure of the chromophore to the bulk water would result in fluorescence quenching, i.e., loss of fluorescence21.
The proper folding of the barrel-like structure is essential for protecting the chromophore against fluorescence quenching22. Proline (Pro) residues play a special role in the structural organization of GFP23. Being unable to support a β-strand, they constitute connecting loops responsible for maintaining the protein structure as a whole. Not surprisingly, 10-15 proline residues are found in both Aequorea– and Anthoathecata-derived GFPs; some of them are highly conserved in other types of fluorescent β-barrel proteins. Prolines are expected to critically influence folding properties due to their peculiar geometric features. For example, in Aequorea-derived GFPs, of the ten proline residues (Figure 2A), nine form trans– and only one forms a cis-peptide bond (Pro89). Pro58 is essential, i.e., not interchangeable with the rest of the 19 canonical amino acids. This residue may be responsible for the correct positioning of the Trp57 residue, which has been reported to be crucial for chromophore maturation and the overall GFP folding24. The fragment PVPWP with three proline residues (Pro54, Pro56, Pro58) and Trp57 is the essential part of the "lower lid" in the GFP structure from Figure 1A. The PVPWP structural motif is found in several proteins such as cytochromes and eukaryotic voltage-activated potassium channels25. Proline-to-alanine substitutions at positions 75 and 89 are also detrimental to protein expression and folding and abolish chromophore maturation. Pro75 and Pro89 are part of the "upper lid" burying the chromophore (Figure 1A) and are conserved across 11-stranded β-barrel fluorescent proteins23. These two "lids" keep the chromophore excluded from the aqueous solvent, even when the stable tertiary structure has been partially broken26. Such a specific molecular architecture protects the fluorophore from collisional (dynamic) fluorescence quenching, e.g., by water, oxygen, or other diffusible ligands.
In order to perform molecular engineering of the GFP structure, one should introduce amino acid substitutions in the primary structure of the protein. Numerous mutations have been performed on GFP, providing variants with elevated stability, fast and reliable folding, and variable fluorescence properties17. Nonetheless, in most cases, mutation of proline residues is considered a risky approach due to the fact that none of the remaining 19 canonical amino acids can properly restore the conformational profile of the proline residue27. Thus, an alternative approach has been developed, in which proline residues are replaced with other proline-based structures, dubbed as proline analogs28. Owing to its unique cyclic chemical structure, proline exhibits two characteristic conformational transitions (Figure 1C): 1) the proline ring puckering, a fast process entailing organization of the backbone, which primarily affects the φ torsion angle, and 2) the peptide bond cis/trans isomerization, a slow process impacting backbone folding via the ω torsion angles. Due to its slow nature, the latter transition is commonly responsible for the rate-limiting steps in the folding process of the whole protein. It has been shown previously that peptide bond cis/trans isomerization around some proline residues features slow steps in the folding of GFP variants. For example, the formation of the cis-peptide bond at Pro89 features the slow step in the process of folding because it relies on the bond transition from trans 세스 cis29. A faster refolding can be achieved after replacing Pro89 with an all-trans peptide loop, i.e., by abolishing a cis-to-trans isomerization event30. In addition to the cis/trans isomerization, the pucker transitions may also generate profound changes in protein folding due to the backbone organization and packing within the protein interior27,31.
Chemical modifications result in alteration of the intrinsic conformational transitions of the proline residues, thereby affecting the ability of the protein to fold. Certain proline analogs are particularly attractive candidates for proline substitution in proteins as they allow the manipulation and study of the folding properties. For example, (4R)-fluoroproline (R-Flp), (4S)-fluoroproline (S-Flp), 4,4-difluoroproline (Dfp), and 3,4-dehydroproline (Dhp) are four analogs (Figure 1D) that differ minimally from proline in terms of both molecular volume and polarity32. At the same time, each analog exhibits a distinct ring puckering: S-Flp stabilizes the C4–endo pucker, R-Flp stabilizes the C4–exo pucker, Dfp exhibits no apparent pucker preference, while Dhp abolishes the puckering (Figure 1D)33. By using these analogs in the protein structure, one can manipulate with the conformational transition of the proline residues, and with this, affect the properties of the resulting GFP variants.
In this work, we set out to incorporate the designated set of proline analogs (Figure 1D) into the structure of GFP variants using the selective pressure incorporation method (SPI, Figure 3)34. Replacement of amino acid residues with their closest isostructural analogs is an applied biotechnological concept in protein design35,36. Thus, the effects of proline analogs in a model protein illustrate their potential to serve as tools in protein engineering37. The production of proteins containing desired analogs was performed in modified E. coli strains that are not able to produce proline (proline-auxotrophy). Thus, they could be forced to accept replacement of substrates in the process of protein biosynthesis38. This global substitution of proline is enabled by the natural substrate flexibility of endogenous aminoacyl-tRNA synthetases39, the key enzymes catalyzing the esterification of tRNAs with appropriate amino acids40. In general, as outlined in Figure 3, cellular growth is performed in a defined medium until the mid-logarithmic growth phase is reached. In the next step, the amino acid to be replaced is intracellularly depleted from the expression system during fermentation and subsequently exchanged by the desired analog or ncAA. Target protein expression is then induced for residue-specific non-canonical amino acid incorporation. The substitution of the cognate amino acid with its analog occurs in a proteome-wide manner. Although this side effect may have a negative impact on the growth of the host strain, the quality of target protein production is mostly not affected, since, in recombinant expression, the cellular resources are mainly directed to the production of the target protein41,42. Therefore, a tightly regulated, inducible expression system and strong promoters are crucial for high incorporation efficiency43. Our approach is based on multiple residue-specific incorporation of ncAAs in response to sense codons (sense codon reassignment), whereby within the target gene, the number of positions for Pro analog insertion can be manipulated via site-directed mutagenesis44. A similar approach was applied in our previous report on the preparation of recombinant peptides with antimicrobial properties45. In this work, we have applied the SPI method, which allows all proline residues to be replaced by related analogs, to generate proteins expected to possess distinct physicochemical properties not present in proteins synthesized with the canonical amino acid repertoire. By characterizing the folding and fluorescence profile of resulting variants, we aim to showcase the effects of atomic substitutions in variants of GFP.
1. Introduction of expression plasmids into competent Pro-auxotrophic E. coli cells
2. Production of recombinant wild-type fluorescent proteins (harboring canonical proline) and procedure for selective pressure incorporation (SPI) to produce fluorescent proteins with proline analogs (S-Flp, R-Flp, Dfp, Dhp)
3. Purification procedure of protein samples by immobilized metal ion affinity chromatography (IMAC)
4. SDS-PAGE sample preparation
5. Fluorescence emission of protein variants
6. Denaturation and refolding of EGFP variants
At the beginning of the study, we selected three different fluorescent protein variants sharing the parent GFP architecture. The first protein selected was EGFP, which is an engineered variant derived from the original GFP from the jellyfish Aequorea victoria containing Phe64Leu/Ser65Thr mutations. The second selected protein was NowGFP51,60. It is also a variant of A. victoria GFP derived by mutagenesis in several steps via preceding fluorescent proteins. NowGFP contains 18 mutations compared to its immediate predecessor fluorescent protein "Cerulean"61. In turn, the "Cerulean" protein is a derivative of the enhanced cyan fluorescent protein (ECFP)62,63, a protein previously selected by directed laboratory evolution and containing a tryptophan-based chromophore. Both, EGFP and NowGFP are widely used in cell biology and biophysical studies, and they contain ten conserved proline residues in their structures. In addition, NowGFP has an eleventh proline residue at position 230, which appeared due to the extensive mutation history of this protein variant. The third protein selected was the KillerOrange fluorescent protein64,65. It is a derivative of the chromoprotein anm2CP from the hydrozoan genus Anthoathecata. The protein sequence contains 15 proline residues, and the chromophore is based on a tryptophan rather than a tyrosine residue. High-resolution X-ray structures have been reported for all three selected proteins (Figure 2)51,65,66.
In the first step, proline analogs (Figure 1D) were incorporated into all proline positions of three model proteins (EGFP, NowGFP, and KillerOrange) by selective pressure incorporation (SPI, a scheme of the procedure is given in Figure 3). Instrumentally, the proline-auxotrophic E. coli K12 strain JM8367 was used for expression of the proteins in the presence of proline and analogs (Figure 1D), yielding wild-type and modified proteins, respectively. Pellets from cells expressing the native protein and variants bearing S-Flp and Dhp had the typical bright color due to the intact chromophore, whereas variants containing R-Flp and Dfp remained colorless, indicating misfolding and deposition of unfolded protein in inclusion bodies (Figure 4A). SDS-PAGE analysis of the expressed samples verified the presence of insoluble R-Flp-containing proteins (Figure 4B–D), which precluded further investigations. Although this is beyond the scope of the present study, it should be noted that protein solubility and misfolding issues can be alleviated to some extent by in vitro refolding procedures68. In contrast, native proteins as well as S-Flp- and Dhp-bearing variants were found mainly in the soluble fractions (Figure 4B–D). The wild-type, as well as S-Flp- and Dhp-containing variants, could be further isolated and characterized in fluorescence studies. Soluble proteins were purified by immobilized metal ion affinity chromatography (IMAC), yielding 20-30 mg/L of culture volume for EGFP, 60-80 mg for NowGFP and KillerOrange, whose yields for wild-type and modified proteins were very similar. Liquid chromatography-mass spectrometry (LC-MS)-coupled analysis confirmed the expected identity and purity of the isolates obtained in this fashion (Figure 5). In the mass spectra, each proline replacement with S-Flp produced a +18 Da shift per each proline residue in the sequence, while for the proline-to-Dhp replacement, the shift was −2 Da per residue.
In the next step, light absorption and emission spectra were recorded to analyze the potential effects of non-canonical proline analogs incorporation on the spectroscopic properties of the parent fluorescent proteins (Figure 6). UV-Vis absorption spectra showed a typical band around 280 nm characteristic for aromatic residues, tyrosine, and tryptophan, while the chromophore absorbance was found at 488 nm for EGFP, and 493 nm for NowGFP (Figure 6A,B). In KillerOrange, the chromophore absorbance region comprised two bands (Figure 6C), which correspond to two possible configurational and charge states of the complex chromophore. The band around 510 nm is known as the state from which fluorescence occurs with high quantum yield49,65. In the proline replacement variants, the following was observed: Incorporation of Dhp did not change the absorbance spectra of EGFP and NowGFP, while S-Flp produced an enhanced UV absorption. The latter can be explained by induced differences in the tryptophan residue microenvironments, particularly Trp57 sandwiched between three S-Flp in the PVPWP motif (Figure 6A,B)69. A more trivial explanation for a higher UV absorption, however, may stem from an increased fraction of improperly folded protein. Since the concentration of the protein was assessed by quantification of absorbance features, the presence of a protein with an improperly mature chromophore can increase the absorbance, while this fraction is not counted in the overall concentration (Figure 6A,B). Supporting this hypothesis, we observed that the S-Flp-containing EGFP exhibited a markedly reduced ratio of chromophore versus combined tryptophan and tyrosine absorbance (ε(CRO)/ε(Tyr+Trp) = 0.96) as compared to a higher value (1.57) in the parent protein (Table 2)70. The presence of a non-fluorescent fraction in the S-Flp-containing EGFP will be an important contributing factor in further analysis of the protein properties. In the KillerOrange variant containing S-Flp, an enhanced absorbance alongside a red-shift in the chromophore band was observed. This fact indicated that the chromophore formation favored a configuration with a large fluorescence quantum yield (Figure 6C).
Subsequently, we analyzed the fluorescence spectra of the proteins recorded upon excitation at the corresponding maximum absorbance wavelengths. The results show that the spectra remained essentially identical for the examined fluorescent protein variants bearing proline and replacements, S-Flp and Dhp. This outcome implies that the analogs did not alter the chemical environment of the chromophore in any case (Figure 6G–I). Despite this fact, marked differences were seen in the fluorescence spectra of KillerOrange recorded upon excitation at 295 nm, hence upon tryptophan excitation. This experiment tracks fluorescence resonance energy transfer (FRET) or direct excitonic coupling that occurs between the tryptophan side chains and the mature chromophore as both are located at a short distance of not more than 25 Å. For EGFP and NowGFP variants, when the emission spectra were measured using 295 nm excitation, a strong chromophore emission was observed alongside hardly any tryptophan emission (Figure 6D,E). However, the variants containing S-Flp exhibited a slightly larger tryptophan-specific emission. This observation can be linked to an uncounted contribution of the unfolded apoprotein that contains tryptophans but not the mature chromophore. Substantially increased tryptophan-specific emission was seen in KillerOrange, indicating a lack of fluorescence quenching via the expected mechanism of excitation energy transfer or excitonic coupling. The protein variants containing proline and S-Flp exhibited comparable tryptophan emission alongside the favored red-shifted fluorescence feature of a high quantum yield. In contrast, the variant that contained Dhp showed a drastic decrease in chromophore fluorescence intensity, presumably due to minor structural effects (Figure 6F).
Next, we compared the folding properties of the proteins by performing an unfolding/renaturation experiment. Fluorescence emission spectra were recorded in the folded state (protocol section 5), after chemical denaturation and, subsequently, in the process of refolding monitored over a period of 24 h (protocol section 6). The spectra were recorded upon excitation at both relevant wavelengths, 295 nm, and at the maxima of the chromophores' absorbance spectra, while the resulting fluorescence is presented as normalized to the maximum value for each protein (Figure 7). At the end of the protocol, we observed that EGFP variants could refold, while the NowGFP and KillerOrange variants – once denatured – remained unfolded (data not shown). Thus, refolding capacities of the original fluorescence proteins varied substantially. Of note, KillerOrange has been developed as a photosensitizer starting from the hydrozoan chromoprotein variant KillerRed65,71, and its refolding typically lags behind in spite of the robust β-barrel structure. In our experiments, we found that the wild-type EGFP chromophore fluorescence recovered only partially, although the tryptophan-specific fluorescence was larger after renaturation (Figure 7A,D). Essentially similar behavior was observed in the variant containing Dhp (Figure 7C,F). In S-Flp-containing EGFP, a similar result was observed when the excitation was performed at the tryptophan-specific wavelength of 295 nm (Figure 7B). Strikingly, the fluorescence recovered to a much higher extend when the chromophore was excited at 488 nm (Figure 7E). It seems that S-Flp induces a much better yield of refolding compared to the other two variants. However, this beneficial effect was not seen when using 295 nm excitation due to unknown molecular interactions.
Subsequently, refolding velocity was monitored by recording fluorescence of both tryptophan, and the chromophore, separately, while the endpoint of the process was determined at 24 h after the start of renaturation. Only EGFP variants showed a relatively fast refolding kinetics that could be evaluated reliably, while none of the denatured NowGFP and KillerOrange variants could recover to a value that enabled further quantitative measurements. In EGFP, tryptophan emission recovery was twice as fast (completed in 750 s) compared to the recovery of chromophore emission (completed in 1,500 s), indicating the complexity of the underlying processes (Figure 8). At both excitation wavelengths, the refolding rate was elevated by the presence of S-Flp, in agreement with literature data25. At the same time, the Dhp-containing variant showed a refolding profile similar to wild-type.
Figure 1: Green fluorescent protein (GFP) structural scaffold, chromophore building, proline conformational transitions and synthetic analogs used in this study. (A) The structure of GFP consists of the β-strands forming a nearly perfect barrel (i.e., a "can" with dimensions 4.2 nm x 2.4 nm) that is capped at both ends by α-helical lids. The 27 kDa GFP protein shows a tertiary structure consisting of eleven β-strands, two short α-helices, and the chromophore in the middle. The conformational states of adjacent prolines are linked to chromophore formation. (B) Autocatalytic maturation (condensation) of the chromophore occurs at residues Ser65, Tyr66, and Gly67, and proceeds in several steps: First, torsional adjustments in the polypeptide backbone to bring the carboxyl carbon of Thr65 into proximity to the amide nitrogen of Gly67. Then, the formation of a heterocyclic imidazoline-5-one ring system occurs upon nucleophilic attack on this carbon atom by the amide nitrogen of glycine and subsequent dehydration. Finally, the system gains visible fluorescence when oxidation of the tyrosine alpha-beta carbon bond by molecular oxygen leads to the extension of the conjugated system of the imidazoline ring system, at the end including the tyrosine phenyl ring and its para-oxygen substituent. The resulting para-hydroxybenzylidene imidazolinone chromophore in the center of the β-barrel is completely separated from the bulk solvent. (C) The skeletal structure formulas and geometries of 1) the proline ring (puckers) and 2) the preceding amide bond represents the main conformational transitions of the proline residue. (D) The proline analogs used in this work with the designated proline ring puckers. The figure was generated using ChemDraw and Discovery Studio Visualizer. The GFP structure is from PDB structure entry 2Q6P. Please click here to view a larger version of this figure.
Figure 2: Fluorescent proteins used in this study. The panels show the ribbon representation of the typical β-barrel structures of three different variants of fluorescent proteins: EGFP, NowGFP, and KillerOrange, with ribbon color representing the color of fluorescence emission of each variant. Proline residues (one-letter code) are highlighted as sticks, and the appropriate positions are annotated. Chromophores are shown with initial amino acid composition in bold. All structure representations were produced with PyMol based on the following PDB structure entries: 2Q6P for EGFP, 4RYS for NowGFP, 4ZFS for KillerOrange. Please click here to view a larger version of this figure.
Figure 3: Flow chart presentation of the SPI method for residue-specific incorporation of non-canonical proline analogs. A proline-auxotrophic Escherichia coli (E. coli) host strain carrying the gene of interest on an expression plasmid is grown in a defined minimal medium with all 20 canonical amino acids until an OD600 of ~0.7 is reached at which the cell culture is in the mid-logarithmic growth phase. Cells are harvested and transferred into fresh minimal medium containing 19 canonical amino acids and a proline analog. After the addition of an inducer, protein expression is performed overnight. Finally, the target protein is isolated by cell lysis and purified prior to further analysis. In a variation of the protocol, the cells are grown in a defined minimal medium with 19 canonical amino acids, and proline is added in a limited amount (e.g., one-fifth of the concentration of the other amino acids). By this measure, the cells exhaust proline in the medium before they can exit the logarithmic growth phase, and then, subsequently, the analog is added, and the protein of interest production is induced. Please click here to view a larger version of this figure.
Figure 4: Expression analysis of EGFP, NowGFP, and KillerOrange variants. (A) Cell pellets from 1 mL of expression culture, normalized to OD600 = 2. SDS-PAGE analysis of (B) EGFP, (C) NowGFP, and (D) KillerOrange variants. Soluble (S) and insoluble fractions (I) of each fluorescent protein derivates were loaded on 15% acrylamide gel, as well as eluted fractions (E) from IMAC of soluble proteins. PageRuler Unstained Protein Ladder was used as a marker (M) in the lanes denoted by (M). The expected regions of the particular protein are framed. Incorporated amino acids at proline positions are Pro, R-Flp, S-Flp, and Dhp (in (A) cell pellets from fluorescent protein variants incorporating Dfp instead of Dhp are shown). Gels were stained by 1% (w/v) Coomassie Brillant Blue. Please click here to view a larger version of this figure.
Figure 5: Mass spectrometric analysis of fluorescent protein variants. (A) Representative deconvoluted ESI-MS spectra of H6-tagged EGFP (black), S-Flp-EGFP (orange), and Dhp-EGFP (cyan) with the location of the main mass peaks provided as numbers (in Da). The calculated molecular masses [M+H]+ of the H6-tagged proteins are: For EGFP 27,745.33 Da (observed 27,746,15 Da); for S-Flp-EGFP 27,925.33 Da (observed 27,925.73 Da); for Dhp-EGFP 27,725.33 Da (observed 27,726.01 Da). (B) Representative deconvoluted ESI-MS spectra of H6-tagged NowGFP (black), S-Flp-NowGFP (orange), and Dhp-NowGFP (cyan) with the location of the main mass peaks provided as numbers (in Da). The calculated masses of the H6-tagged proteins are: For NowGFP 27,931.50 Da (observed 27,946.46 Da; the difference of ~16 Da is probably due to oxidation of a methionine in the protein); for S-Flp-NowGFP 28,129.50 Da (observed 28,130.08 Da); for Dhp-NowGFP 27,909.50 Da (observed 27,910.22 Da). (C) Representative deconvoluted ESI-MS spectra of H6-tagged KillerOrange (black), S-Flp-KillerOrange (orange), and Dhp-KillerOrange (cyan) with the location of the main mass peaks provided as numbers (in Da). The calculated masses of the H6-tagged proteins are: For KillerOrange 27,606.09 Da (observed 27,605.91 Da); for S-Flp-KillerOrange 27,876.09 Da (observed 27,876.08 Da); for Dhp-KillerOrange 27,576.09 Da (observed 27,575.93 Da). Deviations between the observed and calculated molecular masses of about 1 Da are within the error range of the ESI-MS equipment. Please click here to view a larger version of this figure.
Figure 6: Light absorption and fluorescence emission spectra of fluorescent protein variants. Normalized UV-Vis absorption spectra are shown for the variants (A) of EGFP, (B) of NowGFP, and (C) of KillerOrange. Spectra were normalized to the maximum of chromophore absorbance (around 500 nm). Normalized fluorescence emission spectra are shown of the variants (D,G) of EGFP, (E,H) of NowGFP, and (F,I) of KillerOrange. Spectra in (D,E,F) were measured upon excitation with ultraviolet light (295 nm), for the spectra in (G,H,I) 488 nm, 493 nm, and 510 nm light were used for excitation, respectively, and the spectra were normalized to the respective maxima of chromophore emission (around 500 nm). In each panel, black curves correspond to the spectra of the fluorescent protein variant with native proline, orange curves indicate the spectra of S-Flp-substituted proteins, and blue curves correspond to Dhp-substituted proteins. Please click here to view a larger version of this figure.
Figure 7: Fluorescence emission spectra of EGFP variants in refolding experiments. Normalized fluorescence emission spectra of 0.3 µM solutions of fluorescent protein variants in the native state and after denaturation and refolding: Spectra in (A,B,C) were measured upon excitation with ultraviolet light (295 nm) (A) for EGFP, (B) for S-Flp-EGFP, and (C) for Dhp-EGFP. Spectra in (D,E,F) were measured upon excitation with green light (488 nm) (D) for EFGP, (E) for S-Flp-EGFP, and (F) for Dhp-EGFP. The emission spectra of the native (black curves) and refolded samples (green corresponds to EGFP, orange to S-Flp-EGFP and blue to Dhp-EGFP, respectively) of each protein variant are normalized to the maximum fluorescence of the appropriate native state. Please click here to view a larger version of this figure.
Figure 8: Monitoring protein folding and chromophore maturation of EGFP variants with fluorescence. (A) Fluorescence emission in the region of Trp fluorescence (emission was set to 330 nm) recorded upon excitation with ultraviolet light (295 nm). (B) Development of the fluorescence amplitude in the region of chromophore emission upon excitation with green light (488 nm). The time-dependent fluorescence traces were normalized to unity (100%) according to the fluorescence amplitude reached at the end of the monitoring interval. In each panel, black curves correspond to the spectra of the fluorescent protein variant with native proline, orange curves indicate the spectra of S-Flp-substituted proteins and blue curves correspond to Dhp-substituted proteins. Please click here to view a larger version of this figure.
Construct | Amino acid sequences (6xHis tag underlined): | |||
EGFP-H6 | MVSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVP WPTLVTTLTYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTR AEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVN FKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDH MVLLEFVTAAGITLGMDELYKHHHHHH |
|||
H6-NowGFP | MRGSHHQHHHGSVSKGEKLFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGK MSLKFICTTGKLPVPWPTLKTTLTWGMQCFARYPDHMKQHDFFKSAMPEGY VQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGVDFKEDGNILGHKLEYN AISGNANITADKQKNGIKAYFTIRHDVEDGSVLLADHYQQNTPIGDGPVLLPD NHYLSTQSKQSKDPNEKRDHMVLLEFVTAAGIPLGADELYK |
|||
H6-KillerOrange | MRGSHHHHHHGSECGPALFQSDMTFKIFIDGEVNGQKFTIVADGSSKFPH GDFNVHAVCETGKLPMSWKPICHLIQWGEPFFARYPDGISHFAQECFPEG LSIDRTVRFENDGTMTSHHTYELSDTCVVSRITVNCDGFQPDGPIMRDQ LVDILPSETHMFPHGPNAVRQLAFIGFTTADGGLMMGHLDSKMTFNGSR AIEIPGPHFVTIITKQMRDTSDKRDHVCQREVAHAHSVPRITSAIGSDQD |
Table 1: Primary structures of the target proteins. His-tags are underlined in each sequence.
λ [nm] | ε [M-1·cm-1] (EGFP) | ε [M-1·cm-1] (S-Flp-EGFP) | ε [M-1·cm-1] (Dhp-EGFP) |
488 (≡ CRO) | 31,657 (± 1,341) | 22,950 (± 290) | 27,800 (± 542) |
280 (≡ Tyr+Trp) | 20,116 (± 172) | 23,800 (± 715) | 17,300 (± 554) |
Values for extinction coefficient ε (in M-1·cm-1) are calculated from recorded UV-Vis absorption spectra of appropriate EGFP variants using known protein concentrations. Selected wavelength at 280 nm corresponds to the maximum absorbance of aromatic residues, tyrosine and tryptophan, and 488 nm represents the chromophore absorbance wavelength. |
Table 2: Extinction coefficients (ε) of EGFP variants at selected wavelengths. Values for the extinction coefficient ε (in M-1·cm-1) are calculated from recorded UV-Vis absorption spectra of appropriate EGFP variants using known protein concentrations. The selected wavelength of 280 nm corresponds to the maximum absorbance of aromatic residues, tyrosine, and tryptophan, whereas 488 nm represents the maximum chromophore absorbance wavelength.
Supplementary Material: Preparation of stock solutions and buffers Please click here to download this File.
In nature, manipulations with protein structures and functions typically occur due to mutations, the phenomenon that leads to an exchange of an amino acid identity at certain positions in the protein sequence. This natural mechanism is widely applied as a biotechnological method for protein engineering in the form of mutagenesis, and it relies on the repertoire of the 20 canonical amino acids involved in the process. The exchange of proline residues is problematic, however. Due to its special backbone group architecture, it is hardly interchangeable with the remaining 19 residues for replacement72. For example, proline is typically known as a secondary structure breaker in polypeptide sequences because of its poor compatibility with the most common secondary structures, i.e., α-helix and β-strand. This proline feature is easily lost when the residue is mutated to another amino acid from the common repertoire. The replacement of proline with its chemical analogs offers an alternative approach, which enables to keep the basic backbone features of the parent proline residue while imposing bias on its specific conformational transitions or producing modulations of the molecular volume and polarity. For example, it is possible to supply bacterial cultures with analog structures such as hydroxy-, fluoro-, alkyl-, dehydroprolines, structures having variable ring sizes and more, thus facilitating the production of a protein containing specific proline residue alterations.
The selective pressure incorporation (SPI) method described in this study allows for a global, i.e., residue-specific replacement of all prolines in the target protein with related chemical analogs. The importance of the method is reflected by the fact that SPI allows creating sequence changes inaccessible to common mutagenesis techniques. For example, it allows the production of a target protein containing rather small structural changes that may typically not exceed one or two atom replacements/deletions/additions, as demonstrated in this study. Such protein modifications are dubbed "atomic mutations"73,74. In a fluorescent protein such as GFP, the result of this molecular intrusion can be seen in the velocity of folding, local polarities, protein packing, stability of the involved structural features. The changes in the absorbance and fluorescence properties are produced indirectly due to the impact on protein folding and residue microenvironments. The precision of the molecular changes performed by SPI is typically much higher, as compared to mutations of prolines to other canonical residues, the latter being typically detrimental for the protein folding, production, and isolation.
As a production method, the SPI approach uses the substrate tolerance of the aminoacyl-tRNA synthetase pocket towards chemical analogs of the native amino acid. The synthetase is responsible for the correct identification of the amino acid structure, while the incorporation into proteins occurs downstream in the translation process. Instrumentally, the protein production, isolation, and purification in SPI are performed in a way typical for any other recombinant protein expression techniques; however, with some additions to the protocol as follows: Proline, which is bound for replacement, is provided at the beginning of the fermentation process, such that the cells can grow and develop their intact cellular machinery. However, the cell culture is not allowed to reach the maximal optical density, to keep the cells in the logarithmic phase optimal for protein expression. There are two major variations of the SPI method at this point. In the first one, the concentration of proline is adjusted in the initial growth medium (a chemically defined medium) such that depletion of proline happens without any external intrusion. The cells exhaust proline in the medium before they can exit the logarithmic growth phase, and then, subsequently, the analog is added, and the protein of interest production is induced. In the second version of the method, the cells are grown in the medium containing proline until the middle of their logarithmic phase. At this point, the cells should be taken out and physically transferred into another medium, which no longer contains proline, only the analog, with subsequent induction of the protein of interest. In both versions, the analog and the protein induction reagent are provided to the pre-grown cells. The isolation and purification of the wild-type protein are performed in the same way as for the variants. In principle, every available Pro-auxotrophic strain can be used as an expression host. Nevertheless, expression tests to identify the most suitable host are advisable. Also, tests of different chemically defined media can be used to optimize protein yield.
There are certain requirements regarding the chemical analogs that need to be considered for SPI, such as solubility and concentration. The metabolic availability and uptake of amino acids are dependent on the number of dissolved molecules in the medium. To increase the solubility of a particular compound, slightly acidic or alkaline conditions can be chosen. Since the artificial molecules can cause growth inhibitory effects due to their cell toxicity, the concentration should be lowered to a minimum in order to avoid cell stress75.
A minor weakness of SPI is the decrease of incorporation efficiency with larger numbers of positions that need to be exchanged. In principle, a reduction of the amino acid frequency within the target biomolecule by site-directed mutagenesis can solve this problem. However, the structural and functional properties of a desired protein might be affected by changing the primary structure.
As mentioned before, SPI allows residue-specific replacement of the canonical amino acid. This implies that non-canonical amino acids are inserted into every position of the canonical amino acid within the target protein, including conserved residues that are indispensable for protein function or folding. Alternative methods for site-specific incorporation are the only possibility to overcome this issue3. In the past few decades, the orthogonal pair method has been developed that can produce proteins containing modified residues at predefined sites. The most common modification of this method is known as stop codon suppression. This method is based on an engineered orthogonal translation system dedicated for site-specific incorporation of synthetic amino acids76. More than 200 amino acids with different side-chain modifications have been incorporated into proteins to date using this approach77. However, these translation systems are still not suitable for insertions of proline analogs into target proteins. Furthermore, the method's performance is considered low in the case of minor amino acid modifications because some background promiscuity of the aminoacyl-tRNA synthetase typically remains in the engineered translation systems.
Using SPI, we produced a number of β-barrel fluorescent protein variants and studied outcomes of the exchange of proline with its unnatural analogs. In the case of proline replacement with R-Flp and Dfp, a dysfunctional protein was produced by the expression host. The effect is likely produced by protein misfolding. The latter may originate from the C4–exo conformation promoted by R-Flp, which is unfavored by the parent protein structures27. With Dfp, the misfolding is likely to be produced by the diminished velocity of the trans-to-cis peptide bond isomerization at the proline residue27. The latter is known to be among the limiting steps in the kinetic profile of the protein folding that affects β-barrel formation and subsequent chromophore maturation. Indeed, for both amino acids, R-Flp and Dfp, the protein production resulted in an aggregated and insoluble protein. Consequently, chromophore formation could not occur, and the fluorescence was lost entirely. With S-Flp and Dhp, however, proper protein maturation was observed, resulting in fluorescent protein samples for each analog/protein combination. Despite some modulations in the absorbance and fluorescence features of the protein, these largely remained similar to those of the wild-type proteins. The effect of the amino acid substitution was revealed in the refolding kinetics studies. The latter showed a faster refolding in the case of replacement with S-Flp. Model studies have shown that this residue may generate some improvement in the trans-to-cis amide rotation velocity and lead to the formation of the C4–endo conformation. Both these factors are likely to contribute to the beneficial kinetic effects of this residue in EGFP. In contrast, Dhp produced kinetic folding profiles maximally similar to the parent protein. The diversity of the outcomes produced by mere atomic mutations in the examined fluorescent proteins illustrates the potential of the SPI production method in altering target protein properties. The protein alterations induced by proline replacement with the analogs have further implications in the engineering of enzymes78,79,80 and ion channels81,82, as well as in general engineering of protein stability.
The basic limitation of the SPI method is its "all-or-none" mode in exchanging proline resides with related analogs. It would be of great advantage to be able to select precisely, which proline residues should be replaced with the analogs, and which ones should remain unmodified. However, at present, there is no technique that could perform such a sophisticated production using a microbial production host. Chemical synthesis of proteins83,84, as well as cell-free production85,86, are the two alternative methods that can produce position-specific proline modifications. Nonetheless, their operational complexity and low production yields make them inferior compared to the production in living cells. As of now, SPI remains the most operationally simple and robust approach for the production of complex proteins bearing atomic mutations. By introducing unnatural amino acid substitutes, the method allows modifying protein features in a targeted manner, as exemplified here by alterations in folding and light absorption/emission of fluorescent proteins generated by proline replacements.
The authors have nothing to disclose.
This work was supported by the German Research Foundation (Cluster of Excellence "Unifying Systems in Catalysis) to T.F. and N.B. and by the Federal Ministry of Education and Science (BMBF Program "HSP 2020", TU-WIMIplus Project SynTUBio) to F.-J.S. and T.M.T.T.
Acetonitrile | VWR | HiPerSolv CHROMANORM ULTRA for LC-MS, 83642 | LC-MS grade required |
Acrylamide and bisacrylamide aqueous stock solution at a ratio of 37.5:1 (ROTIPHORESE Gel 30) | Carl Roth | 3029.1 | |
Agar-agar | Carl Roth | 5210 | |
Ammonium molybdate ((NH4)2MoO4) | Sigma-Aldrich | 277908 | |
Ammonium peroxydisulphate (APS) | Carl Roth | 9592.2 | ≥98 %, p.a., ACS grade required |
Ammonium sulfate ((NH4)2SO4) | Sigma-Aldrich | A4418 | |
Ampicillin sodium salt | Carl Roth | K029 | |
Biotin | Sigma-Aldrich | B4501 | |
Bromophenol blue | Sigma-Aldrich | B0126 | |
Calcium chloride (CaCl2) | Sigma-Aldrich | C5670 | |
Coomassie Brillant Blue R 250 | Carl Roth | 3862 | |
Copper sulfate (CuSO4) | Carl Roth | CP86.1 | |
D-glucose | Carl Roth | 6780 | |
1,2-Bis-(dimethylamino)-ethane, N,N,N',N'-Tetramethylethylenediamine (TEMED) | Carl Roth | 2367.3 | ≥99 %, p.a., for electrophoresis |
1,4-dithiothreitol (DTT) | Carl Roth | 6908 | |
Dichloromethane (DCM) | Sigma-Aldrich | 270997 | |
di-potassium hydrogen phosphate (K2HPO4) | Carl Roth | P749.1 | |
di-sodium hydrogen phosphate (Na2HPO4) | Carl Roth | X987 | |
DNase I | Sigma-Aldrich | D5025 | |
Dowex 50WX8-100 (hydrogen form) | Acros Organics / Thermo Fisher Scientific (Waltham, U.S.A.) | 10731181 | cation exchange resin |
Ethanol | Carl Roth | 9065.1 | |
Formic acid | VWR | HiPerSolv CHROMANORM for LC-MS, 84865 | LC-MS grade required |
Glacial acetic acid | Carl Roth | 3738.5 | 100 %, p. a. |
Glycerol | Carl Roth | 3783 | |
Imidazole | Carl Roth | X998 | |
Hydrogen chlroide (HCl) | Merck | 295426 | |
Iron(II) chloride (FeCl2) | Sigma-Aldrich | 380024 | |
Isopropanol | Carl Roth | AE73.1 | |
Isopropyl β-D-1-thiogalactopyranoside (IPTG) | Sigma-Aldrich | I6758 | |
Lysozyme | Sigma-Aldrich | L6876 | |
Magnesium chloride (MgCl2) | Carl Roth | KK36.1 | |
Magnesium sulfate (MgSO4) | Carl Roth | 8283.2 | |
Manganese chloride (MnCl2) | Sigma-Aldrich | 63535 | |
β-mercaptoethanol | Carl Roth | 4227.3 | |
PageRuler Unstained Protein Ladder | Thermo Fisher Scientific | 26614 | |
Potassium chloride (KCl) | Carl Roth | 6781.3 | |
Potassium dihydrogen phosphate (KH2PO4) | Sigma-Aldrich | P5655 | |
RNase A | Carl Roth | 7156 | |
Sodium chloride (NaCl) | Carl Roth | P029 | |
Sodium dihydrogen phosphate (NaH2PO4) | Carl Roth | T879 | |
Sodium dodecyl sulphate (NaC12H25SO4) | Carl Roth | 0183 | |
Thiamine | Sigma-Aldrich | T4625 | |
Trifluoroacetic acid (TFA) | Sigma-Aldrich | T6508 | |
Tris hydrochloride (Tris-HCl) | Sigma-Aldrich | 857645 | |
Tris(hydroxymethyl)-aminomethane (Tris) | Carl Roth | 5429 | |
Tryptone | Carl Roth | 8952 | |
Yeast extract | Carl Roth | 2363 | |
Zinc chloride (ZnCl2) | Sigma-Aldrich | 229997 | |
L-alanine | Sigma-Aldrich | A7627 | |
L-arginine | Sigma-Aldrich | A5006 | |
L-asparagine | Sigma-Aldrich | A8381 | |
L-aspartic acid | Sigma-Aldrich | A0884 | |
L-cysteine | Sigma-Aldrich | C7352 | |
L-glutamic acid | Sigma-Aldrich | G2128 | |
L-glutamine | Sigma-Aldrich | G3126 | |
L-glycine | Sigma-Aldrich | G7126 | |
L-histidine | Sigma-Aldrich | H8000 | |
L-isoleucine | Sigma-Aldrich | I2752 | |
L-leucine | Sigma-Aldrich | L8000 | |
L-lysine | Sigma-Aldrich | L5501 | |
L-methionine | Sigma-Aldrich | M9625 | |
L-phenylalanine | Sigma-Aldrich | P2126 | |
L-proline | Sigma-Aldrich | P0380 | |
L-serine | Sigma-Aldrich | S4500 | |
L-threonine | Sigma-Aldrich | T8625 | |
L-tryptophan | Sigma-Aldrich | T0254 | |
L-tyrosine | Sigma-Aldrich | T3754 | |
L-valine | Sigma-Aldrich | V0500 | |
(4S)-fluoroproline | Bachem | 4033274 | Make sure that all proline analogs are proline free, check content. Otherwise include a step to consume proline contaminations during expression. |
(4R)-fluoroproline | Bachem | 4033275 | Make sure that all proline analogs are proline free, check content. Otherwise include a step to consume proline contaminations during expression |
3,4-dehydroproline | Bachem | 4003545 | Make sure that all proline analogs are proline free, check content. Otherwise include a step to consume proline contaminations during expression |
4,4-difluoroproline | Enamine | EN400-17448 | Make sure that all proline analogs are proline free, check content. Otherwise include a step to consume proline contaminations during expression |
Conical polystyrene (Falcon) tubes, 15 mL | Fisher Scientific | 14-959-49B | |
Conical polystyrene (Falcon) tubes, 50 mL | Fisher Scientific | 14-432-22 | |
Dialysis membrane, Molecular Weight Cut-Off (MWCO) 5,000 | Spectrum Medical Industries | Spectra/Por MWCO 5000 dialysis membrane, 133198 | |
Immobilized Metal ion Affinity Chromatography (IMAC) column 1 mL, Ni-NTA | GE Healthcare | HisTrap HP, 1 mL, 17-5247-01 | |
Luer-Lock syringe, 5 mL | Carl Roth | EP96.1 | |
Luer-Lock syringe, 20 mL | Carl Roth | T550.1 | |
Luer-Lock syringe, 50 mL | Carl Roth | T552.1 | |
Microcentrifuge tubes, 1.5 mL | Eppendorf | 30120086 | |
Petri dishes (polystyrene, sterile) | Carl Roth | TA19 | |
pQE-80L plasmid vector | Qiagen | no longer available | replaced by N-terminus pQE Vector set Cat No./ID: 32915 |
Pro-auxotrophic E. coli strain JM83 | Addgene | 50348 | https://www.addgene.org/50348/ |
Pro-auxotrophic E. coli strain JM83 | ATCC | 35607 | |
Round-bottom polystyrene tubes, 14 mL | Fisher Scientific | Corning Falcon, 14-959-1B | |
Syringe filter 0.45 µm with polyvinylidene difluoride (PVDF) membrane | Carl Roth | CCY1.1 | |
High-Performance Liquid Chromatography (HPLC) column for LC-ESI-TOF-MS | Sigma-Aldrich | Supelco Discovery BIO Wide Pore C5 HPLC column, 3 µm particle size, 10 cm x 2.1 mm | with conical 0.1 mL glass inserts, screw caps and septa |
HPLC autosampler vials 1.5 mL | Sigma-Aldrich | Supelco 854165 | |
Mass spectrometer for LC-ESI-TOF-MS | Agilent | Agilent 6530 Accurate-Mass QTOF | |
Mass spectrometry data analysis software | Agilent | MassHunter Qualitative Analysis software v. B.06.00 | |
Benchtop centrifuge for 1.5 mL Eppendorf tubes | Eppendorf | 5427 R | |
Cooling centrifuge for 50 mL Falcon tubes | Eppendorf | 5810 R | |
Fast Protein Liquid Chromatography (FPLC) system | GE Healthcare | ÄKTA pure 25 L | |
Fluorescence spectrometer | Perkin Elmer | LS 55 | |
High pressure microfluidizer for bacterial cell disruption | Microfluidics | LM series with “Z” type chamber | |
Orbital shaker for bacterial cultivation | Infors HT | Minitron | |
Peristaltic pump for liquid chromatography (LC) | GE Healthcare | P-1 | |
Ultrasonic homogenizer for bacterial cell disruption | Omnilab | Bandelin SONOPULS HD 3200, 5650182 | with MS72 sonifier tip |
UV-Vis spectrophotometer | Biochrom | ULTROSPEC 2100 | |
UV-Vis/NIR spectrophotometer | Perkin Elmer | LAMBDA 950 UV/Vis/NIR |