Restriction endonucleases with new sequence specificity can be developed from enzymes recognizing a partially degenerate sequence. Here we provide a detailed protocol that we successfully used to alter the sequence specificity of NlaIV enzyme. Key ingredients of the protocol are the in vitro compartmentalization of the transcription/translation reaction and selection of variants with new sequence specificities.
Restriction endonuclease (REase) specificity engineering is extremely difficult. Here we describe a multistep protocol that helps to produce REase variants that have more stringent specificity than the parental enzyme. The protocol requires the creation of a library of expression selection cassettes (ESCs) for variants of the REase, ideally with variability in positions likely to affect DNA binding. The ESC is flanked on one side by a sequence for the restriction site activity desired and a biotin tag and on the other side by a restriction site for the undesired activity and a primer annealing site. The ESCs are transcribed and translated in a water-in-oil emulsion, in conditions that make the presence of more than one DNA molecule per droplet unlikely. Therefore, the DNA in each cassette molecule is subjected only to the activity of the translated, encoded enzyme. REase variants of the desired specificity remove the biotin tag but not the primer annealing site. After breaking the emulsion, the DNA molecules are subjected to a biotin pulldown, and only those in the supernatant are retained. This step assures that only ESCs for variants that have not lost the desired activity are retained. These DNA molecules are then subjected to a first PCR reaction. Cleavage in the undesired sequence cuts off the primer binding site for one of the primers. Therefore, PCR amplifies only ESCs from droplets without the undesired activity. A second PCR reaction is then carried out to reintroduce the restriction site for the desired specificity and the biotin tag, so that the selection step can be reiterated. Selected open reading frames can be overexpressed in bacterial cells that also express the cognate methyltransferase of the parental REase, because the newly evolved REase targets only a subset of the methyltransferase target sites.
Sequence specificity engineering is extremely challenging for class II REases. In this class of endonucleases, sequence recognition and catalysis are closely intertwined, most probably as an evolutionary safeguard against creation of an endonuclease of broader specificity than its cognate methyltransferase, which would damage host DNA. Directed evolution of new specificities in cells is further complicated by the need to protect host DNA against the newly engineered endonuclease activity. Therefore, there are only a few successful attempts of REase engineering reported and all of them exploit the unique features of a particular enzyme1,2,3,4,5,6,7.
Here we provide a detailed protocol for specificity engineering that can be used to generate endonuclease variants that have narrower specificity than a parental enzyme that is based on our successful engineering of a NlaIV endonuclease8. For any such enzyme with an arbitrary recognition sequence, extra specificity can be introduced for bases in the flanks. For parental enzymes that recognize partially degenerate sequences (such as NlaIV with its GGNNCC target), additional specificity can also be introduced within the recognition sequence. As extra specificity will likely require protein-DNA contacts, the newly recognized bases should lie within the footprint of the parental endonuclease on DNA. In principle, selection schemes can be set up for any desired specialization of the recognition sequence. However, most REases that recognize palindromic and nearly palindromic target sequences are functional dimers that recognize only a half-site of the palindrome. Hence, selection of new specificities that violate the symmetry of protein nucleic interactions is unlikely to work. For the dimeric NlaIV, for example, the GGNNCC sequence can theoretically be narrowed down to GGATCC but narrowing the specificity down to GGAACC is expected to be more difficult. Our scheme involves both positive and negative selection.
The process is more efficient when negative selection is also used to remove the specificities able to cleave all sequences other than the preferred narrower specificity. For example, selection for GGATCC could be combined with antiselection against GGBVCC (where B is any base other than A, and V is any base other than T). When some of the possible target sequences are not covered, the outcome of the selection experiment depends on the effectiveness of positive and negative selection. In our NlaIV work, we selected for GGATCC, and against GGSSCC (where S is G or C), and obtained a specificity that, ignoring symmetry breaking targets, could be described as GGWWCC (where W is A or T), suggesting that in this particular case, negative selection was more important than positive selection.
Our approach starts with the creation of an expression selection cassette (ESC). The ESC is structured in sections. On the inside core section, there are variants of the open reading frame (ORF) of the REase, under T7 promoter control. This core section of the ESC cannot contain any cognate site for the engineered REase. The core is sandwiched between two cognate sites for wild type REase: a cleavage site for the undesired activity (counter selected sequence, GGSSCC in this example) and a cleavage site for the desired activity (selected sequence, GGATCC in the example). The final step of the preparation of the ESC in PCR adds biotin close to the desired activity at the 5' end and creates a variety of counter selected sequences (GGSSCC in the example). The selection strategy relies on the use of carefully designed primers at the ESC reamplification protocol after an in vitro transcription/translation/selection protocol (Figure 1A). The ESC library is expressed in an in vitro compartmentalized transcription translation water-in-oil emulsion9,10,11. Within each droplet, the specificity of the expressed enzyme affects the state of the ESC (Figure 1B, step I). For the described arrangement, the desired cleavage activity of the translated protein removes the DNA's biotin tag but does not affect the other ESC end with the counter selected sequence. When the emulsion is broken, biotinylated fragments are removed by streptavidin affinity pulldown, so that only fragments from droplets with the desired activity remain (Figure 1B, step II). This step removes inactive REase variants. The supernatant fraction of the pull-down step is then amplified by PCR. In the first PCR reaction primers F2 and R1 are used (Figure 1A,B, step III). Primer F2 binds to the ESC section between the counter selected sequence and the molecule end. Therefore, ESCs expressing variants that are capable of cleaving the counter selected sequence (and, therefore, separate the binding sites for primers F2 and R1 into two different DNA molecules) are not amplified and are thus removed from the library. The primer R1 binds between the selected site and the core of the ESC so that it is not affected by the cleavage status of the selected site and restores the cleavage site for the desired activity (GGATCC). The cycle is closed by a second PCR (with primers F1 and R2) that adds biotin at the 5' end close to the selected site and restores designed variation at the counter selected site close to the opposite end of the ESC (Figure 1B, step IV). The resulting DNA mixture is ready for another round of selection.
The success of the selection protocol depends strongly on the proper choice of the new, more stringent target recognition sequence and on careful design of the mutagenesis strategy and its effective implementation. Because it is much easier to improve upon slight preexisting preferences of the REase than to overcome them, we recommend starting with a kinetic study of any preexisting preferences. The necessity of careful mutagenesis design results from the limited size of a mutant library that can be processed by the presented protocol (109 clones in a single experiment). Therefore all 20 possible amino acid substitutions can be effectively tested in only a few positions (see Discussion). Random mutagenesis, such as error-prone PCR (EP-PCR) presented as an alternative method, will lead to profound undersampling of existing complexity. If any information concerning potential amino acid positions involved in contacts with DNA (or even located in a close proximity to the degenerate nucleotides in a cognate sequence) is available, it certainly should be used to select a few amino acids for oligonucleotide guided saturation mutagenesis (protocol steps 1.6-3.10).
1. Preparation of ESCs
2. Split-and-mix Synthesis of Mutagenic Primers
NOTE: This step is used only for projects that require subsaturation mutagenesis at more than one site. A synthesizer with multiple synthesis columns is required. Assign columns for synthesis of randomized NNS codon triplets and wild type codon triplets according to the mutagenesis frequencies. For example, if seven equal volume synthesis columns are available, and a mutagenesis rate of 0.3 is desirable at a given site, add randomized NNS codons in ~0.3 x 7 or two columns, and wild type codons in ~0.7 x 7 or five columns (Figure 3).
3. Generating Variant Libraries
NOTE: Use the recombinant plasmid from step 1.6.
4. Performing Compartmentalized In Vitro Transcription-translation Reaction
5. Continued Processing of Libraries and Selection
6. Screen variants for Altered Sequence Specificity
This protocol is just a tool to increase the frequency of desired variants of an engineered REase by depleting (but not eliminating) two unwanted classes: inactive enzymes and endonucleases with unchanged wild type sequence specificity. On the other hand, because changing REase specificity is extremely difficult, finding even one such variant producing a cleavage pattern that is different from the wild type enzyme in a single screening of 24 clones should be considered a success. In our hands the best screens could identify up to 20% of promising variants (Figure 8A).
The positive outcome strongly depends on a library quality (i.e., limited frequency of substitutions and their random distribution) and efficient capture of the biotinylated population of library members (steps 3.6–3.7). Both problems can be detected. The library quality should be checked prior to the selection by sequencing as many clones as possible (>15) or by direct sequencing of the library by high throughput sequencing (step 3.10, Table 3). If a majority of the selected clones are not active, this is a clear indication of failure of the streptavidin capture selection. A similar effect is observed in the case of libraries that undergo many selection cycles, because such libraries are most probably dominated by inactive variants that escaped the streptavidin capture selection step (Figure 8B). Therefore, it is advisable to run screening after every selection cycle and further develop manually selected promising variants rather than to depend on selection iteration.
Figure 1: In vitro selection of a new sequence specificity based on NlaIV engineering. (A) The organization of the expression/selection cassette (ESC) includes two recognition sites for REase, 1) the selected sequence (GGATCC) close to the right end and 2) the counter selected sequence (GGSSCC) close to the left end, as well as the T7p and T7t-T7 promoter and T7 terminator. The primer binding sites are shown below. Cleavage by wild type and selected NlaIV variants are shown as red and green triangles respectively. (B) Selection cycle steps: I) Emulsification of transcription-translation-cleavage reaction mixes with the ESC library; II) All biotinylated DNA is captured on magnetic particles coated with streptavidin and removed, thus removing encoding inactive variants; III) ESCs encoding REases with wild type activity (i.e., those able to cleave the GGSSCC sequence) are eliminated because cleavage of the sequence separates the binding sites for the forward and reverse primers. Therefore, no amplification of these ESCs occurs; IV) Input for the next selection round is created by addition of biotin on the right end and reintroducing variation of the counter selected sequences on the left end. Reprinted from Czapinska et al.8 with permission from Elsevier. Please click here to view a larger version of this figure.
Figure 2: Preparation of ESC. Fragment derived from the original construct in an expression vector containing NlaIV ORF under control of the T7 promoter was modified to be suitable for expression/selection. The NlaIV site downstream from the NlaIV ORF was removed and unique sites (SalI, EcoRI and Eco52I) that were used to mutagenize selected positions were introduced in the NlaIV ORF as silent mutations. The final construct was amplified with flanking primers that introduced two flanking NlaIV sites: The counter selected sequence (GGSSCC) on the left and selected sequence (GGATCC) on the right. The reverse primer also introduced biotin. Primers used in creation of mutated ECS are shown as blue arrows and labeled below (see Table 1B,C). Please click here to view a larger version of this figure.
Figure 3: Scheme of split and mix synthesis. The example refers to MutB primer synthesis where an NNS sequence was introduced at 0.8 frequency at four positions (see also Table 3). Note that chemical synthesis is carried out from 3' to 5' but all sequences are shown in canonical 5'-3' orientation (i.e., it proceeds from right to left in this scheme). Wild type sequences at mutagenized positions are shown in green while NNS mutagenic sequences are in red. The SalI recognition site that is later used to introduce mutations in ESCs is underlined. Points of mixing and splitting steps (2 and 4) are indicated. Please click here to view a larger version of this figure.
Figure 4: Use of unique restriction enzyme sites in oligonucleotide targeted mutagenesis. The strategy of mutation introduction is shown on an example of the construction of libraries A-C (see steps 3.1–3.7). Reprinted from Czapinska et al.8 with permission from Elsevier. Please click here to view a larger version of this figure.
Figure 5: Endonucleolytic cleavage in in vitro transcription-translation. (A) Cleavage of a test substrate in optimal REase buffer: 1) Substrate, 612 bp PCR product with a single NlaIV recognition site; 2) Cleavage products, 355 bp and 257 bp. (B) Cleavage in an in vitro transcription-translation reaction (containing 0.5 µg of ESC): 1–2) 15 µL aliquots of in vitro transcription translation without substrate (line 2: reaction supplemented with 1.5 mM MgCl2); 3–4) 15 µL aliquots of in vitro transcription-translation with 1 µg of test substrate; (line 4: reaction supplemented with 1.5 mM MgCl2). S-DNA size marker (pBR322 digested with MspI). Samples were resolved in 6% native PAGE. DNA was stained with ethidium bromide. Please click here to view a larger version of this figure.
Figure 6: Products of the first PCR in the selection cycle. See Figure 1B, step III; protocol step 5.10. Column sets 1 and 2 are aliquots of two different libraries loaded in triplicate. S-DNA size standard (lambda DNA digested with HindIII and EcoRI). Arrow indicates position of the full-length ESC (1,050 bp). Please click here to view a larger version of this figure.
Figure 7: NlaIV variants purified for further screening in mini scale. See step 6.3.11. Each line contains a 10 µL aliquot of a different variant. S-protein molecular weight standard. Molecular mass of NaIV REase subunit is 29.9 kDa. Please click here to view a larger version of this figure.
Figure 8: Examples of screening of NlaIV variants for sequence specificity alteration. See step 6.4.2. (A) Successful screening with high frequency of promising variants. S = DNA size marker, lambda DNA cleaved with HindIII and EcoRI; wild type (wt) = lambda DNA cleaved with wild type NlaIV; λ = lambda DNA substrate, not cleaved; other columns=variants with very low activity. Variants are labeled ! = promising variants that produce a cleavage pattern distinct from the wild type enzyme; ? = variants that also might have altered sequence preference. (B) Unsuccessful screening, with a majority of variants inactive and one variant with apparently unaltered cleavage pattern. Please click here to view a larger version of this figure.
Figure 9: Alternative selection by ligation. This alternative can be used for all REases generating sticky ends. Here we present an example protocol for a selection scheme for MwoI enzyme (unpublished). I) Selected sequence (located at the right end of the ESC) with defined residues shown in red and selected variation of the cognate sequence shown in blue. In parentheses below the counter selected sequence to be placed at the left end of the ESC is shown; II) Product of MwoI cleavage; III) After terminating in vitro transcription/translation, products are purified and ligation is performed with excess adapter. Only the cleavage products that were cleaved in the selected sequence can participate in ligation. Therefore, inactive variants are eliminated, and the pulldown step is unnecessary. The cleavage product in the counter selected sequence (on left end of the ESC, not shown) cannot participate in this ligation because the protruding end of the adaptor is not complementary to the counter selected sequence; IV) Selective PCR uses the same strategy as in the main protocol to eliminate variants with the wild type degenerate sequence specificity (F1 primer binding distal to the counter selected site) whereas inactive variants are eliminated by the selective reverse primer that cannot bind to the uncleaved (and therefore not modified by adapter ligation) right end. In the next cycle the process can be iterated by using adaptor that is identical to the cleavage product of the preceding step (i.e., the "cleaved cassette" in panel III), and an appropriate selective reverse primer. Please click here to view a larger version of this figure.
Table 1: Primers used in NlaIV engineering. Sequences of the restriction sites mentioned in the comments are underlined. Small letters indicate sequences that do not have complements in the DNA templates. Please click here to view this table (Right click to download).
Table 2: Conditions of PCR reactions to be used in the protocol. Tm = primer melting temperature (if Tm is different for the primers, the lower Tm should be used). Please click here to view this table (Right click to download).
Table 3: Results of quality check of two mutagenic primers synthesized with split-and-mix strategy. Mutagenized codons are indicated with [XXX]. A lower index number indicates the position of an encoded amino acid. Adapted from Czapinska et al.8 with permission from Elsevier. Please click here to view this table (Right click to download).
Table 4: Results of EP-PCR. Main parameters derived from sequence analysis of 22 clones of ECS. Please click here to view this table (Right click to download).
The selection protocol described here was tested for NlaIV8, a dimeric PD-(D/E)XK fold recognition sequence that recognizes a palindromic target site with central NN bases and catalyzes a blunt end cut between the NN bases. NlaIV was picked because cleavage between the NN bases suggests that these bases are close to the protein in the complex. In principle, the protocol could be used for any sequence specific restriction endonuclease, monomeric or dimeric, of any fold group, catalyzing double strand breaks of any stagger, irrespective of whether catalytic and specificity domains coincide (as in the NlaIV example) or are separate (e.g., FokI). Moreover, the protocol in principle is useful not only for the generation of new, more narrow enzyme specificity, but could also be used to eliminate star activities, or to create high fidelity endonucleases. However, all this has not been tested yet. In particular, targeted elimination of star activity may be complicated, because the same amino acid residues could be involved in binding to the desired and undesired bases. The in vitro steps described in this protocol are not limited to the selection of narrowed down specificities but could also be used to select otherwise altered specificities. However, there is then a problem with variant endonucleases: if the spectrum of substrates includes novel targets not cleaved by the parental endonuclease, there is in general no good way to protect cells from the harmful effects of this activity. In contrast, if endonuclease specificity is only narrowed down, the targets are a subset of the wild type targets, and hence the already available cognate methyltransferase should be fully protective.
Our protocol differs in several respects from many directed evolution protocols. Open reading frame diversity is generated once at the beginning of the experiment, not in every iteration. Moreover, it is created by split-and-mix synthesis, rather than by EP-PCR. For NNS substitutions of codons, as used in this work, there are (4 x 4 x 2)6 ~ 1.07 x 109 combinations for six positions. Therefore, any given variant is present on average once in 1.7 fmoles of ESC. This capacity can be increased to seven positions by using synthesis with a mixture of 20 trinucleotide precursors that is offered by Glen Research or by decreasing mutation frequency in less promising positions with split-and-mix oligonucleotide synthesis. If possible, it is recommended to limit the extent of variation to six positions. Obviously, such mutagenesis targeting requires some preexisting knowledge about at least the regions of the REase involved in substrate binding. The split-and-mix protocol to generate diversity has clear advantages in comparison to EP-PCR. Using EP-PCR, we obtained unchanged variants and sequences carrying eight substitutions for NlaIV ESCs in the same EP-PCR (Table 4). The library from EP-PCR contains a substantial fraction of clones that should be avoided (wild type sequences, multiple substitutions, frameshift and nonsense mutations, and mutations in places unlikely to affect sequence specificity).
Our protocol also differs from many other directed evolution protocols by the presence of two sequential selection steps. Positive selection makes sure that the desired activity is retained, otherwise the biotin tag is not removed, and the coding sequence can be removed by pull-down. It is technically possible that the fortuitous emergence of a novel, non-overlapping specificity (e.g., GCATGC) could lead to severing of the biotin tag as well, if a suitable cleavage site is present near the desired cleavage, but not elsewhere. However, this should be highly unlikely. Negative selection removes open reading frames that code for enzymes that still have the undesired activity. This step is not strictly mandatory, because the protocol will still enrich the output library with variants that are able to cleave the selection sequence but not able to cleave elsewhere in the ESC, therefore rendering it unsuitable for PCR amplification. However, selection effectiveness is expected to be lower because enzymes with the original sequence specificity will not be removed from the output and will outcompete promising variants with altered specificity but also decreased enzymatic activity. Note that at the population level, both desired and undesired target sequences can, but need not be, degenerate. In the NlaIV example, the anti-target was degenerate and the target non-degenerate. Even when there is degeneracy at the population level, in a single droplet only one (non-degenerate) target or anti-target is present. In our protocol, target and anti-target sequences are reintroduced at every repetition of the selection steps. Therefore, an open reading frame must encode an enzyme capable of cleaving all possible targets, and unable to cleave any of the anti-targets, to survive multiple selection rounds. Notice that the need to reintroduce the antiselection target at each iteration of the protocol enforces two sequential PCRs. The first PCR uses a primer that anneals outside the anti-target, so that cleavage of the anti-target prevents the PCR reaction. The second PCR requires a primer that reaches beyond the anti-target, and reintroduces anti-target, to make sure that during multiple rounds of selection, each open reading frame is tested against all variants of the anti-target.
For enzymes that generate sticky ends, a related alternative protocol based on a previously described method for isolation of REase ORF10 can be used. The depletion of inactive variants by biotin capture that is used in our experiments is replaced in the alternative protocol by ligation of the compatible adapter with a sequence that is used as a primer binding site in a selective PCR (Figure 9). Only ESCs that produce enzymes with the selected specificity generate ligation-capable ends and will therefore be selected. The sequence of the sticky end of the counter eselected sequence must be designed in such a way that it cannot participate in ligation with adapters. Iteration of the selection process can be easily achieved by switching between two different adapters and consequently two different reverse primers in selective PCR.
Even with new protocols, the task of engineering novel specificities in vitro is still very challenging. For typical type II REases, sequence specificity and endonucleolytic activity depend on the same protein regions. It is therefore difficult to alter one without affecting the other. Success is made more likely by a strategy that takes into account the footprint of the enzyme, respects the symmetry of protein-DNA interactions, and builds on preexisting enzymatic preferences, which should be determined upfront in biochemical experiments, as was done for the NlaIV example8.
The authors have nothing to disclose.
This work was supported by the grants from the Ministry of Science and Higher Education (0295/B/PO1/2008/34 to MB and N301 100 31/3043 to KS), from the Polish National Science Centre (NCN) (UMO-2011/02/A/NZ1/00052, UMO-2014/13/B/NZ1/03991 and UMO-2014/14/M/NZ5/00558 to MB) and by short term EMBO fellowship to KS (ATSF 277.00-05).
1000Å CPG Support (dA, dT, dC, dG) | Biosset | 45-1000-050 | Other vendors can be used as well |
ASM-800 DNA/RNA | Biosset | 800-001-000 | |
GeneJET Gel Extraction Kit | Thermo Scientific | K0691 | Any other kit can be used |
Glen-Pak DNA purification cartridge | Glen Research | 60-5200 | |
HIS-Select Nickel Affinity Gel | Sigma | P6611 | |
pET 28a vector | Any other vector with T7 promoter upstream of plycloning site can be used instead | ||
Phusion High-Fidelity DNA Polymerase | Thermo Scientific | F530S | Any other high fidelity and highly processive thermophilic polymearse can be used instead |
Porous steel foil | Biosset | 40-063 | |
Rapid Translation System RTS 100, E.coli HY Kit |
Roche | 3 186 148 | |
Restriction endonucleases | Thermo Scientific | Obviously other vendors, enzymes can be used | |
Streptavidin Magnetic Beads | New England Biolabs | S1420S | Other vendors can be used as well. We have positively tested beds form Sigma |
Synthesis chemicals including phosphoramidities | Carl Roth | Other vendors can be used as well | |
Synthesis columns (different sizes) | Biosset | ||
T4 DNA ligase | Thermo Scientific | EL0011 | Any other ligase can be used |