The genetic reporter assay is a well-established and powerful tool for dissecting the relationship between DNA sequences and their gene regulatory activities. Coupling candidate regulatory elements to reporter genes that carry identifying sequence tags enables massive parallelization of these assays.
The genetic reporter assay is a well-established and powerful tool for dissecting the relationship between DNA sequences and their gene regulatory activities. The potential throughput of this assay has, however, been limited by the need to individually clone and assay the activity of each sequence on interest using protein fluorescence or enzymatic activity as a proxy for regulatory activity. Advances in high-throughput DNA synthesis and sequencing technologies have recently made it possible to overcome these limitations by multiplexing the construction and interrogation of large libraries of reporter constructs. This protocol describes implementation of a Massively Parallel Reporter Assay (MPRA) that allows direct comparison of hundreds of thousands of putative regulatory sequences in a single cell culture dish.
Massively Parallel Reporter Assays (MPRA) allow multiplexed measurement of the transcriptional regulatory activities of thousands to hundreds of thousands of DNA sequences1-7. In their most common implementation, multiplexing is achieved by coupling each sequence of interest to a synthetic reporter gene that contains an identifying sequence tag downstream of an open reading frame (ORF; Figure 1). Following transfection, RNA isolation and deep sequencing of the 3’ ends of the reporter gene transcripts, the relative activities of the coupled sequences can be inferred from the relative abundance of their identifying tags.
Figure 1. Overview of MPRA. A library of MPRA reporter constructs is constructed by coupling putative regulatory sequences to synthetic reporter genes that consist of an “inert” ORF (such as GFP or luciferase) followed by an identifying sequence tag. The library is transfected en masse into a population of cultured cells and transcribed reporter mRNA is subsequently recovered. Deep sequencing is used to count the number of occurrences of each tag among the reporter mRNAs and the transfected plasmids. The ratio of mRNA counts over plasmid counts can be used to infer the activity of the corresponding regulatory sequence. Adapted with permission from Melnikov, et al2.
MPRA can be adapted to a wide variety of experimental designs, including 1) comprehensive mutagenesis of individual gene regulatory elements, 2) scanning for novel regulatory elements across a locus of interest, 3) testing the effect of natural genetic variation in a set of putative promoters, enhancers or silencers, and 4) semi-rational engineering of synthetic regulatory elements. Libraries of sequence variants can be generated using a variety of methods, including oligonucleotide library synthesis (OLS) on programmable microarrays2,3,6,7, assembly of degenerate oligonucleotides1,4, combinatorial ligation8 and fragmentation of genomic DNA5.
This protocol describes construction of a library of promoter variants using OLS and the pMPRA1 and pMPRAdonor1 vectors (Addgene IDs 49349 and 49352, respectively; http://www.addgene.org), transient transfection of this library into cultured mammalian cells and subsequent quantitation of the promoter activities by deep sequencing of their associated tags (Tag-Seq). Earlier versions of this protocol were used in the research reported in Melnikov et al. Nature Biotechnology 30, 271-277 (2012) and in Kheradpour et al. Genome Research 23, 800-811 (2013).
1. Sequence Design and Synthesis
MPRA_SfiI_F | GCTAAGGGCCTAACTGGCCGCTTCACTG |
MPRA_SfiI_R | GTTTAAGGCCTCCGTGGCCGACGCTCTTC |
TAGseq_P1 | AATGATACGGCGACCACCGAGATCTACACT CTTTCCCTACACGACGCTCTTCCGATCT |
TAGseq_P2 | CAAGCAGAAGACGGCATACGAGAT[index]GTGAC TGGAGTTCAGACGTGTGCTCTTCCGATCTCGAG GTGCCTAAAGG |
Table 1. Primer sequences. [index] denotes a 6 to 8 nt index sequence used for multiplexed sequencing. Obtain at least 8 TagSeq-P2 primers with different indices. All of the primers should be purified by HPLC or PAGE.
Reagent | 1x Volume (µl) |
Herculase II Fusion DNA Polymerase | 0.5 |
5x Herculase II Reaction Buffer | 10 |
dNTP (10 mM each) | 1.25 |
BSA (20 mg/ml) | 1.25 |
Primer MPRA_SfiI_F (25 µM) | 0.25 |
Primer MPRA_SfiI_R (25 µM) | 0.25 |
OLS template (1-10 attomol) | varies |
Nuclease-free water | to 50 |
Table 2. Emulsion PCR reaction mix (water phase).
Figure 2. Preparation of oligonucleotide synthesis libraries. A) Three different raw oligonucleotide synthesis libraries (OLS) run on denaturing 10% TBE-Urea polyacrylamide gels. Bands corresponding to full length oligonucleotides (*) can be visualized and excised from libraries 1 and 2. Library 3 contains contaminants that interfere with PAGE purification. If this is the case, proceed directly to PCR amplification. B) Products of open and emulsion PCR amplification of the same oligonucleotide library run on an agarose gel. PCR amplification of complex oligonucleotide libraries frequently creates chimeric products and other artifacts that may appear as higher and lower bands. Emulsion PCR can minimize these artifacts.
2. Library Construction
3. Transfection, Perturbation, and RNA Isolation
4. Tag-Seq
Reagent | 1x Volume (µl) |
mRNA sample (400-700 ng total) | 8 |
Oligo-0dT (50 µM) | 1 |
dNTP (10 mM each) | 1 |
Table 3. RNA/Reverse transcription primer mix for cDNA synthesis.
Reagent | 1x Volume (µl) |
10x SuperScript III RT Buffer | 2 |
MgCl2 (25 mM) | 4 |
DTT (0.1 M) | 2 |
RNaseOut (40 U/µl) | 1 |
SuperScript III (200 U/µl) | 1 |
Table 4. cDNA synthesis reaction mix.
Reagent | 1x Volume (µl) |
2x PfuUltra II HotStart PCR Master Mix | 25 |
Primer TagSeq_P1 (25 µM) | 0.5 |
Primer TagSeq_P2 (25 µM) | 0.5 |
Template (mRNA, cDNA mix or plasmid DNA) | varies |
Nuclease-free water | to 50 |
Table 5. Tag-Seq PCR reaction mix.
MPRA facilitates high-resolution, quantitative dissection of the sequence-activity relationships of transcriptional regulatory elements. A successful MPRA experiment will typically yield highly reproducible measurements for the majority of sequences in the transfected library (Figure 3A). If poor reproducibility is observed (Figure 3B), this is indicative of a too low concentration of reporter mRNAs in the recovered RNA samples, due to either 1) low absolute activity among the assayed sequences, or 2) low transfection efficiency.
Figure 4 shows a representative “information footprint”1,2 generated by assaying ~37,000 random variants of a 145 bp sequence upstream of the human IFNB gene in HEK293 cells with or without exposure to Sendai virus. The promoter TATA-box and known proximal enhancer10 can be clearly identified as information-rich regions in a virus-dependent manner.
Figure 3. Tag-Seq reproducibility. Scatter plots showing examples of Tag-Seq data from two independent replicate transfections with high (A) and low (B) reproducibility. The latter plot shows many outlier tags with high mRNA counts in only one of the two replicates. Such artifacts typically indicate that the concentrations of reporter mRNAs were too low for quantitative PCR amplification, either due to low absolute activities among the reporter constructs, or low transfection efficiencies.
Figure 4. Information footprinting of the human IFNB transcription start site and proximal enhancer. Approximately 37,000 random variants of a 145 nucleotide (nt) region upstream of the human IFNB gene were assayed using MPRA in HEK293 cells with (A) and without (B) exposure to Sendai virus. The blue bars show the mutual information between the reporter output and the nucleotide at each position. The proximal enhancer and TATA-box stand out as regions of high information content upon viral infection.
MPRA is a flexible and powerful tool for dissection of sequence-activity relationships in gene regulatory elements. The success of MPRA experiments depend on at least three factors: 1) careful design of the sequence library, 2) minimization of artifacts during amplification and cloning, and 3) high transfection efficiency.
The possible lengths of the variable regions in the reporter constructs are largely determined by the synthesis or cloning technology used. Standard OLS is generally limited to about 200 nt, but this protocol is compatible with inserts up to at least 1,000 nt. Note that variable regions that are highly repetitive or contain strong secondary structures may end up underrepresented due to PCR and cloning biases. The length of the tags that identify each of the variable regions should be 10-20 nt and the collection of tags should ideally be designed such there are at least two nucleotide differences between any pair. Tags that contain the seed sequences of known microRNA or other factors that might influence mRNA stability should also be avoided when possible.
A key parameter in the design of MPRA experiments is the total number of distinct reporter constructs to be included in the library (the design complexity, denoted CD). In practice, CD is limited by the number of cultured cells that can be transfected. As a rule of thumb, the total number of transfected cells should be at least 50-100 times greater than CD. For example, if 20 million cells can be transfected with a transfection efficiency of 50%, then CD should be no more than ~200,000. Note that CD is equal to the number of distinct regulatory sequence variants multiplied by the number of distinct tags per sequence. The more distinct tags are linked to each regulatory sequence, the more accurate the estimate of the activity of that sequence can be made (because measurements from distinct tags can be averaged), but the fewer distinct variants can be assayed in one experiment. The optimal choice depends on the experimental design. In a simple “promoter bashing” experiment, where a mathematical model will be fitted to the aggregated measurements, a single tag per variant is usually sufficient. In a screen for single-nucleotide polymorphisms that cause changes in regulatory activities, it may be necessary to use 20 or more tags per allele to obtain statistically robust results, because comparing each pair of alleles requires a separate hypothesis test.
If the sequences to be assayed are not expected to contain transcription start sites, a constant promoter can also be added in the same fragment. For example, pMPRAdonor2 (Addgene ID 49353) includes a minimal TATA-box promoter that is useful when the upstream variable region is expected to have significant enhancer activity, while pMPRAdonor3 (Addgene ID 49354) includes a modified, strong SV40 viral promoter that is useful when the variable region is expected to contain silencer activity or other negative regulatory elements.
Raw OLS products often contain a significant fraction of truncated oligonucleotides. These may interfere with accurate PCR amplification of the designed sequences, particularly when there is significant homology between them. Using PAGE purification to remove truncated synthesis products and emulsion PCR to minimize amplification artifacts are effective techniques for ensuring high library quality. If either step is impractical, it is imperative to minimize the number of PCR cycles used at each amplification step. Selection and expansion of the cloned library in liquid culture is generally sufficient to maintain the design complexity, but if recombination-prone vectors are to be used or significant representation bias is observed, the recovered cells can instead be plated directly onto large LB agar plates, expanded as individual colonies and then scraped off for DNA isolation. It is also important to consider the potential impact of synthesis errors, which are typically found at a rate of 1:100-500 in OLS. Full-length sequencing of the reporter constructs prior to transfection is recommended to identify and correct for such errors.
It is not necessary to introduce reporter constructs into every cell in the transfected culture, but transfection efficiencies below ~50% may lead to poor signal to noise ratios. It is advisable to optimize transfection conditions prior to performing MPRA experiments in a new cell type. When working with hard-to-transfect cell types, MPRA signals can be boosted by pre-selecting transfected cells. The pMPRA vector series includes variants that constitutively express a truncated cell surface marker that can be used to physically enrich for transfected cells prior to RNA isolation (for example, Addgene IDs 49350 and 49351).
The authors have nothing to disclose.
This work was supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number R01HG006785.
Oligonucleotide library synthesis | Agilent, CustomArray or other OLS vendors | custom | If using OLS construction method |
pMPRA1 | Addgene | 49349 | MPRA plasmid backbone |
pMPRAdonor1 | Addgene | 49352 | luc2 ORF donor plasmid |
TE 0.1 Buffer (10 mM Tris-HCl, 0.1 mM EDTA, pH 8.0) | Generic | n/a | OLS buffer |
Novex TBE-Urea Gels, 10% | Life Technologies | EC6875BOX | PAGE purification of OLS products |
Novex TBA-Urea Sample Buffer | Life Technologies | LC6876 | PAGE purification of OLS products |
SYBR Gold Nucleic Acid Gel Stain | Life Technologies | S-11494 | PAGE purification of OLS products |
Micellula DNA Emulsion & Purification Kit | EURx/CHIMERx | 3600-01 | Library amplification by emulsion PCR |
Herculase II Fusion DNA Polymerase | Agilent | 600675 | Polymerase for emulsion PCR |
SfiI | New England Biolabs | R0123S | Library cloning with pMPRA vectors |
KpnI-HF | New England Biolabs | R3142S | Library cloning with pMPRA vectors |
XbaI | New England Biolabs | R0145S | Library cloning with pMPRA vectors |
T4 DNA Ligase (2,000,000 units/ml) | New England Biolabs | M0202T | Library cloning with pMPRA vectors |
One Shot TOP10 Electrocomp E. coli | Life Technologies | C4040-50 | Library cloning with pMPRA vectors |
LB agar and liquid media with carbenicllin | Generic | n/a | Growth media for cloning |
E-Gel EX Gels 1% | Life Technologies | G4010-01 | Library verification and purification |
E-Gel EX Gels, 2% | Life Technologies | G4010-02 | Library verification and purification |
MinElute Gel Extraction Kit | Qiagen | 28604 | Library and backbone purification |
EndoFree Plasmid Maxi Kit | Qiagen | 12362 | Library DNA isolation |
Cell culture media | n/a | n/a | Experiment-specific |
Transfection reagents | n/a | n/a | Experiment-specific |
MicroPoly(A)Purist Kit | Life Technologies | AM1919 | mRNA isolation |
TURBO DNA-free Kit | Life Technologies | AM1907 | Plasmid DNA removal |
SuperScript III First-Strand Synthesis System | Life Technologies | 18080-051 | cDNA synthesis |
PfuUltra II Hotstart PCR Master Mix | Agilent | 600850 | Polymerase for Tag-Seq PCR |
Primers (see text) | IDT | custom | PAGE purify Tag-Seq primers |