Here we describe a method for preparation of both single read and paired end Illumina mRNA-Seq sequencing libraries for gene expression analysis based on T7 linear RNA amplification. This protocol requires only 10 nanograms of starting total RNA and generates highly consistent libraries representing whole transcripts.
Whole transcriptome sequencing by mRNA-Seq is now used extensively to perform global gene expression, mutation, allele-specific expression and other genome-wide analyses. mRNA-Seq even opens the gate for gene expression analysis of non-sequenced genomes. mRNA-Seq offers high sensitivity, a large dynamic range and allows measurement of transcript copy numbers in a sample. Illumina’s genome analyzer performs sequencing of a large number (> 107) of relatively short sequence reads (< 150 bp).The "paired end" approach, wherein a single long read is sequenced at both its ends, allows for tracking alternate splice junctions, insertions and deletions, and is useful for de novo transcriptome assembly.
One of the major challenges faced by researchers is a limited amount of starting material. For example, in experiments where cells are harvested by laser micro-dissection, available starting total RNA may measure in nanograms. Preparation of mRNA-Seq libraries from such samples have been described1, 2 but involves significant PCR amplification that may introduce bias. Other RNA-Seq library construction procedures with minimal PCR amplification have been published3, 4 but require microgram amounts of starting total RNA.
Here we describe a protocol for the Illumina Genome Analyzer II platform for mRNA-Seq sequencing for library preparation that avoids significant PCR amplification and requires only 10 nanograms of total RNA. While this protocol has been described previously and validated for single-end sequencing5, where it was shown to produce directional libraries without introducing significant amplification bias, here we validate it further for use as a paired end protocol. We selectively amplify polyadenylated messenger RNAs from starting total RNA using the T7 based Eberwine linear amplification method, coined "T7LA" (T7 linear amplification). The amplified poly-A mRNAs are fragmented, reverse transcribed and adapter ligated to produce the final sequencing library. For both single read and paired end runs, sequences are mapped to the human transcriptome6 and normalized so that data from multiple runs can be compared. We report the gene expression measurement in units of transcripts per million (TPM), which is a superior measure to RPKM when comparing samples7.
1. Isolate total RNA
2. Prepare double stranded cDNA
Incubate for 2 hr at 16 °C then add 1 μl T4 DNA Polymerase, and incubate for additional 10 minutes at 16 °C.
3. Amplify poly-A mRNA by in vitro transcription
4. Fragmentation of amplified poly-A mRNA
5. RNA clean up
6. cDNA synthesis
First strand synthesis for single read library:
Incubate at 70 °C for 5 minutes in thermocycler, and quick chill on ice. NotI Nonamer Primer (5′- TGAATTCGCGGCCGCTCAAGCAGAAGACGGCATACGAGCTCTTCCGATCT NNNNNNNNN -3′). The 5′ proximal sequence is the NotI restriction site while the next sequence until the random region is the reverse complement of Illumina’s adaptor B sequence from Chip-Seq kit.
Put on thermocycler for 2 minutes at 42°C, add 1 μl of SuperScript III reverse transcriptase, and incubate at 42 °C for 1hr.
*Instead of dCTP, 5-methyl dCTP was used in the dNTP mixture.
First strand synthesis for paired end library:
Incubate at 65 °C for 5 minutes in thermocycler, and quick chill on ice.
Put on thermocycler for 1 min at 45 °C, add 1 μl of SuperScript III reverse transcriptase, and incubate at 45 °C for 1 hour.
Second strand synthesis (for both libraries):
Mix tube by inversion, give a short spin and incubate at 16 °C for 2 hr.
7. Purify cDNA
8. End repair
For single read library:
(Use Illumina Chip-Seq sample prep kit)
Incubate in the thermal cycler for 30 minutes at 20 °C.
For paired end library:
(Use Illumina paired end sample prep kit)
Incubate in the thermal cycler for 30 minutes at 20 °C.
9. cDNA clean up
10. Add ‘A’ bases to the 3′ end of the DNA fragments
For single read library:
Incubate for 30 minutes at 37 °C.
For paired end library:
Incubate for 30 minutes at 37 °C.
11. cDNA clean up
12. Adapter ligation
For single read library:
(Use Illumina Chip-Seq sample prep kit)
Incubate for 15 minutes at room temperature. Adapters should be thawed on ice and diluted 1:20.
For paired end library:
(Use Illumina paired end sample prep kit)
Incubate for 15 minutes at 20 °C. Adapters should be thawed on ice and diluted 1:20.
13. Ligation reaction clean up
Perform a NotI digestion, ONLY for single read library
Incubate for 2 hr to overnight at 37 °C, and purify reaction using zymo column. Elute with 6 μl of buffer EB followed by a second elution with 5 μl of EB.
14. Size selection/Gel purification
Perform the following using a 2% Sybr Safe E-Gel from Invitrogen.
15. Elute DNA from gel slice
16. PCR
For single read library:
(Use Illumina Chip-Seq sample prep kit)
Use the following PCR protocol:
*The first time that the kit is used, dilute PCR primers 1:2 with EB buffer.
For paired end library:
(Use Illumina paired end sample prep kit)
Amplify using the following PCR protocol:
*The first time that the kit is used, dilute PCR primers 1:2 with EB buffer.
17. Library clean up
18. Quantitate library
19. Data analysis
Bowtie6 was used to map reads to the RefSeq gene set (NCBI Build 36.1). The single end reads (30 nucleotides) and the pair end reads (42 nucleotides) were mapped allowing up to 10 matches to the gene set, and allowing up to two mismatches per read. Transcripts Per Million (TPM) values were obtained to measure gene expression using RSEM7 (RNA-Seq by Expectation-Maximization).
20. Representative Results: We made T7LA libraries for both single read and paired end runs from 1 μg, 100 μg, 10 μg, 1 μg and 100 pg starting total RNA (Figure 1). For evaluation of our protocol, we made single read and paired end libraries without T7 RNA amplification starting from 10 μg total RNA These control libraries, termed “MinAmp”, have minimal amplification. The only amplification they undergo are the 10 cycles of PCR near the end of the protocol to ligate the Illumina sequencing adapters, a step common to all libraries. All RNA used were isolated from H14 human embryonic stem cells8.
We first evaluated the number of genes identified by the various libraries (Table 1 and Supplementary Table 1). For both single read and paired end libraries, the 10 ng T7LA libraries identified almost the same number of genes as the 10 μg MinAmp libraries, with a TPM of 10 or more. In the case of the single read libraries, the 10 ng T7LA library identified 100% of the 8500 genes identified by the 10 μg unamplified library. For paired end libraries, the 10 ng T7LA library identified 86% of genes identified by the 10 μg unamplified library (7961 of 9267 genes). Libraries made from less than 10 ng were not able to identify as many genes. For example, in the single read protocol, the 1 ng library identified only ˜50% of the genes identified by the 10 μg MinAmp library, prompting us to limit the lowest amount of total RNA for use with the T7LA protocol to 10 ng. Moreover, mapping of a housekeeping gene, GAPDH (Figure 2) shows that all T7LA libraries made with at least 10ng of starting RNA identified all exons, including the extreme 5′ exon. Comparison of the 10 ng T7LA single read and paired end libraries with the MinAmp libraries shows a high degree of similarity (Spearman correlation, R = 0.90 and 0.95 respectively, Figures 3a and b). We also compared the two single read and paired end libraries made from 10 ng total RNA and they had a very high correlation coefficient (R = 0.92), demonstrating that both types of libraries made using the T7LA protocol produce a very similar gene expression signature (Figure 3c.). Hence, the T7LA method is able to produce sequencing libraries that are as reliable and comprehensive as the MinAmp libraries, but from 1000-fold less starting material.
Figure 1 Schema of paired end and single read library preparation protocol.
Figure 2 A genome browser picture of a housekeeping gene, GAPDH, for all single read and paired end libraries. The scale bar on the left for the single read libraries indicates 350 total reads. The scale bar in the center for the paired end libraries indicates 5000 total reads. The horizontal axis represents the genome sequence of GAPDH.
Figure 3 Correlation of gene expressions between the various libraries (Spearman’s): A. Between single read 10 ng T7LA and 10 μg MinAmp library shows that both these libraries have a very similar gene expression pattern (R = 0.90). B. Between paired end 10 ng T7LA and 10 μg MinAmp library demonstrates their similarity of gene expression profiles (R = 0.95). C. Correlation of gene expressions between 10 ng paired end and 10 ng single read libraries show a high degree of similarity between these libraries prepared by the T7LA method (R = 0.92).
Figure 4 Identification of human embryonic stem cell specific genes5 from all the single read and paired end libraries.
Library | Type° | Raw Clusters | % Clusters passing filter | % Aligning to genome | % Error Rate | Genes identified |
MinAmp 10ug | Single read | 225602 +/- 4952 | 65.48 +/- 2.58 | 47.61 +/- 0.53 | 0.62 +/- 0.06 | 8500 |
1ug T7LA | Single read | 144818 +/- 6513 | 82.21 +/- 6.45 | 48.09 +/- 0.27 | 0.42 +/- 0.03 | 8757 |
100ng T7LA | Single read | 27385 +/- 1818 | 81.33 +/- 11.75 | 44.46 +/- 4.53 | 0.49 +/- 0.10 | 8709 |
10ng T7LA | Single read | 11184 +/- 985 | 60.70 +/- 3.70 | 14.96 +/- 1.15 | 0.99 +/- 0.30 | 8589 |
1ng T7LA | Single read | 12695 +/- 1365 | 53.27 +/- 16.76 | 4.08 +/- 0.79 | 2.25 +/- 1.56 | 4720 |
100pg T7LA | Single read | 10390 +/- 1398 | 72.99 +/- 2.90 | 1.48 +/- 0.20 | 1.51 +/- 0.39 | 1121 |
MinAmp 10ug | Paired End R1 | 95786 +/- 12937 | 90.77 +/- 2.79 | 58.50 +/- 0.95 | 0.94 +/- 0.38 | 9267 |
Paired End R2 | 95786 +/- 12937 | 90.77 +/- 2.79 | 58.13 +/- 1.13 | 0.99 +/- 0.37 | ||
1ug T7LA | Paired End R1 | 297669 +/- 10196 | 91.35 +/- 0.36 | 46.89 +/- 0.14 | 0.47 +/- 0.01 | 7334 |
Paired End R2 | 297669 +/- 10196 | 91.35 +/- 0.36 | 45.52 +/- 0.12 | 0.51 +/- 0.01 | ||
100ng T7LA | Paired End R1 | 205602 +/- 9932 | 90.53 +/- 0.76 | 63.44 +/- 1.00 | 0.48 +/- 0.02 | 8011 |
Paired End R2 | 205602 +/- 9932 | 90.53 +/- 0.76 | 61.80 +/- 8.09 | 0.60 +/- 0.36 | ||
10ng T7LA | Paired End R1 | 214622 +/- 11155 | 89.98 +/- 1.13 | 56.32 +/- 1.94 | 0.80 +/- 0.26 | 7961 |
Paired End R2 | 214622 +/- 11155 | 89.98 +/- 1.13 | 46.41 +/- 18.39 | 2.48 +/- 2.68 | ; | |
1ng T7LA | Paired End R1 | 144951 +/- 19841 | 90.54 +/- 1.19 | 3.91 +/- 0.16 | 8.71 +/- 0.86 | 8124 |
Paired End R2 | 144951 +/- 19841 | 90.54 +/- 1.19 | 3.27 +/- 1.21 | 9.11 +/- 3.52 | ||
100pg T7LA | Paired End R1 | 187600 +/- 11759 | 89.52 +/- 1.11 | 1.78 +/- 0.05 | 13.42 +/- 0.50 | 6623 |
Paired End R2 | 187600 +/- 11759 | 89.52 +/- 1.11 | 1.99 +/- 0.23 | 15.29 +/- 0.96 |
° R1 and R2 are forward and reverse sequences of a tag
* ≥10 TPM
Table 1. Information on cluster numbers, genes identified, error rate, percent alignment of the single read and paired end libraries.
Supplementary Table 1. List of all genes and their TPM values for all samples, single read and paired end.
Current protocols for making paired end libraries require between 1 μg9 to 2.5 μg10 starting amount of total RNA. Here we present our linear T7 amplification based (T7LA) method to prepare both single read and paired end Illumina sequencing libraries and show that this method allows generation of libraries from as low as 10 ng of total RNA, producing data that is comparable to that of minimally amplified (MinAmp) libraries made from 1000 fold more starting material (10 μg total RNA). The 10 ng libraries not only identify similar total numbers of genes, but also produce gene expression signatures that are similar (Figures 3a and b). Moreover, both the single read and paired end libraries produced by the T7LA method are very similar to each other (Figure 3c), which allows researchers to compare data generated by libraries made from either protocol. Since these libraries were prepared from human embryonic stem cell RNA, we searched for 30 stem cell specific genes among the libraries and find that almost all of these genes (93-100%) are identified by the libraries made from at least 10 ng of starting total RNA (Figure 4), thus validating our protocol. We believe our protocol would be very useful for researchers, especially in circumstances such as flow sorted cells or laser-micro-dissected tissue where starting material is limiting. In such circumstances, our protocol would allow generation of gene expression data comparable to libraries made from much larger starting quantities since our protocol produces expression profiles comparable across at least 3 orders of magnitude of starting RNA.
The authors have nothing to disclose.
This work was supported by funding from the Morgridge Institute for Research and the University of Wisconsin Foundation. We thank Krista Eastman for editorial assistance.
Company | Kit/Reagent | Catalog # | Special Comments |
Ambion | Fragmentation reagent | AM8740 | |
Ambion | Liner Acrylamide | AM9520 | |
Ambion | MEGAscript T7 Kit | AM1334 | |
Ambion | Non DEPC treated nuclease free water | AM9932 | |
Fermentas | dNTP set | R0181 | |
Fermentas | methylated dNTPs | R0431 | For Single Read sample prep only |
Illumina | TruSeq SR Cluster Generation kit v5 | GD-203-5001 | For Single Read sample prep only |
Illumina | TruSeq Seq Kit v5 36 cycles | FC-104-5001 | |
Illumina | Chip Seq Sample Prep Kit | IP-102-1001 | For Single Read sample prep only. Can be replaced with NEB DNA sample prep Master Mix Set 1 Cat# E6040S |
Illumina | TruSeq PE Cluster Generation kit v5 | PE-203-5001 | For paired end sample prep only |
Illumina | Paired End Sample Prep Kit | PE-102-1001 | For paired end sample prep only |
Invitrogen | 1kb plus ladder | 10787-018 | |
Invitrogen | E. coli Rnase H | 18021-014 | |
Invitrogen | Rnase Out | 10777-019 | |
Invitrogen | 2nd strand buffer 5x | 10812-014 | |
Invitrogen | E. coli DNA polymerase | 18010-017 | |
Invitrogen | E-gel SYBR safe 2% | G521802 | |
Invitrogen | Superscript III (with 5X FS buffer and 0.1M DTT) | 18080-085 | |
Invitrogen | Trizol | 12183555 | |
Invitrogen | RNA quibit assay kit | Q32852 | |
Invitrogen | dsDNA HS qubit assay kit | Q32851 | |
Invitrogen | E. coli DNA ligase | 18052-019 | |
Invitrogen | T4 DNA Polymerase | 18005-025 | |
Invitrogen | Ultra Pure Dnase Rnase water | 10977-015 | |
Invitrogen | Superscript II double stranded cDNA synthesis kit | 11917-020 | |
Invitrogen | Random Primer (invitrogen) | 48190-011 | For paired end sample prep only |
IDT | Not1Nonamer B primer | N/A | For Single Read sample prep only |
IDT | Oligo dT T7 | N/A | |
NEB | Not1 digestion kit | R0189S | For Single Read sample prep only |
Qiagen | Rneasy Minelute kit | 74204 | |
Qiagen | RNEasy Mini Kit (50) | 74104 | |
Qiagen | Gel purification kit | 28604 | |
Qiagen | Dnase set | 79254 | |
Qiagen | Rneasy MinElute kit | 74204 | |
Zymo Research | DNA clean and concentrator (250X) | D4014 |