Here we present a deep sequencing approach that provides an unbiased determination of nascent 3'-termini as well as mutational profiles of single-stranded DNA molecules. The main application is the characterization of nascent retroviral complementary DNAs (cDNAs), the intermediates generated during the process of retroviral reverse transcription.
Monitoring of nucleic acid intermediates during virus replication provides insights into the effects and mechanisms of action of antiviral compounds and host cell proteins on viral DNA synthesis. Here we address the lack of a cell-based, high-coverage, and high-resolution assay that is capable of defining retroviral reverse transcription intermediates within the physiological context of virus infection. The described method captures the 3′-termini of nascent complementary DNA (cDNA) molecules within HIV-1 infected cells at single nucleotide resolution. The protocol involves harvesting of whole cell DNA, targeted enrichment of viral DNA via hybrid capture, adaptor ligation, size fractionation by gel purification, PCR amplification, deep sequencing, and data analysis. A key step is the efficient and unbiased ligation of adaptor molecules to open 3′-DNA termini. Application of the described method determines the abundance of reverse transcripts of each particular length in a given sample. It also provides information about the (internal) sequence variation in reverse transcripts and thereby any potential mutations. In general, the assay is suitable for any questions relating to DNA 3′-extension, provided that the template sequence is known.
In order to dissect and understand viral replication fully, increasingly refined techniques that capture replication intermediates are required. In particular, the precise definition of viral nucleic acid species within the context of infected cells can provide new insights, since many viral replication mechanisms have to date been examined in isolated in vitro reactions. A prime example is the process of reverse transcription in retroviruses, such as human immunodeficiency virus 1 (HIV-1). The various steps of HIV-1 reverse transcription, during which the viral enzyme reverse transcriptase (RT) copies the single-stranded RNA genome into double-stranded DNA, have been studied primarily in primer extension assays with purified proteins and nucleic acids1,2,3,4,5. While fundamental principles were established, such assays do not incorporate all viral and cellular components and may not reflect biologically relevant stoichiometries of involved factors. Therefore, we designed a powerful technique to determine the spectra of reverse transcription intermediates with their precise cDNA 3'-termini (i.e., determining their exact lengths) and nucleotide sequences in the context of infections of living cells6. Collection of data from time course experiments can be utilized to compare the profile of transcripts under various conditions, such as the presence of antiviral molecules or proteins, that may influence the efficiency and processivity of DNA synthesis and accumulation. This allows a more detailed understanding of the natural pathogen life cycle, which is often the basis for targeted drug design and successful therapeutic intervention.
HIV-1 reverse transcription comprises a series of successive events initiated by annealing of a tRNA primer to the genomic RNA template, which is then extended by RT to produce a short single-stranded cDNA transcript called a minus-strand strong-stop (-sss) (see Figure 1). Subsequently, the -sss cDNA is transferred from the 5'-long terminal repeat (LTR) to the 3'-LTR of the genomic RNA, where it anneals and serves as the primer for continued RT mediated elongation of the minus strand DNA (see reviews on reverse transcription1,2,3,4). This first strand transfer is one of the rate-limiting steps of reverse transcription; hence, the -sss cDNA is known to accumulate. The basic workflow and library design to capture the reverse transcription products in infected cells is outlined in Figure 2a. The specific primers and analysis settings that are used in the protocol and listed in Table 1 target all early reverse transcription cDNA intermediates within the length range of 23 to ~650 nt, which includes the 180-182 nt -sss DNA. However, appropriate minor adaptations to the strategy will allow application to not only late reverse transcription products but also other viruses and systems, where the objective is to detect 3'-OH containing DNA ends. Important limitations to consider include the length range of the final PCR product in the library; in particular, templates in which the distance between the adaptor on the open 3'-terminus and the upstream primer site exceed ~1000 nt will likely be less efficiently sequenced, potentially introducing misleading technical biases during library preparation (see discussion for more details and adaptation suggestions).
Previously reported techniques for the systematic determination of 3'-termini of nucleic acid strands have focused on RNA, not DNA, molecules. One example is 3' RACE (rapid amplification of cDNA ends)7, which relies on the polyadenylation of mRNA. In addition, adaptor ligation-based strategies employing RNA ligases were developed, which have included RLM-RACE (RNA ligase-mediated RACE)8 or LACE (ligation-based amplification of cDNA ends)9. It is important to emphasize that ligation-based amplifications are sensitive to any bias introduced by the ligation reaction itself. For example, ligation may be more or less efficient depending on a particular nucleotide in the 3' position, the sequence, total molecule length, or local structure. Such ligase preferences lead to incomplete capture of molecules and misrepresentation in the readout, which we and others have observed9,10. To minimize ligation bias during the adaptor addition steps in the protocol described herein, we tested a number of ligation strategies and found the use of T4 DNA ligase with a hairpin single-stranded DNA adaptor (as described by Kwok et al.11) to be the only procedure with near quantitative ligation that did not result in significant differences in ligation efficiency when assessed with a specifically selected set of control oligonucleotides6. The choice of this ligation strategy is, therefore, a key feature in the success of this protocol.
To date, monitoring of HIV-1 RT progression in infected cells has primarily been accomplished by measuring reverse transcription products of various length with quantitative PCR (qPCR) using primer-probe sets that uniquely measure shorter or longer (early and late, respectively) cDNA products12,13,14. While this qPCR approach is appropriate to determine intrinsic efficiencies of the reverse transcription process in cellular systems, the output is of relatively low resolution, with no sequence information being derived. Our new approach, based on optimized adaptor ligation, PCR-mediated library generation, and deep sequencing addresses the technology gap and offers an opportunity to monitor reverse transcription during HIV-1 infection quantitatively and at single nucleotide resolution.
We have illustrated the utility of this method in a study that distinguished between two proposed models for the capacity of the HIV-1 restriction factor APOBEC3G (apolipoprotein B mRNA editing enzyme catalytic polypeptide-like 3G) to interfere with the production of viral reverse transcripts6.
NOTE: Please refer to the Table of Materials for specific reagents and equipment used in this protocol.
1. Virus Production and Cell Infection
CAUTION: Infectious HIV-1 should only be handled in approved biosafety containment laboratories.
NOTE: Production of HIV-1 particles by transient transfection of human embryonic kidney (HEK) 293T cells, as outlined in step 1.1, is a standard procedure and has been described previously15,16. General cell culture procedures are described previously17.
2. DNA extraction, HIV-1 DNA Quantification, and Enrichment by Hybrid Capture
3. Adaptor Ligation
4. Adaptor Removal and Size Separation
5. PCR Amplification and Library Preparation
6. Evaluating the Library
7. High-Throughput Sequencing Run
8. Data Analysis
The technique described in this article was applied to a wider study to address the mechanisms underlying inhibition of HIV-1 reverse transcription by the antiretroviral human protein APOBEC3G (A3G)6. Figure 3 shows representative results obtained after employing the protocol in samples from CEM-SS T-cells infected with vif-deficient HIV-1 in the absence or presence of A3G. The total number of unique reads obtained from each sample after filtering out any PCR duplicates that have the same 6 nt barcode and the same length (performed by the analysis software provided) are plotted in Figure 3a. Increasing levels of A3G reduce the total read number reflecting the inhibitory effect of A3G on RT mediated cDNA synthesis previously described and measured by qPCR6,13,21,22. In Figure 3b, the fraction of molecules at each possible length within the first 182 nt are shown. For HIV-1 infection in the absence of A3G, the most abundant species is the main 180 nt strong-stop molecule itself, with some accumulation of reads in the shorter range (23 to 40 nt) (top graph, blue histograms). The addition of A3G changes this profile as a sharp increase of shorter, truncated cDNA molecules at a few very specific, reproducible positions is detected (middle and lower graphs). Since A3G is a cytidine deaminase, cytosine-to-uridine (identified as C-to-T) mutations in the cDNA occur when A3G is present in the infecting virions21,23,24. Using the obtained sequencing information, the percentage of C-to-T mutations was plotted on the same graph (red dotted line). It should be noted that the mutational profile is derived from all unique reads combined and coverage of each nucleotide will vary. However, if required sequence information can be related back to each molecule and correlated with a specific 3'-terminus. The data provided were taken from Pollpeter et al.6 and the correlation between mutational and cDNA length profiles was demonstrated to be due to detection and cleavage of deaminated cDNA by the cellular DNA repair machinery.
A positive control for the 3'-mapping approach can easily be produced by processing a pool of synthetic oligonucleotides of known sequence, length, and concentration. This control is added at the adaptor ligation in step 3.3.2 and advised to be included in all multiplexed libraries. Data obtained from a control sample should have all the oligonucleotides at the expected input ratios, with only very minor background reads. Figure 4 shows results of a positive control set of 17 chemically synthesized oligonucleotides (for sequences, see Table 2), which were mixed at equimolar ratios. As expected, all molecules appear in close to equal abundance with only small variations (top graph). While most positions within the -sss DNA sequence that were not represented by an oligonucleotide return zero read counts, we observed minor species that are 1 or 2 nt shorter than the actual control oligonucleotides. We have not further investigated these minor species but assume that they represent degraded or incomplete products potentially present in the provided oligonucleotide stocks at purchase (oligonucleotides were ordered as HPLC purified, for which the manufacturer indicates > 80% purity). The bottom graph shows the control sample from a different library run, where variation is slightly higher between the 17 oligonucleotides and correlates with overall length in that longer control molecules are detected more efficiently then shorter ones. This may be due to a minor bias in PCR reactions or in clustering during MiSeq sequencing, which has an insert size optimum and may occur with libraries carrying particularly broad insert ranges. A basic way to address this bias is the application of a normalization factor based on the slope that indicates the bias correlating to molecule length (pink line). The required calculations are included in the analysis program (see step 8.3 in the protocol).
Figure 1: Diagram showing the first steps of HIV-1 reverse transcription. The process starts with annealing of tRNA(Lys,3) (orange) to the primer binding site (PBS) in the genomic viral RNA (step 1), which allows the initiation and elongation of viral cDNA (blue, step 2). Concomitantly, the template genomic RNA is degraded by RNaseH activity of RT (step 3). The first full intermediate in the process of reverse transcription is the minus-strand strong-stop (-)sss cDNA, which is complete when the RT catalyzed polymerization reaches the 5'-terminus of the gRNA repeat (R) region (step 3). The (-)sss intermediate is transferred to the 3'-terminus of the genomic RNA template by annealing to the complementary 3'-long terminal repeat (LTR) R region. From here, polymerization continues (step 4). In the described method the reverse transcription progression is determined by mapping the exact length of the nascent viral cDNA (blue). PPT, polypurine tract; U5, unique 5'-sequence; U3, unique 3'-sequence. This figure is republished from a previous publication6. Please click here to view a larger version of this figure.
Figure 2: Workflow outline and schematics of the adapter ligation and PCR amplification strategy. (a) Workflow outlining main steps of the described technique to determine 3'-termini of HIV-1 reverse transcripts in infected cells. The figure is adapted from a previous publication6. (b) Schematic of the adaptor ligation and PCR amplification strategy. Nascent cDNA molecules of varying length that have been purified in previous steps are ligated to a single-stranded DNA adaptor using T4 DNA ligase. The hairpin adaptor (named "full Kwok + MiSeq", see Table 1) design was inspired by Kwok et al.11. The adaptor carries a random 6 nt barcode sequence, which allows for base-pairing to facilitate ligation and simultaneously serves as an identifier for unique reads. The 3'-termini of the adaptor carries a spacer (SpC3) to prevent self-ligation. Ligated products are separated from excess adaptor by denaturing polyacrylamide gel electrophoresis (PAGE). Nucleic acids in the gel are stained and cut into three separate, equal-size gel pieces in the area from above the adaptor to the well as done in25. After elution, precipitation, and resuspension, the products are PCR amplified with primers annealing to the known sequence of the adaptor (primer 1, multiplex oligonucleotide kit, see Table of Materials) and a primer carrying the first 22 nt of the HIV-1 5'-LTR sequence immediately following the tRNA (primer 2, MP1.0 + 22HIV). The 5'-termini of the chosen primers carry adaptors for the chosen sequencing platform (P5 and P7) as well as an index sequence to distinguish individual samples run in the same library. Starting points of the sequencing read primers are indicated. The blue box indicates the region of interest to determine the original 3'-termini of the captured molecule. This figure is adapted from a previous publication6. Please click here to view a larger version of this figure.
Figure 3: Representative results. (a) The total read count of representative samples processed with the described protocol. This includes all sequences that were identified as unique reads of HIV-1 molecules with their 3'-termini within the first 635 nt of the minus strand cDNA (up to the PPT, see Figure 1). Infection with HIV-1 not carrying A3G yields the highest number of reads, whereas A3G inhibits cDNA synthesis and thereby reduces the total read count. Uninfected cells served as a negative control, while a set of synthetic oligonucleotides provides a positive control. b) The relative abundance of cDNAs for each length between nt positions 23 and 182 (full-length -sss cDNA is 180 to 182 nt) of the HIV-1NL4.3 sequence (x-axis) is shown in blue histograms (scale on the left y-axis). The relative abundance of cDNA was calculated from the absolute number of sequences terminating at a given nucleotide within the -sss cDNA sequence divided by the sum of all reads measuring 182nt or less. Shown in dashed red lines are the percentages of reads carrying C-to-T/U mutations at the respective position (scale on the right y–axes). Figure 3b is republished from a previous publication6. Please click here to view a larger version of this figure.
Figure 4: Representative results of control samples. Shown are two profiles for pools containing equimolar amounts of 17 different length synthetic oligonucleotides. These oligonucleotides have sequences from HIV-1NL4.3 and were selected to cover various lengths and present all 4 bases as a 3'-nucleotide (see Table 2). The top graph shows the positive control sample from Figure 3a. No significant bias towards molecule length or the open 3'-termini is detected. The bottom graph shows a different library run, which produced a minor length bias in sequencing. In this case, it is advisable to apply a normalization factor, which is derived from the slope (shown in pink) that represents the size bias. This figure is republished from a previous publication6. Please click here to view a larger version of this figure.
Oligo name | Length in nt | Sequence | Purpose | Manufacturer (Purification) | ||||||||||||||
full Kwok + MiSeq | 61 | 5’-PHO-tgaagagcctagtcgctgttcannnnnnctgcccatagagagatcggaagagcacacgtct-SpC3-3’ | Adaptor | IDT DNA technologies (HPLC) | ||||||||||||||
2xBiotin SS bait | 40 | 5’-biotin-cagtgtggaaaatctctagcagtggcgcccgaacagggac-biotin-3’ | Hybrid Capture | MWG Eurofins (HPLC) | ||||||||||||||
Biotin 1-16 ss | 22 | 5’-cagtgtggaaaatctctagcag-BiTEG-3’ | Hybrid Capture | MWG Eurofins (HPLC | ||||||||||||||
Biotin tRNA + CTG | 16 | 5’-cagtggcgcccgaaca-BITEG-3’ | Hybrid Capture | MWG Eurofins (HPLC) | ||||||||||||||
MP1.0 + 22HIV | 82 | 5’-aatgatacggcgaccaccgagatctacactctttccctacacgacgctcttccgatctcactgctagagattttccacactg-3’ | PCR amplification | MWG Eurofins (HPLC |
Table 1: Table of oligonucleotides including length, sequences, and modifications that are utilized in the described protocol. The table is adapted from a previous publication6. Please click here to download this table as an excel file.
Oligo name | Length in nt | Sequence | Manufacturer (Purification) | |||||||||||||
HTP con long C | 120 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcacacactactttgagcactcaaggcaagctttattgaggcttaagc-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con long G | 119 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcacacactactttgagcactcaaggcaagctttattgaggcttaag-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con long T | 116 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcacacactactttgagcactcaaggcaagctttattgaggctt-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con long A | 118 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcacacactactttgagcactcaaggcaagctttattgaggcttaa-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con mid C | 76 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcac-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con mid G (a) | 71 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacagacg-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con mid G (b) | 72 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacagacgg-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con mid A | 69 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacaga-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con mid T | 85 | 5’-ctgctagagattttccacactgactaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcacacactactt-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con short A | 40 | 5’-ctgctagagattttccacactgactaaaagggtctgaggga-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con short T | 33 | 5’-ctgctagagattttccacactgactaaaagggt-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con short G | 41 | 5’-ctgctagagattttccacactgactaaaagggtctgaggg-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP con short C | 34 | 5’-ctgctagagattttccacactgactaaaagggtc-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP Con 46 (T) | 46 | 5’-ctgctagagattttccacactg actaaaagggtctgagggatctct-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP Con 83 (C) | 83 | 5’-ctgctagagattttccacactg actaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcacacactac-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP Con 103 (C) | 103 | 5’-ctgctagagattttccacactg actaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcacacactactttgagcactcaaggcaagc-3’ | MWG Eurofins (HPLC) | |||||||||||||
HTP Con 107 (A) | 107 | 5’-ctgctagagattttccacactg actaaaagggtctgagggatctctagttaccagagtcacacaacagacgggcacacactactttgagcactcaaggcaagcttta-3’ | MWG Eurofins (HPLC) |
Table 2: Table of 17 synthetic control oligonucleotides used as a positive control sample. The top 13 oligonucleotides were chosen based on size [long (116 to 120 nt), mid (69 to 85 nt), short (33 to 41 nt)] as well as their 3'-termini. The table is adapted from a previous publication6. Please click here to download this table as an excel file.
The availability of fast, reliable, and cost-effective deep sequencing has revolutionized many aspects in the field of life sciences, allowing great depth in sequencing-based analyses. A remaining challenge lies in the innovative design and creation of representative sequencing libraries. Here we describe a protocol to capture nascent viral cDNA molecules, specifically the intermediates of the HIV-1 reverse transcription process.
The most critical step in this strategy is the ligation of an adaptor to the open 3'-termini in a quantitative and unbiased manner. Efficiencies of ligations between two ssDNA termini, both inter- and intramolecular, have been investigated and optimized for various applications11,26,27,28,29. The choice of using a hairpin adaptor with T4 DNA ligase under the conditions described in step 3.3 is the result of empirical optimization in which we evaluated different ligases, adaptors and reagents for the ligation of synthetic oligonucleotides representing HIV-1 sequences (Table 2) (data not shown). In these in vitro test reactions, we confirmed that the T4 DNA ligase mediated ligation of the hairpin adaptor, as described by Kwok et al.11, has a very low bias, and achieves near complete ligation of acceptor molecules when the adaptor is used in excess. The ligation efficiency was unaffected by the addition of nucleotide sequence to render the adaptor compatible for the multiplex primer system (see Figure 4). In comparison, we found that a thermostable 5'DNA/RNA ligase ("Ligase A", see Table of Materials for exact ligases compared here), which is an engineered RNA ligase that was developed in part to improve on ligation efficiency with ssDNA as the acceptor27, was indeed more effective at ligating two ssDNA molecules than RNA ligase ("Ligase B") but had a significant bias, with strong differences in ligation efficiency even between oligonucleotides with single base length differences [Table 2; HTP con mid G (a) and (b)]. Furthermore, we found only a minimal bias in reactions with "Ligase C" combined with an adaptor carrying a randomized 5'-termini (a strategy used to offset known nucleotide bias of "Ligase C"; see for example Ding et al.30). However, the "Ligase C"-mediated intermolecular ligations were incomplete, rendering the T4 DNA ligase system the superior choice.
Several quality control steps over the course of the protocol and the inclusion of positive and negative controls allow for the detection of potential problems before assay continuation and provide guidance for troubleshooting efforts. The qPCR quantifications in steps 2.2.2 and 2.3.12 ensure that the quantity of the input material is sufficient. Typical cDNA copy numbers in the 200 µL elution (from step 2.1) range from around 10,000 to 300,000 per µL. The hybrid capture step can result in some loss of overall HIV-1 cDNA quantity but should result in a strong enrichment of specific HIV-1 cDNA over cellular DNA, which can be determined by using appropriate primers to quantify genomic DNA before and after enrichment by qPCR or by measuring total DNA concentration. Recovered HIV-1 cDNA after the hybrid capture steps should be at least 10% of the input. Low starting material may otherwise explain a successful oligonucleotide positive control (see step 3.3.2) but only limited reads achieved in the samples. Low read numbers overall could also be explained by overestimation of the library concentration due to the presence of irrelevant DNA species without MiSeq adaptors. This would result in low cluster density and can be improved by determining the concentration of HIV-1 sequences in the library by qPCR in addition to the total DNA amount by fluorometric assays. Due to the highly sensitive nature of the method, special care should be taken to avoid even low-level contamination, both from other samples (in particular, from the high concentration control oligonucleotide stocks) as well as from laboratory equipment. Working in a UV sterilizing PCR workstation is beneficial in this regard. The automated gel electrophoresis of the final library (step 6.1.2) is a further quality control measure. The nucleic acid size range typically observed is between 150 to 500 nt. Primers that can be detected in the optional control after the PCR and before purification (see note in step 5.2) should now be absent. In a representative result, the sample intensity curve has a peak around 160 to 170 nt and a second sharper peak around 320 to 350 nt. This likely reflects the often-seen higher abundance in both relatively short (1 to 20 nt insert length) reverse transcripts and full-length strong-stop (180 to 182 nt insert length) (Figure 3b).
While the presented protocol and selected primers are specific for early HIV-1 reverse transcription constructs, the method is generally applicable to any study aiming to determine open 3'-termini of DNA. The main modifications required in other contexts will be the method for hybrid capture and the primer design strategy. For example, if the target is to be adapted to late HIV-1 transcripts, a larger number of different capturing biotinylated oligonucleotides annealing across the length of the cDNA would be advisable and will likely decrease the loss in the hybrid capture step. As mentioned in the introduction, it is important to consider limitations when designing the range over which 3'-termini are to be detected to avoid different sources of bias. First, there may be a bias in the PCR reactions if the templates with the adaptor are of vastly varying length. Second, the sequencing platform used here (e.g., MiSeq) has a preferred insert length range for optimal clustering, and significantly shorter and longer products may not be sequenced with the same efficiency. In part, this can be addressed computationally, as was done by calculating a correction factor for linear length bias (see Figure 4, bottom graph). However, if the region of where 3'-termini mapping is desired is long (> 1000 nt), it is more advisable to split the reactions with the ligated transcripts and use multiple upstream primers to assess 3'-termini in sections.
The analysis program was written in-house for the specific purpose of analyzing both the last nucleotide of the HIV-1 sequence adjacent to the fixed adaptor sequence as well as the base variation of all bases to identify any mutations. The individual steps comprise the following: first, the adaptor sequences are trimmed using the fastx-0.0.13 toolkit; then, any sequences that are duplicated (meaning identical sequences including the barcode) are removed. All remaining unique reads are then aligned to the HIV-1 sequence using Bowtie (http://bowtie-bio.sourceforge.net/index.shtml) with the maximum mismatch set at three bases. The template sequence is comprised of the first 635 nt of HIV-1 cDNA (NL4.3 strain), which includes the -sss sequence and the first strand transfer product up to the polypurine track (U5-R-U3-PPT; see Figure 1). Thereby, the provided software and templates are only directly suitable if the method is used for the same application (detection of early reverse transcripts of the HIV-1NL4.3). Adjustments will have to be made for other target sequences. The positions of the 3'-termini for each read were determined by the position in the alignment. Base calls for each position are recorded and mutation rates are calculated from the total coverage of each base, which varies, as reads are of different lengths and long inserts may not be entirely covered by the 125-base sequencing in Read2.
To conclude, we believe the described method to be a valuable tool for many types of studies. Obvious applications include investigations of the mechanisms underlying reverse transcription inhibition through antiretroviral drugs or cellular restriction factors. However, only relatively minor adjustments should be necessary to adapt the system to 3'-termini mapping within other single-stranded DNA viral intermediates, which are present, for example, in parvovirus replication. Furthermore, the principle of the method, particularly its optimized ligation step, can provide a core part of library preparation design for the characterization of any 3'-DNA extensions, including elongations catalyzed by cellular double-stranded DNA polymerases.
The authors have nothing to disclose.
The authors acknowledge the support from members of the Malim laboratory, Luis Apolonia, Jernej Ule, and Rebecca Oakey. The authors thank Matt Arno at the King’s College London Genomic Centre and Debbie Hughes at the University College London (UCL), Institute for Neurology Next Generation Sequencing Facility, for help with MiSeq sequencing runs. The work was supported by the UK Medical Research Council (G1000196 and MR/M001199/1 to M.M.), the Wellcome Trust (106223/Z/14/Z to M.M.), the European Commission’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. PIIF-GA-2012-329679 (to D.P.), and the Department of Health via a National Institutes for Health Research Comprehensive Biomedical Research Center award to Guy’s and St. Thomas’ NHS Foundation Trust in partnership with King’s College London and King’s College Hospital NHS Foundation Trust.
293T cells | ATCC | CRL-3216 | |
Dulbecco's Modified Eagle's Medium | Gibco | 31966-021 | |
Penicillin/Streptomycin | Gibco | 15150-122 | |
Fetal Bovine Serum | Gibco | 10270-106 | |
HeraCell Vios 250i CO2 Incubator | Thermo Scientific | 51030966 | |
Laminar flow hood – CAS BioMAT2 | Wolflabs | CAS001-C2R-1800 | |
10mm TC-treated culture dish | Corning | 430167 | |
TrypLE™ Express (1x), Stable Trypsin Replacement Enzyme | Gibco | 12605-010 | |
OptiMEM® (Minimal Essential Medium) | Gibco | 31985-047 | |
HIV-1 NL4-3 Infectious Molecular Clone (pNL4-3) | NIH Aids reagent program | 114 | |
Polyethylenimine (PEI) – MW:25000 | PolySciences Inc | 23966-2 | dissolved at 1mg/ml and adjusted to pH7 |
RQ1- Rnase free Dnase | Promega | M6101 | |
Filter 0.22 μm | Triple Red Limited | FPE404025 | |
15 mL polypropylene tubes | Corning | CLS430791 | |
Sucrose | Calbiochem | 573113 | |
Phosphate Buffered Saline (1x) | Gibco | 14190-094 | |
Ultracentrifuge tubes | Beckman Coulter | 344060 | |
Ultracentrifuge | Sorval | WX Ultra Series | Th-641 Rotor |
Alliance HIV-1 p24 antigen ELISA kit | Perkin Elmer | NEK050001KT | |
CEM-SS cells | NIH Aids reagent program | 776 | |
Roswell Park Memorial Institute Medium | Gibco | 31870-025 | |
CoStar® TC treated multiple well plates | Corning | CLS3513-50EA | |
Benchtop centrifuge: Heraus™ Multifuge™ X3 FR | Thermo Scientific | 75004536 | |
TX-1000 Swinging Bucket Rotor | Thermo Scientific | 75003017 | |
Microcentrifuge: 5424R | Eppendorf | 5404000060 | |
Total DNA extraction kit (DNeasy Blood and Tissue kit) | Qiagen | 69504 | |
Nuclease free H2O | Ambion | AM9937 | |
Cutsmart buffer | New England Biolabs (part of DpnI enzyme) | R0176S | |
DpnI restriction enzyme | New England Biolabs | R0176S | |
Oligonucleotides for qPCR | MWG Eurofins | N/A | HPSF purification |
TaqMan PCR Universal Mastermix | Thermo | 4304437 | |
LoBind Eppendorf® tubes | Eppendorf | 30108078 | |
Axygen™ aerosol filter pipette tips, 1000 μL | Fisher Scientific | TF-000-R-S | |
Axygen™ aerosol filter pipette tips, 200 μL | Fisher Scientific | TF-200-R-S | |
Axygen™ aerosol filter pipette tips, 20 μL | Fisher Scientific | TF-20-R-S | |
Axygen™ aerosol filter pipette tips, 10 μL | Fisher Scientific | TF-10-R-S | |
PCR clean hood | LabCaire | Model PCR-62 | |
DynaMag™2-magnet | Thermo | 12321D | |
Streptavidin MagneSphere® paramagnetic particles | Promega | Z5481 | |
Casein | Thermo Scientific | 37582 | |
End over end rotator, Revolver™ 360° | Labnet | H5600 | |
Tris-Base | Fisher Scientific | BP152-5 | |
Hydrochloric Acid | Sigma | H1758-100ML | |
EDTA disodium salt dihydrate | Electran (VWR) | 443885J | |
Sodium Chloride | Sigma | S3014 | |
Dri-Block® Analog Block Heater | Techne | UY-36620-13 | |
PCR tubes and domed caps | Thermo Scientific | AB0266 | |
PCR machine | Eppendorf | Mastercycler® series | |
T4 DNA ligase | New England Biolabs | M0202M | |
40% Polyethylene glycol solution (PEG) in H2O, MW: 8000 | Sigma | P1458-25ML | |
Betaine solution, 5M | Sigma | B0300-1VL | |
Gel loading buffer II (formamide buffer) | Thermo Scientific | AM8546G | |
Precast 6% TBE urea gels | Invitrogen | EC6865BOX | |
Mini cell electrophoresis system | Invitrogen, Novex | XCell SureLock™ | |
Tris/Borate/EDTA solution (10x) | Fisher Scientific | 10031223 | |
Needle 21 G x1 1/2 | VWR | 613-2022 | |
SYBR Gold nucleic acid stain (10000x) | Life Technologies | S11494 | |
Dark Reader DR46B transilluminator | Fisher Scientific | NC9800797 | |
Ammonium acetate | Merck | 101116 | |
SDS solution 20% (w/v) | Biorad | 161-0418 | |
Centrifuge tube filter | Appleton Woods | BC591 | |
Filter Glass Fibre Gf/D 10mm | Whatman (VWR) | 512-0427 | |
polyadenylic acid (polyA) RNA | Sigma | 10108626001 | |
Glycogen, molecular biology grade | Thermo Scientific | R0561 | |
Isopropanol (2-propanol) | Fisher Scientific | 15809665 | |
Ethanol, molecular biology grade | Fisher Scientific | 10041814 | |
Accuprime™ Supermix I (DNA polymerase premix) | Life Technologies | 12342-010 | |
NEBNext® Multiplex Oligo for Illumina (Index Primer Set 1 and 2) | New England Biolabs | E7335S; E7500S | |
Tapestation D1000 Screentape High sensitivity | Agilent Technologies | 5067- 5584 | |
Tapestation D1000 Reagents | Agilent Technologies | 5067- 5585 | |
2200 Tapestation – automated gel electrophoresis system | Agilent Technologies | G2965AA | |
Agencourt® AMPure® beads XP | Beckman Coulter | A63880 | |
Qubit™ dsDNA HS Assay Kit | Invitrogen | Q32851 | |
Qubit™ 2.0 Fluorometer | Invitrogen | Q32866 | |
Topo™ TA cloning Kit | Invitrogen | 450071 | |
Sequencing platform: MiSeq System | Illumina | ||
Experiment Manager (Sample sheet software) | Illumina | Note: Use TruSeq LT as a template | |
Miseq™ Reagent kit V3 (150 cycle) | Illumina | MS-102-3001 | |
Sequencing hub: Basespace | Illumina | https://basespace.illumina.com | |
Ligase A: Thermostable 5’ App DNA/RNA ligase | NEB | M0319S | Not used in this protocol, but tested in optimization process with results described in the discussion. |
Ligase B: T4 RNA ligase 1 | NEB | M0204 | Not used in this protocol, but tested in optimization process with results described in the discussion. |
Ligase C: CircLigase | Epicentre | CL4111K | Not used in this protocol, but tested in optimization process with results described in the discussion. |