The comparison and optimization of two plant organellar DNA enrichment methods are presented: traditional differential centrifugation and fractionation of the total gDNA based on methylation status. We assess the resulting DNA quantity and quality, demonstrate performance in short-read next-generation sequencing, and discuss the potential for use in long-read single-molecule sequencing.
Plant organellar genomes contain large, repetitive elements that may undergo pairing or recombination to form complex structures and/or sub-genomic fragments. Organellar genomes also exist in admixtures within a given cell or tissue type (heteroplasmy), and an abundance of subtypes may change throughout development or when under stress (sub-stoichiometric shifting). Next-generation sequencing (NGS) technologies are required to obtain deeper understanding of organellar genome structure and function. Traditional sequencing studies use several methods to obtain organellar DNA: (1) If a large amount of starting tissue is used, it is homogenized and subjected to differential centrifugation and/or gradient purification. (2) If a smaller amount of tissue is used (i.e., if seeds, material, or space is limited), the same process is performed as in (1), followed by whole-genome amplification to obtain sufficient DNA. (3) Bioinformatics analysis can be used to sequence the total genomic DNA and to parse out organellar reads. All these methods have inherent challenges and tradeoffs. In (1), it may be difficult to obtain such a large amount of starting tissue; in (2), whole-genome amplification could introduce a sequencing bias; and in (3), homology between nuclear and organellar genomes could interfere with assembly and analysis. In plants with large nuclear genomes, it is advantageous to enrich for organellar DNA to reduce sequencing costs and sequence complexity for bioinformatics analyses. Here, we compare a traditional differential centrifugation method with a fourth method, an adapted CpG-methyl pulldown approach, to separate the total genomic DNA into nuclear and organellar fractions. Both methods yield sufficient DNA for NGS, DNA that is highly enriched for organellar sequences, albeit at different ratios in mitochondria and chloroplasts. We present the optimization of these methods for wheat leaf tissue and discuss major advantages and disadvantages of each approach in the context of sample input, protocol ease, and downstream application.
Genome sequencing is a powerful tool to dissect the underlying genetic basis of important plant traits. Most genome-sequencing studies focus on the nuclear genome content, as the majority of genes are located in the nucleus. However, organellar genomes, including the mitochondria (across eukaryotes) and plastids (in plants; the specialized form, the chloroplast, works in photosynthesis) contribute significant genetic information essential to organismal development, stress response, and overall fitness1. Organellar genomes are typically included in total DNA extractions intended for nuclear genome sequencing, although methods to reduce organelle numbers prior to DNA extraction are also employed2. Many studies have used sequencing results from total gDNA extractions to assemble organellar genomes3,4,5,6,7. However, when the target of the study is to focus on organellar genomes, using the total gDNA increases the sequencing costs because many reads are "lost" to the nuclear DNA sequences, particularly in plants with large nuclear genomes. Moreover, due to the duplication and transfer of organellar sequences into the nuclear genome and between organelles, resolving the correct mapping position of sequencing reads to the proper genome is bioinformatically challenging2,8. The purification of organellar genomes from the nuclear genome is one strategy to reduce these problems. Further bioinformatics strategies may be used to separate reads that map to regions of homology between the mitochondria and chloroplasts.
While the organellar genomes from many plant species have been sequenced, little is known about the breadth of organellar genome diversity available in wild populations or in cultivated breeding pools. Organellar genomes are also known to be dynamic molecules that undergo significant structural rearrangement due to recombination between repeat sequences9. Moreover, multiple copies of the organellar genome are contained within each organelle, and multiple organelles are contained within each cell. Not all copies of these genomes are identical, which is known as heteroplasmy. In contrast to the canonical picture of "master circles," there is now growing evidence for a more complex picture of organellar genome structures, including sub-genomic circles, linear chromosomes, linear concatamers, and branched structures10. The assembly of plant organellar genomes is further complicated by their relatively large sizes and substantial inverted and direct repeats.
Traditional protocols for organellar isolation, DNA purification, and subsequent genome sequencing are often cumbersome and require large volumes of tissue input, with several grams to upwards of hundreds of grams of young leaf tissue necessary as a starting point11,12,13,14,15,16,17. This makes organellar genome sequencing inaccessible when tissue is limited. In some situations, seed amounts are limited, such as when it is necessary to sequence on a generational basis or in male sterile lines that have to be maintained via crossing. In these situations, organellar DNA can be purified and then subjected to whole-genome amplification. However, whole-genome amplification can introduce significant sequencing bias, which is a particular problem when assessing structural variation, sub-genomic structures, and heteroplasmy levels18. Recent advances in library preparation for short-read sequencing technologies have overcome low-input barriers to avoid whole-genome amplification. For example, the Illumina Nextera XT library preparation kit allows for as little as 1 ng of DNA to be used as input19. However, standard library preparations for long-read sequencing applications, such as PacBio or Oxford Nanopore sequencing technologies, still require a relatively high amount of input DNA, which can pose a challenge for organellar genome sequencing. Recently, new user-made, long-read sequencing protocols have been developed to reduce the input amounts and to help facilitate genome sequencing in samples where obtaining microgram-quantities of DNA is difficult20,21. However, obtaining high-molecular weight, pure organellar fractions to feed into these library preparations remains a challenge.
We sought to compare and optimize organellar DNA enrichment and isolation methods suitable for NGS without the need of whole-genome amplification. Specifically, our goal was to determine best practices to enrich for high-molecular weight organellar DNA from limited starting materials, such as a subsample of a leaf. This work presents a comparative analysis of methods to enrich for organellar DNA: (1) a modified, traditional differential centrifugation protocol versus (2) a DNA fractionation protocol based on the use of a commercially available DNA CpG-methyl-binding domain protein pulldown approach22 applied to plant tissue23. We recommend best practices for the isolation of organellar DNA from wheat leaf tissue, which may be readily extended to other plants and tissue types.
1. Generation of Plant Materials for Organellar Isolation and DNA Extraction
2. Method #1: DNA Extraction Using Differential Centrifugation (DC)
NOTE: The differential centrifugation protocol was modified from two publications that optimized conditions to isolate both organelles but enrich for mitochondria17,24. The resulting protocol is less time-intensive and uses fewer toxic chemicals than the previous methods. Specifically, we made modifications to the buffers and wash steps, including the addition of polyvinylpyrrolidone (PVP) to the STE extraction buffer and the elimination of the final wash step in NETF buffer, which contains sodium fluoride (NaF).
Caution: The preparation and use of STE buffer should be performed under a chemical fume hood with proper personal protection equipment, as this buffer contains 2-mercaptoethanol (BME).
3. Method #2: Methyl-fractionation (MF) Approach to Enrich for Organellar DNA from Total Genomic DNA
NOTE: This protocol was modified from a user-developed Genomic Tip Kit DNA extraction protocol for plants and fungi27 and the commercial Microbiome DNA Enrichment Kit protocol28. In theory, any DNA isolation protocol that yields high-molecular weight DNA may be used for the pulldown. For short-read sequencing, any extraction yielding predominately >15 kb fragments is adequate for use in the pulldown. For long-read sequencing, larger fragments may be desirable. Therefore, we optimized this protocol to yield high molecular weight DNA.
4. Sample Quantification and Quality Control
The protocols presented in this manuscript describe two distinct methods to enrich for organellar DNA from plant tissue. The conditions presented here reflect optimization for wheat tissue. A comparison of key steps in the protocols, required tissue input, and DNA output are described in Figure 1. The steps of the DC protocol we tested follow similar conditions to those described previously (Figure 1A). Harvested tissue must be processed freshly and subjected to differential centrifugation and/or gradients to isolate intact organelles. The nuclear DNA is eliminated before the organelles are lysed, and finally, the DNA is extracted and used for downstream applications. In contrast, in the MF protocol, plant tissue may be harvested and stored before use, and intact organelles are not required. Instead, the nuclear and organellar DNA is fractionated from total gDNA based on the methylation status of the DNA. Both protocols yield roughly equal amounts of organellar DNA (Figure 1C). In terms of total organellar DNA output relative to tissue input, the MF protocol is advantageous when tissue is limited, as a small sample from a single plant may be used, and the plant may be allowed to grow for further analysis. Typically, in DC protocols, all aerial tissues of many seedlings are required, and these plants are discarded. However, the DC method can be optimized to specifically enrich for one organelle type over the other, which is not possible with the MF approach. It is worth mentioning that the total time for each protocol is roughly equivalent, although there is less hands-on time in the MF approach.
Both Methods Enrich for Organellar DNA, Albeit with Differing Proportions of Mitochondria and Plastid Sequences:
Very low amounts of purified organellar DNA are obtained from either method (on the order of ~50 – 100 ng; Figure 1C). To assess the levels of organellar genome enrichment and nuclear genome contamination in DNA isolated from both the DC and MF methods, a qPCR assay was employed. In this assay, the relative abundances of three amplicons (i.e., nuclear-specific, ACTIN; mitochondrial-specific, NAD3; and chloroplast-specific, PSBB) were assessed in total genomic DNA, and the organellar DNA fraction was obtained from both methods (Figure 2). Quantification cycle (Cq) values were examined for each sample (Figure 2A), and because the Cq is defined as the PCR cycle at which the fluorescence from the target amplification increases above the background fluorescence level, Cq and target abundance have an inverse relationship. In the DC sample, the Cq of NAD3 and PSBB are, respectively, ~17 and ~15 cycles earlier than ACTIN (which has a Cq of ~36) (see Figure 2B for Cq values and enrichment levels). This equates to theoretical 167,181- and 47,790-fold enrichments for NAD3 and PSBB, respectively, relative to ACTIN in the DC sample (Figure 2B, see the legend of Figure 2 for the calculation). In the total genomic DNA sample, the fold enrichments for NAD3 and PSBB relative to ACTIN are only 158 and 10,701, respectively. It is not surprising to find a higher abundance of the organellar amplicons relative to the nuclear amplicon in total genomic DNA, given that the organellar genomes exist in greater copy numbers per cell than the nuclear genome37 and that the number of organelles per cell may differ depending on the tissue type or the developmental stage38,39. Overall, the data indicate that the DC method preferentially enriches for mitochondria, which is to be expected, as centrifugation speeds are optimized for selectively isolating mitochondria and reducing nuclear and chloroplast "contamination."
The unmethylated fraction of the MF total gDNA also shows substantial enrichment of both organellar amplicons and is expected to retain the native relative amounts of these targets. The fold enrichments for NAD3 and PSBB relative to ACTIN in the unmethylated fraction are 20,551 and 1,703,253, respectively (Figure 2A and 2B). In the methylated fraction, the fold enrichments for NAD3 and PSBB relative to ACTIN are 31 and 823, respectively, indicating that MBD2-Fc protein is highly efficient at the pulldown of methylated nuclear DNA. As the chloroplast amplicon has a higher abundance than the mitochondrial amplicon in total genomic DNA (~6 Cq earlier), methylated fraction (~5 Cq earlier), and unmethylated fraction (~6 Cq earlier) samples, this suggests that the native abundance of these amplicons is not substantially changed by MDB2 pulldown. We focus here on the unmethylated (organellar) fraction due to the interest in sequencing these genomes specifically. However, if the nuclear genome is the primary interest, MF and subsequent sequencing of the methylated fraction would yield a much higher nuclear genome coverage than total genomic DNA sequencing, due to the reduction in organellar DNA "contamination."
It is worth noting that if qPCR is not available, end-point PCR (using the same primers as for qPCR) provides the qualitative assessment of organellar purity. In this case, pure organellar DNA samples will show amplification for the mitochondrial and plastid amplicons, but no detectable amplification of the nuclear amplicon on the agarose gel, whereas total genomic DNA shows amplification for all three primer sets, as demonstrated in previous studies11,12.
Organellar DNA Isolated From Both Methods Is Suitable for NGS:
Trimmed and cleaned PE sequencing reads (see step 4.3) were mapped to previously published wheat organellar reference genomes, and the amount of reads used for mapping each sample ranged from ~800,000 to 1,100,000 reads (Figure 3I). Results from mapping de novo Illumina sequencing reads to the available wheat chloroplast and mitochondria genomes are consistent with the qPCR results, with the DC method yielding DNA that is more enriched in mitochondrial DNA (Figure 3A and 3B, ~80% and ~10% of reads map to the mitochondrial (mt) and chloroplast (cp) genomes, respectively) and the MF method yielding DNA that likely reflects the native abundance of the two organellar genomes (Figures 3A and 3B, ~20% and ~80% of reads map to the mt and cp genomes, respectively). In both methods, the theoretical coverage (see the legend of Figure 3 for the calculation) of both wheat organellar genomes exceeds 100X coverage (and ranges up to ~2,000X coverage for the chloroplast genome in the unmethylated fraction from the MF method), even when 12 libraries are multiplexed (Figure 3C and 3D; the 6 libraries included in this analysis were pooled with an additional 6 libraries for a separate analysis, for a total of 12 libraries pooled in a single sequencing lane). A more detailed view of coverage was attained by examining the fraction of the genome covered at specific depths, as well as at per-base coverage levels (Figure 3E-3I). For the MF method, the average per-base coverage was ~300 – 450X for the mt genome and 4,000 – 5,000X for the cp genome. For the DC method, the average per-base coverage was ~900 – 1,300 and ~500 – 700X for the mt and cp genomes, respectively. However, there was a small fraction of both the mt and cp genomes that had extremely low or high coverage, and this was seen in organellar DNA derived from either method (Figure 3I). Regions of higher-than-average coverage likely correspond to regions of homology between the organellar genomes, and regions with low coverage may indicate SNPs or other small variants between the cultivars we sequenced and the published references. In support of this notion, these spikes of high coverage were most pronounced for the mt DNA derived from the MF method (Figures 3E and 3I), likely due to the high coverage of the cp genome in this method. Unexplainably, the coverage of the cp genome is more uneven in the MF method than the DC method (Figure 3G and 3H), which could be due to slight biases in the MBD2-Fc pulldown along the cp DNA. Further experiments will be required to determine why this is the case. Regardless, the mt and cp genomes had relatively even coverage with both methods and no large areas of missing coverage, which can be demonstrated by the examination of the fraction of genomes sequenced at a given depth (Figure 3E-3H). Additionally, the levels of coverage for both genomes are considered sufficient for downstream analysis, such as variant analysis. If deemed necessary for the analysis of rare variants, reducing the number of pooled samples would achieve greater coverage. Alternatively, a far greater number of samples may be pooled on a HiSeq lane, while attaining even greater sequencing depth, albeit at a sacrifice to sequence length, as HiSeq libraries are currently limited at the PE150 length in contrast to PE300 MiSeq libraries.
To examine the levels of nuclear genome contamination using a mapping approach, PE read mapping categories were examined. PE reads can map to a reference genome in a variety of configurations. When reads 1 and 2 align to the reference in a head-to-head fashion, with a certain "expected" distance between the two mates (based on the average insert size of the library and typically specified as an input parameter in the mapping software), these PE reads are said to map "concordantly." In contrast, "discordant" mapping is the situation where mates map with a lesser- or greater-than-expected distance to the reference genome or map in alternate configurations (head-to-tail or tail-to-tail). If only one mate aligns to the reference genome, then that PE read is said to map neither concordantly or discordantly to the reference genome. In all three read-mapping categories, PE reads can align to the reference genome one or multiple times.
For both DC- and MF-isolated organellar DNA, read mapping to the mitochondrial genome was predominantly in the aligned concordantly one time category (Figure 4A), whereas reads mapped to the chloroplast genome in relatively equal proportions of concordantly one time and concordantly more than one time (Figure 4B), likely due to the large inverted repeats present in the chloroplast genome and also to the extremely high coverage levels. However, fewer PE reads mapped to the nuclear genome and largely mapped more than one time in a neither concordant nor discordant fashion (i.e., only one mate is able to map). These are most likely mapping "off-target" to sequences in the nuclear genome, which are homologous to the organellar genomes or misassembled regions. Only a minor amount of reads (<5%) mapped to the nuclear genome concordantly, indicating low levels of nuclear genome contamination in organellar DNA isolated from the DC or MF method (Figure 4C), as is also reflected by the qPCR results (Figure 2A). The nuclear fraction after MBD2-Fc pulldown from Chinese Spring non-etiolated tissues was also sequenced to determine how efficient the pulldown is at the removal of unmethylated DNA. Less than 1% of reads in the nuclear fraction-derived library mapped to organellar reference genomes, whereas ~45% of all reads mapped to the nuclear genome (Figure 4). However, most reads mapped in a discordant fashion, which likely reflects the high levels of misassembly and fragmentation in the wheat nuclear reference genome. Regardless, the results suggest that the MBD2-Fc pulldown is highly efficient at the removal of unmethylated organellar DNA from methylated nuclear DNA. It is worth noting that, because the organellar-enriched DNA resulting from these methods contains a mixture of mitochondria and chloroplast sequences, and because sequence similarities resulting from ancient gene transfer between these organelles remain in their genomes, the proper assignment of reads to the specific genomes must be solved bioinformatically.
The Etiolation of Leaf Tissue Does Not Appreciably Alter Organelle Abundances:
Traditionally, etiolated tissues are preferred for plant mitochondrial DNA isolation in order to decrease the levels of phenolics and starches, which may interfere with extraction or downstream applications13. To determine if organellar genome enrichment levels could be altered or improved by growth conditions, both etiolated and non-etiolated tissues were subjected to the MF protocol and sequencing. Interestingly, etiolation did not appreciably change the percentage of reads that mapped to the organellar reference genomes (Figures 3A and 3B) or the per-base coverage (Figure 3I) compared to non-etiolated conditions. We also isolated organellar DNA using differential centrifugation, with both etiolated and non-etiolated tissues, and little difference in enrichment was found between the different tissues using qPCR (data not shown). This suggests that more physiologically relevant non-etiolated tissues can be used for organellar sequencing studies, with no appreciable change of enrichment.
Quality Control Suggests That MF DNA Is Most Suitable for Long-read Sequencing:
As long-read sequencing becomes more accessible to researchers, the isolation of high-molecular weight DNA is becoming increasingly important. To assess organellar DNA isolated with either method for intactness and quality, PFGE was employed. Total genomic DNA typically migrates as a diffuse smear in PFGE, and the molecular weight is determined by the protocol and how the DNA was stored and handled post-extraction. The total genomic DNA isolated with genomic tips should exceed 50 kb, which was verified using PFGE (Figure 5, lane 2). The total genomic DNA from the genomic tips is used as the input into the Microbiome Enrichment Kit to fractionate the nuclear from organellar DNA. The nuclear fraction obtained after fractionation does decrease in size, but remains centered around 50 kb (Figure 5, lane 4). This is not surprising, given that the relatively rougher handling of the nuclear fraction as elution from MBD2-Fc-bound beads requires heat and proteinase K digestion. Due to the limited mass, the organellar fraction was not run on PFGE, but subsequent analysis with the TapeStation indicated DNA >50 kb (data not shown). The organellar DNA obtained with differential centrifugation has an average mass of ~20 kb, likely caused by the extended organellar isolation protocol and the subsequent column-based DNA extraction and concentration. Gradient-based organellar isolation and alternate DNA extraction methods may maintain larger DNA fragment sizes. Regardless, DNA of the size obtained in this protocol may be used to generate 10- or 15-kb sequencing reads if care is taken during the library preparation.
Figure 1: A Comparative View of Two Methods to Enrich for Plant Organellar DNA. A traditional DC protocol (A) is contrasted with the MF protocol (B). It is recommended to avoid freezing and thawing the samples; however, steps at which the samples may be stored long-term are indicated with dashed arrows (A and B). Key differences between the protocols are highlighted in red (B). (C) The table compares the methods in terms of tissue input, number of plants required, DNA output, and resulting DNA size. Please click here to view a larger version of this figure.
Figure 2: Assessment of Nuclear DNA Contamination in Organellar DNA Isolated Using two Methods. (A) The Cq (Y-axis) is the PCR cycle at which the fluorescence from the target amplification increases above the background fluorescence level. ACTIN is a nuclear-specific target gene, and NAD3 and PSBB are mitochondria- and chloroplast-specific, respectively. The error bars indicate the standard deviation among three technical replicates of each sample. DC – differential centrifugation method, Unmethylated – fraction of DNA not bound by MBD2-Fc, Total gDNA – genomic DNA not treated with MBD2-Fc, Methylated – fraction of DNA bound by MBD2-Fc.
(B) The table shows the Cq values, which are shown on the graph in (A), and the fold enrichment of the organellar amplicons relative to ACTIN. *Fold enrichment = 2(Cq ACTIN – Cq Target). The formula assumes a perfect efficiency of 2 for each primer set, since the minor deviation of each primer set from 2 is negligible and would have little effect on the calculation and overall trend (ACTIN = 1.961, NAD3 = 1.95, and PSBB = 1.989). Primer efficiencies were evaluated by making a standard curve with a series of five 1:10 dilutions of total genomic DNA. Please click here to view a larger version of this figure.
Figure 3: Read Mapping and Theoretical Coverage of Chloroplast and Mitochondrial Genomes. Percentage of reads mapped to the mitochondrial (A) or chloroplast (B) Chinese Spring reference genomes. Corresponding theoretical coverage of the Chinese Spring mitochondrial (C) or chloroplast (D) reference genomes, assuming genome sizes of 450 and 135 kb, respectively, calculated using the total read numbers and the percentage of reads mapping to the different genomes. Genome-wide distribution of coverage for organellar DNA from the MF method (E and G) or the DC method (F and H). The data in panels E–H is from the Chinese Spring etiolated sample, but all other samples showed a similar trend. (I) Average, lowest, and highest per-base coverage for all samples in panels A–D. Sample labels including "E" designate etiolated samples, and "NE" designates non-etiolated samples. DC indicates DNA isolated with the differential centrifugation method and Unmethylated indicates DNA that is in the unmethylated fraction after pulldown with MBD2-Fc (MF protocol). Samples labeled “Chris” designate wheat Triticum aestivum ‘Chris.’ CS designates samples of wheat Triticum aestivum ‘Chinese Spring. Note: Due to sequence homology between the chloroplast, mitochondria, and nuclear genomes resulting from ancient gene transfer between the organellar genomes as well as between the organellar and nuclear genomes, a small percentage of raw reads may map to multiple genomes. In addition, reads that do not map to either organellar reference genome are not represented in this figure. Hence, the percentages displayed here (A and B) do not total 100%. Please click here to view a larger version of this figure.
Figure 4: PE Read Mapping to the Wheat Nuclear Genome. Percentage of categories of PE read mapping types to the mitochondrial (A), chloroplast (B), or nuclear (C) Chinese Spring reference genomes. – E designates etiolated samples and – NE designates non-etiolated samples. DC indicates DNA isolated with the differential centrifugation method, Unmethylated indicates DNA that is in the unmethylated fraction after pulldown with MBD2-Fc in the MF protocol, and Methylated designates the nuclear fraction after MBD2-Fc pulldown. Samples labeled “Chris” designate wheat Triticum aestivum ‘Chris.’ CS designates samples of wheat Triticum aestivum ‘Chinese Spring.’ Unmapped reads are not shown. Please click here to view a larger version of this figure.
Figure 5: Examination of DNA Quality Using PFGE. Wheat total genomic DNA (lane 2), wheat organellar DNA obtained from differential centrifugation (lane 3), and the nuclear fraction after MF with the MBD2-Fc pulldown approach (lane 4) were subjected to PFGE on a 1% agarose gel with a 1 kb extended ladder used as a marker (lanes 1 and 5). Please click here to view a larger version of this figure.
Buffer Name | Recipe | Notes | Method |
STE Buffer | 400 mM sucrose, 50 mM Tris pH 7.8, 20 mM EDTA pH 8.0, 0.6% (w/v) polyvinylpyrrolidone (PVP), 0.2% (w/v) bovine serum albumin (BSA), 0.1% (v/v) β-mercaptoethanol (BME) | Buffer mix containing only sucrose, Tris, and EDTA can be made up to a month in advance and kept at 4°C. PVP, BSA, and BME should be added fresh to an aliquot of the required amount of buffer just before use. | Method #1 |
ST Buffer | 400 mM sucrose, 50 mM Tris pH 7.8, 0.6% (w/v) polyvinylpyrrolidone (PVP), 0.1% (w/v) bovine serum albumin (BSA) | Buffer mix containing only sucrose and Tris can be made up to a month in advance and kept at 4°C. Note that the ST buffer does not contain EDTA or BME, and contains a lower concentration of BSA. | Method #1 |
DNase stock | 2 mg/ml DNase in 0.15 M NaCl to a stock concentration of 2 mg/ml | Store 200 ul aliquots at -20°C. To prepare DNase working solution (200 μl of DNase solution per sample) see Table 1 below. See the full protocol below for full details of DNase digestion. DNase working solution should be prepared fresh. To stop the DNase reaction a 400 mM EDTA pH 8.0 solution is required (final concentration needed to stop the reaction is 0.2 M EDTA, see full protocol for details). | Method #1 |
DNase working solution | 0.25 mg/ml DNase and 20 mM MgCl2 in ST Buffer | Prepare fresh, 200 ul per sample. Concentrations shown are for final reaction volume, so mix: 62.5 μl 2 mg/ml DNase (based on final 500 μl reaction volume), 4 μl 1M MgCl2 (based on 200 μl DNase solution volume), and 133.5 μl of ST buffer for a final volume of 200 μl. | Method #1 |
Lysis Buffer | 20 mM EDTA pH 8.0; 10 mM Tris pH 7.9; 500 mM Guanidine-HCl; 200 mM NaCl; 1% Triton X-100; 0.5 mg/ml lysing enzymes from trichoderma harzianum | Mix all ingredients except for lysing enzymes and store at room temperature. Lysing enzymes should be added fresh to a small aliquot for immediate use. | Method #2 |
Table 1: Recipes of homemade buffers and working stocks.
Concentration Worksheet | |||||||
SAMPLE NAME | Empty Device Weight (g) | Weight of Filled Device (g) | Filled Volume (ul, filled minus empty weights) | Weight After 1st Spin (20 min*, g) | Volume After 1st Spin (ul, filled minus empty weights) | Weight After 2nd Spin (X min*, g) | Volume After 2nd Spin (ul, filled minus empty weights) |
Note that the actual recovered volume will be a few ul less than calculated volume. |
Table 2: Concentration Worksheet.
Name | Genome Specificity | Gene Sequence Source | Sequence (5’ – 3’) |
Ta_ACTIN – F | Nuclear | Gramene Scaffold IWGSC_CSS_1AS_scaff_3272162: 10,663-12,557 | CAGGTATCGCTGACCGTATGA |
Ta_ACTIN – R | Nuclear | Same as above | GAAGGTAGGGCTGAACAAGAAAC |
Ta_NAD3 – F | Mitochondrial | NCBI accession EU534409.1 | GGTGATGCCAGAAGTCGTTT |
Ta_NAD3 – R | Mitochondrial | Same as above | CAGATCAATCTTGTTAGGAGGTACTG |
Ta_PSBB – F | Chloroplast | NCBI accession KJ592713.1 | GCTACCTTTGCTTTGCTCTTCT |
Ta_PSBB – R | Chloroplast | Same as above | GCTGCCTGTTTCCTTGTAGTT |
Table 3: List of qPCR Primers.
To date, most organellar sequencing studies center on traditional DC methods to enrich for specific DNA. Methods to isolate organelles from diverse plants have been described, including moss40; monocots such as wheat15 and oats11; and dicots such as arabidopsis11, sunflower17, and rapeseed14. Most protocols focus on leaf tissue13,14,15,16,17, with some having been adapted for a variety of tissue types, including seeds11. The isolation of organelles from protoplasts has also been demonstrated41. However, this not amenable to all systems, nor is it feasible when the tissue of interest is limited. Many of these organellar isolation methods were designed to recover intact organelles for specific experiments, such as physiological studies. These protocols are cumbersome and typically require the use of density gradients, such as sucrose or Percoll gradients, which are very efficient at isolating specific organellar fractions but require a large tissue input (i.e., in excess of 5 g and upwards of kilograms, depending on the tissue type). However, the DC method may be optimized to enrich for specific cellular fractions, such as the mitochondria or chloroplast, by changing spin speeds and density gradients. In contrast, the MF approach requires far less starting material (20 mg), but mitochondrial and plastid DNAs will be present per their relative abundances in the tissue used for DNA extraction. Nonetheless, the MF protocol offers an alternative approach for isolating mixed organellar DNA and is particularly beneficial for starting with small amounts of tissue.
To assess sample purity following organelle isolation, most studies to date only use end-point PCR and gel electrophoresis11,12. This gives a fair qualitative measure of sample purity. However, low levels of amplification may not be visualized on an agarose gel. Few reports include more quantitative measures of quality control, such as qPCR14. For a quantitative assessment of DNA sample purity isolated from both methods, we utilized qPCR and sequencing to determine how much nuclear DNA remains in the sample, as well as the relative proportions of mitochondrial versus chloroplast DNA. Both methods evaluated here are efficient at removing nuclear DNA. Both methods yield a mix of mitochondrial and chloroplast DNA, albeit at different proportions.
Growing plants in the dark (etiolation) is reported to help facilitate organellar isolation due to a reduction of phenolics13. However, in this comparison, we did not find an appreciable advantage to working with etiolated tissue over light-grown samples. Although the proportion of specialized chloroplasts will likely be higher when light-grown, the total plastid number, as reflected in the proportion of reads mapping to the chloroplast genome, is unchanged under differing light conditions. Therefore, for downstream functional analyses, such as the assessment of heteroplasmy in different tissues or under different stressors or for expression analyses, we recommend performing genomic sequencing on plants grown under physiologically relevant conditions.
For application with short-read sequencing technologies, both techniques compared here yield adequate DNA quantity and quality. However, to achieve long reads of >20 kb for single-molecule sequencing applications, a greater amount of higher-quality DNA is necessary. For instance, ideally, > 1 µg of pure organellar wheat DNA with a molecular weight >20 kb is necessary for in-house, low-input protocols for 20-kb insert library preparations42. New user-developed, low-input protocols may reduce DNA requirements (i.e., to 50 ng or even less20), but the challenge remains to have high-quality, high-molecular weight DNA going into the library preparations. It is essential that a majority of the DNA is >20 kb, as smaller fragments will be preferentially inserted into the SMRTbell and throw off the size distribution of the library43. We tried a number of homemade DNA extraction protocols and a number of commercial protocols for DNA extraction (not shown). For wheat leaf tissue, the best balance between DNA quantity and quality, particularly length, was obtained using a commercial kit27,29. Depending on the plant species and tissue of interest, alternative extraction protocols may be equally suited or more fruitful. Nonetheless, we conclude that the total extraction of high-molecular weight genomic DNA >50 kb in size, followed by fractionation with the MBD2-Fc pulldown approach28, is amenable to long-read sequencing from limited starting material. Future work should test the limits of the starting material required following fractionation for long-insert library preparation and subsequent long-read sequencing. Critically, this approach could provide a robust method to isolate DNA from a subsample of a single leaf that is suitable for long-read sequencing, without whole-genome amplification. We anticipate that this approach will be easily adaptable to additional tissue types and broadly applicable to other plant species. It will be particularly useful in situations where the tissue amounts are limiting, such as sequencing at individual generations in a crossing scheme or in rarer tissue types.
The authors have nothing to disclose.
We would like to acknowledge funding from the United States Department of Agriculture-Agricultural Research Service and from the National Science Foundation (IOS 1025881 and IOS 1361554). We thank R. Caspers for greenhouse maintenance and plant care. We also thank the University of Minnesota Genomics Center, where the Illumina library preparations and sequencing were performed. We are also grateful for the comments from the journal editors and four anonymous reviewers that further strengthened our manuscript. We also thank OECD for a fellowship to SK to integrate these protocols for collaborative projects with colleagues in Japan.
2-mercaptoethanol (beta-mercaptoethanol; BME) | Sigma Aldrich | M3148-100ml | |
2-propanol (Isopropyl alcohol/isopropanol), bioreagent | Sigma Aldrich | I9516 | |
agarose, Bio-Rad Cetified Megabase agarose | Bio-Rad | 1613108 | |
analytical balance | Mettler Toledo | AB54-S | |
balance | Mettler Toledo | PB1502-S | |
bovine serum albumin (BSA) | Sigma Aldrich | B4287-25G | |
Ceramic grinding cylinders, 3/8in x 7/8in | SPEX SamplePrep | 2183 | |
Cryogenic Blocks compatible with tissue homogenizer for holding 50 ml tubes | SPEX SamplePrep | 2664 | |
DNaseI | Sigma | DN25 | |
ethanol, absolute | Decon Laboratories | 2716 | |
Ethylenediamine Tetraacetic Acid (EDTA), 0.5M Solution, pH8.0 | Fisher | BP2482-500 | |
gel imaging system | |||
gel stain | Such as GelRed or Ethidium Bromide | ||
grinding pestle, wide tip for 2 ml conical tubes | |||
Guanidine-HCl, 8M solution | ThermoFisher | 24115 | |
LightCycler 480 SYBR Green I Master | Roche | 4707516001 | |
liquid nitrogen | |||
Lysing enzymes from Trichoderma harzianum | Sigma | L1412 | |
Magnesium Chloride | G Bioscience | 24115 | |
magnetic rack | ThermoFisher | A13346 | |
microcentrifuge tubes, LoBind 1.5 ml | Eppendorf | 22431021 | |
microcentrifuge tubes, standard nuclease-free 1.5 ml | Eppendorf | ||
microcentrifuge, refrigerated | Sorvall | Legend X1R | Or equivalent product, must be capable of reaching at least 18,000 x g with rotors for 50 ml tubes, Oak Ridge tubes, and 1.5 ml tubes |
microcentrifuge, room temperature | Eppendorf | 5424 | Or equivalent product, must be capable of reaching at least 18,000 x g with rotor for 1.5 ml and 2 ml microcentrifuge tubes |
Microcon DNA Fast Flow Centrifugal Filter Units | EMD Millipore | MRCFOR100 | |
Miracloth, 1 square per sample cut to fit funnel | EMD Millipore | 475855 | |
NEBNext Microbiome DNA Enrichment Kit | New England Biolabs | E2612L | |
parafilm | Parafilm M | PM992 | |
plastic pots and trays | |||
polyvinylpyrrolidone (PVP) | Fisher | BP431-100 | |
Proteinase K | Qiagen | 19131 | |
Pulsed-Field Gel Electrophoresis rig (e.g. CHEF DR III) | Bio-Rad | 1703697 | |
purification beads, Agencourt AMpureXP beads | Beckman Coulter | A63881 | |
QIAamp DNA Mini Kit | Qiagen | 51304 | |
Qiagen 20/g Genomic Tip DNA Extraction Kit | Qiagen | 10223 | |
Qiagen Buffer EB (elution buffer) | Qiagen | 19086 | |
Qiagen DNA Extraction Buffer Set | Qiagen | 19060 | |
QiaRack | Qiagen | 19015 | |
qPCR machine (e.g. Roche Light Cycler 480) | Roche | ||
qPCR plate sealing film | Roche | 4729757001 | |
qPCR plate, 96 well plate | Roche | 4729692001 | |
Qubit assay tubes | Life Technologies | Q32856 | |
Qubit Broad Spectrum assay kit | Life Technologies | Q32850 | |
Qubit High Sensitivity assay kit | Life Technologies | Q32851 | |
RNaseA | Qiagen | 19101 | |
Serological pipettes (20 ml) and pipet-aid | Fisher | 13-678-11 | |
Small funnels, 1 per sample | |||
Sodium Chloride | Ambion | AM9759 | |
Soft paintbrush, 2 per sample | |||
SPEX SamplePrep 2010 Geno/Grinder or another type of tissue homogenizer | SPEX SamplePrep | Or another comparable tissue homogenizer. If you do not have access to a tissue homogenizer, then grinding in a pre-chilled mortar and pestle will suffice (see protocol for details). However, a homogenizer will give more consistent results and total homogenization time is reduced. | |
Sucrose | Omnipure | 8550 | |
TBE | |||
thermomixer | |||
Tris | Sigma | T2819-100ml | |
Triton X-100 | Promega | H5142 | |
tube rotater | |||
tubes, 50 mL conical polypropylene | Corning | 352070 | |
tubes, 50 ml high-speed polypropylene | ThermoScientific/Nalgene | 3119-0050 | e.g. Nalgene Oakridge tubes or equivalent |
vermiculite | |||
water bath | |||
water, sterile and certified Nuclease-free | Fisher | 1481 | |
water, sterile milliQ |