Environmental DNA assays require rigorous design, testing, optimization and validation before the collection of field data can begin. Here, we present a protocol to take users through each step of designing a species-specific, probe-based qPCR assay for the detection and quantification of a target species DNA from environmental samples.
New, non-invasive methods for detecting and monitoring species presence are being developed to aid in fisheries and wildlife conservation management. The use of environmental DNA (eDNA) samples for detecting macrobiota is one such group of methods that is rapidly becoming popular and being implemented in national management programs. Here we focus on the development of species-specific targeted assays for probe-based quantitative PCR (qPCR) applications. Using probe-based qPCR offers greater specificity than is possible with primers alone. Furthermore, the ability to quantify the amount of DNA in a sample can be useful in our understanding of the ecology of eDNA and the interpretation of eDNA detection patterns in the field. Careful consideration is needed in the development and testing of these assays to ensure the sensitivity and specificity of detecting the target species from an environmental sample. In this protocol we will delineate the steps needed to design and test probe-based assays for the detection of a target species; including creation of sequence databases, assay design, assay selection and optimization, testing assay performance, and field validation. Following these steps will help achieve an efficient, sensitive, and specific assay that can be used with confidence. We demonstrate this process with our assay designed for populations of the mucket (Actinonaias ligamentina), a freshwater mussel species found in the Clinch River, USA.
Researchers and managers are increasingly becoming interested in the use of environmental DNA assays for species detection. For three decades, quantitative or real time PCR (qPCR/rtPCR) has been used in numerous fields for the sequence-specific detection and quantification of nucleic acids1,2. Within the relatively new field of eDNA research, use of these assays with a standard curve for quantification of copies of target DNA per volume or weight of eDNA sample has now become routine practice. Mitochondrial DNA sequences are generally targeted in eDNA assays because the mitochondrial genome is present in thousands of copies per cell, but assays for nuclear DNA or RNA sequences are also possible. It is vital to understand that published assays for eDNA samples are not always equal in performance. An assay’s reliability in detecting only the DNA of a target species (i.e., specificity) and detection of low quantities of target DNA (i.e., sensitivity) may vary considerably due to differences in how the assay was designed, selected, optimized and tested. Reporting quantitative measures of assay performance has been previously largely overlooked, but recently standards to improve transparency in assay development are emerging3,4,5,6,7,8.
Optimization and reporting of assay performance aids in study design and interpretation of eDNA survey results. Assays that cross-react with non-target species DNA could lead to false positive detections, while assays with poor sensitivity may fail to detect the target species DNA even when it is present in the sample (false negatives). An understanding of assay sensitivity and selectivity will help inform the sampling effort needed to detect rare species. Because there are many natural sources of variation in eDNA, studies must limit controllable sources of variation as much as possible, including fully optimizing and characterizing the eDNA assay3.
Conditions that directly affect an assay’s specificity or sensitivity will change the assay’s performance. This can occur under different laboratory conditions (i.e., different reagents, users, machines, etc.). Therefore, this protocol should be revisited when applying an assay under new conditions. Even assays well-characterized in the literature should be tested and optimized when adopted by a new laboratory or when using different reagents (e.g., master-mix solution)5,9. Assay specificity may change when applied to a different geographic region, because the assay is being applied to samples from a new biotic community that may include non-target species that the assay has not been tested against, and genetic variation in the target species may occur. Again, the assay should be re-assessed when used in a new location. Field conditions differ from laboratory conditions because in the field PCR inhibitors are more likely to be present in samples. PCR inhibitors directly affect the amplification reaction and thus affect assay performance. For this reason, an internal positive control is required when developing an eDNA assay.
Finally, environmental conditions in the field can affect the target species' DNA molecules and their capture through DNA degradation, transport and retention. Furthermore, different protocols for DNA collection and extraction vary in their efficiency and ability to retain DNA. However, it is important to note that these processes affect the detectability of eDNA but not a molecular assays’ performance. Thus, detectability of DNA from the target species in field samples is a function of both the technical performance of the qPCR assay as well as field conditions and collection, storage, and extraction protocols. When using a well characterized and highly performing assay, users can feel confident in the assay’s capabilities; allowing researchers to now focus on understanding the external assay factors (i.e., environmental variables, differences in capture or extraction protocols) affecting eDNA detection.
Here we focus specifically on assay technical performance through rigorous design and optimization. We demonstrate the protocol using a probe-based assay developed for the detection of a freshwater mussel, the mucket (Actinonaias ligamentina), from water sampled in the Clinch River, USA. Recently Thalinger et al. (2020) presented guidelines for validation of targeted eDNA assays. Assay design following our protocol will bring an assay to Thalinger et al.’s level 4 plus an additional step towards level 56. At this point an assay’s technical performance will be optimized and it will be ready for regular use in laboratory and field applications. Further use of the assay in laboratory, mesocosm, and field experiments can then address questions regarding eDNA detection and factors influencing detectability, the final steps for level 5 validation6.
1. Generation of a sequence database of mitochondrial DNA sequences from target and non-target species of interest
2. Assay design
3. Assay screening and optimization
In designing a species-specific qPCR assay for the mucket (A. ligamentina), available sequences of all Unionidae species in the Clinch river were downloaded. Closely related species such as Lampsilis siliquoidea were also included in the reference database even though they are not found in the same river. Not all species in the river system of interest were found in GenBank, so additional species were sequenced in house. Sequences were aligned using Geneious software and Primer Quest (IDT) software was used to design multiple assays. Five sets of primers and probe were added to the alignment for visual assessment (Figure 2). They were then tested in silico using Primer-Blast, after which they were ordered for further testing in vitro. In the laboratory, all assays were tested using DNA extractions of 27 available species to verify specificity. One assay (A.lig.1) successfully amplified only the target species (Table 1; Table 2). This assay moved forward for further testing of assay efficiency, LOD and LOQ. It has an amplicon length of 121 base pairs. Table 3 shows the sequence used for the A. ligamentina synthetic DNA standard. Figure 3A and Figure 3B show the results of a successful assay with good efficiency and r2 values. Figure 3C and Figure 3D show an assay whose standard curve has a poor efficiency; this assay was discarded. The LOD and LOQ for the selected assay (A.lig.1) were both found to be 5.00 copies/reaction using the discrete method described in Klymus et al5. The IPC that was multiplexed with the assay (Tables 3-6) did not affect the A. ligamentina assay’s standard curve. The IPC we use is a fragment of the mouse HemT transcript. This assay was predesigned by IDT for another application, but we modified its use as an IPC for our lab’s eDNA applications.
A successful qPCR run should meet certain criteria for each measure of performance (i.e., standard curve amplification, genomic DNA positive control, no template control and internal positive control). The target assay standards should have exponential amplification curves. These curves should reach an end point plateau if allowed to run enough cycles. This is indicative of the fluorescent probe being completely consumed during the reaction, and fluorescence levels reaching a maximum limit. Later amplifying standards may not reach a plateau in 40 cycles. The positive controls (genomic DNA and IPC) should have the same pattern. Unknowns may or may not amplify, but amplification in unknowns should also have an exponential pattern and an endpoint plateau (Figure 5).
In a quality qPCR, the standard dilutions amplify at evenly spaced Cq of approximately every 3.3 cycles for each 10-fold difference in concentration. Each replicate of a standard dilution amplifies in a tightly grouped manner having nearly the same Cq (represented by the r2 values). All standard dilutions should exhibit amplification (Figure 3A). In a poor qPCR, standards may exhibit non-exponential shape, uneven variation in Cq values between dilutions, not come to an endpoint plateau, or some dilutions may not amplify at all (Figure 3D).
The important parameters for a standard curve are efficiency, r2, slope, and y-intercept. Efficiency should fall between 90%-110% with ideal values near 100% and r2 values should be above 0.98 with ideal results approaching 1.015,22. Slope values should be between -3.2 and -3.5 with ideal results near -3.322. The y-intercept values should fall between a Cq of 34-41 with ideal results having a Cq of 37.0. The y-intercept is the predicted Cq of a reaction with 1 copy of the target sequence, the smallest unit that can be measured in a single qPCR. Unknowns with Cq’s greater than the y-intercept are likely to be inhibited. Running greater than 40 cycles of PCR may be necessary to detect the target in case of inhibition or an inefficient primer set, however quantification is not possible under these circumstances and additional negative controls without the target sequence, but containing total DNA similar to the unknowns, should be run to rule out amplification from non-specific sources.
The Internal Positive Control (IPC) amplification in unknown samples should be compared to the results of the negative template control IPC, as there is no competition for reagents and no inhibitors are present. Unknowns with an IPC having a Cq of 2 cycles or greater than the average Cq value of the NTC, or that do not amplify should be considered inhibited. If no inhibitors are present in the samples, then all IPC amplification should have a tight grouping in the plot with Cq values near the same as the NTC (Figure 6).
Finally, in situ testing of the assay occurred. Twenty water samples from the Clinch River and three field blank sample were filtered between September 25-26, 2019 within 500 meters from a mussel bed known to have A. ligamentina. Approximately four 1 L samples of water were filtered per sampling location. Location sites included at the bottom of the mussel bed in stream, bottom of the mussel bed near shore, 100 m downstream of the bed in stream, 500 m downstream of the bed in stream and 500 m downstream of the bed near shore (Figure 7). Back in the laboratory, each filter was cut in half and DNA was extracted from only half of a filter. The remaining filter half for each sample was stored in a -80 °C freezer. Samples were then run using the A.lig.1 assay multiplexed with the IPC. Of the 23 samples, five were found to be inhibited. These samples were diluted 1:10 and dilutions were re-run. Nineteen of the 20 field samples amplified using the designed assay. Of these 19 samples, five were above the assay’s LOD and LOQ of 5 copies/reaction; meaning most of the samples had an eDNA detection but at a level where false negative results are likely to occur and that the assay could not confidently quantify the copy number for those 14 samples. Nevertheless, 75 to 100% of the four biological site replicates amplified at each sampling location. Two of the three field blanks were negative, while one field blank did show amplification, emphasizing the importance of clean technique in the field.
Figure 1: Workflow for mitochondrial DNA sequence database construction.Please click here to view a larger version of this figure.
Figure 2: Sequence alignments for Clinch river mussel species with prospective primers and probes for the Actinonaias ligamentina ND1 assay. Forward primers in dark green, probe in red and reverse primer in light green. Please click here to view a larger version of this figure.
Figure 3: Standard curve and linear regression examples. A. Example of an acceptable standard curve derived from the amplification of three replicates each of six standard dilutions. A 10-fold standard dilution series with the highest concentration of the standard on the left, with decreasing concentrations moving to the right. The horizontal line crossing all the traces is the threshold for cycle at quantitation (Cq). Where each trace crosses this threshold is where the Cq is determined. B. Linear regression made from the standard replicates of Figure 3A. Replicates of the standard dilutions are plotted in circles and the unknowns (samples) are plotted with x’s. The efficiency is 98.9%, r2 approaching 1.0, and slope of -3.349. C. Example of a poor standard curve derived from the amplification of three replicates each of six standard dilutions. D. A linear regression forming the standard curve for the standard replicates amplified in example 3C. Note the poor efficiency and r2 values. Also note that only 4 of the 6 standards amplified. If after repeat runs, the standard curve does not improve, the problem may be with a poor primer/probe set that does not amplify target DNA as expected in which case, this assay should not be considered. Please click here to view a larger version of this figure.
Figure 4: Examples of plate setups for LOD and LOQ standard qPCR runs. Standards used in the curve are in blue, standard concentration decreases from dark to light blue. DNA positive control in green and no template control (NTC) in yellow. Experimental standard concentrations in grey showing 24 replicates for each standard dilution. The dilution series was plated across two plates (A, B), each with a standard curve, positive control, and NTC. Please click here to view a larger version of this figure.
Figure 5: Plate setup and amplification traces from a qPCR run. A. Plate setup, standards shown in blue, darker color indicating the highest the concentration of the standard. DNA positive control in green, no template controls in yellow (NTC), sample targets in grey. B. Amplification traces from a qPCR run. Standards shown in blue, DNA positive control in green, no template controls in yellow, and unknowns in red. Please click here to view a larger version of this figure.
Figure 6: Amplification traces for the Internal Positive Control (IPC). IPC traces for all unknown samples in magenta and the IPC from the no template controls (NTCs) shown in orange with triangles. Please click here to view a larger version of this figure.
Figure 7: Map showing the eDNA collection sites of a mussel bed in the Clinch River along the Virginia/Tennessee border. Samples were collected at Wallens Bend at the bottom of the bed, 100 m downstream of the bed, and 500 m downstream of the bed. Sites were either collected in the middle of the stream (in stream) or roughly 1 – 2 meters from the shoreline (shore). Please click here to view a larger version of this figure.
Component | Name | Sequence 5’ – 3’ | Fluorescent label | |
Forward Primer | A.lig.1-f | CCCTCATCACGTACCTCTTAATC | ||
Reverse Primer | A.lig.1-r | GGAATGCCCATAATTCCAACTTTA | ||
Probe | A.lig.1 probe | TTCTTGAACGTAAAGCCCTCGGGT | FAM |
Table 1: The designed Actinonaias ligamentina qPCR assay (A.lig.1) including sequences for the forward and reverse primers and the probe.
Species | Amplified | In the Clinch River |
1. Actinonaias ligamentina | Yes | Yes |
2. Actinonaias pectorosa | No | Yes |
3. Amblema plicata | No | Yes |
4. Corbicula spp. | No | Yes |
5. Cumberlandia monodonta | No | Yes |
6. Cyclonaias tuberculata | No | Yes |
7. Cyprogenia stegaria | No | Yes |
8. Elliptio dilatata | No | Yes |
9. Epioblasma brevidens | No | Yes |
10. Epioblasma capsaeformis | No | Yes |
11. Epioblasma florentina aureola | No | Yes |
12. Epioblasma triquetra | No | Yes |
13. Fusconaia cor | No | Yes |
14. Fusconaia subrotunda | No | Yes |
15. Lampsilis ovata | No | Yes |
16. Lampsilis siliquoidea | No | No |
17. Lasmigona costata | No | Yes |
18. Lemiox rimosus | No | Yes |
19. Lexingtonia dolabelloides | No | Yes |
20. Medionidus conradicus | No | Yes |
21. Plethobasus cyphyus | No | Yes |
22. Pleurobema plenum | No | Yes |
23. Ptychobranchus fasciolaris | No | Yes |
24. Ptychobranchus subtentus | No | Yes |
25. Quadrula pustulosa | No | Yes |
26. Strophitus undulatus | No | Yes |
27. Villosa iris | No | Yes |
Table 2: A list of species used for the in vitro specificity testing of the A.lig.1 assay. The assay amplified genomic DNA of the target (Actinonaias ligamentina) and did not amplify any of the non-target species.
Component | Sequence 5’-3’ | ||||
Actinonaias ligementina standard | CCCTCATCACGTACCTCTTAATCCTATTAGGTGTCGCATTTTTCACTCTTCTTGAACGTA | ||||
AAGCCCTCGGGTACTTTCAAATCCGAAAAGGCCCAAATAAAGTTGGAATTATGGGCATTC | |||||
CCCAACCATTAGCAGATGCTCTAAAGCTCTTCGTAAAAGAATGAGTAACACCAACCTCCT | |||||
CAAACTACCTACCCTTCATCTTAACCCCAACCACTATGTTAATTTTAGCACTTAGACTTT | |||||
GACAATTATTTCCATCCTTTATANTATCATCCCAAATANTTTTTGGTATGCTCCTATTCT | |||||
TGTGTATCTCCTCCCTAGCTGTTTATACAACACTTATAACAGGCTGAGCCTCAAACTCCA | |||||
AATATGCCCTTTTAGGAGCTATTCGAGCCATAGCCCAAACCATTTCTTATGAGGTTACAA | |||||
TAAC | |||||
IPC template (Hem-T) | CTACATAAGTAACACCTTCTCATGTCCAAAGCTCTCTGAGTGTCCCTCGAATCTCAGACGCT | ||||
GTATGACAGTCTCCTTTCGTGTGAACATTCGGCTGCTCTATGTTCTCAAGGACTGCAC | |||||
Table 3: Sequence (5’-3’) of the Actinonaias ligamentina standard and the IPC template (Hem-T) used for this assay. The sequence for the forward and reverse primers are in bold and italics, and that of the probe is underlined.
Component | Name | Sequence 5’ – 3’ | Fluorescent label | |
Forward Primer | HemT-F | TCTGAGTGTCCCTCGAATCT | ||
Reverse Primer | HemT-R | GCAGTCCTTGAGAACATAGAGC | ||
Probe | HemT-P | TGACAGTCTCCTTTCGTGTGAACATTCG | Cy5 |
Table 4: The Internal Positive Control (IPC) assay including sequences for the forward and reverse primers and the probe.
Volume per sample (µL) | Component |
10 | Environmental Master Mix |
1 | 20uM A. lig.1 F/R mix |
1 | 2.5uM A. lig.1 probe |
1 | 5uM IPC primer mix (HemT-F/ R) |
0.75 | 2.5uM IPC probe (HemT-P) |
1.5 | 1 X 103 concentration of the IPC template |
2.75 | H20 |
2 | Sample |
20 | Total Volume |
Table 5: The PCR mix used for the A.lig.1 assay multiplexed with the IPC assay.
Step | Temperature (°C ) | Time | |
1 | Initial Denature | 95 | 10 min |
2 | Denature | 95 | 15 sec |
3 | Annealing | 60 | 1 min |
4 | Go to Step 2, repeat 39X |
Table 6: Reaction conditions for the A.lig.1 assay.
As with any study, defining the question to be addressed is the first step and the design of the eDNA assay depends upon the scope of the study26. For instance, if the goal of the research or survey is to detect one or a few species, a targeted probe-based assay is best. If, however, the goal is to assess a larger suite or assemblage of species, high throughput sequencing metabarcoding assays are better suited. Once it is determined which approach to take, a pilot study including assay design, testing, and optimization is recommended24. Assay design starts with a list of species as described in Figure 1. This list will be the basis for understanding how well an assay performs in terms of specificity and the geographic range it might be applied to6,10. It is encouraged to design the assay for a specific geographic area, enabling the designer to better test an assay for cross-reactivity against other species in that area, and to be aware of the limitations this has on extending an assay to other areas where a target species may occur24. Once the list is complete, sequences can be downloaded from public genetic databases. Since these databases are incomplete27, one should sequence as many species on the list as possible in house to complete the local reference database of sequences that will be used in assay design. Prioritize co-occurring closely related species, as these are the most likely non-targets that will amplify. Focusing on all species within the same genus or family as the target species is a good place to start. Comparisons with closely related species will help identify sequence regions unique to the target species. This can help inform how the assay may perform in other systems or locations. Mitochondrial regions are the usual choice for assay development, because more sequence information from a wider variety of species is available at mitochondrial genes that have been used in barcode of life projects, and because mitochondrial DNA is present at much greater concentration in copies/cell than nuclear DNA24,28,29. Multiple gene regions should be assessed for further assay development as sequence coverage varies among taxa in the genetic repository databases. After this local database of reference sequences is created, a combination of manual visualization of aligned sequence data and computer software programs is used to design the primer/probe assays. One should not rely strictly on software to determine which assays to test. It is important to verify visually on alignments where the primers and probes sit on the targets and non-targets to get a better understanding of how they might act in a PCR. Finally assay screening and optimization includes three levels (in silico, in vitro and in situ)6,7,24,25. In silico design and testing are important for producing a short list of assays with a good chance of success, but empirical (in vitro) testing is crucial for selecting the assay with the best actual performance. In vitro optimization and testing of assays include measuring the reaction efficiency and defining the assay’s sensitivity and specificity. Limits of detection and quantification are two parameters often overlooked in assay development but important for data interpretation. By running multiple replicates of the standard curves for an assay, LOD and LOQ can easily be measured1,5,30. Few studies discuss results with respect to the assay’s LOD or LOQ, but Sengupta et al. (2019) incorporate their assay’s LOD and LOQ into their data interpretation and graphics for a clearer understanding of their results31. Internal positive controls should be multiplexed into the designed assay as well. Without testing for PCR inhibition in the samples, false negatives may occur24,32. We propose the use of a multiplexed IPC assay with the target assay as the easiest method for PCR inhibition testing23. Finally, in situ testing of the assay from field and laboratory collected samples is necessary to ensure target amplification occurs in environmental samples24.
Limitations exist for the use of species-specific, probe-based qPCR assays with eDNA samples. For instance, the design of multiple assays for testing may be limited by sequence availability, and compromise may be necessary on aspects of assay performance. These choices must be guided by the goals of the study and must be reported with the results26. For example, if the goal is detection of a rare species and few positives are expected, an assay with imperfect specificity (i.e., amplification of non-target species) could be used if all detections will be verified by sequencing. If the goal is monitoring the geographic range of a species and eDNA concentration data is not needed, an assay with imperfect efficiency could be used and data reported only as percent detection. Furthermore, unless all potential conspecifics are tested in the laboratory, which is rarely possible, one cannot know with absolute certainty the true specificity of an assay. For instance, the assay was designed and tested against several freshwater mussel species in the Clinch River. To use this assay in a different river system, we would need to test it against a suite of species in the new location. Genetic variation within the species or population that is not tested during assay development might also affect specificity. Finally, even if an assay has been verified to have high technical performance; conditions change when working in the field. Non-assay related conditions such as water flow, pH, and animal behavior can change eDNA detectability as can use of different eDNA collection and extraction protocols. Using assays that are optimized and well described will help facilitate understanding of the influence such parameters have on eDNA detection.
The field of eDNA is maturing beyond the stage of exploratory analysis to increasing standardization of methods and techniques. These developments will improve our understanding of eDNA techniques, abilities, and limitations. The optimization process we outline above improves an assay’s sensitivity, specificity, and reproducibility. The ultimate goal of this refinement and standardization of eDNA methods is to improve researchers’ abilities to make inferences based on eDNA data as well as increase end-user and stake-holder confidence in results.
The authors have nothing to disclose.
We thank Alvi Wadud and Trudi Frost who helped in primer development and testing. Funding for the assay design reported in this study was provided by the Department of Defense Strategic Environmental Research and Development Program (RC19-1156). Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Data generated during this study are available as a USGS data release https://doi.org/10.5066/P9BIGOS5.
96 Place Reversible Racks with Covers | Globe Scientific | 456355AST | |
Clean gloves (ie. latex, nitrile, etc.) | Kimberly-Clark | 43431, 55090 | |
CFX96 Touch Real-Time PCR Detection System | Bio-Rad | 1855196 | |
Fisherbrand Premium Microcentrifuge Tubes: 1.5mL | Fisher Scientific | 5408129 | |
Fisherbrand Premium Microcentrifuge Tubes: 2.0mL | Fisher Scientific | 2681332 | |
Hard-Shell 96-Well PCR Plates, low profile, thin wall, skirted, white/clear | Bio-Rad | #HSP9601 | |
IPC forward and reverse primers | Integrated DNA Technologies, Inc. | none | custom product |
IPC PrimeTime qPCR Probes | Integrated DNA Technologies, Inc. | none | custom product |
IPC Ultramer DNA Oligo synthetic template | Integrated DNA Technologies, Inc. | none | custom product |
Labnet MPS 1000 Compact Mini Plate Spinner Centrifuge for PCR Plates | Labnet | C1000 | |
Microcentrifuge machine | Various | – | Any microcentrifuge machine that hold 1.5mL and 2.0mL tubes is typically okay. |
Microseal 'B' PCR Plate Sealing Film, adhesive, optical | Bio-Rad | MSB1001 | |
Nuclease-Free Water (not DEPC-Treated) | Invitrogen | AM9932 | |
Pipette Tips GP LTS 1000 µL F 768A/8 | Rainin | 30389272 | |
Pipette Tips GP LTS 20 µL F 960A/10 | Rainin | 30389274 | |
Pipette Tips GP LTS 200 µL F 960A/10 | Rainin | 30389276 | |
Pipettes | Rainin | Various | Depending on lab preference, manual or electronic pipettes can be used at various maximum volumes. |
TaqMan Environmental Master Mix 2.0 | Thermo Fisher Scientific | 4396838 | |
Target forward and reverse primers | Integrated DNA Technologies, Inc. | none | custom product |
Target PrimeTime qPCR Probes | Integrated DNA Technologies, Inc. | none | custom product |
Target synthetic gBlock gene fragment | Integrated DNA Technologies, Inc. | none | custom product. used for qPCR standard dilution series |
TE Buffer | Invitrogen | AM9849 | |
VORTEX-GENIE 2 VORTEX MIXER | Fisher Scientific | 50728002 |