Here, we present a protocol to investigate the impacts of hydraulic fracturing on nearby streams by analyzing their water and sediment microbial communities.
Hydraulic fracturing (HF), commonly called "fracking", uses a mixture of high-pressure water, sand, and chemicals to fracture rocks, releasing oil and gas. This process revolutionized the U.S. energy industry, as it gives access to resources that were previously unobtainable and now produces two-thirds of the total natural gas in the United States. Although fracking has had a positive impact on the U.S. economy, several studies have highlighted its detrimental environmental effects. Of particular concern is the effect of fracking on headwater streams, which are especially important due to their disproportionately large impact on the health of the entire watershed. The bacteria within those streams can be used as indicators of stream health, as the bacteria present and their abundance in a disturbed stream would be expected to differ from those in an otherwise comparable but undisturbed stream. Therefore, this protocol aims to use the bacterial community to determine if streams have been impacted by fracking. To this end, sediment, and water samples, from streams near fracking (potentially impacted) and upstream or in a different watershed of fracking activity (unimpacted) must be collected. Those samples are then subjected to nucleic acid extraction, library preparation, and sequencing to investigate microbial community composition. Correlational analysis and machine learning models can subsequently be employed to identify which features are explanative of variation in the community, as well as identification of predictive biomarkers for fracking's impact. These methods can reveal a variety of differences in the microbial communities among headwater streams, based on the proximity to fracking, and serve as a foundation for future investigations on the environmental impact of fracking activities.
Hydraulic fracturing (HF), or "fracking", is a method of natural gas extraction, which has become increasingly prevalent as the demand for fossil fuels continues to rise. This technique consists of using high-powered drilling equipment to inject a blend of water, sand, and chemicals into methane-rich shale deposits, usually to release trapped gasses1.
Because these unconventional harvesting techniques are relatively new, it is important to investigate the effects of such practices on nearby waterways. Fracking activities mandate the clearing of large swaths of land for equipment transportation and well pad construction. Approximately 1.2-1.7 hectares of land must be cleared for each well pad2, potentially impacting runoff and water quality of the system3. There is a lack of transparency surrounding the exact chemical composition of fracking fluid, including what biocides are used. Additionally, fracking wastewater tends to be highly saline2. Furthermore, the wastewater may contain metals and naturally occurring radioactive substances2. Therefore, the possibility of leaks and spills of fracking fluid due to human error or equipment malfunction is concerning.
Stream ecosystems are known to be very sensitive to changes in surrounding landscapes4 and are important for maintaining biodiversity5 and proper nutrient cycling6 within the entire watershed. Microbes are the most abundant organisms in freshwater streams and thus, are essential to nutrient cycling, biodegradation, and primary production. Microbial community composition and function serve as great tools to gain information on the ecosystem due to their sensitivity to perturbance, and recent research has shown distinct shifts in observed bacterial assemblages based on proximity to fracking activity7,8. For example, Beijerinckia, Burkholderia, and Methanobacterium were identified as enriched in streams near fracking while Pseudonocardia, Nitrospira, and Rhodobacter were enriched in the streams not near fracking7.
Next generation sequencing of the 16S ribosomal RNA (rRNA) gene is an affordable method of determining bacterial community composition that is faster and cheaper than whole genome sequencing approaches9. A common practice within the field of molecular ecology is to use the highly variable V4 region of the 16S rRNA gene for sequencing resolution, often down to the genus level with a wide scope of identification9, as it is ideal for unpredictable environmental samples. This technique has been implemented widely in published studies and has been successfully utilized to identify the impact of fracking operations on aquatic environments7,8. However, it is worth noting that bacteria have varying copy numbers of the 16S rRNA gene, which affects their detected abundances10. There are a few tools to account for this, but their efficacy is questionable10. Another practice that is quickly growing in prevalence and lacks this weakness is metatranscriptomic sequencing, in which all RNA is sequenced, allowing researchers to identify both active bacteria and their genes expression.
Therefore, in contrast to methods in previously published studies7,8,11,12, this protocol also covers sample collection, preservation, processing, and analysis for investigating microbial community function (metatranscriptomics). The steps detailed herein allow researchers to see what impact, if any, fracking has had on the genes and pathways expressed by microbes in their streams, including antimicrobial resistance genes. Moreover, the level of detail presented for sample collection is improved. Although several of the steps and notes may seem obvious to experienced researchers, they could be invaluable to those just starting research.
Herein, we describe methods for sample collection and processing to generate bacterial genetic data as a means to investigate the impact of fracking on nearby streams based on our labs' several years of experience. These data can be used in downstream applications to identify differences corresponding to fracking status.
1. Collection of sediment samples for nucleic acid extraction
2. Filter collection for nucleic acid extraction
3. Nucleic acid extraction and quantification
4. 16S rRNA library creation
5. DNA 16S rRNA library purification
6. RNA library creation and purification
7. Microbial community analysis
The success of DNA and RNA extractions can be evaluated using a variety of equipment and protocols. Generally, any detectable concentration of either is considered sufficient to conclude that the extraction was successful. Examining Table 1 then, all extractions, except for one, would be dubbed successful. Failure at this step is often due to low initial biomass, poor sample preservation, or human error during extraction. In the case of filters, extraction may have been successful even if the concentration is below detection. If those extracts do not yield bands for PCR (if doing 16S) or a detectable concentration after library preparation (metatranscriptomics), then they likely did truly fail.
If the 16S protocol is followed, bright bands following PCR amplification, as seen in wells 4 and 6 in Figure 1, indicate success, while a lack of bands, as seen in the other wells in the top row, indicates failure. Moreover, a bright band in the gel lane that contains a negative PCR control would also indicate a failure since it would be risky to assume that the contamination impacting the negative control(s) did not affect the samples.
For both 16S and metatranscriptomics, the success of sequencing can be evaluated by looking at the number of sequences obtained (Figure 2). 16S samples should have a minimum of 1,000 sequences, with at least 5,000 being ideal (Figure 2A). Likewise, metatranscriptomics samples should have a minimum of 500,000 sequences, with at least 2,000,000 being ideal (Figure 2B). Samples with fewer sequences than those minimums should not be used for analyses, as they may not accurately represent their bacterial community. However, samples that fall between the minimum and ideal can still be used though results should be interpreted more cautiously if many samples fall in that range.
The success of subsequent downstream analysis can be determined simply on the basis of whether the expected output files were obtained or not. At any rate, programs, such as QIIME2 and R (Figure 3), should allow for the evaluation of potential significant differences among the bacterial communities based on fracking. The data for Figure 3 was obtained by collecting sediment samples from twenty-one different sites at thirteen different streams for 16S and metatranscriptomics analysis. Of those twenty-one sites, twelve of them were downstream of fracking activity and classified as HF+, and nine of them were either upstream of fracking activity or in a watershed where fracking was not occurring; these streams were classified as HF-. Besides the presence of fracking activity, the streams were otherwise comparable.
Those differences could take the form of consistent compositional shifts based on fracking status. If that were the case, HF+ and HF- samples would be expected to cluster apart from each other in a PCoA plot, as is the case in Figure 3A and Figure 3B. To confirm that those apparent shifts are not just an artifact of the ordination method, further statistical analysis is needed. For example, a PERMANOVA22 test on the distance matrix that Figure 3A and Figure 3B are based on revealed significant clustering based on fracking status, meaning that the separation observed in the plot is consistent with differences among the samples' bacterial communities, instead of an artifact of ordination. A significant PERMANOVA or ANOSIM result is a strong indication of consistent differences between HF+ and HF- samples, which would indicate that the HF+ samples were impacted by fracking, while a high p-value would indicate that the samples were not impacted. Metatranscriptomic data can likewise be visualized and evaluated using the same methods.
Examining differential features (microbes or functions) can reveal evidence that samples have been impacted too. One method of determining differential features is to create a random forest model. The random forest model can be used to see how well the samples' fracking status can be correctly classified. If the model performs better than expected by chance, that would be additional evidence of differences dependent on fracking status. Moreover, the most important predictors would reveal which features were most important for correctly differentiating samples (Figure 3C). Those features also then would have had consistently different values based on fracking status. Once those differential features are determined, the literature can be reviewed to see if they have been previously associated with fracking. However, it may be challenging to find studies that determined differential functions, as most have only used 16S rRNA compositional data. Therefore, for evaluating the implications of differential functions, one possible method would be to see if they have been previously associated with potential resistance to biocides commonly used in fracking fluid or if they could aid in tolerating highly saline conditions. Furthermore, examining the functional profile of a taxon of interest could reveal evidence of fracking's impact (Figure 3D). For example, if a taxon is identified as differential by the random forest model, its antimicrobial resistance profile in HF+ samples could be compared to its profile in HF- samples and if they differ greatly, that could suggest that fracking fluid containing biocides entered the stream.
SampleID | Concentration (ng/µL) |
1 | 1.5 |
2 | 1.55 |
3 | 0.745 |
4 | 0.805 |
5 | 7.82 |
6 | 0.053 |
7 | 0.248 |
8 | 0.945 |
9 | 1.82 |
10 | 0.804 |
11 | 0.551 |
12 | 1.69 |
13 | 4.08 |
14 | Below_Detection |
15 | 7.87 |
16 | 0.346 |
17 | 2.64 |
18 | 1.15 |
19 | 0.951 |
Table 1: Example DNA concentrations based on Fluorometer 1x DS DNA high sensitivity assay. Extractions for all these samples, except for 14, would be considered successful due to having detectable amounts of DNA.
Figure 1: Example e-gel with PCR products. The gel was pre-stained and visualized under a UV light, causing any DNA present on it to glow. PCR worked for the samples in wells 4 and 6 in the first row, as they both had one single bright band of the expected size (based on the ladder). PCR for the samples in the other six wells failed, as they did not produce any bands. The positive control (first well, second row) had a bright band, indicating that PCR was performed properly, and the negative controls (wells 6 and 7, second row) did not have any bands, indicating that samples were not contaminated. If a negative had a band as bright as the samples, PCR would have been considered a failure since it would be risky to assume that the samples had amplicons that were not just the result of contamination. Please click here to view a larger version of this figure.
Figure 2: Example sequence counts. (A) 16S example sequence counts. Nearly all these 16S samples had over 1,000 sequences. The very few that had less than 1,000 sequences should be excluded from downstream analyses, as they had insufficient sequences to accurately represent their bacterial communities. Several sequences had between 1,000 and 5,000 sequences; while not ideal, they would still be usable since they exceed the bare minimum, and the majority of samples exceed the ideal minimum of 5,000 as well. (B) Metatranscriptomics example counts. All samples exceeded both the minimum (500,000) and ideal minimum (2,000,000) number of sequences. Therefore, sequencing was successful for all of them, and they could all be used in downstream analysis. Please click here to view a larger version of this figure.
Figure 3: Example analysis. (A) PCoA plot based on coordinates calculated with a Weighted Unifrac distance matrix created and visualized through QIIME2. (B) PCoA plot based on coordinates calculated with the Weighted Unifrac distance matrix exported from QIIME2. The coordinates were visualized using the Phyloseq and ggplot2 packages in R. Metadata vectors were fitted to the plot using the Vegan package. Each point represents a sample's bacterial community, with closer points indicating more similar community compositions. Clustering based on fracking status for these 16S sediment samples was observed (PERMANOVA, p=0.001). Furthermore, the vectors reveal that the HF+ samples tended to have higher levels of Barium, Bromide, Nickel, and Zinc, which corresponded to different bacterial community composition compared to the HF- samples. (C) Plot of best predictors for a random forest model that tested where bacterial abundances could be used to predict fracking status among the samples. The random forest model was created through R using the randomForest package. The top 20 predictors are shown as well as the resulting decreases in impurity (measure of the number of HF+ and HF- samples grouped together) in the form of Mean Decrease in Gini Index when they are utilized to separate samples. (D) Pie chart showing the antimicrobial resistance profile of the Burkholderiales profile based on metatranscriptomic data. Sequences were first annotated with Kraken2 to determine which taxa they belonged to. BLAST was then used with those annotated sequences and the MEGARes 2.0 database to determine which antimicrobial resistance genes (in the form of "MEG_#") were being actively expressed. Antimicrobial resistance genes expressed by members of Burkholderiales were then extracted to see which ones were most prevalent among that taxa. While more costly and time-consuming, metatranscriptomics does allow for functional analyses, such as this which cannot be done with 16S data. Notably, Kraken2 was used for this example analysis, instead of HUMAnN2. Kraken2 is faster than HUMAnN2; however, it only outputs compositional information, instead of composition, contribution, and functions (genes) and pathways like HUMAnN2 does. Please click here to view a larger version of this figure.
Supplementary File: An example metatranscriptomics pipeline. Please click here to download this file.
The methods described in this paper have been developed and refined over the course of several studies published by our group between 2014 and 20187,8,10 and have been employed successfully in a collaborative project to investigate the impacts of fracking on aquatic communities in a three year project that will soon submit a paper for publication. These methods will continue to be utilized over the course of the remainder of the project. Additionally, other current literature investigating the impact of fracking on streams and ecosystems describe similar methods for sample collection, processing, and analysis7,8,10,11. However, none of those papers utilized metatranscriptomic analysis, making this paper the first to describe how those analyses can be used to elucidate fracking's impact on nearby streams. Furthermore, the methods presented here for sample collection are more detailed, as are the steps taken to avoid contamination.
One of the most important steps of our protocol is initial sample collection and preservation. Field sampling and collection comes with certain challenges, as maintaining an aseptic or sterile environment during collection can be difficult. During this step, it is vital to avoid contaminating samples. To do this, gloves should be worn, and only sterile containers and tools should be allowed to come into contact with samples. Samples should also be immediately placed on ice after collection to mitigate nucleic acid degradation. Adding a commercial nucleic acid preservative upon collection can also increase nucleic acid yield and allow samples to be stored for longer periods of time after collection. Whenever nucleic acid extraction is performed, it is important to use the appropriate amount of sample, too much can clog spin filters used for extraction (for those protocols that make use of them) but too little can result in low yields. Be sure to follow the instructions for whichever kit is used.
Similar to field collection, avoiding or minimizing contamination is also important during nucleic acid extraction and sample preparation, especially when working with low nucleic acid yield samples, such as suboptimal sediment samples (samples containing a large amount of gravel or rocks) or water samples. Therefore, as with sample collection, gloves should be worn during all these steps to reduce contamination. Additionally, all work surfaces used during lab procedures should be sterilized beforehand by wiping with a 10% bleach solution, followed by a 70% ethanol solution. For pipetting steps (3-6), filter tips should be used to avoid contamination due to the pipette itself, with tips being changed every time they touch a non-sterile surface. All tools used for lab work, including pipettes, should be wiped down before and after with the bleach and ethanol solutions. To evaluate contamination, extraction blanks and negatives (sterile liquid) should be included during every set of nucleic acid extractions and PCR reactions. If quantification after extractions reveals a detectable amount of DNA/RNA in the negatives, extractions can be repeated if there is sufficient sample left. If negative samples for PCR show amplification, troubleshooting should be performed to determine the source and then the samples should be rerun. To account for low levels of contamination, it is recommended that extraction blanks and PCR negatives be sequenced so that the contaminants can be identified and removed, if necessary, during computational analysis. Conversely, PCR amplification could also fail due to a variety of causes. For environmental samples, inhibition of the PCR reaction is often the culprit, which can be due to a variety of substances interfering with Taq polymerase23. If inhibition is suspected, PCR grade water (see Table of Materials) can be used to dilute the DNA extracts.
This protocol has a few notable limitations and potential difficulties. Sample collection can be challenging for both water and sediment samples. In order to get enough biomass, ideally 1 L of stream water needs to be pushed through a filter. The pores of the filter need to be small to capture microbes but can also trap sediment. If a lot of sediment is in the water due to recent rainfall, the filter can clog making it difficult to push the entire volume through the filter. For sediment collection, it can be challenging to estimate the depth of sediment during collection. Furthermore, it is important to ensure that the sediment collected is predominantly soil, as pebbles and rocks will lead to lower nucleic acid yield and may not be an accurate representation of the microbial community. Lastly, it is vital as well that samples are kept on ice after collection, especially if a preservative is not used.
Though this protocol covers both metatranscriptomics and 16S lab protocols, it should be emphasized that these two methods are very different in both process and in the type of data they provide. The 16S rRNA gene is a commonly targeted region, highly conserved in bacteria and archaea, and useful for characterizing the bacterial community in a sample. Although a targeted and specific approach, species level resolution is often unattainable, and characterizing newly diverged species or strains is difficult. Contrarily, metatranscriptomics is a broader approach that captures all the active genes and microbes present within a sample. Whereas 16S provides only data for identification, metatranscriptomics can provide functional data such as expressed genes and metabolic pathways. Both are valuable and when combined, they can reveal which bacteria are present and which genes they are expressing.
This paper describes methods for field collection and sample processing for both 16S rRNA and metatranscriptomic analyses in the context of studying fracking. Additionally, it details collection methods for high quality DNA/RNA from low biomass samples and for long-term storage. The methods described here are the culmination of our experiences with sample collection and processing in our efforts to learn how fracking impacts nearby streams through examining the structure and function of their microbial communities. Microbes respond quickly to disturbances, and consequently, which microbes are present and the genes they express can provide information about the effects of fracking on ecosystems. Overall, these methods could be invaluable in our understanding of how fracking impacts these important ecosystems.
The authors have nothing to disclose.
The authors would like to acknowledge the funding sources for the projects that led to the development of these methods, with those sources being: the Howard Hughes Medical Institute (http://www.hhmi.org) through the Precollege and Undergraduate Science Education Program, as well as by the National Science Foundation (http://www.nsf.gov) through NSF awards DBI-1248096 and CBET-1805549.
200 Proof Ethanol | Thermo Fisher Scientific | A4094 | 400 mL need to be added to Buffer PE (see Qiagen QIAQuck Gel Extraction kit protocol) and 96 mL needs to be added to the DNA/RNA Wash Buffer (see ZymoBIOMICS DNA/RNA Miniprep kit protocol). Additional ethanol is needed for the ZymoBIOMICS DNA/RNA Miniprep and NEBNext® Ultra™ II RNA Library Prep with Sample Purification Beads kits. |
Agarose | Thermo Fisher Scientific | BP1356-100 | 100 g per bottle. 0.6 g of agarose would be needed to make one 2% 30 mL gel. |
Disinfecting Bleach | Walmart (Clorox) | No catalog number | Use a 10% bleach solution for cleaning the work area before and after lab procedures |
DNA gel loading dye | Thermo Fisher Scientific | R0611 | Each user-made (i.e. non-e-gel) should include loading dye with all of the samples in the ratio of 1 µL dye to 5 µL sample |
DNA ladder | MilliporeSigma | D3937-1VL | A ladder should be run on every gel/e-gel |
DNA/RNA Shield (2x) | Zymo Research | R1200-125 | 3 mL per sediment sample (50 mL conical) and 2 mL per water sample (filter) |
Ethidium bromide | Thermo Fisher Scientific | BP1302-10 | Used for staining user-made e-gels |
Forward Primer | Integrated DNA Technologies (IDT) | 51-01-19-06 | 0.5 µL per PCR reaction |
Isopropanol | MilliporeSigma | 563935-1L | Generally less than 2 mL per library. Volume needed varies by mass of excised gel fragment (see Qiagen QIAQuick Gel Extraction kit protocol). |
PCR-grade water | MilliporeSigma | 3315932001 | 13 µL per PCR reaction (assuming 1 µL of sample DNA template is used) |
Platinum Hot Start PCR Master Mix (2x) | Thermo Fisher Scientific | 13000012 | 10 µL per PCR reaction |
Reverse Primer | Integrated DNA Technologies (IDT) | 51-01-19-07 | 0.5 µL per PCR reaction |
TBE Buffer (Tris-borate-EDTA) | Thermo Fisher Scientific | B52 | 1 L of 10x TBE buffer (30 mL of 1x TBE buffer would be needed to make one 30 mL gel) |
1 L bottle | Thermo Fisher Scientific | 02-893-4E | One needed per stream (the same bottle can be used for multiple streams if it is sterilized between uses) |
1.5 mL Microcentrifuge tubes | MilliporeSigma | BR780400-450EA | 5 microcentrifuge tubes are needed per DNA extraction and an additional 3 are needed to purify RNA (see ZymoBIOMICS DNA/RNA Miniprep kit protocol) |
2% Agarose e-gel | Thermo Fisher Scientific | G401002 | Each gel can run 10 samples (so 9 with a PCR negative and 8 if the extraction negative is run on the same gel) |
50 mL Conicals | CellTreat | 229421 | 1 50 mL conical needed per sediment samples |
500 mL Beaker | MilliporeSigma | Z740580 | Only 1 needed (for flame sterilization) |
Aluminum foil | Walmart (Reynolds KITCHEN) | No number | Aluminum foil can be folded and autoclaved. The part not exposed to the environment can then be used as a sterile, DNA and RNA free surface for processing filters (one folded piece per filter to avoid cross-contamination) |
Autoclave | Gettinge | LSS 130 | Only one needed |
Centrifuge | MilliporeSigma | EP5404000138-1EA | Only 1 needed |
Cooler | ULINE | S-22567 | Just about any cooler can be used. This one is listed due to being made of foam, making it lighter and thus easier to take along for field sampling. |
Disruptor Genie | Bio-Rad | 3591456 | Only one needed |
Electrophoresis chamber | Bio-Rad | 1664000EDU | Only 1 needed |
Electrophoresis power supply | Bio-Rad | 1645050 | Only 1 needed |
Freezer (-20 C) | K2 SCIENTIFIC | K204SDF | One needed to store DNA extracts |
Freezer (-80 C) | K2 SCIENTIFIC | K205ULT | One needed to store RNA extracts |
Gloves | Thermo Fisher Scientific | 19-020-352 | The catalog number is for Medium gloves. |
Heat block | MilliporeSigma | Z741333-1EA | Only one needed |
Lab burner | Sterlitech | 177200-00 | Only one needed |
Laminar Flow Hood | AirClean Systems | AC624LFUV | Only 1 needed |
Library purification kit | Qiagen | 28704 | One kit has enough for 50 reactions |
Magnet Plate | Alpaqua | A001219 | Only one needed |
Microcentrifuge | Thermo Fisher Scientific | 75004061 | Only one needed |
Micropipette (1000 µL volume) | Pipette.com | L-1000 | Only 1 needed |
Micropipette (2 µL volume) | Pipette.com | L-2 | Only 1 needed |
Micropipette (20 µL volume) | Pipette.com | L-20 | Only 1 needed |
Micropipette (200 µL volume) | Pipette.com | L-200R | Only 1 needed |
NEBNext Ultra II RNA Library Prep with Sample Purification Beads | New England BioLabs Inc. | E7775S | One kit has enough reagents for 24 samples. |
Parafilm | MilliporeSigma | P7793-1EA | 2 1" x 1" squares are needed per filter |
PCR Tubes | Thermo Fisher Scientific | AM12230 | One tube needed per reaction |
Pipette tips (for 1000 µL volume) | Pipette.com | LF-1000 | Pack of 576 tips |
Pipette tips (for 20 µL volume) | Pipette.com | LF-20 | Pack of 960 tips |
Pipette tips (for 200 µL volume) | Pipette.com | LF-250 | Pack of 960 tips |
PowerWulf ZXR1+ computer cluster | PSSC Labs | No number | This is just an example of a supercomputer powerful enough to perform metatranscriptomics analysis in a timely manner. Only one needed. |
Qubit fluorometer starter kit | Thermo Fisher Scientific | Q33239 | Comes with a Qubit 4 fluorometer, enough reagent for 100 DNA assays, and 500 Qubit tubes |
Scoopula | Thermo Fisher Scientific | 14-357Q | Only one needed |
Sterile blades | AD Surgical | A600-P10-0 | One needed per filter |
Sterivex-GP Pressure Filter Unit | MilliporeSigma | SVGP01050 | 1 filter needed per water sample |
Thermocycler | Bio-Rad | 1861096 | Only one needed |
Vise-grip | Irwin | 2078500 | Only one needed (for cracking open the filters) |
Vortex-Genie 2 | MilliporeSigma | Z258415-1EA | Only 1 needed |
WHIRL-PAK bags | ULINE | S-22729 | 1 needed per filter |
ZymoBIOMICS DNA/RNA Miniprep kit | Zymo Research | R2002 | One kit has enough reagents for 50 samples. |