Metagenomics was used to investigate the microbiome of silage cattle feed. Analysis was performed by shotgun sequencing. This approach was used to characterize the composition of the microbial community within the cattle feed.
Metagenomics is defined as the direct analysis of deoxyribonucleic acid (DNA) purified from environmental samples and enables taxonomic identification of the microbial communities present within them. Two main metagenomic approaches exist; sequencing the 16S rRNA gene coding region, which exhibits sufficient variation between taxa for identification, and shotgun sequencing, in which genomes of the organisms that are present in the sample are analyzed and ascribed to “operational taxonomic units”; species, genera or families depending on the extent of sequencing coverage.
In this study, shotgun sequencing was used to analyze the microbial community present in cattle silage and, coupled with a range of bioinformatics tools to quality check and filter the DNA sequence reads, perform taxonomic classification of the microbial populations present within the sampled silage, and achieve functional annotation of the sequences. These methods were employed to identify potentially harmful bacteria that existed within the silage, an indication of silage spoilage. If spoiled silage is not remediated, then upon ingestion it could be potentially fatal to the livestock.
Metagenomics is the direct analysis of DNA purified from biological communities found within environmental samples 1 and was originally used to detect unculturable bacteria found in sediments 2. Metagenomics has been widely used for a number of applications, such as identifying the human microbiome 3, classifying microbial populations within the ocean 4 and even for the analysis of the bacterial communities that develop on coffee machines 5. The introduction of next generation sequencing technologies resulted in greater sequencing throughput and output. Consequently, DNA sequencing has become more economical 6 and the depth of sequencing that can be performed has greatly increased, enabling metagenomics to become a powerful, analytical tool.
"Front-end" enhancements in the practical, molecular aspect of metagenomic sequencing have driven the growth of the in silico bioinformatics tools available for the taxonomic classification 7-9, functional annotation 10,11 and visual representation 12,13 of DNA sequence data. The increasing number of available, sequenced prokaryotic and eukaryotic 14 genomes allows further accuracy in the classification of microbial communities, which are invariably performed against a "back-end" reference database of sequenced genomes 15. Two main approaches can be adopted for metagenomic analysis.
The more conventional method is analysis of the 16S rRNA gene coding region of bacterial genome. The 16S rRNA is highly conserved between prokaryote species but exhibits nine hyper-variable regions (V1 – V9) which can be exploited for species identification 16. The introduction of longer sequencing (≤ 300 bp paired end) allowed for the analysis of DNA sequences spanning two hyper-variable regions, in particular the V3 – V4 region 17. Advances in other sequencing technologies, such as Oxford Nanopore 18 and PacBIO 19, do allow the entire 16S rRNA gene to be sequenced contiguously.
While 16S rDNA based libraries provide a targeted approach to species identification and enable the detection of low copy number DNA that naturally occurs within purified samples, shotgun sequencing libraries allow for the detection of species that may contain DNA regions that are either not amplifiable by the 16S rRNA marker primer sequences used, or because the differences between the template sequence and the amplifying primer sequence are too great 20,21. Furthermore, although DNA polymerases have a high fidelity of DNA replication, base errors can nonetheless occur during PCR amplification and these incorporated errors can result in incorrect classification of originating species 22. Biases in the PCR amplification of template sequences can also occur; sequences of DNA with a high GC content can be under represented in the final amplicon pool 23 and similarly unnatural base modifications, such as thymine glycol, can halt DNA polymerases causing failures in the amplification of DNA sequences 24. In contrast, a shotgun sequencing DNA library is a DNA library that has been prepared by using all of the purified DNA that has been extracted from a sample and subsequently fragmented into shorter DNA chain lengths prior to preparation for sequencing. Taxonomic classification of DNA sequences generated by shotgun sequencing is more accurate when compared to 16S rRNA amplicon sequencing 25, although the financial cost required to reach a reliable sequencing depth is greater than that of amplicon sequencing 26. The major benefit of shotgun sequencing metagenomics is that sequenced regions of the various genomes in the sample are available for gene prospecting once they have been has been taxonomically classified 27.
Metagenomic sequence data is analyzed by an ever-increasing range of bioinformatic tools. These tools are able to perform a wide variety of applications, for example, quality control analysis of the raw sequence data 28, overlapping of paired end reads 29, de novo assembly of sequence reads to contigs and scaffolds 30,31, taxonomic classification and visualization of sequence reads and assembled sequences 7,12,32,33 and the functional annotation of assembled sequences 34,35.
Silage, produced by farmers throughout the world from fermented cereals such as maize (Zea mays), is predominately used as cattle feed. Silage is treated with the bacterium Lactobacillus sp. to aid fermentation 36 but to date, there is limited knowledge of the other microbial populations found in silage. The fermentation process can lead to undesirable and potentially harmful micro-organisms becoming prevalent within the silage 37. In addition to yeasts and molds, bacteria are particularly adaptable to the anaerobic environment in fermenting silage and are more frequently associated with diseases in livestock rather than the degradation of the silage 38. Butyric acid bacteria can be inadvertently added from soil remains when filling the silage silos and are able to convert the lactic acid, a product of anaerobic digestion, to butyric acid, thus increasing the pH of the silage 39. This increase in pH can lead to an upsurge in spoilage bacteria that would normally be unable to sustain growth under optimum silage fermentation conditions 38. Clostridium spp., Listeria spp. and Bacillus spp. are of particular concern, especially in silage for dairy cattle feed, as bacterial spores that have survived the gastrointestinal tract 40 can enter the food-chain, lead to food spoilage and, in rare cases, to animal and human fatalities 37,39,41-44. Moreover, while it is difficult to estimate the exact economic impact of veterinary treatment and livestock loss caused by silage spoilage, it is likely to be detrimental to a farm if an outbreak was to occur.
It is hypothesized that by using a metagenomic approach we can classify the microbial populations that are present in silage samples and furthermore identify microbial communities associated with silage spoilage that would, in turn, potentially have a detrimental effect on the livestock, enabling remedial action to be taken before the silage is to be used as a food source.
1. Site Location
2. DNA Extraction
NOTE: DNA extraction was performed using a commercial kit following the manufacturer's instructions. A negative control, which contained no sample, was used throughout the library preparation method.
3. DNA Purification Using DNA Purification Beads
NOTE: Prior to metagenomic library preparation the extracted DNA was purified using purification beads to ensure a pure DNA sample was obtained.
4. Quantification of Purified DNA
NOTE: Purified DNA was quantified using a fluorometer and double-stranded (dsDNA) High Sensitivity (HS) assay kit following the manufacturer's instructions.
5. Shotgun Sequencing Library Preparation
NOTE: The shotgun sequencing library was prepared using a commercial library preparation kit using the manufacturer's instructions.
6. Library Quantity and Quality Check
NOTE: The quantity and quality of the prepared libraries were assessed using a commercial kit and instrumentation.
7. DNA Sequencing
8. Analysis of Raw Sequence Data
NOTE: The commands for each program using a Linux operating system are shown below the protocol step. The pipeline used for sequence data analysis is shown in Figure 1. The programs are to be installed by the user prior to analysis. This process should be performed individually for each sample.
9. Quality Control Trimming and Filtering Sequence Data
10. Metagenome Assembly
11. Paired-end Read Overlap
12. Taxonomic Classification
13. Functional Annotation
14. Visualizing CAZy Annotation
Prior to bioinformatic processing, raw sequence reads were trimmed and adapters were removed using Trimmomatic software 28. After the trimming and filtering step, the number of reads was reduced to 50% of the sequence reads (Table 1). The average base phred score was >30 after quality control (Figure 2).
Pairs of DNA sequences which had overlapping regions were merged using FLASH software 29 to generate single longer reads, non-overlapping reads were kept in a separate file. 45.47% reads (105,343) combined successfully. Following the overlapping of reads using FLASH of reads, the resulting extended fragments underwent bacterial taxonomic classification using Kraken software 7 and were subsequently visualized with Krona software (Figure 3).
The majority of the bacterial species present in the silage metagenome are found within 4 prokaryotic phyla: Firmicutes (34%), Actinobacteria (28%), Proteobacteria (27%) and Bacteroidetes (7%). The distribution of classes present within these phyla can be seen in Figure 4. The most abundant species in the metagenome were Lactobacillus spp. (24%; Firmicutes), Corynebacterium spp. (8%; Actinobacteria), Propionibacterium spp. (3%; Actinobacteria) and Prevotella spp. (3%; Bacteroidetes). Species important to animal health and implicated in disease were also observed; Clostridium spp. (1%) Bacillus spp. (0.6%), Listeria spp. (0.2%) were predicted to be present in the silage sample.
Functional annotation was performed on assembled reads. The metagenome was assembled using the SPAdes assembler 30 using the trimmed and filtered paired-end and unpaired reads generating 92,284 scaffolds. In order to identify cellulases, proteins were predicted using MG-RAST and annotated using the Carbohydrate-Active enZYmes Database (CAZy). Of the 97,562 predicted proteins, 6357 were annotated as a putative carbohydrate-active enzyme in one of the five enzymes classes that make up the CAZy database (Figure 5). Results were visualized as a Venn diagram using InteractiVenn software 50 showing the distribution of protein annotations including those containing more than one CAZy enzyme class annotation. Of these, 3861 were predicted to have glycoside hydrolase activity and will be further characterized in the laboratory to confirm function.
Figure 1: Bioinformatic Metagenomics Pipeline for the Analysis of Silage. Two main approaches were used to investigate the microbiome of silage, taxonomic classification and functional annotation. Please click here to view a larger version of this figure.
Figure 2: Sequence Quality Per-base Before and After Trimming and Adapter Removal. The per-base sequence quality plot from FASTQC shows the average phred score across the length of the sequence reads pre- and post- quality control. Please click here to view a larger version of this figure.
Figure 3: Taxonomic Classification of the Bacterial Microbiome of Solid Silage. Classification of trimmed and overlapping sequence reads from FLASH was performed using Kraken 7 and subsequently visualized with krona. Please click here to view a larger version of this figure.
Figure 4: Taxonomic Class Distribution of the 4 Most Abundant Phyla in the Bacterial Microbiome of Solid Silage. The percentage of each class of bacteria within the four most abundant phyla. Firmicutes: Clostridia (red) and Bacilli (dark blue); Proteobacteria: delta/epsilon (pink), alpha (pale blue), gamma (orange) and beta (turquoise); Bacteroidetes: Flavobacteriia (dark blue) and Bacteroidia (pale green); Actinobacteria: Coriobacteriia (dark purple) and other Actinobacteria (dark green). Please click here to view a larger version of this figure.
Figure 5: CAZy Annotation of the Predicted Proteome in the Solid Silage Microbiome. Venn diagram showing the distribution of the five enzyme classes of CAZy annotations in the predicted proteome of solid silage microbiome. Please click here to view a larger version of this figure.
# Raw reads | # Filtered reads (paired) | # Filtered reads | # FLASHed reads |
(paired) | (unpaired) | ||
2,374,949 x2 | 231,679 x2 | 1,892,534 | 105,343 |
Table 1: Summary Table of Sequencing Reads.
While an in silico analysis can give an excellent insight to the microbial communities that are present within environmental samples, it is critical that the taxonomic classifications demonstrated be performed in association with relevant controls and that a suitable depth of sequencing has been achieved to capture the entire population present 51.
With any computational analysis, there are many routes to achieve a similar goal. The methods that we have used in this study are examples of suitable and straightforward methods, that have been brought together to achieve a range of analyses on the silage microbiome. A variety and an ever-increasing number of bioinformatics tools and techniques are available to analyze metagenomic data, for instance Phylosift 8 and MetaPhlAn2 52, and these should be evaluated prior to the investigation for their relevance to the sample and the analysis required 53. Metagenomic analysis methods are limited by the databases for available for classification, sequencing depth and the quality of sequencing.
The bioinformatic processing demonstrated here was performed on a local, high powered machine; however cloud-based systems are also available. These cloud-based services allow for the rental of the necessary computational power without having the high-cost investment of a suitable powerful local workstation. A potential application of this method would be to assess silage before its use in agriculture to ensure that no potentially harmful bacteria are present therefore preventing them from entering the food chain.
The authors have nothing to disclose.
Authors would like to thank Andrew Bird for the silage samples and Audrey Farbos of the Exeter Sequencing Service for her assistance in preparing DNA sequencing libraries. Exeter Sequencing Service and Computational core facilities at the University of Exeter. Medical Research Council Clinical Infrastructure award (MR/M008924/1). Wellcome Trust Institutional Strategic Support Fund (WT097835MF), Wellcome Trust Multi User Equipment Award (WT101650MA) and BBSRC LOLA award (BB/K003240/1).
FastDNA SPIN Kit for Soil | MP Bio | 116560200 | DNA Extraction |
DNA FastPrep | MP Bio | 116004500 | DNA Extraction |
Agencourt AMPure XP beads | Beckman Coulter | A63880 | DNA Purification |
Elution Buffer | Qiagen | 19806 | DNA Purification |
Qubit Fluorometer | Thermo Fisher | Q33216 | DNA Quantification |
Qubit dsDNA HS Assay Kit | Thermo Fisher | Q32854 | DNA Quantification |
Nextera XT DNA Library Prep Kit | Illumina | FC-131-1024 | Library Preparation |
Nextera XT Index Kit | Illumina | FC-131-1001 | Library Preparation |
TapeStation 2200 | Agilent | G2964AA | DNA Quantification |
HS D100 ScreenTape | Agilent | 5067-5584 | DNA Quantification |
HS D100 ScreenTape Reagents | Agilent | 5067-5585 | DNA Quantification |
TapeStation Tips | Agilent | 5067-5153 | DNA Quantification |
TapeStation Tubes | Agilent | 401428 and 401425 | DNA Quantification |
HiSeq 2500 | Illumina | DNA Sequencing – provided by a sequencing service | |
High Power Analysis Workstation | Various | Local or cloud based, user preferred system |