The feasibility of whole-genome sequencing (WGS) strategies using benchtop instruments has simplified the genome interrogation of every microbe of public health relevance in a lab setting. A methodological adaptation of the workflow for bacterial WGS is described and a bioinformatics pipeline for analysis is also presented.
Aquaculture is one of the fastest-growing food-producing sectors worldwide and tilapia (Oreochromis spp.) farming constitutes the major freshwater fish variety cultured. Because aquaculture practices are susceptible to microbial contamination derived from anthropogenic sources, extensive antibiotic usage is needed, leading to aquaculture systems becoming an important source of antibiotic-resistant and pathogenic bacteria of clinical relevance such as Escherichia coli (E. coli). Here, the antimicrobial resistance, virulence, and mobilome features of a pathogenic E. coli strain, recovered from inland farmed Oreochromis spp., were elucidated through whole-genome sequencing (WGS) and in silico analysis. Antimicrobial susceptibility testing (AST) and WGS were performed. Furthermore, phylogenetic group, serotype, multilocus sequence typing (MLST), acquired antimicrobial resistance, virulence, plasmid, and prophage content were determined using diverse available web tools. The E. coli isolate only exhibited intermediate susceptibility to ampicillin and was characterized as ONT:H21-B1-ST40 strain by WGS-based typing. Although only a single antimicrobial resistance-related gene was detected [mdf(A)], several virulence-associated genes (VAGs) from the atypical enteropathogenic E. coli (aEPEC) pathotype were identified. Additionally, the cargo of plasmid replicons from large plasmid groups and 18 prophage-associated regions were detected. In conclusion, the WGS characterization of an aEPEC isolate, recovered from a fish farm in Sinaloa, Mexico, allows insights into its pathogenic potential and the possible human health risk of consuming raw aquacultural products. It is necessary to exploit next-generation sequencing (NGS) techniques for studying environmental microorganisms and to adopt a one health framework to learn how health issues originate.
Aquaculture is one of the fastest-growing food-producing sectors worldwide, and its production practices are intended to satisfy the rising food demand for human consumption. Global aquaculture production has tripled from 34 million tonnes (Mt) in 1997 to 112 Mt in 20171. The main species groups, contributing to nearly 75% of the production, were seaweed, carps, bivalves, catfish, and tilapia (Oreochromis spp.)1. However, the appearance of diseases caused by microbial entities is unavoidable because of intensive fish farming, leading to potential economic losses2.
Antibiotic usage in fish farming practices is well known for preventing and treating bacterial infections, the main limiting factor in productivity3,4. Nonetheless, residual antibiotics accumulate in aquaculture sediments and water, exerting selective pressure and modifying the fish-associated and the residing bacterial communities5,6,7,8. Consequently, the aquaculture environment serves as a reservoir for antimicrobial resistance genes (ARGs), and the further emergence and spread of antibiotic-resistant bacteria (ARB) in the surrounding milieu9. In addition to the bacterial pathogens commonly observed affecting fish farming practices, members of the Enterobacteriaceae family are often encountered, including human pathogen strains of Enterobacter spp., Escherichia coli, Klebsiella spp., and Salmonella spp.10. E. coli is the most common microorganism isolated from fish meal and water in fish farming11,12,13,14,15.
E. coli is a versatile gram-negative bacterium that inhabits the gastrointestinal tract of mammals and birds as a commensal member of their intestinal microbiota. However, E. coli possess a highly adaptive capacity to colonize and persist in different environmental niches, including soil, sediments, food, and water16. Because of the gene gain and loss through the horizontal gene transfer (HGT) phenomenon, E. coli has rapidly evolved into a well-adapted antibiotic-resistant pathogen, able to cause a broad spectrum of diseases in humans and animals17, 18. Based on the isolation origin, pathogenic variants are defined as intestinal pathogenic E. coli (InPEC) or extra-intestinal pathogenic E. coli (ExPEC). Furthermore, InPEC and ExPEC are subclassified into well-defined pathotypes according to disease manifestation, genetic background, phenotypic traits, and virulence factors (VFs)16,17,19.
Traditional culture and molecular techniques for pathogenic E. coli strains have allowed the rapid detection and identification of different pathotypes. However, they may be time-consuming, laborious, and frequently require high technical training19. Furthermore, no single method can be used to reliably study all pathogenic variants of E. coli because of the complexity of their genetic background. Currently, these drawbacks have been overcome with the advent of high-throughput sequencing (HTS) technologies. Whole-genome sequencing (WGS) approaches and bioinformatic tools have improved the exploration of microbial DNA affordably and at a large scale, facilitating the in-depth characterization of microbes in a single run, including closely related pathogenic variants20,21,22. Depending on the biological questions, several bioinformatics tools, algorithms, and databases can be used to perform data analysis. For instance, if the main goal is to assess the presence of ARGs, VFs, and plasmids, tools such as ResFinder, VirulenceFinder, and PlasmidFinder, along with their associated databases, might be a good starting point. Carriço et al.22 provided a detailed overview of the different bioinformatics software and related databases applied for microbial WGS analysis, from raw data preprocessing to phylogenetic inference.
Several studies have demonstrated the broad utility of WGS for genome interrogation regarding antimicrobial resistance attributes, pathogenic potential, and tracking of the emergence and evolutionary relationships of clinically relevant variants of E. coli sourced from diverse origins23,24,25,26. WGS has enabled the identification of molecular mechanisms underlying the phenotypic resistance to antimicrobials, including those rare or complex resistance mechanisms. This is through detecting acquired ARG variants, novel mutations in drug-target genes, or promoter regions27,28. Moreover, WGS offers the potential to infer antimicrobial resistance profiles without requiring prior knowledge about the resistance phenotype of a bacterial strain29. Alternatively, WGS has allowed the characterization of the mobile genetic elements (MGEs) carrying both antimicrobial resistance and virulence features, which has driven the bacterial genome evolution of existing pathogens. For instance, the application of WGS during the investigation of the German E. coli outbreak in 2011 resulted in uncovering the unique genomic features of an apparently novel E. coli pathotype; interestingly, those outbreak strains originated from the enteroaggregative E. coli (EAEC) group, which acquired the prophage encoding the Shiga toxin from the enterohemorrhagic E. coli (EHEC) pathotype30.
This work presents a methodological adaptation of the workflow for bacterial WGS using a benchtop sequencer. Moreover, a bioinformatics pipeline is provided using web-based tools to analyze the resulting sequences and further support researchers with limited or no bioinformatics expertise. The described methods allowed elucidation of the antimicrobial resistance, virulence, and mobilome features of a pathogenic E. coli strain ACM5, isolated in 2011 from inland farmed Oreochromis spp. in Sinaloa, Mexico12.
NOTE: The E. coli strain ACM5 was recovered by processing and culturing the fish sample for fecal coliform (FC) determination12. During the fish sampling, fish did not show clinical signs of disease, bacterial, or fungal infection, and a mean temperature of 22.3 °C prevailed. After isolation, the E. coli isolate was subjected to biochemical testing and cryopreserved in brain heart infusion (BHI) broth with DMSO (8% v/v) as a cryoprotective agent.
1. Reactivation of frozen E. coli ACM5 stock culture
2. Determination of antibacterial susceptibility
NOTE: The antimicrobial susceptibility testing described here corresponds to the disk diffusion method based on the Clinical and Laboratory Standards Institute (CLSI) guidelines (M02 Ed13:2018)31. E. coli ATCC 25922 strain is required for quality control purposes.
3. Genomic DNA (gDNA) extraction and quantification
4. DNA library preparation
NOTE: DNA library preparation and sequencing were performed following the manufacturer's guidelines and protocols (see Table of Materials). The starting gDNA concentration is 4.0 ng.
5. Library pooling, denaturalizing, and sequencer initiating
6. Sequence data analysis
NOTE: Check Supplementary File 1 for further description of the general WGS data preprocessing, software, parameter settings, and sequence analysis of the E. coli genome.
The antimicrobial susceptibility was determined by the disk diffusion method and interpreted by CLSI breakpoint criteria for 12 antibiotics spanning six distinct antimicrobial classes, that is, aminoglycosides, β-lactams, fluoroquinolones, nitrofurans, phenicols, and folate pathway antagonists. The E. coli ACM5 exhibited sensitivity to all antibiotics except one β-lactam drug. Four β-lactam drugs were tested: ampicillin, carbenicillin, cephalothin, and cefotaxime. Among these, a 14 mm inhibition halo for ampicillin was measured. Therefore, according to the CLSI interpretative categories for ampicillin (susceptible: ≥17 mm; intermediate: 14-16 mm; resistant: ≤13 mm) 32, E. coli ACM5 shows an intermediate susceptibility to ampicillin.
The E. coli ACM5 was subjected to WGS using the benchtop sequencer. Therefore, the DNA sample was prepared, multiplexed, and sequenced following the protocol described. Overall, the sequencing run held with mid-output configuration yielded 3.37 Gb of data with an error rate of 0.73%, and 89.71% of the bases scored a quality above Q3033. Particularly, the sequencing of E. coli ACM5 generated a total of 1,490,594 paired end (PE) reads. The raw sequence data from this study were deposited in the sequence read archive (SRA) database at the national center for biotechnology information (NCBI) under BioProject number PRJNA715781.
After the initial quality check and filtering, 96.8% of the reads were conserved from raw sequencing data and assembled into scaffolds with a depth of coverage of 38x. The scaffold-level genome assembly generated 83 scaffolds (>300 bp), 5,272,433 bp in length and with 50.61% GC content. Genome annotation revealed 5,633 encoded features, of which 5,524 are protein-coding genes (CDSs) and 109 are RNA-related sequences (Figure 1). Additionally, genome annotation grouped CDSs into collections of functionally related proteins called subsystems, especially those playing functional roles in carbohydrates, proteins, amino acids and derivative metabolisms, and membrane transport (Figure 2).
In silico typing predicted E. coli ACM5 as an O-nontypeable (ONT) strain assigned to the ONT:H21 serotype. Additionally, it showed that it belongs to the phylogroup B1 and sequence type (ST) 40. A single acquired antimicrobial resistance determinant coding for a multidrug efflux pump of broad-spectrum was identified, that is, the mdf(A) gene (Figure 1).
Regarding virulence traits, several VAGs featured in the E. coli genome under study, including genes encoding for a heat-stable enterotoxin (astA), a serum resistance protein (iss), and the type III secretion system (T3SS) along with its secreted effector proteins (Figure 1). The bundle-forming pili (BFP) operon and the perABC gene cluster were not evidenced. Consequently, according to the predicted virulence profile, E. coli ACM5 was assigned to the aEPEC pathotype.
WGS analysis also revealed two putative large plasmids belonging to different incompatibility (Inc) groups in the E. coli genome (i.e., the IncF and IncI plasmid groups). However, both lacked ARGs and VAGs. Moreover, 18 prophage-associated regions were identified; however, in contrast to plasmids, those harbor VAGs, including the iss gene, the sitABCD operon encoding for an iron/manganese transporter, and a few T3SS effector proteins (Figure 1).
Figure 1: Circular map of the draft genome of E. coli ACM5. Data from the outermost to the innermost circles are declared and colored as follows. Circle 1: Genome size scale in base pairs. Circles 2 and 3: Annotated CDSs transcribed on the forward (blue) and reverse (red) DNA strand, respectively. Circles 4 and 5: Ribosomal RNAs (black) and transfer RNAs (green), respectively. Circle 6: 83 scaffolds of the draft genome (gray). Circle 7: Annotated features: antimicrobial resistance determinants (red), virulence-associated genes (purple), and prophage regions (aqua). Circle 8: %GC content (black). Circle 9: GC skew. Please click here to view a larger version of this figure.
Figure 2: Subsystem distribution of aEPEC strain ACM5. The analysis is based on RAST SEED annotation. The pie chart organizes the subsystems by cellular process, and the protein-coding genes involved in the respective cellular process are indicated in parenthesis. Please click here to view a larger version of this figure.
Supplementary File 1. Description of WGS data preprocessing, software, parameter settings, and sequence analysis of the E. coli ACM5 genome. Please click here to download this File.
This study presents an adaptation of the bacterial WGS workflow using a benchtop sequencer and a pipeline for genomic characterization of a pathogenic E. coli variant. Depending on the sequencing platform used, the turnaround times (TATs) for wet laboratory procedures (bacterial culturing, gDNA extraction, library preparation, and sequencing) and sequence analysis could vary, particularly if slow-growing bacteria are studied. Following the protocol for WGS described above, the TAT was within 4 days, which is comparable to what the literature currently states (<5 days) 34.
The theoretical depth of coverage for genome sequencing of E. coli ACM5 was calculated at 30x. Nonetheless, the sequencing generated empirical data for the genome assembly at 38x depth. As previously recommended35, this was sufficient to get an entire genome representation and follow downstream data analysis regarding the antimicrobial resistance, virulence, and mobilome features of E. coli ACM5. Respecting antimicrobial resistance, only the acquired determinants were investigated. E. coli ACM5 is an ampicillin-resistant isolate, but only the MdfA multidrug efflux pump encoding gene was observed. However, this does not confer resistance to β-lactams36, and no other acquired determinant involved in β-lactam resistance was identified. Hence, it is likely that a combination of other molecular mechanisms, such as decreased cell membrane permeability and increased expression of distinct efflux pump systems, is underlying the observed ampicillin resistance37.
Regarding virulence traits, the sole presence of the intimin-encoding gene (eae) is the hallmark genetic marker of aEPEC strains. The aEPEC pathotype is a subset of intestinal pathogenic E. coli strain carrying the locus of enterocyte effacement (LEE) pathogenicity island, where a T3SS and associated effector/translocator proteins are encoded, including Cif, EspA/B/D, EspF, intimin, and the translocated intimin receptor (Tir). The interplay between T3SS and its effectors is directly involved in the ability to promote the attaching and effacing (A/E) lesions on intestinal epithelial cells by typical enteropathogenic E. coli (EPEC) and EHEC, both pathotypes known as human foodborne illness causative agents17,19. The astA gene encodes a heat-resistant enterotoxin named EAST1, and although it was first recognized and associated with EAEC strains, the EAST1 toxin is widely distributed among E. coli pathotypes38. Here, WGS demonstrated that E. coli ACM5 possesses the astA gene, and despite the fact that EAST1-producing E. coli isolates have been associated with diarrheal illness in humans and animals, its pathogenic role in eliciting diarrhea remains controversial39.
The key limitation of this methodological approach should be acknowledged; the initial contig-level assembly resulted in a fragmented draft genome, and therefore was further subjected to scaffolding against a closer reference genome. The implemented sequencing read length configuration (2 x 150 bp) and the presence of vast, repetitive elements throughout the genome of E. coli are the most presumable explanations for the fragmented contig-level assembly. Short-read datasets cannot correctly resolve, and therefore reconstruct large repetitive elements, causing potential misassemblies and several breakpoints during the assembly process. Hence, the implementation of long-read sequencing technologies would aid in overcoming these limitations and obtaining closed genomes40.
Several critical steps must be considered throughout the WGS protocol. High-quality gDNA is the first checkpoint required to ensure high-quality sequencing data. Secondly, in the library preparation, extra caution must be taken during the tagmentation process, especially in the handling of multiple samples and the addition of index primers, to avoid cross-contamination, and in controlling the incubation time, to properly fragment the DNA samples. Moreover, quantification and normalization steps are essential to eliminate any bias that DNA libraries could introduce into the final sequencing data, since an incorrect equimolar ratio in the pooled library can favor a substantially unequal distribution in the number of reads per sample. The method presented here is straightforward and represents a cost-effective alternative, principally, for the optimal usage of the reagents contained in the DNA library preparation kits. The protocol for WGS described above has been applied not only to E. coli genome sequencing but also to other unrelated bacterial species such as Vibrio parahaemolyticus41. According to the Food and Agriculture Organization of the United Nations (FAO) and the World Health Organization (WHO), NGS methodologies can play a determining role in food safety by providing rapid identification and characterization of microorganisms and antimicrobial resistance (AMR) with accuracy that was previously not possible42. Therefore, these methodologies constitute a tool for surveillance and source tracking of pathogens, not only in the clinical context but also in the risk assessment of foodborne pathogens, revealing insights into the ecological and physiological properties of these microorganisms43.
The authors have nothing to disclose.
To the National Council of Science and Technology of Mexico (CONACyT by its acronym in Spanish) for the Doctoral scholarship awarded to José Antonio Magaña-Lizárraga [No. 481143].
Accublock Mini digital dry bath | Labnet | D0100 | Dry bath for incubation of tubes |
Agencourt AMPure XP | Beckman Coulter | A63881 | Magnetic beads in solution for DNA library purification |
DeNovix DS-11 | DeNovix Inc. | UV-Vis spectophotometer to check the quality of the gDNA extracted | |
DNA LoBind Tubes | Eppendorf | 0030108418 | 1.5 mL PCR tubes for DNA library pooling |
DynaMag-2 Magnet | Invitrogen, Thermo Fisher Scientific | 12321D | Magnetic microtube rack used during magnetic beads-based DNA purification |
Gram-negative Multibac I.D. | Diagnostic reseach (Mexico) | PT-35 | Commercial standard antibiotic disks for antimicrobial susceptibility testing |
MiniSeq Mid Output Kit (300-cycles) | Illumina | FC-420-1004 | Reagent cartdrige for paired-end sequencing (2×150) |
MiniSeq System Instrument | Illumina | SY-420-1001 | Benchtop sequencer used for Next-generation sequencing |
MiniSpin centrifuge | Eppendorf | 5452000816 | Standard centrifuge for tubes |
Nextera XT DNA Library Preparation Kit | Illumina | FC-131-1024 | Reagents to perform DNA libraries for sequencing. Includes Box 1 and Box 2 reagents for 24 samples |
Nextera XT Index Kit v2 | Illumina | FC-131-2001, FC-131-2002, FC-131-2003, FC-131-2004 | Index set A, B, C, D |
PhiX Control v3 | Illumina | FC-110-3001 | DNA library control for sequencing |
Precision waterbath | LabCare America | 51221081 | Water bath shaker used for bacterial culture |
Qubit 1X dsDNA HS Assay Kit | Invitrogen, Thermo Fisher Scientific | Q33231 | Reagents for fluorescence-based DNA quantification assay |
Qubit 2.0 Fluorometer | Invitrogen, Thermo Fisher Scientific | Q32866 | Fluorometer used for fluorescence assay |
Qubit Assay tubes | Invitrogen, Thermo Fisher Scientific | Q32856 | 0.5 mL PCR tubes for fluorescence-based DNA quantification assay |
SimpliAmp Thermal Cycler | Applied Biosystems, Thermo Fisher Scientific | A24811 | Thermocycler used for DNA library amplification |
Spectronic GENESYS 10 Vis | Thermo | 335900 | Spectophotometer used for bacterial suspension in antimicrobial susceptibility testing |
ZymoBIOMICS DNA Miniprep Kit | Zymo Research Inc. | D4300 | Kit for genomic DNA extraction (50 preps) |