This protocol details a comprehensive approach for the culturing, sequencing, and de novo hybrid genome assembly of urinary bacteria. It provides a reproducible procedure for the generation of complete, circular genome sequences useful in studying both chromosomal and extrachromosomal genetic elements contributing to urinary colonization, pathogenesis, and antimicrobial resistance dissemination.
Complete genome sequences provide valuable data for the understanding of genetic diversity and unique colonization factors of urinary microbes. These data may include mobile genetic elements, such as plasmids and extrachromosomal phage, that contribute to the dissemination of antimicrobial resistance and further complicate treatment of urinary tract infection (UTI). In addition to providing fine resolution of genome structure, complete, closed genomes allow for the detailed comparative genomics and evolutionary analyses. The generation of complete genomes de novo has long been a challenging task due to limitations of available sequencing technology. Paired-end Next Generation Sequencing (NGS) produces high quality short reads often resulting in accurate but fragmented genome assemblies. On the contrary, Nanopore sequencing provides long reads of lower quality normally leading to error-prone complete assemblies. Such errors may hamper genome-wide association studies or provide misleading variant analysis results. Therefore, hybrid approaches combining both short and long reads have emerged as reliable methods to achieve highly accurate closed bacterial genomes. Reported herein is a comprehensive method for the culture of diverse urinary bacteria, species identification by 16S rRNA gene sequencing, extraction of genomic DNA (gDNA), and generation of short and long reads by NGS and Nanopore platforms, respectively. Additionally, this method describes a bioinformatic pipeline of quality control, assembly, and gene prediction algorithms for the generation of annotated complete genome sequences. Combination of bioinformatic tools enables the selection of high quality read data for hybrid genome assembly and downstream analysis. The streamlined approach for the hybrid de novo genome assembly described in this protocol may be adapted for the use in any culturable bacteria.
The urinary microbiome is an emerging area of research that has shattered a decades long misconception that the urinary tract is sterile in healthy individuals. Members of the urinary microbiota may serve to balance the urinary environment and prevent urinary tract infection (UTI)1,2. Uropathogenic bacteria invade the urinary tract and employ diverse virulence mechanisms to displace the resident microbiota, colonize the urothelium, evade immune responses and counteract environmental pressures3,4. Urine is a relatively nutrient-limited medium characterized by high osmolarity, limited nitrogen and carbohydrate availability, low oxygenation, and low pH5,6,7. Urine is also considered to be antimicrobial, composed of high concentrations of inhibitory urea and antimicrobial peptides such as the human cathelicidin LL-378. Investigating mechanisms employed by both resident bacteria and uropathogens to colonize the urinary tract is critical to further understanding urinary tract health and developing new strategies for UTI treatment. Furthermore, as the failure of front-line antimicrobial therapies becomes more common, it is increasingly important to monitor the dissemination of mobile genetic elements carrying antimicrobial resistance determinants within populations of urinary bacteria9,10.
To investigate genotypes and phenotypes of urinary bacteria, their successful culture and subsequent whole genome sequencing (WGS) is imperative. Culture-dependent methods are necessary to detect and identify viable microbes in urine samples11. Standard clinical urine culture involves plating urine onto 5% sheep blood agar (BAP) and MacConkey agar and incubating aerobically at 35 °C for 24 h12. However, with a detection threshold of ≥105 CFU/mL13, many members of the urinary microbiota are not reported by this method. Improved culturing techniques such as Enhanced Quantitative Urine Culture (EQUC)11 employ various combinations of different urine volumes, incubation times, culture media, and atmospheric conditions to identify microbes commonly missed by standard urine culture. Described in this protocol is a modified version of EQUC, termed here Modified Enhanced Urine Culture protocol, that enables culturing of diverse urinary bacteria and uropathogens using selective media and optimal atmospheric conditions but is not inherently quantitative. The successful isolation of urinary bacteria enables the extraction of genomic DNA (gDNA) for downstream WGS and genome assembly.
Genome assemblies, complete assemblies in particular, enable the discovery of genetic factors that may contribute to colonization, niche maintenance, and virulence among both resident microbiota and uropathogenic bacteria. Draft genome assemblies contain a diverse number of contiguous sequences (contigs) that may contain sequencing errors and lack orientation information. In a complete genome assembly, both the orientation and the accuracy of every base pair have been verified14. Furthermore, obtaining complete genome sequences provides insight into genome structure, genetic diversity, and mobile genetic elements15. Short reads alone may identify the presence or absence of important genes but may not pinpoint their genomic context16. With enabling long-read sequencing technologies such as Oxford Nanopore and PacBio, generating closed de novo assemblies of bacterial genomes no longer requires strenuous methods such as manual closing of de novo assemblies by multiplex PCR17,18. The combination of Next Generation short-read sequencing and Nanopore long-read sequencing technologies allows the facile generation of accurate, complete, and closed bacterial genome assemblies at relatively low costs19. Short-read sequencing produces accurate yet fragmented genome assemblies generally consisting of an average of 40-100 contigs, while Nanopore sequencing generates long reads of about 5-100 kb in length that are less accurate but can serve as scaffolds to join contigs and resolve genomic synteny. Hybrid approaches utilizing both short-read and long-read technologies can produce accurate and complete bacterial genomes19.
Described here is a comprehensive protocol for the isolation and identification of bacteria from human urine, genomic DNA extraction, sequencing, and complete genome assembly using a hybrid assembly approach. This protocol provides a special emphasis on the steps necessary to properly modify reads generated by short-read and long-read sequencing for the accurate assembly of a closed bacterial chromosome and extrachromosomal elements such as plasmids.
Bacteria were cultured from urine collected from consenting women as part of institutional review board-approved studies 19MR0011 (UTD) and STU 032016-006 (UTSW).
1. Modified enhanced urine culture
NOTE: All culture steps must be carried out under sterile conditions. Sterilize all instruments, solutions, and media. Clean the work area with 70% ethanol, then set up a Bunsen burner and work carefully close to the flame to reduce the chances of contamination. Alternately, a class II biosafety cabinet may be used to maintain a sterile environment. Wear appropriate personal protective equipment (PPE) to avoid exposure to potentially pathogenic microbes.
2. Identification of bacterial species by 16S rRNA gene Sanger sequencing
NOTE: Microbial identity can be alternatively confirmed using Matrix-Assisted Laser Desorption Ionization Time of Flight Mass Spectrometry (MALDI-TOF)20.
3. Extraction of genomic DNA (gDNA)
NOTE: This section utilizes reagents and spin-columns provided in the gDNA extraction kit referenced in the Table of Materials for the high yield extraction of quality genomic DNA from diverse bacterial species. Provided below are recommended modifications and instructions.
4. Assessing the quality of extracted gDNA
5. Paired-end next generation short-read sequencing and library preparation
NOTE: Short-read sequencing may be performed on various instruments at distinct read lengths and orientations. 150 bp (300 cycle) paired-end sequencing is recommended for bacterial WGS. Both library preparation and sequencing may be outsourced to core facilities or commercial laboratories.
6. Nanopore MinION sequencing library preparation
7. Assessing and preparing reads
NOTE: A recommended directory structure is depicted in Figure 4. Create the directories found in the Desktop, namely, Long_Reads, Short_Reads and Trimmed_Reads, prior to proceeding with the computation steps below.
8. Generating hybrid genome assembly
NOTE: The following assembly pipeline utilizes Unicycler19,28,29,30 to combine short and long reads prepared in sections 7.1 and 7.2 (Figure 3). Install Unicycler and its dependencies and execute the commands below. Short-read files exported in step 7.1.5 are assumed to be named trimmed_short_file.R1.fastq and trimmed_short_file.R2.fastq for simplicity.
9. Assessing assembly quality
NOTE: The following protocol utilizes Bandage31 and QUAST32, two programs that must be set up prior to use (Figure 2 and Figure 4). Bandage does not require installation once downloaded and QUAST requires familiarity with basic command-line usage. It is also recommended to assess genome completeness using Benchmarking Universal Single-Copy Orthologs (BUSCO)33.
10. Genome annotation
NOTE: The below annotation pipeline utilizes Prokka34, a command-line tool that must be installed prior to usage. Alternatively, use Prokka through the automated GUI K-Base (Table of Materials) or annotate genomes via the web server RAST35. If depositing genomes into NCBI, they will be automatically annotated using the Prokaryotic Genome Annotation Pipeline (PGAP)36.
11. Suggested practices for data democratization
This protocol has been optimized for the culture and sequencing of urinary bacteria belonging to the genera listed in Figure 1. Not all urinary bacteria are culturable by this method. Culture media and conditions are specified by the genus in Figure 1. Exemplary gel electrophoresis assessments of gDNA integrity are depicted in Figure 2. An overview of the bioinformatics pipeline for sequencing read processing, genome assembly, and annotation is described in Figure 3. A guide for computational directory structure is provided in Figure 4 to both simplify protocol understanding and provide framework for successful organization. Furthermore, included are representative complete genomes of two Klebsiella spp., K. pneumoniae and K. oxytoca, that were generated by this protocol. A representation of these assemblies is provided in Figure 5 and also includes an additional incomplete example K. pneumoniae genome. A detailed overview of each fully annotated complete genome is shown in Figure 6. Finally, a summary of sequencing read statistics is provided in Table 1 to offer a broad understanding of raw and trimmed data sufficient for the generation of high-quality closed genome assemblies. Additionally, key parameters of the two representative complete Klebsiella spp. genomes are listed. Genomes and raw data were deposited in Genbank under the BioProject PRJNA683049.
Figure 1: Modified enhanced urine culture of diverse urinary genera. Chart for the agar and liquid broth that may be used to culture diverse urinary genera. All culturing is suggested to be performed at 35 °C as described in subsection 1.1. Circles represent media appropriate for culturing a particular genus, colors were arbitrarily selected to distinguish one media type from another. CDC-AN BAP (red), CDC Anaerobe Sheep Blood Agar; 5% Sheep-BAP (orange), Sheep Blood Agar; BHI (green), Brain Heart Infusion; TSB (yellow), Tryptic Soy Broth; CHROMagar Orientation (blue). aGardnerella vaginalis should be cultured on HBT Bilayer G. vaginalis Selective agar in microaerophilic atmosphere and under special broth culture requirements44. bLactobacillus iners should be cultured on 5% Rabbit-BAP plates and NYCIII broth in microaerophilic atmosphere. cLactobacillus spp. may be cultured on MRS in microaerophilic conditions. Please click here to view a larger version of this figure.
Figure 2: Genomic DNA extraction agarose gel images. Representative gel images depicting gDNA extraction outcomes. (A) Lane 1: 1 kb ladder, Lane 2: intact gDNA representing successful extraction, Lane 3: smearing indicating fragmented gDNA. (B) Lane 1: 1 kb ladder, Lanes 2 & 3: rRNA contamination denoted by two bands between 1.5 kb and 3 kb. Please click here to view a larger version of this figure.
Figure 3: Hybrid genome assembly workflow. Schematic of steps from read quality control and pre-processing to assembly annotation. Read trimming removes ambiguous and low-quality reads. Q-score and length parameters are indicated and represent the reads that are retained. Assembly utilizes both short and long reads to generate a hybrid de novo genome assembly. Assembly quality is evaluated based on completeness and correctness using specified tools and parameters. The final genome assembly is annotated for all genes and specific loci of interest. Please click here to view a larger version of this figure.
Figure 4: Bioinformatics directory structure guide. A schematic of recommended directory and file organization for the processing of short and long reads, hybrid assembly, and genome annotation and QC. Key command-line data processing steps are highlighted next to corresponding files and directories. Eliciting commands and flags (bold), input files (blue), output files or directories (red), user input such as file naming convention (magenta). Please click here to view a larger version of this figure.
Figure 5: Genome assembly graphs by Bandage. Representative complete genome assembly graphs of (A) Klebsiella oxytoca KoPF10 and (B) Klebsiella pneumoniae KpPF25 and incomplete genome assembly of (C) Klebsiella pneumoniae KpPF46. The complete genome of KoPF10 demonstrates a single closed chromosome and the complete genome of KpPF25 consists of a closed chromosome and five closed plasmids. The incomplete chromosome of KpPF46 consists of two interconnected contigs. Unicycler hybrid de novo assembly generates an assembly graph that is visualized by Bandage. The assembly graph provides a simplistic schematic of the genome, indicating closed chromosome or plasmids by a linker connecting two ends of a single contig. The presence of more than one interconnected contig indicates incomplete assembly. Contig size and depth can be noted in Bandage as well. Please click here to view a larger version of this figure.
Figure 6: Complete genome maps of annotated hybrid assemblies. Assembly maps generated by Geneious Prime for the complete genome of (A) K. oxytoca KoPF10 and (B) K. pneumoniae KpPF25 showing annotated genes denoted by colored arrows along plasmid backbones. Chromosomes only show rRNA and tRNA genes for simplicity. Genome annotations were performed using Prokka as indicated in section 10 of this protocol. Please click here to view a larger version of this figure.
Table 1: Representative Klebsiella spp. complete assembly characteristics. Assembly parameters of K. oxytoca strain KoPF10 and K. pneumoniae strain KpPF25. Accession numbers for the deposited data on NCBI are provided. Number of reads both prior to and after trimming are specified for both sequencing technologies. N50 is provided for long reads only since short reads are of a controlled length. Plasmid replicon predicted using PlasmidFinder v2.1 Enteroebacteriaceae database with parameters set to 80% identity and 60% length. a MLST, Multilocus Sequence Type. b CDS, Coding Sequences. c Plasmid replicon predicted using PlasmidFinder v2.1 Enterobacteriaceae database with parameters set to 80% identity and 60% length. d Oxford Nanopore Technologies (ONT) deposited read data. e Illumina deposited read data. Please click here to download this Table.
The comprehensive hybrid genome assembly protocol described here offers a streamlined approach for the successful culturing of diverse urinary microbiota and uropathogens, and the complete assembly of their genomes. Successful WGS of bacterial genomes begins with the isolation of diverse and sometimes fastidious microbes in order to extract their genomic DNA. To date, existing urine culture protocols either lack the necessary sensitivity to detect many urinary species or involve lengthy and extensive approaches that require extended time and resources11. The Modified Enhanced Urine Culture approach described offers a simplified yet comprehensive protocol for the successful isolation of bacteria belonging to 17 common urinary genera, including potentially pathogenic or beneficial commensal species, and both facultative and obligate aerobic or anaerobic bacteria. This in turn provides the necessary starting material for accurate sequencing and assembly of bacterial genomes and for critical phenotypic experiments, which contribute to the understanding of urinary health and disease. Furthermore, this modified culture approach provides for a more defined clinical diagnosis of viable microorganisms found in urine specimens and allows for their biobanking for future genomic studies. However, this protocol is not without limitations. It may require long incubation times depending on the organism as well as use of resources such as a hypoxia chamber or controlled incubators that may not be readily available. The use of anaerobic GasPaks offers an alternative solution but these are costly and do not always produce a sustained and controlled environment. Finally, culture bias as well as sample diversity may allow for particular organisms and uropathogens to outcompete fastidious bacteria. Despite these limitations, a culture of diverse urinary bacteria is made possible by this approach.
Genomic sequencing has gained popularity with the advancement of Next Generation Sequencing technologies which tremendously increased both the yield and accuracy of sequencing data14,15. Coupled with the development of algorithms for data processing and de novo assembly, complete genome sequences are at the fingertips of novice and expert scientists alike15,45. Knowledge of overall genome organization provided by complete genomes offers important evolutionary and biological insights, including gene duplication, gene loss, and horizontal gene transfer14. Additionally, genes important to antimicrobial resistance and virulence are often localized on mobile elements, which are typically not resolved in draft genome assemblies15,16.
The protocol herein follows a hybrid approach for the combination of sequencing data from short-read and long-read platforms to generate complete genome assemblies. While focused on urinary bacterial genomes, this procedure may be adapted to diverse bacteria from various isolation sources. Critical steps in this approach include following adequate sterile technique and utilizing appropriate media and culture conditions for the isolation of pure urinary bacteria. Furthermore, the extraction of intact, high-yield gDNA is essential for generating sequencing data free of contaminating reads that may hamper assembly success. Subsequent library preparation protocols are critical for the generation of quality reads of sufficient length and depth. Therefore, it is of utter importance to handle gDNA with care during library preparation for long-read sequencing in particular, as this technology's greatest benefit is the generation of long reads with no theoretical upper length limit. Also outlined are sections for the appropriate quality control (QC) of sequencing reads that eliminates noisy data and improves assembly outcome.
Despite successful DNA isolation, library preparation, and sequencing, the nature of genomic architecture of some species may still provide an obstacle for the generation of a closed genome assembly45,46. Repetitive sequences often complicate assembly computation and despite long read data, these regions may be resolved with low confidence, or not at all. Long reads thus have to be on average longer than the largest repeat region in the genome or coverage must be high (>100x)19. Some genomes may remain incomplete and require manual approaches for completion. Nevertheless, hybrid assembled incomplete genomes are typically composed of fewer contigs than short-read draft genomes. Adjusting default parameters of the assembly algorithm or following more stringent cutoffs for read QC may help. Alternatively, one suggested approach is to map long reads to the incomplete regions in search of evidence for the most likely assembly path, and then confirm the path utilizing PCR and Sanger sequencing of the amplified region. Mapping reads using Minimap2 is suggested and Bandage offers a useful tool for the visualization of mapped reads along assembled contigs providing evidence for contig linkage47.
An additional challenge to generating complete genomes lies in familiarity and comfort with command-line tools. Many bioinformatic tools are developed to offer computational opportunities to any user; however, their utilization relies on an understanding with the basics of UNIX and programming. This protocol aims to provide sufficiently detailed instructions to enable individuals without prior command-line experience to generate closed genome assemblies and annotate them.
The authors have nothing to disclose.
We thank Dr. Moutusee Jubaida Islam and Dr. Luke Joyce for their contributions to this protocol. We would also like to recognize the University of Texas at Dallas Genome Center for their feedback and support. This work was funded by the Welch Foundation, award number AT-2030-20200401 to N.J.D., by the National Institutes of Health, award number R01AI116610 to K.P., and by the Felecia and John Cain Chair in Women's Health, held by P.E.Z.
Equipment: | |||
Bioanalyzer 2100 | Agilent | G29398A | Optional but recommended |
Centrifuge | Eppendorf | — | Any centrifuge for spinning conicals and microcentrifuge tubes (e.g. Models 5810R/5424R) |
Electrophoresis | BioRad Laboratories | 1645070 | |
Gel Imaging System | BioRad Laboratories | ChemiDoc models | |
Incubator | ThermoFisher Scientific | — | Any CO2 Incubator (e.g. Thermo Forma model 3110) |
Magnetic Rack | New England BioLabs | S15095 | 12-tube rack |
MinION | Oxford Nanopore Technologies | — | |
Nanodrop | ThermoFisher Scientific | ND-ONE-W | |
NextSeq 500 | Illumina | SY-415-1002 | Other Illumina models are acceptable |
Plate Reader | BioTek | — | Synergy H1 |
Qubit fluorometer | ThermoFisher Scientific | Q33238 | |
Rotator | Benchmark Scientific | H2024 | |
Thermocycler | ThermoFisher Scientific | — | Any thermocycler for PCR reactions (e.g. ProFlex PCR system) |
Materials: | |||
10X Phosphate Buffered Saline (PBS) | Fisher Scientific | BP3991 | |
10X TBE buffer | — | — | 1M Tris,1M Boric Acid,0.2M EDTA (pH 8.0) |
1429R primer | Sigma Aldrich (Custom oligos) | — | GGTTACCTTGTTACGACTT |
1kb Ladder | VWR | 101228-494 | |
1M Tris-Cl (pH 7.5) | ThermoFisher Scientific | 15567027 | |
6x Loading dye | Fisher Scientific | NC0783588 | |
8F primer | Sigma Aldrich (Custom oligos) | — | AGAGTTTGATCCTGGCTCAG |
Agar | Fisher Scientific | BP1423-2 | |
Agarose | BioRad Laboratories | 63001 | |
AMPure XP Beads | Beckman Coulter | A63880 | |
Anaerobe Pouch System – GasPak EZ | BD Diagnostic Systems | B260683 | |
Boric Acid | Fisher Scientific | A73-500 | |
Brain Heart Infusion Broth | BD Diagnostic Systems | 212304 | |
CDC Anaerobe 5% Sheep Blood Agar | BD Diagnostic Systems | L007357 | |
CHROMagar Orientation | BD Diagnostic Systems | PA-257481.04 | |
DNeasy Blood & Tissue | QIAGEN | 69504 | |
DreamTaq Master Mix | ThermoFisher Scientific | K1081 | |
Dry Anaerobic Indicator Strips | BD Diagnostic Systems | 271051 | |
EDTA | Fisher Scientific | S311-500 | |
Ethanol 200 Proof | Sigma Aldrich | E7023 | For molecular biology |
Ethidium Bromide | ThermoFisher Scientific | BP130210 | |
Flow cell priming kit | Oxford Nanopore Technologies | EXP-FLP002 | |
Flow cell wash kit | Oxford Nanopore Technologies | EXP-WSH003 | |
Gel Extraction Miniprep Kit | BioBasic | BS654 | |
Ligation sequencing kit | Oxford Nanopore Technologies | SQK-LSK109 | |
Lysozyme | Research Products International Corp | L381005.05 | |
Mutanolysin | Sigma Aldrich | M9901-5KU | |
Native barcoding expansion 1-12 | Oxford Nanopore Technologies | EXP-NBD104 | |
NEB Blunt/TA Ligase Master Mix | New England BioLabs | M0367L | |
NEBNext FFPE DNA Repair Mix | New England BioLabs | M6630L | |
NEBNext quick ligation buffer | New England BioLabs | B6058S | |
NEBNext Ultra II End repair / dA-tailing module | New England BioLabs | E7546L | |
Nextera DNA CD Indexes | Illumina | 20018708 | |
Nextera DNA Flex Library Prep – (M) Tagmentation | Illumina | 20018705 | |
Nuclease-free water | Sigma Aldrich | W4502 | |
Qubit 1X dsDNA HS Assay Kit | ThermoFisher Scientific | Q33230 | |
Qubit Assay Tubes | ThermoFisher Scientific | Q32856 | |
Quick T4 DNA Ligase | New England BioLabs | E6056L | |
R9 Flow cell | Oxford Nanopore Technologies | FLO-MIN106D | |
RNase A | ThermoFisher Scientific | EN0531 | |
Sheep Blood | Hemostat Laboratories | DS13250 | |
TE buffer | — | — | 10mM Tris, 1mM EDTA (pH 8.0) |
Triton X-100 | Sigma Aldrich | T8787 | |
Tryptic Soy Broth | BD Diagnostic Systems | 211825 | |
Software & Bioinformatic Tools: | |||
Bandage | — | — | https://rrwick.github.io/Bandage/ |
Center for Genomic Epidemiology | — | — | http://www.genomicepidemiology.org/ |
CLC Genomics Workbench 12 | QIAGEN | — | |
CRISPRcasFinder | — | — | https://crisprcas.i2bc.paris-saclay.fr/ |
FastQC | — | — | https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ |
Geneious Prime | Geneious | — | |
gVolante (BUSCO) | — | — | https://gvolante.riken.jp/ |
Kbase Prokka Wrapper | — | — | https://kbase.us/applist/apps/ProkkaAnnotation/annotate_contigs/release |
Minimap2 | — | — | https://github.com/lh3/minimap2 |
MinKNOW | Oxford Nanopore Technologies | — | |
NanoFilt | — | — | https://github.com/wdecoster/nanofilt |
NanoStat | — | — | https://github.com/wdecoster/nanostat |
PHASTER | — | — | https://phaster.ca/ |
Prokka | — | — | https://github.com/tseemann/prokka |
QUAST | — | — | http://quast.sourceforge.net/quast |
Trim Galore | — | — | https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ |
Trimmomatic | — | — | http://www.usadellab.org/cms/?page=trimmomatic |
Unicycler | — | — | https://github.com/rrwick/Unicycler#necessary-read-length |