Summary

细菌种群中分层基因型和辅助基因组位点的启发式挖掘

Published: December 07, 2021
doi:

Summary

该分析计算平台为对细菌种群基因组学感兴趣的微生物学家,生态学家和流行病学家提供实用指导。具体而言,这里介绍的工作展示了如何执行:i)分层基因型的系统发育指导映射;ii)基于频率的基因型分析;iii) 亲属关系和克隆性分析;iv)识别谱系分化附属位点。

Abstract

常规和系统地使用细菌全基因组测序(WGS)正在提高公共卫生实验室和监管机构开展的流行病学调查的准确性和分辨率。大量公开可用的WGS数据可用于大规模研究致病人群。最近,一个名为ProkEvo的免费计算平台被发布,以使用细菌WGS数据实现可重复,自动化和可扩展的基于分层的群体基因组分析。ProkEvo的这种实施证明了将种群的标准基因型图谱与挖掘辅助基因组内容以进行生态推断相结合的重要性。特别是,这里强调的工作使用ProkEvo派生的输出,使用R编程语言进行人口规模的分层分析。主要目标是通过展示如何:i)使用系统发育指导的分层基因型图谱;(ii)使用系统发育指导的分层基因型图谱;(iii)使用系统发育指导的分级基因型图谱图;(iii)使用分级基因型图ii)评估基因型的频率分布,作为生态适应性的代理;iii)使用特定的基因型分类确定亲属关系和遗传多样性;和iv)地图谱系区分附属位点。为了增强可重复性和可移植性,使用R markdown文件来演示整个分析方法。示例数据集包含来自 2,365 个人畜共患食源性病原体 纽波特沙门氏菌 分离株的基因组数据。分层基因型(血清->BAPS1 ->ST -> cgMLST)的系统发育锚定图揭示了群体遗传结构,突出了序列类型(STs)作为区分基因型的基石。在三个最主要的谱系中,ST5和ST118比高克隆的ST45系统型更晚地共享一个共同的祖先。基于ST的差异进一步突出了辅助抗菌素耐药性(AMR)位点的分布。最后,使用系统发育锚定的可视化来结合分层基因型和AMR内容,以揭示亲缘结构和谱系特异性基因组特征。综合起来,这种分析方法为使用泛基因组信息进行启发式细菌群体基因组分析提供了一些指导。

Introduction

公共卫生实验室和监管机构越来越多地使用细菌全基因组测序(WGS)作为常规监测和流行病学调查的基础,大大加强了病原体疫情调查1234。因此,大量去识别的WGS数据现已公开,可用于以前所未有的规模研究致病物种的种群生物学的各个方面,包括基于以下方面的研究:多个储层,地理区域和环境类型的种群结构,基因型频率和基因/等位基因频率5.最常用的WGS引导的流行病学调查基于仅使用共享核心基因组内容的分析,其中共享(保守)内容仅用于基因型分类(例如,变异呼叫),这些变异成为流行病学分析和追踪的基础1267.通常,基于细菌核心基因组的基因分型是使用7到几千个位点8910的多位点序列分型(MLST)方法进行的。这些基于MLST的策略包括将预组装或组装的基因组序列映射到高度策划的数据库,从而将等位基因信息组合成可重复的基因型单元,用于流行病学和生态学分析1112。例如,这种基于MLST的分类可以在两个分辨率水平上生成基因型信息:较低级别的序列类型(ST)或ST谱系(7个位点),以及更高级别的核心基因组MLST(cgMLST)变体(〜300-3,000个位点)10

基于MLST的基因型分类在实验室之间具有可计算性和高度可重复性,使其被广泛接受为细菌物种水平1314以下的精确亚型方法。然而,细菌种群的结构具有物种特异性的不同程度的克隆性(即基因型同质性),基因型之间等级亲缘关系的复杂模式151617,以及辅助基因组内容分布的广泛差异1819.因此,一种更全面的方法超越了离散分类,进入了MLST基因型,并结合了不同分辨率下基因型的层次结构关系,以及将辅助基因组内容映射到基因型分类上,这有助于基于人群的推断182021.此外,分析还可以集中在偶异相关基因型21,22中辅助基因组位点的共同遗传模式上。总体而言,组合方法能够对种群结构与地理空间或环境梯度中特定基因组组成(例如,位点)的分布之间的关系进行不可知论的询问。这种方法可以产生关于特定种群生态特征的基本和实用信息,这些信息反过来又可以解释它们在水库(如食用动物或人类)中的向性和分散模式。

这种基于系统的分层人口导向方法需要大量的WGS数据,以获得足够的统计能力来预测可区分的基因组特征。因此,该方法需要一个能够同时处理数千个细菌基因组的计算平台。最近,ProkEvo被开发出来,是一个免费获得,自动化,便携式和可扩展的生物信息学平台,允许基于分层的综合细菌种群分析,包括泛基因组图谱20。ProkEvo允许研究中大规模细菌数据集,同时提供一个框架来生成可测试且可推断的流行病学和生态假设以及可由用户自定义的表型预测。这项工作补充了该管道,提供了有关如何利用ProkEvo派生的输出文件作为分析和解释分层种群分类和辅助基因组挖掘的输入的指南。这里介绍的案例研究利用了 肠道沙门氏菌 谱系I人畜共患血清 S的种群。以纽波特为例,特别旨在为微生物学家,生态学家和流行病学家提供有关如何:i)使用自动化系统发育依赖性方法来绘制分层基因型的实用指南;ii)评估基因型的频率分布,作为评估生态适应性的代理;iii)使用独立的统计方法确定谱系特异性的克隆程度;iv)绘制谱系分化AMR位点,作为如何在种群结构背景下挖掘附属基因组内容的示例。更广泛地说,这种分析方法提供了一个可推广的框架,可以在一定规模上进行基于人群的基因组分析,无论目标物种如何,都可以用来推断进化和生态模式。

Protocol

1. 准备输入文件 注意:该协议可在此处获得 – https://github.com/jcgneto/jove_bacterial_population_genomics/tree/main/code。该协议假设研究人员专门使用ProkEvo(或类似的管道)来获取此Figshare存储库中可用的必要输出(https://figshare.com/account/projects/116625/articles/15097503 – 需要登录凭据 – 用户必须创建一个免费帐户才能访问文件!值得注意的是,ProkEvo会自动从NCBI-SRA存储库下载基因组?…

Representative Results

通过利用计算平台ProkEvo进行群体基因组学分析,细菌WGS数据挖掘的第一步包括在核心基因组系统发育的背景下检查分层种群结构(图1)。在 S的情况下。 肠系 谱系I,如 S所示。 Newport数据集,总体的分层结构如下:血清(最低分辨率水平),BAPS1亚组或单倍型,ST谱系和cgMLST变体(最高分辨率)20。这种对分层种群结构的系统发育指导?…

Discussion

利用基于系统的启发式和分层种群结构分析为识别细菌数据集中的新基因组特征提供了一个框架,这些特征有可能解释独特的生态和流行病学模式20.此外,将辅助基因组数据映射到种群结构上可用于推断祖先获得的和/或最近衍生的性状,这些性状有助于ST谱系或cgMLST变体在储库6202145<s…

Divulgazioni

The authors have nothing to disclose.

Acknowledgements

这项工作得到了UNL-IANR农业研究司和国家抗菌素耐药性研究与教育研究所以及食品科学和技术部内布拉斯加州食品卫生中心提供的资金的支持。这项研究只能通过利用UNL的荷兰计算中心(HCC)来完成,该中心得到了内布拉斯加州研究计划的支持。我们还感谢通过HCC获得开放科学网格(OSG)提供的资源,该网格得到了美国国家科学基金会和美国能源部科学办公室的支持。这项工作使用了Pegasus Workflow Management Software,该软件由美国国家科学基金会(grant #1664162)资助。

Materials

amr_data_filtered https://figshare.com/account/projects/116625/articles/14829225?file=28758762
amr_data_raw https://figshare.com/account/projects/116625/articles/14829225?file=28547994
baps_output https://figshare.com/account/projects/116625/articles/14829225?file=28548003
Core-genome phylogeny https://figshare.com/account/projects/116625/articles/14829225?file=28548006
genome_sra https://figshare.com/account/projects/116625/articles/14829225?file=28639209
Linux, Mac, or PC any high-performance platform
mlst_output https://figshare.com/account/projects/116625/articles/14829225?file=28547997
sistr_output https://figshare.com/account/projects/116625/articles/14829225?file=28548000
figshare credentials are required for login and have access to the files

Riferimenti

  1. Grad, Y. H., et al. Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011. Proceedings of the National Academy of Sciences of the United States of America. 109 (8), 3065-3070 (2012).
  2. Worby, C. J., Chang, H. -. H., Hanage, W. P., Lipsitch, M. The distribution of pairwise genetic distances: a tool for investigating disease transmission. Genetica. 198 (4), 1395-1404 (2014).
  3. Leekitcharoenphon, P., et al. Global genomic epidemiology of Salmonella enterica serovar Typhimurium DT104. Applied and Environmental Microbiology. 82 (8), 2516-2526 (2016).
  4. Alba, P., et al. Molecular epidemiology of Salmonella Infantis in Europe: insights into the success of the bacterial host and its parasitic pESI-like megaplasmid. Microbial Genomics. 6 (5), (2020).
  5. Zhou, Z., Alikhan, N. -. F., Mohamed, K., Fan, Y. the Agama Study Group, Achtman, M. The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Research. 30 (1), 138-152 (2020).
  6. Azarian, T., et al. Global emergence and population dynamics of divergent serotype 3 CC180 pneumococci. PLOS Pathogens. 14 (11), 1007438 (2018).
  7. Saltykova, A., et al. Comparison of SNP-based subtyping workflows for bacterial isolates using WGS data, applied to Salmonella enterica serotype Typhimurium and serotype 1,4,[5],12:i. PLOS ONE. 13 (2), 0192504 (2018).
  8. Achtman, M., et al. Multi-locus sequence typing as a replacement for serotyping in Salmonella enterica. PLoS Pathogens. 8 (6), 1002776 (2012).
  9. Maiden, M. C. J., et al. Multi-locus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proceedings of the National Academy of Sciences of the United States of America. 95 (6), 3140-3145 (1998).
  10. Alikhan, N. -. F., Zhou, Z., Sergeant, M. J., Achtman, M. A genomic overview of the population structure of Salmonella. PLOS Genetics. 14 (4), 1007261 (2018).
  11. Gupta, A., Jordan, I. K., Rishishwar, L. stringMLST: a fast k-mer based tool for multi-locus sequence typing. Bioinformatics. 33 (1), 119-121 (2017).
  12. Jolley, K. A., Maiden, M. C. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics. 11 (1), 595 (2010).
  13. Maiden, M. C. J., et al. MLST revisited: the gene-by-gene approach to bacterial genomics. Nature Reviews Microbiology. 11 (10), 728-736 (2013).
  14. Maiden, M. C. J. Multilocus sequence typing of bacteria. Annual Review of Microbiology. 60 (1), 561-588 (2006).
  15. Shapiro, B. J., Polz, M. F. Ordering microbial diversity into ecologically and genetically cohesive units. Trends in Microbiology. 22 (5), 235-247 (2014).
  16. Cordero, O. X., Polz, M. F. Explaining microbial genomic diversity in light of evolutionary ecology. Nature Reviews Microbiology. 12 (4), 263-273 (2014).
  17. Achtman, M., Wagner, M. Microbial diversity and the genetic nature of microbial species. Nature Reviews Microbiology. 6 (6), 431-440 (2008).
  18. Abudahab, K., et al. PANINI: Pangenome neighbour identification for bacterial populations. Microbial Genomics. 5 (4), (2019).
  19. Laing, C. R., Whiteside, M. D., Gannon, V. P. J. Pan-genome analyses of the species Salmonella enterica, and identification of genomic markers predictive for species, subspecies, and serovar. Frontiers in Microbiology. 8, 1345 (2017).
  20. Pavlovikj, N., Gomes-Neto, J. C., Deogun, J. S., Benson, A. K. ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses. PeerJ. 9, 11376 (2021).
  21. McNally, A., et al. Combined analysis of variation in core, accessory and regulatory genome regions provides a super-resolution view into the evolution of bacterial populations. PLOS Genetics. 12 (9), 1006280 (2016).
  22. Langridge, G. C., et al. Patterns of genome evolution that have accompanied host adaptation in Salmonella. Proceedings of the National Academy of Sciences of the United States of America. 112 (3), 863-868 (2015).
  23. Price, M. N., Dehal, P. S., Arkin, A. P. FastTree 2 – Approximately maximum-likelihood trees for large alignments. PLoS ONE. 5 (3), 9490 (2010).
  24. Page, A. J., et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 31 (22), 3691-3693 (2015).
  25. Yoshida, C. E., et al. The Salmonella In silico typing resource (SISTR): An open web-accessible tool for rapidly typing and subtyping draft Salmonella genome assemblies. PLOS ONE. 11 (1), 0147101 (2016).
  26. Cheng, L., Connor, T. R., Siren, J., Aanensen, D. M., Corander, J. Hierarchical and spatially explicit clustering of DNA sequences with BAPS software. Molecular Biology and Evolution. 30 (5), 1224-1228 (2013).
  27. Tonkin-Hill, G., Lees, J. A., Bentley, S. D., Frost, S. D. W., Corander, J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Research. 47 (11), 5539-5549 (2019).
  28. MLST. GitHub Available from: https://github.com/tseemann/mist (2020)
  29. ABRicate. GitHub Available from: https://github.com/tseemann/abricate (2020)
  30. R: A language and environment for statistical computing. R Foundation for Statistical Computing Available from: https://cran.r-project.org (2021)
  31. Wickham, H., et al. Welcome to the Tidyverse. Journal of Open Source Software. 4 (43), 1686 (2019).
  32. rOpenSci: The skimr package. GitHub Available from: https://github.com/ropensci/skimr/ (2021)
  33. . vegan: Community ecology package. R package version 2.5-5 Available from: https://CRAN.R-project.org/package=vegan (2019)
  34. Yu, G. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics. 69 (1), (2020).
  35. . ggpubr: “ggplot2” Based Publication Ready Plots. R package version 0.4.0 Available from: https://CRAN.R-project.org/package=ggpubr (2020)
  36. . ggrepel: Automatically Position Non-Overlapping Text Labels with “ggplot2”. R package version 0.9.1 Available from: https://CRAN.R-project.org/package=ggrepel (2021)
  37. Wickham, H. Reshaping Data with the reshape Package. Journal of Statistical Software. 21 (12), (2007).
  38. . RColorBrewer: ColorBrewer Palettes. R package version 1.1-2 Available from: https://CRAN.R-project.org/package=RColorBrewer (2014)
  39. Hadfield, J., Croucher, N. J., Goater, R. J., Abudahab, K., Aanensen, D. M., Harris, S. R. Phandango: an interactive viewer for bacterial population genomics. Bioinformatics. 34 (2), 292-293 (2018).
  40. Perron, G. G., et al. Functional characterization of bacteria isolated from ancient arctic soil exposes diverse resistance mechanisms to modern antibiotics. PLOS ONE. 10 (3), 0069533 (2015).
  41. Mitchell, P. K., et al. Population genomics of pneumococcal carriage in Massachusetts children following introduction of PCV-13. Microbial Genomics. 5 (2), (2019).
  42. Klemm, E. J., et al. Emergence of host-adapted Salmonella Enteritidis through rapid evolution in an immunocompromised host. Nature Microbiology. 1 (3), 15023 (2016).
  43. Břinda, K., et al. Rapid inference of antibiotic resistance and susceptibility by genomic neighbour typing. Nature Microbiology. 5 (3), 455-464 (2020).
  44. MacFadden, D. R., et al. Using genetic distance from archived samples for the prediction of antibiotic resistance in Escherichia coli. Antimicrobial Agents and Chemotherapy. 64 (5), (2020).
  45. Mageiros, L., et al. Genome evolution and the emergence of pathogenicity in avian Escherichia coli. Nature Communications. 12 (1), 765 (2021).
  46. Yahara, K., et al. Genome-wide association of functional traits linked with Campylobacter jejuni survival from farm to fork. Environmental Microbiology. 19 (1), 361-380 (2017).
  47. Walter, J., Maldonado-Gómez, M. X., Martínez, I. To engraft or not to engraft: an ecological framework for gut microbiome modulation with live microbes. Current Opinion in Biotechnology. 49, 129-139 (2018).
  48. Maldonado-Gómez, M. X., et al. Stable engraftment of Bifidobacterium longum AH1206 in the human gut depends on individualized features of the resident microbiome. Cell Host & Microbe. 20 (4), 515-526 (2016).
  49. Zhao, S., et al. Adaptive evolution within gut microbiomes of healthy people. Cell Host & Microbe. 25 (5), 656-667 (2019).
  50. Treangen, T. J., Ondov, B. D., Koren, S., Phillippy, A. M. The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biology. 15 (11), 524 (2014).
  51. Letunic, I., Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Research. 49, 293-296 (2021).
  52. Croucher, N. J., et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Research. 43 (3), 15 (2015).
  53. Fenske, G. J., Thachil, A., McDonough, P. L., Glaser, A., Scaria, J. Geography shapes the population genomics of Salmonella enterica Dublin. Genome Biology and Evolution. 11 (8), 2220-2231 (2019).
  54. Lees, J. A., et al. Fast and flexible bacterial genomic epidemiology with PopPUNK. Genome Research. 29 (2), 304-316 (2019).
  55. Cohan, F. M. Towards a conceptual and operational union of bacterial systematics, ecology, and evolution. Philosophical Transactions of the Royal Society B: Biological Sciences. 361 (1475), 1985-1996 (2006).
  56. Cohan, F. M., Koeppel, A. F. The origins of ecological diversity in prokaryotes. Current Biology. 18 (21), 1024-1034 (2008).
  57. Cohan, F. M. Transmission in the origins of bacterial diversity, from ecotypes to phyla. Microbial Transmission. 5 (5), 311-343 (2019).
  58. Davis, J. J., et al. The PATRIC bioinformatics resource center: expanding data and analysis capabilities. Nucleic Acids Research. 48, 606-612 (2019).
  59. Feng, Y., Zou, S., Chen, H., Yu, Y., Ruan, Z. BacWGSTdb 2.0: a one-stop repository for bacterial whole-genome sequence typing and source tracking. Nucleic Acids Research. 49, 644-650 (2021).

Play Video

Citazione di questo articolo
Pavlovikj, N., Gomes-Neto, J. C., Benson, A. K. Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations. J. Vis. Exp. (178), e63115, doi:10.3791/63115 (2021).

View Video