Summary

从头 使用核糖体分析数据识别主动翻译的开放阅读框

Published: February 18, 2022
doi:

Summary

翻译核糖体将每个密码子的三个核苷酸解码为肽。它们沿着mRNA的运动,通过核糖体分析捕获,产生表现出特征性三重态周期性的足迹。该协议描述了如何使用RiboCode从核糖体分析数据中破译这一突出特征,以识别全转录组水平上主动翻译的开放阅读框。

Abstract

识别开放阅读框(ORF),特别是那些编码小肽并在特定生理环境下被积极翻译的阅读框,对于上下文依赖性翻译组的全面注释至关重要。核糖体分析是一种检测RNA上翻译核糖体的结合位置和密度的技术,为快速发现全基因组范围内翻译发生的位置提供了一条途径。然而,在生物信息学中,高效、全面地鉴定用于核糖体分析的转化ORFs并非易事。这里描述的是一个易于使用的包,名为RiboCode,旨在从核糖体分析数据中的失真和模糊信号中搜索任何大小的ORF。本文以我们之前发布的数据集为例,提供了整个RiboCode管道的分步说明,从原始数据的预处理到最终输出结果文件的解释。此外,为了评估注释ORF的平移率,还详细描述了每个ORF上核糖体密度的可视化和定量程序。综上所述,本文是对翻译、小ORF和肽相关研究领域的有用和及时的指导。

Introduction

最近,越来越多的研究表明,从编码基因的ORF和先前注释的基因翻译的肽被广泛生产为非编码基因,例如长非编码RNA(lncRNA)12345678。这些翻译的ORF由细胞调节或诱导,以响应环境变化,压力和细胞分化18910111213。一些ORF的转化产物已被证明在发育和生理学中的各种生物过程中起着重要的调节作用。例如,Chng等人14发现了一种名为Elabela(Ela,也称为Apela / Ende / Toddler)的肽激素,它对心血管发育至关重要。Pauli等人认为Ela还充当有丝分裂原,促进早期鱼胚胎中的细胞迁移15。Magny等人报道了两种少于30个氨基酸的微肽调节钙转运并影响果蝇心脏的正常肌肉收缩10

目前尚不清楚基因组编码了多少这样的肽,以及它们是否具有生物学相关性。因此,系统地识别这些潜在编码的ORF是非常可取的。然而,使用进化守恒16,17和质谱1819等传统方法直接确定这些ORF(即蛋白质或肽)的产物具有挑战性因为这两种方法的检测效率都取决于所产生的蛋白质或肽的长度,丰度和氨基酸组成。核糖体分析是一种在核苷酸分辨率下鉴定mRNA上核糖体占用的技术,它的出现为评估不同转录本的编码潜力提供了一种精确的方法32021,无论它们的长度和组成如何。使用核糖体分析鉴定主动翻译ORF的一个重要且常用的特征是核糖体从起始密码子到停止密码子在mRNA上的足迹的三核苷酸(3-nt)周期性。然而,核糖体分析数据通常存在几个问题,包括沿ORF的低和稀疏测序读数,高测序噪声和核糖体RNA(rRNA)污染。因此,这些数据产生的扭曲和模糊信号削弱了核糖体在mRNA上足迹的3-nt周期模式,最终使得高置信翻译ORFs的鉴定变得困难。

一个名为“RiboCode”的软件包采用了改进的Wilcoxon签名秩测试和P值积分策略,以检查ORF是否比帧外RPF具有更多的帧内核糖体保护片段(RPM)22。它被证明对于模拟和真实核糖体分析数据中翻译组的 从头 注释是高效,灵敏和准确的。在这里,我们描述了如何使用该工具从先前研究生成的原始核糖体分析测序数据集中检测潜在的转化ORF23。这些数据集用于通过比较MCF-10A细胞的核糖体占用谱来探索EIF3亚基“E”(EIF3E)在翻译中的功能,这些细胞转染对照(si-Ctrl)和 EIF3E (si-eIF3e)小干扰RNA(siRNA)。通过将RiboCode应用于这些示例数据集,我们检测到5,633个可能编码小肽或蛋白质的新型ORF。这些ORF根据其相对于编码区域的位置分为各种类型,包括上游ORF(uORFs),下游ORF(dORFs),重叠ORF,来自新型蛋白质编码基因(新型PCG)的ORF以及来自新型非蛋白编码基因(新型NonPCGs)的ORF。与对照细胞相比,EIF3E缺陷细胞中uORFs上的RPF读数密度显着增加,这可能至少部分是由主动翻译核糖体的富集引起的。EIF3E缺陷细胞第25~ 75 密码子区域的局部核糖体积累表明早期翻译伸长受阻。该协议还展示了如何可视化所需区域的RPF密度,以检查已识别ORF上核糖体足迹的3-nt周期模式。这些分析证明了RiboCode在识别翻译ORF和研究翻译监管方面的强大作用。

Protocol

1. 环境设置和 RiboCode 安装 打开一个 Linux 终端窗口并创建一个 conda 环境:conda create -n RiboCode python=3.8 切换到创建的环境并安装 RiboCode 和依赖项:康达激活核糖代码conda install -c bioconda ribocode ribominer sra-tools fastx_toolkit cutadapt bowtie star samtools 2. 数据准备 获取基因组参考文件。 有关参考序列,请转…

Representative Results

将示例核糖体分析数据集存入GEO数据库,加入号为GSE131074。此协议中使用的所有文件和代码均可从补充文件 1-4 获得。通过将RiboCode应用于一组已发表的核糖体分析数据集23,我们确定了在用对照和EIF3E siRNA处理的MCF-10A细胞中主动翻译的新型ORF。为了选择最有可能被翻译核糖体结合的RPF读数,检查了测序读数的长度,并使用映射在已知?…

Discussion

核糖体分析为在基因组尺度上研究核糖体在细胞中的作用提供了前所未有的机会。精确破译核糖体分析数据携带的信息可以深入了解基因或转录本的哪些区域正在积极翻译。此分步协议提供了有关如何使用 RiboCode 详细分析核糖体分析数据的指导,包括软件包安装、数据准备、命令执行、结果说明和数据可视化。RiboCode的分析结果表明,翻译是普遍存在的,并且发生在编码基因的未注释ORF和许多先?…

Disclosures

The authors have nothing to disclose.

Acknowledgements

作者要感谢西安交通大学HPCC平台提供的计算资源的支持。Z.X.衷心感谢西安交通大学青年顶尖人才支持计划。

Materials

A computer/server running Linux Any
Anaconda or Miniconda Anaconda Anaconda: https://www.anaconda.com; Miniconda:https://docs.conda.io/en/latest/miniconda.html
R R Foundation https://www.r-project.org/
Rstudio Rstudio https://www.rstudio.com/

References

  1. Eisenberg, A. R., et al. Translation Initiation Site Profiling Reveals Widespread Synthesis of Non-AUG-Initiated Protein Isoforms in Yeast. Cell Systems. 11 (2), 145-160 (2020).
  2. Spealman, P., et al. Conserved non-AUG uORFs revealed by a novel regression analysis of ribosome profiling data. Genome Research. 28 (2), 214-222 (2018).
  3. Ingolia, N. T., Lareau, L. F., Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell. 147 (4), 789-802 (2011).
  4. Bazzini, A. A., et al. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation. The EMBO Journal. 33 (9), 981-993 (2014).
  5. Ingolia, N. T., et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Reports. 8 (5), 1365-1379 (2014).
  6. Chew, G. L., Pauli, A., Schier, A. F. Conservation of uORF repressiveness and sequence features in mouse, human and zebrafish. Nature Communications. 7, 11663 (2016).
  7. Zhang, H., et al. Determinants of genome-wide distribution and evolution of uORFs in eukaryotes. Nature Communications. 12 (1), 1076 (2021).
  8. Guenther, U. P., et al. The helicase Ded1p controls use of near-cognate translation initiation codons in 5′ UTRs. Nature. 559 (7712), 130-134 (2018).
  9. Goldsmith, J., et al. Ribosome profiling reveals a functional role for autophagy in mRNA translational control. Communications Biology. 3 (1), 388 (2020).
  10. Magny, E. G., et al. Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames. Science. 341 (6150), 1116-1120 (2013).
  11. Stumpf, C. R., Moreno, M. V., Olshen, A. B., Taylor, B. S., Ruggero, D. The translational landscape of the mammalian cell cycle. Molecular Cell. 52 (4), 574-582 (2013).
  12. Gerashchenko, M. V., Lobanov, A. V., Gladyshev, V. N. Genome-wide ribosome profiling reveals complex translational regulation in response to oxidative stress. Proceedings of the National Academy of Sciences of the United States of America. 109 (43), 17394-17399 (2012).
  13. Andreev, D. E., et al. Oxygen and glucose deprivation induces widespread alterations in mRNA translation within 20 minutes. Genome Biology. 16, 90 (2015).
  14. Chng, S. C., Ho, L., Tian, J., Reversade, B. ELABELA: a hormone essential for heart development signals via the apelin receptor. Developmental Cell. 27 (6), 672-680 (2013).
  15. Pauli, A., et al. Toddler: an embryonic signal that promotes cell movement via Apelin receptors. Science. 343 (6172), 1248636 (2014).
  16. Stark, A., et al. Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature. 450 (7167), 219-232 (2007).
  17. Lin, M. F., Jungreis, I., Kellis, M. PhyloCSF: a comparative genomics method to distinguish protein coding and non-coding regions. Bioinformatics. 27 (13), 275-282 (2011).
  18. Slavoff, S. A., et al. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nature Chemical Biology. 9 (1), 59-64 (2013).
  19. Schwaid, A. G., et al. Chemoproteomic discovery of cysteine-containing human short open reading frames. Journal of the American Chemical Society. 135 (45), 16750-16753 (2013).
  20. Ingolia, N. T., Brar, G. A., Rouskin, S., McGeachy, A. M., Weissman, J. S. Genome-wide annotation and quantitation of translation by ribosome profiling. Current Protocols in Molecular Biology. , 1-19 (2013).
  21. Ingolia, N. T., Ghaemmaghami, S., Newman, J. R., Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science. 324 (5924), 218-223 (2009).
  22. Xiao, Z., et al. De novo annotation and characterization of the translatome with ribosome profiling data. Nucleic Acids Research. 46 (10), 61 (2018).
  23. Lin, Y., et al. eIF3 Associates with 80S Ribosomes to Promote Translation Elongation, Mitochondrial Homeostasis, and Muscle Health. Molecular Cell. 79 (4), 575-587 (2020).
  24. . AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format Available from: https://agat.readthedocs.io/en/latest/gff_to_gtf.html (2020)
  25. . Gene Expression Omnibus Available from: https://www.ncbi.nim.nih.gov/geo (2002)
  26. Ingolia, N. T., Brar, G. A., Rouskin, S., McGeachy, A. M., Weissman, J. S. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nature Protocols. 7 (8), 1534-1550 (2012).
  27. . STAR manual Available from: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf (2022)
  28. . The genetic codes Available from: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi (2019)
  29. . RiboMiner Available from: https://github.com/xryanglab/RiboMiner (2020)
  30. Ingolia, N. T., Hussmann, J. A., Weissman, J. S. Ribosome profiling: global views of translation. Cold Spring Harbor Perspectives in Biology. 11 (5), 032698 (2018).
  31. Lee, S., et al. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proceedings of the National Academy of Sciences of the United States of America. 109 (37), 2424-2432 (2012).
  32. Gao, X., et al. Quantitative profiling of initiating ribosomes in vivo. Nature Methods. 12 (2), 147-153 (2015).
  33. Spealman, P., Naik, A., McManus, J. uORF-seqr: A Machine Learning-Based approach to the identification of upstream open reading frames in yeast. Methods in Molecular Biol. 2252, 313-329 (2021).
  34. . RiboCode Available from: https://github.com/xryanglab/RiboCode (2018)
  35. Sharma, P., Wu, J., Nilges, B. S., Leidel, S. A. Humans and other commonly used model organisms are resistant to cycloheximide-mediated biases in ribosome profiling experiments. Nature Communications. 12 (1), 5094 (2021).
check_url/63366?article_type=t

Play Video

Cite This Article
Zhu, Y., Li, F., Yang, X., Xiao, Z. De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data. J. Vis. Exp. (180), e63366, doi:10.3791/63366 (2022).

View Video