A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types

Haipeng Xing; Willey Liao; Yifan Mo; Michael Q. Zhang

doi:10.3791/4273

JoVE Journal > Biology

Please note that all translations are automatically generated. Click here for the English version.

Biology

一种新型的全基因组分析的贝叶斯变点算法的多元ChIPseq数据类型

Published: December 10, 2012

doi:

10.3791/4273

Haipeng Xing, Willey Liao², Yifan Mo², Michael Q. Zhang³

¹Department of Applied Mathematics & Statistics,Stony Brook University, ²Computational Biology and Bioinformatics,Cold Spring Harbor Laboratory, ³Department of Molecular and Cell Biology,University of Texas at Dallas

Summary

我们的贝叶斯变点（BCP）算法的基础上通过隐马尔可夫模型的造型变化点的国家的最先进的进步和应用染色质免疫沉淀测序（ChIPseq）数据分析。 BCP执行在广泛和点状数据类型，但擅长准确地识别健壮的，可重复的岛屿弥漫组蛋白富集。

Abstract

ChIPseq是一种广泛使用的技术，用于调查蛋白质-DNA相互作用。读密度分布所产生的使用下一个 – 蛋白结合的DNA测序和对准读取到参考基因组的短。富集的区域显示峰，这往往显着不同的形状，这取决于对目标蛋白^1。例如，转录因子通常在现场和序列特异性方式结合，往往会产生点状的山峰，而组蛋白修饰更普遍的特点是广泛的，弥漫性的岛屿富集^2。可靠地识别这些地区是我们的工作重点。

算法分析ChIPseq数据采用各种方法，启发式^3-5进行更加严格的统计模型，例如隐马尔可夫模型（HMM模型^）6-8。我们试图最小化的必要性难以界定，专案参数，通常的解决方案，妥协的分辨率和减轻直观的工具的可用性。基于HMM的方法，我们的目的是限制参数估计的程序和简单的，有限状态分类，往往利用。

此外，传统ChIPseq的数据分析包括分类的读取密度分布为点状或弥漫性的后续应用适当的工具。我们还旨在取代这两个不同的模型需要一个单一的，更灵活的模型，它可以足够能力解决整个频谱的数据类型。

为了实现这些目标，我们首先构建了一个统计框架，自然为蓝本ChIPseq数据结构尖端提前HMM模型^9，利用唯一明确的公式，其性能优势的关键创新。更复杂的启发式模型，通过我们的的HMM可容纳无限的隐藏状态贝叶斯模型。我们把它应用在读取密度，进一步定义丰富的段，确定合理的变化点。我们的分析表明，我们的贝叶斯变点（BCP）算法具有降低计算复杂度，证明了一个简化的运行时间和内存占用。 BCP算法已成功地应用于斑点状的峰值和漫岛的识别与强大的精度和有限的用户定义的参数。这所示，它的多功能性和易用性。因此，我们认为它可以容易地实现在广泛的范围内的数据类型和最终用户的方式，很容易比较和对比，使其成为一个伟大的工具ChIPseq数据的分析，可以帮助研究团体之间的协作和佐证。在这里，我们演示了应用程序的BCP现有的转录因子^10,11和表观数据来说明它的用处。

Protocol

1。准备输入文件BCP分析对齐短期运行（芯片和输入库）使用首选短读校准软件到相应的参考基因组测序读。映射的位置应该被转换为6列浏览器的可扩展的数据格式（BED）13（UCSC基因组浏览器， http://genome.ucsc.edu/ ），制表符分隔的每行对应的读表示映射的染色体，起始位置（从0开始），结束位置（半开），读的名字，得分（可选）和链。 <…

Representative Results

BCP擅长识别组蛋白修饰数据的广泛富集的地区。作为一个参照点，我们以前相比，我们的研究结果的SICER 3，现有的工具，已显示出强大的性能。为了更好地说明BCP的优势，我们研究了组蛋白的修饰，得到了很好的研究，建立评估成功率的基础。然后考虑到这一点，我们分析H3K36me3，因为它已经显示出强烈的积极转录基因体（图1）关联。相反，H3K36me3也被证明是互斥H3K27me3压?…

Discussion

我们的目标是建立一个模型分析ChIPseq数据同样可以识别点状和弥漫性两种数据结构。到现在为止，富集的地区，尤其是弥漫性的地区，这反映了先决条件预期的大岛，大小，已经难以辨认。为了解决这些问题，我们利用最新进展在HMM的技术，具有许多优点，现有的的的启发式模型和缺乏创新的HMM模型。

我们的模型使用明确的公式与贝叶斯框架。从其他HMM模型，这是一个关键?…

Disclosures

The authors have nothing to disclose.

Acknowledgements

斯塔尔基金会奖（MQZ），美国国立卫生研究院授予ES017166（MQZ），美国国家科学基金会：授予DMS0906593（HX）。

Materials

Name of the reagent	Company	Catalogue number	Comments (optional)
Linux-based workstation

References

Park, P. J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669-680 (2009).
Barski, A., et al. High-resolution profiling of histone methylations in the human genome. Cell. 129, 823-837 (2007).
Zhang, Y., et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Zang, C., et al. A clustering approach for identification of enriched domains from histone modification ChIP-Seq data. Bioinformatics. 25, 1952-1958 (2009).
Jothi, R., Cuddapah, S., Barski, A., Cui, K., Zhao, K. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Res. 36, 5221-5231 (2008).
Qin, Z. S., et al. HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data. BMC Bioinformatics. 11, 369 (2010).
Song, Q., Smith, A. D. Identifying dispersed epigenomic domains from ChIP-Seq data. Bioinformatics. 27, 870-871 (2011).
Spyrou, C., Stark, R., Lynch, A. G., Tavaré, S. BayesPeak: Bayesian analysis of ChIP-seq data. BMC Bioinformatics. 10, 299 (2009).
Lai, T., Xing, H. A simple Bayesian approach to multiple change-points. Statistica Sinica. , (2011).
Robertson, G., et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat. Methods. 4, 651-657 (2007).
Stitzel, M. L., et al. Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci. Cell Metab. 12, 443-455 (2010).
Bernstein, B. E., et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045-1048 (2010).
Karolchik, D., et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, 493-496 (2004).
Matys, V., et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 31, 374-378 (2003).
Portales-Casamar, E., et al. JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles. Nucleic Acids Res. 38, D105-D110 (2010).

Play Video

PDF

DOI

DOWNLOAD MATERIALS LIST

Cite This Article

Xing, H., Liao, W., Mo, Y., Zhang, M. Q. A Novel Bayesian Change-point Algorithm for Genome-wide Analysis of Diverse ChIPseq Data Types. J. Vis. Exp. (70), e4273, doi:10.3791/4273 (2012).

一种新型的全基因组分析的贝叶斯变点算法的多元ChIPseq数据类型

Summary

Abstract

Protocol

Representative Results

Discussion

Disclosures

Acknowledgements

Materials

References

Tags

Play Video

Cite This Article

View Video

一种新型的全基因组分析的贝叶斯变点算法的多元ChIPseq数据类型

Summary

Abstract

Protocol

Representative Results

Discussion

Disclosures

Acknowledgements

Materials

References

Tags

Play Video

Cite This Article

View Video

✖

To prove you're not a robot, please enter the text in the image below