Performing Data Mining And Integrative Analysis Of Biomarker In Breast Cancer Using Multiple Publicly Accessible Databases

This article has been accepted and is currently in production


In recent years, emerging databases were designed to lower the barriers for approaching the intricate cancer genomic datasets, thereby, facilitating investigators to analyze and interpret genes, samples and clinical data across different types of cancer. Herein, we describe a practical operation procedure, taking ID1 (Inhibitor of DNA binding proteins 1) as an example, to characterize the expression patterns of biomarker and survival predictors of breast cancer based on pooled clinical datasets derived from online accessible databases, including ONCOMINE, bcGenExMiner v4.0 (Breast cancer gene-expression miner v4.0), GOBO (Gene expression-based Outcome for Breast cancer Online), HPA (The human protein atlas), and Kaplan-Meier plotter. The analysis began with querying the expression pattern of the gene of interest (e.g., ID1) in cancerous samples vs. normal samples. Then, the correlation analysis between ID1 and clinicopathological characteristics in breast cancer was performed. Next, the expression profiles of ID1 was stratified according to different subgroups. Finally, the association between ID1 expression and survival outcome was analyzed. The operation procedure simplifies the concept to integrate multidimensional data types at the gene level from different databases and test hypotheses regarding recurrence and genomic context of gene alteration events in breast cancer. This method can improve the credibility and representativeness of the conclusions, thereby, present informative perspective on a gene of interest.