Discovering hidden connections among diseases, genes and drugs based on microarray expression profiles with negative-term filtering.
Microarrays based on gene expression profiles (GEPs) can be tailored specifically for a variety of topics to provide a precise and efficient means with which to discover hidden information. This study proposes a novel means of employing existing GEPs to reveal hidden relationships among diseases, genes, and drugs within a rich biomedical database, PubMed. Unlike the co-occurrence method, which considers only the appearance of keywords, the proposed method also takes into account negative relationships and non-relationships among keywords, the importance of which has been demonstrated in previous studies. Three scenarios were conducted to verify the efficacy of the proposed method. In Scenario 1, disease and drug GEPs (disease: lymphoma cancer, lymph node cancer, and drug: cyclophosphamide) were used to obtain lists of disease- and drug-related genes. Fifteen hidden connections were identified between the diseases and the drug. In Scenario 2, we adopted different diseases and drug GEPs (disease: AML-ALL dataset and drug: Gefitinib) to obtain lists of important diseases and drug-related genes. In this case, ten hidden connections were identified. In Scenario 3, we obtained a list of disease-related genes from the disease-related GEP (liver cancer) and the drug (Capecitabine) on the PharmGKB website, resulting in twenty-two hidden connections. Experimental results demonstrate the efficacy of the proposed method in uncovering hidden connections among diseases, genes, and drugs. Following implementation of the weight function in the proposed method, a large number of the documents obtained in each of the scenarios were judged to be related: 834 of 4028 documents, 789 of 1216 documents, and 1928 of 3791 documents in Scenarios 1, 2, and 3, respectively. The negative-term filtering scheme also uncovered a large number of negative relationships as well as non-relationships among these connections: 97 of 834, 38 of 789, and 202 of 1928 in Scenarios 1, 2, and 3, respectively.