Bioinformatics is a useful way to process large-scale datasets. Through the implementation of bioinformatics approaches, researchers can quickly, reliably, and efficiently obtain insightful applications and scientific discoveries. This article demonstrates the utilization of bioinformatics in ovarian cancer research. It also successfully validates bioinformatics findings through experimentation.
Notch signaling is a highly conserved regulatory pathway involved in many cellular processes. Dysregulation of this signaling pathway often leads to interference with proper development and may even result in initiation or progression of cancers in certain cases. Because this pathway serves complex and versatile functions, it can be studied extensively through many different approaches. Of these, bioinformatics provides an undeniably cost-efficient, approachable, and user-friendly method of study. Bioinformatics is a useful way to extract smaller pieces of information from large-scale datasets. Through the implementation of various bioinformatics approaches, researchers can quickly, reliably, and efficiently interpret these large datasets, yielding insightful applications and scientific discoveries. Here, a protocol is presented for integration of bioinformatics approaches to investigate the role of Notch signaling in ovarian cancer. Furthermore, bioinformatics findings are validated through experimentation.
The Notch signaling pathway is a highly conserved pathway that is important for many developmental processes within biological organisms. Notch signaling has been shown to play a significant role in cell proliferation and self-renewal, and defects in the Notch signaling pathway can lead to many types of cancers1,2,3,4,5,6. In some circumstances, the Notch signaling pathway has been linked to both tissue growth and cancer as well as cell death and tumor suppression7. Multiple Notch receptors (NOTCH 1−4) and co‒activator Mastermind (MAML 1−3), all with diverse functions, add an additional level of complexity. While the Notch signaling pathway is sophisticated in terms of functions, its core pathway is simple on a molecular basis8. Notch receptors act as transmembrane proteins composed of extracellular and intracellular regions9. A ligand binding to the extracellular region of Notch receptors facilitates proteolytic cleavage, which allows the Notch intracellular domain (NICD) to be released into the nucleus. NICD then binds to co‒activator Mastermind to activate downstream gene expression10.
In recent years, Notch signaling has been shown to play a variety of roles in the initiation and progression of several types of cancers across different species6,11. For instance, Notch signaling has been linked to tumorigenesis involving the human NOTCH1 gene12. Recently, the NOTCH2, NOTCH3, Delta-like 3 (DLL3), Mastermind‒like protein 1 (MAML1), and a disintegrin and metalloproteinase domain‒containing protein 17 (ADAM17) genes were shown to be strongly associated with ovarian cancer, especially with the poor overall survival of patients13.
As the amount of experimental and patient-associated data continuously increases, the demand for analysis of the available data increases as well. The available data are scattered across publications, and they may deliver inconsistent or even contradictory findings. With the development of new technology in recent decades, such as next-generation sequencing, the amount of available data has grown exponentially. Although this represents rapid advancements in science and opportunities for continued biological research, assessing the meaning of publicly available data to solve research questions is a great challenge14. We believe bioinformatics is a useful way to extract smaller pieces of information from large-scale datasets. Through the implementation of various bioinformatics approaches, researchers can quickly, reliably, and efficiently interpret these large datasets, yielding insightful discoveries. These discoveries may range from the identification of potential new drug therapy targets or disease biomarkers, to personalized patient treatments15,16.
Bioinformatics itself is rapidly evolving, and approaches are constantly changing as technological advances sweep medical and biological science. Currently, common bioinformatics approaches include the utilization of publicly accessible databases and software programs to analyze DNA or protein sequences, identify genes of particular relevance or importance, and determine the relevance of genes and gene products through functional genomics16. Although the field of bioinformatics is certainly not limited to these approaches, these are significant in helping clinicians and researchers manage biological data for the benefit of patients as a whole.
This study aims to highlight several important databases and their use for research about the Notch signaling pathway. NOTCH2, NOTCH3, and their co‒activator MAML1 were used as examples for the database study. These genes were used because the importance of the Notch signaling pathway in ovarian cancer has been validated. Systematic analyses of retrieved data confirmed the importance of Notch signaling in ovarian cancer. In addition, because Notch signaling is well conserved across species, it was confirmed that overexpression of Drosophila melanogaster NICD and Mastermind together can induce tumors in Drosophila ovaries, supporting the database findings and the significant and conserved role of Notch signaling in ovarian cancer.
1. Prediction of Clinical Outcomes from Genomic Profiles (PRECOG)
NOTE: The PRECOG portal (precog.stanford.edu) accesses publicly available data from 165 cancer expression datasets, including gene expression levels and patient clinical outcomes17. It specifically provides the Meta‒Z analysis, which incorporates large datasets to provide Z‒scores of different genes in 39 cancer types to indicate patient overall survival. Poor and good survival rates are indicated by positive and negative Z‒score values, respectively.
2. CSIOVDB
NOTE: CSIOVDB (csibio.nus.edu.sg/CSIOVDB/CSIOVDB.html) is a microarray database developed by the Cancer Science Institute of Singapore to study ovarian cancer18. This database contains data of carcinomas from different tumor sites as well as normal ovary tissue data. In addition, CSIOVDB provides Kaplan‒Meier survival plots to assess patient survival with differential gene expression levels. CSIOVDB can be applied to investigate the association between gene expression levels and ovarian cancer stages/grades.
3. Gene Expression across Normal and Tumor tissue (GENT)
NOTE: The GENT portal (medical‒genome.kribb.re.kr/GENT) is developed and maintained by the Korea Research Institute of Bioscience and Biotechnology (KRIBB)19. It collects 16,400 (U133A; 241 datasets) and 24,300 (U133plus2; 306 datasets) publicly available samples. After standardization, GENT offers gene expression data across diverse tissues, which are further divided into tumor and normal tissues.
4. Broad Institute Cancer Cell Line Encyclopedia (CCLE)
NOTE: CCLE (portals.broadinstitute.org/ccle) was created by the Broad Institute and provides genomic profiles and mutations of 947 human cancer cell lines20.
5. cBioPortal
NOTE: cBioPortal (www.cioportal.org) was developed at the Memorial Sloan Kettering Cancer Center (MSK), and accesses, analyzes, and visualizes large scale cancer genomic data21,22. Specifically, this portal allows researchers to search for genetic alterations and signaling networks.
6. Dissection of Drosophila with desired genotypes and DAPI staining
NOTE: Collect the female Drosophila with the desired genotypes, then dissect the fly ovaries to undergo the procedures of DAPI staining for imaging.
Using the procedure mentioned in step 1 using the PRECOG portal, the Z-scores of NOTCH2, NOTCH3, and MAML1 in ovarian cancer were obtained (1.3, 2.32, 1.62, respectively). The negative Z‒score values indicate the poor overall survival of patients with high expression levels of the three genes. Using Conditional Formatting of the spreadsheet software, the Z‒score values are shown in a colored bar graph in Figure 1.
The CSIOVDB database was used to confirm the findings. Using the instructions in step 2, NOTCH2, NOTCH3, and MAML1 were sequentially inputted in the CSIOVDB database search area, and the patient survival data located under the Survival tab was retrieved. In addition to the Overall Survival data, CSIOVDB provides Disease-Free Survival. CSIOVDB further separates patients to present the survival data based on Q1 vs. Q4 (lower quartile vs. upper quartile) of gene expression levels. Consistent with previous findings, high expression of NOTCH2, NOTCH3, and MAML1 correlate with poor overall survival and disease-free survival (Figure 2A,B). Meanwhile, the Clinico-pathological Parameters tab of CSIOVDB also provides a comparison of the gene expression levels among different ovarian cancer stages, grades, and clinical responses with Mann-Whitney tests. The results show that higher expression levels of NOTCH2, NOTCH3, and MAML1 are associated with advanced ovarian cancer stages (Figure 2C).
Because NOTCH2, NOTCH3, and MAML1 are critical for overall patient survival, the gene expression levels in ovarian tumors and cancer cell lines were investigated further. The expression data of NOTCH2, NOTCH3, and MAML1 in normal and tumor ovarian tissues were downloaded from the U133A platform using the step 3 instructions for GENT. Scientists can process the downloaded data according to their own specific research purpose. Here, we utilized the data to produce the box and whisker plots using GraphPad Prism (version 8). Further permutation tests suggested that NOTCH2, NOTCH3, and MAML1 are highly expressed in tumor tissues (Figure 3A). Next, the expression data of NOTCH2, NOTCH3, and MAML1 in ovarian cancer cell lines were downloaded according to protocol step 4, using CCLE. Gene expression levels in cancer cell lines are shown by the box and whisker plots (Figure 3B). Even though expression levels of NOTCH2, NOTCH3, and MAML1 are high in cancer cell lines, conclusions cannot be drawn due to the lack of normal cell line controls in the CCLE database. However, scientists can identify the origin of cancer cell lines, and compare the expression levels based on different grades, stages, and other clinicopathological parameters.
Once the significance of NOTCH2, NOTCH3, and MAML1 in ovarian cancer were confirmed, the cBioPortal was utilized to study their associated signal network. Using protocol step 5, Ovary/Fallopian Tube was selected for Select Studies, then the Ovarian Serous Cystadenocarcinoma (TCGA, Nature 2011) dataset was chosen for analysis. For the section labeled Select Genomic Profiles, the mRNA Expression was selected, and finally its profile mRNA expression Z-scores (all genes). For the section Select Patient/Case Set, the Samples with mRNA data (Agilent microarray) (489) option was chosen from the dropdown menu. At the end, genes NOTCH2, NOTCH3, and MAML1 were selected to submit the query. Based on the three core genes, a signaling network was created to provide the 50 most frequently altered neighboring genes, which are also in the same pathway with the highest mutation rates (Figure 4).
Because Notch signaling is well conserved across species, it was investigated in Drosophila ovarian cancer. Notch signaling has been previously reported to regulate follicle cell proliferation25, differentiation26,27, and cell cycle regulation28,29. Overexpression of NICD alone did not induce tumors in Drosophila (Figure 5A), as the epithelium of the Drosophila egg chambers remained intact with one single layer. However, overexpression of NICD and Mam together induced tumors in Drosophila (Figure 5B), which is demonstrated by multiple epithelial layers and accumulated cells.
Figure 1: Expression of NOTCH2, NOTCH3, and MAML1 in ovarian cancer is associated with poor overall survival. The survival Z-scores of NOTCH2, NOTCH3, and MAML1 in ovarian cancer patients are presented. Poor survival is indicated by negative Z‒score values. Please click here to view a larger version of this figure.
Figure 2: High levels of NOTCH2, NOTCH3, and MAML1 in ovarian cancer are associated with poor overall survival, poor disease-free survival, and advanced cancer stages. The microarray database CSIOVDB provides Kaplan-Meier overall survival and disease-free survival plots of NOTCH2, NOTCH3, and MAML1 in ovarian cancer patients, and gene expression levels in different cancer stages. Please click here to view a larger version of this figure.
Figure 3: NOTCH2, NOTCH3, and MAML1 are highly expressed in ovarian tumors and cancer cell lines. P values are indicated to compare gene expression in normal ovaries and corresponding ovarian tumors. (Abbreviations: Ovary-N = normal ovary tissues; Ovary-C = ovarian cancer tissues). Please click here to view a larger version of this figure.
Figure 4: NOTCH2/NOTCH3/ MAML1 genes and their associated signaling network with the 50 most frequently altered neighboring genes. The signaling network is color-coded. The inputted genes are indicated by seed nodes with a thick border. Each gene is represented by a red circle, and the color intensity of the red circle reflects its mutation frequency. Genes are connected by differently colored lines. Brown lines mean "In Same Component", indicating the involvement in the same biological component. Blue lines mean "Reacts With", indicating gene reactions. Green lines mean "State Change", suggesting that one gene might cause a state change of another gene. Please click here to view a larger version of this figure.
Figure 5: NICD and mam in Drosophila also induce ovarian tumors. A. Overexpression of NICD alone does not induce tumor formation in Drosophila. B. Overexpression of NICD and mam together induce tumors in Drosophila. Scale bar = 50 µm Please click here to view a larger version of this figure.
As there are countless approaches and methods for the utilization of bioinformatics, there are numerous databases available online to the general public. An abundance of information can be extracted from each of these databases, but some are best suited for particular purposes, such as assessing patient survival based on certain inputs. Systematic analyses of retrieved data from different individual databases can convincingly yield important scientific findings.
The current analysis focuses on the role of Notch signaling in ovarian cancer through the utilization of bioinformatics approaches. For instance, the Meta-Z analysis on the PRECOG portal database was used to obtain Z-scores that indicate patient survival outcomes in clinical cancer studies. CSIOVDB is another meta-analysis database that was used to study survival outcomes of ovarian cancer patients. The CSIOVDB data successfully validated the findings from the PRECOG portal that NOTCH2, NOTCH3, and MAML1 are critical for overall patient survival. Later, the applications of the GENT and CCLE databases further demonstrated that NOTCH2, NOTCH3, and MAML1 are highly expressed in ovarian tumors and cancer cell lines. The combination of these databases systematically revealed the significant roles of NOTCH2, NOTCH3, and MAML1 in ovarian cancer. This use of bioinformatics methods provided an efficient way to do cancer research cost-effectively and shows how it can yield important findings for future experimental and clinical applications.
Bioinformatics provides the public the capability to access results from thousands of experiments all at once. The information derived from public databases provides a cost-effective and efficient way to establish an experimental design prior to performing experiments. In addition, it is important to note that publicly available data can be scattered across publications and may deliver inconsistent or even contradictory findings, which requires meta-analyses to be performed through bioinformatics approaches. Scientists can design and perform experiments based on the data found through large bioinformatics databases to validate specific scientific hypotheses. Results from the Drosophila experiment confirmed the findings from the bioinformatics databases and further supported the idea that Notch pathway components should continue to be investigated as potential therapeutic drug targets. The successful validation of bioinformatics findings through experimentation also suggests the importance of bioinformatics approaches for scientific discoveries.
There may be some limitations of bioinformatics. First, some websites/tools might not update their findings due to time efforts or costs associated with maintenance. Second, some websites/tools do constantly update, but the update with additional input might alter previously obtained results. Third, developers of some websites/tools reserve copyrights and restrict the use of their contents. Fourth, analyses or algorithms of certain websites/tools might not always be accurate.
To overcome these limitations, some steps or modifications and troubleshooting for better future applications are suggested. First, some websites/tools do allow researchers to manually load new data for analysis. If not, researchers can download and analyze the most recent data on their own. Second, researchers need to repeatedly run their analyses, and keep record of the dates. If results significantly change, researchers might need to use the additional input of data to figure out the reasons. Third, researchers can find an alternative website/tool to run their analyses to avoid potential copyright issues. Fourth, researchers can get additional websites/tools to validate their important findings. If there are any problems with analyses or algorithms, researchers can download and re-analyze the data to correct the mistakes or use other websites/tools with the appropriate settings.
The authors have nothing to disclose.
This work was supported by Start-Up Funding, College of Science and Mathematics Research Grant, Summer Research Session Award, and Research Seed Funding Award from Georgia Southern University.
DAPI (4',6-Diamidino-2-Phenylindole, Dihydrochloride) | Invitrogen | D1306 | 1:1000 Dilution |
PBS, Phosphate Buffered Saline, 10X Powder, pH 7.4 | ThermoFisher | FLBP6651 | Dissolved with ddH2O to make 1X PBS |
Goat serum | Gibco | 16210064 | Serum |
Embryo dish | Electron Microscopy Sciences | 70543-45 | Dissection Dish |
Nutating mixers | Fisherbrand | 88861041 | Nutator |
tj-Gal4, Gal80ts/ CyO; UAS-NICD-GFP/ TM6B | Dr. Wu-Min Deng at Florida State University | N/A | Fly stock |
w*; UAS-mam.A | Bloomington Drosophila Stock Center | #27743 | Fly stock |
w[1118] | Bloomington Drosophila Stock Center | #5905 | Fly stock |
The PRECOG portal | Stanford University | precog.stanford.edu | Publicly accessible database of cancer expression datasets |
CSIOVDB | Cancer Science Institute of Singapore | csibio.nus.edu.sg/CSIOVDB/CSIOVDB.html | Microarray database used to study ovarian cancer |
The Gene Expression across Normal and Tumor tissue (GENT) Portal | Korea Research Institute of Bioscience and Biotechnology (KRIBB) | medical–genome.kribb.re.kr/GENT | Publicly accessible database of gene expression data across diverse tissues, divided into tumor and normal tissues. |
Broad Institute Cancer Cell Line Encyclopedia (CCLE) | Broad Institute and The Novartis Institutes for BioMedical Research | portals.broadinstitute.org/ccle | Provides genomic profiles and mutations of human cancer cell lines |
cBioPortal | Memorial Sloan Kettering Cancer Center (MSK) | cioportal.org | Portal that allows researchers to search for genetic alterations and signaling networks |
Zeiss 710 Inverted confocal microscope | Carl Zeiss | ID #M 210491 | Examination and image collection of fluorescently labeled specimens |