September 5th, 2025
This article introduces a protocol for using DeepSpaceDB, a dynamic, interactive database for spatial transcriptomics, offering analysis workflows and examples to explore tissue organization and disease-related gene expression.
We are making a spatial transcriptomics database called DeepSpaceDB. The call is to make spatial transcriptomics data more easily accessible for biologists and bioinformaticians. Several spatial transcriptomics platforms have been developed.
They allow researchers to study gene expression patterns within tissue slices. But this technology is expensive and the data analysis requires high level bioinformatics skills. We have been using the rhythm and the Xenium spatial platform, the Our Cancer CEX Research.
This platform allow us to decide tumor, swelling and even the distant host tissue within the same organ context. It can help us to resolve the changes in the expression and cellular computation in each compartment separately. One of the major challenges for biologists is actually conducting data analysis.
So there are many researchers that lack the necessary programming and computational skills to be able to fully interpret an increasingly large number of spatial transcriptomics data sets that are now becoming available. By making spatial transcriptomics data sets more easily accessible, the database enables users to generate new hypotheses, exploring the underlying mechanisms behind different diseases. So, for example, we can evaluate the spatial gene expression patterns associated with the tumor microenvironment.
To begin, click on the database tab and select the organism as mouse, the organ as brain, and the source as Zenodo. Scroll through the resulting samples and select the sample labeled DSID001557. Then click on the selected sample and confirm that the description reads 2 million cells in 100 microliters saline NK cell.
Click on the quality tab to evaluate the sample quality. From the quality measures dropdown menu, select options such as detected genes, read count, and mito to view the respective parameter distributions across the sample slice. Now, navigate to the image annotation tab to identify different regions in the sample slice.
Move the mouse cursor over the sample slice to display annotations. View the grid-based annotations generated by a large language model that show anatomical features and associated conditions. Then navigate to the clusters tab to examine the cell type clusters in the sample slice.
View the two dimensional embedding of the clusters and the corresponding color coded representation across spots on the sample slice. Next, navigate to the genes tab to examine the spatially variable genes in the sample. Click on some of the top genes in the list to generate spatial plots of their expression across the tissue slice.
Observe the color coded expression patterns, which clearly show distinct spatial distributions for the highest scoring genes. Then navigate to the pathways tab to examine the activity of gene sets associated with common biological pathways. View the list of spatially variable pathways with pathway activity estimated based on the expression levels of related genes.
Click on some of the top pathways in the list to generate spatial plots of their activity across the tissue slice. Observe the color coded patterns of pathway activity across different tissue regions. Now, go to the tissue explorer tab, which allows users to freely select regions of interest and compare gene expression patterns between them.
Ensure manual selection is activated. Using the mouse cursor, select the spots in the hippocampal region on the left side of the mouse brain slice. Click on set one, and then add to set to highlight the selected spots on the right panel.
Then click on set two and use the mouse cursor to select the spots in the hypothalamic region of the brain slice. Click on add to set to highlight these selected spots on the right side. After completing the spot selection, click on the compare gene expression button.
This generates a table, displaying the average gene expression values for each selected region along with a scatter plot representation. Move the cursor over individual points on the scatter plot to confirm the gene names and the average expression values in both regions. Based on the comparison results, identify differentially expressed genes.
Navigate back to the genes tab and visualize the expression of these genes across the tissue slice. Click on the database tab and use the filter to select the organism as mouse, the organ as liver, and the condition as cancer. From the resulting sample list, select sample DSID001005.
Click on the selected sample and confirm that the description indicates that the sample is from a mouse liver, containing metastasis of colorectal cancer origin. Then navigate to the tissue explorer tab and activate manual selection mode. Using the mouse cursor, select the spots corresponding to the tumor region identified by positive expression of the EpCAM marker in sample DSID001005.
Click on set one. Then select add to set to highlight the selected tumor spots on the right side. Now, click on set two and use the cursor to select the spots in the distant non-tumor region of the liver sample.
Click on add to set to highlight the selected non-tumor spots on the right side of the display. To perform further analysis of gene expression data, click on the download CSV option, generating a comma separated values file of the gene expression data for the two regions of the sample. After repeating the database navigation steps for DSID001007, confirm that the description states it is another slice from a mouse liver containing metastases of colorectal cancer origin.
Next, confirm that two CSV files have been generated, one each from samples DSID001005 and DSID001007, containing two columns representing average gene expression in tumor and non-tumor regions. Load both CSV files into the R programming environment. Merge the data sets to perform downstream analysis using two replicates per condition.
In R, use the limma package to perform differential gene expression analysis on the merge dataset. Assign the colorectal metastases regions from both samples to the cancer group and the distant healthy regions to the control group. Filter the results to identify upregulated genes with a log fold change greater than 0.5 and an adjusted P value less than 0.05.
Similarly extract downregulated genes with a log fold change less than minus 0.5 and an adjusted P value less than 0.05. A distinct low quality region was observed on the left side of the mouse brain sample, characterized by a reduced number of detected genes and a lower read count. The sample showed an average of approximately 4, 000 genes detected per spot, aligning well with the distribution of other samples in the database.
15 spatial clusters were identified across the mouse brain sample with distinct boundaries representing anatomical differences. The genes NRGN, SLC17A7 and DDN showed strong expression in the hippocampal region. In contrast, LY6H expression was localized in the cortical regions, particularly the lower left and right outer edges of the slice.
Neuropeptide signaling activity was notably increased in the lower cortical regions of the sample slice. Regulation of synaptic plasticity was activated across the hippocampal region, particularly in the upper middle zones. Neurotransmitter transport activity was elevated across the mid and upper right sections of the hippocampus.
The gene CLDN7, CLDN4 and ACTG1 exhibited clear upregulation at the tumor region with colorectal metastasis in liver sample DSID 001005. In contrast, the expression of CLDN7, CLDN4 and ACTG1 were notably lower in the distant healthy liver tissue of sample DSID001007.
This article describes DeepSpaceDB, an interactive database designed to enhance accessibility to spatial transcriptomics data. It provides analysis workflows for researchers to investigate tissue organization and gene expression related to various diseases.