Research Article

Reanalysis of Public Transcriptomes Reveals Shared Immune Signatures Between Major Depressive Disorder And Dermatomyositis With Single-Cell Context

DOI:

10.3791/71024

June 26th, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study aimed to use an integrative bioinformatic reanalysis of public GEO datasets, combined with single-cell contextualization, to identify candidate shared genes between major depressive disorder and dermatomyositis and to characterize their distribution across immune cell subsets in a dermatomyositis-related single-cell dataset.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study aimed to identify candidate shared transcriptomic signals between major depressive disorder and dermatomyositis through an integrative bioinformatic reanalysis of public GEO datasets with single-cell contextualization. The analytical workflow included Weighted Gene Co-expression Network Analysis (WGCNA) for key module identification, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses for functional characterization, GeneMANIA- and a network visualization platform-based network analysis for candidate-gene prioritization, and evaluation of 113 machine-learning models combined with SHapley Additive exPlanations (SHAP) for diagnostic feature selection. Gene Set Enrichment Analysis (GSEA), immune infiltration analysis, and single-cell RNA-seq-based contextualization were subsequently performed to further characterize the immune-related cellular context of the identified signals. Integration of dermatomyositis-related GEO datasets identified 570 differentially expressed genes, from which 33 candidate shared genes were obtained via WGCNA. Functional enrichment and network analyses highlighted immune defense, cytotoxicity, and pathways including PPAR, IL-17, and antigen processing, with ELANE, PPBP, and CTSG emerging as highly connected nodes. Machine-learning-based feature prioritization retained 8 candidate model-selected genes, namely KIF4A, OLR1, KIR2DL4, KRT23, KIR3DS1, AZU1, SCG5, and LRRC37E. Immune infiltration analysis associated these shared genes with regulatory T cells (Tregs), resting mast cells, resting dendritic cells, and both classically activated (M1) and alternatively activated (M2) macrophages. Single-cell RNA-seq contextualization further suggested that CD8⁺ T-cell subsets with different candidate-gene score states showed distinct intercellular communication patterns. Among these, the MIF–(CD74+CD44) axis and signals from naive/central memory T cells were notable features requiring further validation. Overall, this study identified candidate shared transcriptomic signals between major depressive disorder and dermatomyositis and highlighted immune-related cellular contexts that warrant further validation in true comorbid cohorts.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Dermatomyositis is a chronic systemic autoimmune disease characterized by inflammatory involvement of the skin and skeletal muscle, clinically manifested by symmetrical proximal muscle weakness and distinctive cutaneous lesions, and, in severe cases, multiorgan dysfunction1. Accumulating clinical evidence highlights that patients with dermatomyositis frequently experience psychiatric comorbidities, most notably major depressive disorder2,3,4. The pathogenesis of dermatomyositis-associated major depressive disorder is multifactorial, arising from a complex interplay between psychosocial distress, neuroendocrine dysregulation, and systemic immunoinflammation. Persistent pain, fatigue, and progressive muscular weakness can substantially impair physical function and quality of life. These burdens may lead to social role disruption, chronic psychological stress, and reduced autonomy5. In addition, long-term glucocorticoid exposure may perturb the hypothalamic-pituitary-adrenal (HPA) axis and impair hippocampal plasticity, thereby increasing vulnerability to depression-related manifestations6. Moreover, sustained immune activation and systemic inflammation in dermatomyositis are increasingly recognized as potential contributors to depression-related symptoms7. These peripheral mediators may influence the central nervous system by modulating neurotransmitter metabolism and neuroplasticity, thereby linking systemic autoimmunity with neuropsychiatric manifestations7,8,9,10.

Importantly, this rationale does not imply that either disease is biologically homogeneous. Dermatomyositis comprises clinically and serologically distinct subsets, including anti-MDA5- and anti-TIF1-γ-associated phenotypes with different inflammatory and clinical profiles11,12,13. Major depressive disorder is likewise increasingly recognized as a heterogeneous condition, and current evidence supports the existence of an immune-inflammatory subtype rather than a single universal inflammatory signature14,15. Accordingly, the present study was not designed to assume a one-size-fits-all shared molecular program, but rather to screen for candidate overlapping immune-associated transcriptomic signals detectable at the cohort level across independent public datasets.

Beyond psychosocial stress and treatment exposure, a more biologically testable link between major depressive disorder and dermatomyositis is shared immune-inflammatory dysregulation. Major depressive disorder is a heterogeneous condition and should not be assumed to have a single universal transcriptomic profile. However, converging evidence supports an inflammation-related subtype of major depressive disorder, and peripheral transcriptomic studies have identified dysregulation of innate immune, neutrophil-related, interferon, and complement pathways in subsets of affected individuals14,16,17. In parallel, MAPK-related stress signaling has also been implicated in depressive phenotypes18. Dermatomyositis, by contrast, is a well-recognized interferon-driven autoimmune disease, and transcriptomic studies in blood and affected tissues have consistently demonstrated activation of type I interferon and broader immune-inflammatory programs; recent multi-omic analyses have further highlighted ERK- and p38 MAPK-related pathway activity in dermatomyositis19,20,21. Collectively, these findings provide a biologically plausible rationale to examine whether a subset of immune-associated transcriptomic signals may overlap between major depressive disorder and dermatomyositis across independent public datasets.

Despite these observations, the molecular basis underlying the overlap between major depressive disorder and dermatomyositis remains insufficiently understood. Importantly, currently available public datasets do not provide a true cohort of patients simultaneously diagnosed with major depressive disorder and dermatomyositis. Therefore, rather than directly analyzing depression in patients with dermatomyositis, the present study was designed to identify candidate shared transcriptomic signals across separate public datasets of major depressive disorder and dermatomyositis through an integrative bioinformatic reanalysis framework22. Specifically, publicly available transcriptomic datasets were analyzed using differential expression analysis, weighted gene co-expression network analysis (WGCNA), functional enrichment analysis, network-based analysis, and machine-learning-based feature prioritization to identify cross-disease candidate genes and pathways23. In addition, a dermatomyositis-related single-cell dataset was analyzed to contextualize these candidate genes at the immune-cell level. As illustrated in Figure 1, the overall analytical workflow is summarized in a stepwise flowchart. Rather than establishing a definitive comorbidity mechanism, this study aimed to generate a hypothesis-guided framework for identifying candidate shared molecular signals between major depressive disorder and dermatomyositis.

Accordingly, this study adopted a stepwise prioritization framework. Disease-associated co-expression modules were first identified separately in major depressive disorder and dermatomyositis datasets, and their overlap was used to define candidate cross-disease shared signals. These candidates were then functionally contextualized by enrichment and GeneMANIA-based network analysis, prioritized within a dermatomyositis-centered classification framework using machine-learning methods, and finally examined in a dermatomyositis-related single-cell dataset to provide cellular contextualization.

Gene analysis workflow diagram, including WGCNA, machine learning, PPI, SHAP, and immune infiltration.
Figure 1: Flowchart of the data collection and analysis process. Please click here to view a larger version of this figure.

Several alternative approaches have been used to investigate cross-disease molecular overlap. Simple intersection of differentially expressed gene (DEG) lists is computationally straightforward but lacks the module-level contextual information provided by co-expression analysis and is sensitive to arbitrary fold-change and P-value thresholds. Traditional meta-analysis pools effect sizes across studies of the same disease but is not designed to identify shared signals across two distinct conditions. The present workflow integrates multiple complementary analytical layers—co-expression module overlap, functional enrichment, network analysis, machine-learning-based feature prioritization, immune deconvolution, and single-cell contextualization—each serving a distinct purpose within a sequential prioritization framework. This multi-layered design helps reduce the number of candidate genes stepwise and provides cross-validated biological contextualization at multiple levels. The protocol is applicable to any pair of diseases for which publicly available bulk transcriptomic and, optionally, single-cell datasets exist, particularly when true comorbid cohorts are unavailable. However, the workflow is observational and does not incorporate formal causal-inference frameworks; all findings should be interpreted as hypothesis-generating and require independent experimental validation.

Overall, the analytical workflow was designed as a sequential prioritization strategy rather than a direct causal-inference framework. Each step served a distinct purpose: WGCNA-based module overlap for candidate shared-signal identification, enrichment/network analysis for biological contextualization, machine learning for feature prioritization in dermatomyositis-related classification, and single-cell analysis for cell-type-level contextualization.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study used only publicly available, deidentified datasets from the Gene Expression Omnibus (GEO) database. Because the work involved secondary analysis of existing public data and did not include direct participant contact, intervention, or access to identifiable personal information, additional ethics committee approval and informed consent were not required.

Data sources and preprocessing

All gene expression and single‑cell dataset were obtained from the GEO database24. For major depressive disorder, dataset GSE98793 was used, which comprises peripheral blood samples from 128 patients and 64 healthy controls. For dermatomyositis, datasets were selected based on predefined criteria, including Homo sapiens expression profiling, clearly identifiable disease and control groups, available platform annotation for probe-to-gene mapping, and suitability for discovery or validation analysis. When a GEO series contained multiple inflammatory myopathy subtypes, only dermatomyositis and normal control samples were extracted for the present study. GSE1551, GSE46239, and GSE128470 were used as the discovery/training datasets, whereas GSE5370, GSE39454, and GSE11971 were used as independent validation datasets. The dermatomyositis datasets analyzed in this study were derived mainly from affected muscle or skin tissues rather than peripheral blood. Single‑cell data for dermatomyositis were sourced from dataset GSE190510.

Raw expression matrices were downloaded from the GEO database together with the corresponding platform annotation files. Probe IDs were mapped to official gene symbols according to the manufacturer-provided GPL annotation. Probes that could not be mapped unambiguously to a single official gene symbol were removed. When multiple probes mapped to the same gene, they were collapsed at the gene level using the average expression value implemented by the `avereps` function in the limma package, thereby generating a gene-by-sample expression matrix.

To reduce intensity-dependent bias and stabilize variance, log2 transformation was applied when appropriate according to the distribution of expression values. Between-array normalization was then performed using the `normalizeBetweenArrays` function in the limma package. Missing values, when present, were imputed using K-nearest neighbor imputation. For the integrated dermatomyositis training datasets, batch correction was performed using the `ComBat` function in the sva package, with dataset/platform origin treated as the batch variable and sample group (dermatomyositis versus healthy control) included in the design matrix to preserve the biological variation of interest during batch adjustment.

All analyses were performed in R using an Integrated development environment for R on a desktop operating system. The limma package was used for probe summarization and normalization. The sva package was used for ComBat batch correction. Missing values were imputed using K-nearest neighbor imputation with k = 10.

Weighted gene co-expression network analysis

Weighted Gene Co-expression Network Analysis (WGCNA) was performed separately for the major depressive disorder and dermatomyositis datasets using the WGCNA R package25,26. Samples were hierarchically clustered using flashClust to identify outliers; samples exceeding a dendrogram height of 100 and genes in the bottom 25% of variance were excluded. For each network, a soft-thresholding power (β) was selected using pickSoftThreshold to achieve approximate scale-free topology (R2 > 0.8). The adjacency matrix was transformed into a Topological Overlap Matrix (TOM), and modules were identified via dynamic tree cutting with a minimum module size of 60 and a merge cut height of 0.2527. The WGCNA R package was used together with flashClust for hierarchical clustering. The random seed was set to 12345 for reproducibility. Module eigengenes were correlated with disease status using Pearson correlation, with P-values adjusted by the Benjamini–Hochberg method. For each disease, the module showing the strongest and most significant association with disease status was retained as the key disease-associated module. The overlap between the key module genes from the major depressive disorder dataset and those from the dermatomyositis dataset was defined as the candidate shared gene set for downstream analyses. Differential expression analysis of the integrated dermatomyositis cohort was performed separately to characterize dermatomyositis-related transcriptional changes.

Functional enrichment analysis

Gene Ontology (GO) enrichment analysis was performed using R. Gene symbols were converted to Entrez IDs using org.Hs.eg.db, and significantly enriched GO terms (p < 0.05) were identified using enrichGO in clusterProfiler. For multi‑dimensional visualization of the results, bar plots and bubble plots were generated using the enrichplot package, while a circular plot was constructed with the circlize package to display GO categories, gene counts, and enrichment factors. Legends were added with the ComplexHeatmap package. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of differentially expressed genes was also conducted in R. Gene symbols were converted to Entrez IDs based on the org.Hs.eg.db database, and significantly enriched pathways (FDR < 0.05) were identified using the enrichKEGG function from the clusterProfiler package28,29,30,31. Enrichment results were visualized using bar and bubble plots.

GeneMANIA-based functional association network analysis

Based on the previously identified shared genes, a GeneMANIA-based functional association network was constructed to explore the interaction context among these genes and their related partners. The gene list was submitted to GeneMANIA using Homo sapiens as the reference species. GeneMANIA integrates multiple evidence types, including co-expression, physical interactions, pathways, co-localization, genetic interactions, and shared protein domains. The resulting network was exported and imported into a network visualization platform for visualization and analysis. Topological analysis of the network was then performed in a network visualization platform for visualization and analysis to identify highly connected candidate nodes32,33,34.

Machine learning-based diagnostic model construction

Multiple machine-learning algorithms were used for diagnostic classification, including Random Forest (RF), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Naive Bayes, Gradient Boosting Machine (GBM), XGBoost, glmBoost, Elastic Net (Enet), Ridge, Least Absolute Shrinkage and Selection Operator (LASSO), Stepwise Generalized Linear Model (Stepglm), and Partial Least Squares Regression Generalized Linear Model (plsRglm)35. A two-stage modeling framework was applied to generate 113 candidate model combinations. In the first stage, the initial algorithm was used for variable screening in the training cohort; in the second stage, the retained variables were used to fit a diagnostic classification model. Models with ≤5 selected variables were excluded from further comparison. The combined dermatomyositis datasets served as the training cohort, with labels defined as dermatomyositis versus healthy controls, whereas the independent validation cohort(s) were used for external performance evaluation. Internal resampling and tuning were algorithm-specific: glmnet-based models (LASSO, Ridge, and Elastic Net) used 10-fold cross-validation to select lambda.min; GBM used 10-fold internal cross-validation to determine the optimal number of trees; XGBoost used 5-fold resampling to select the final boosting round according to the minimum test log-loss; glmBoost used cvrisk-based internal cross-validation to determine the stopping iteration; and LDA was fitted under the caret cross-validation framework. For algorithms without explicit tuning steps in the present implementation, fixed or package-default settings were used. To reduce information leakage, feature selection, model fitting, and internal tuning were performed using the training cohort only, while the validation cohort(s) were used solely for independent prediction and AUC-based performance assessment. The caret package was used for machine-learning workflow management, with glmnet , randomForest , e1071 , gbm , xgboost , mboost , plsRglm , and MASS for individual algorithms. SHAP analysis was performed using the shapviz package . The random seed was set to 12345 before each model fitting. Models with fewer than 5 selected features were excluded. Model interpretability and gene-level contribution were further assessed using SHapley Additive exPlanations (SHAP), and the most informative genes were prioritized as candidate model-selected features for downstream biological interpretation.

Evaluation of the diagnostic performance

Receiver operating characteristic (ROC) curves were generated using the “pROC” R package to assess the diagnostic performance of candidate biomarkers. Expression levels and predictive accuracy of the candidate markers were validated in independent datasets (GSE5370, GSE11971, and GSE39454). Model performance was further assessed using confusion matrices. Differential expression of key module genes was visualized using volcano and box plots, and ROC curves were constructed to evaluate the diagnostic value of individual genes.

Gene set enrichment analysis

To explore coordinated functional changes associated with the candidate shared transcriptomic signals, Gene Set Enrichment Analysis (GSEA) was performed using clusterProfiler36,37. Gene expression data from dermatomyositis and control samples were ranked according to differential expression. Predefined gene sets corresponding to KEGG pathways (c2.cp.kegg.Hs.symbols.gmt) were used to evaluate whether genes within each pathway exhibited a coordinated trend of up- or down-regulation. Statistical significance was defined as P < 0.05.

Immune cell infiltration analysis

The normalized, log2-transformed, and batch-corrected dermatomyositis matrix was used for immune deconvolution. The CIBERSORT algorithm was applied to estimate the relative abundance of immune cell subtypes using the LM22 reference matrix38. Samples with deconvolution P < 0.05 were retained for downstream analysis. Differences in inferred immune-cell proportions between groups were visualized using box plots, and Spearman correlation analysis was conducted to assess associations between immune cell subsets and candidate shared genes.

Single-cell RNA sequencing analysis for cellular contextualization

Single-cell RNA-seq analyses were performed in R using Seurat. Harmony was used for batch correction, DoubletFinder for doublet detection, celda/decontX for ambient RNA estimation, Monocle for pseudotime trajectory analysis, CellChat for cell-cell communication analysis, AUCell for gene-set activity scoring, and GSVA for ssGSEA scoring. Raw count matrices were imported into Seurat objects with the parameters min.cells = 5 and min.features = 300. Quality-control metrics, including mitochondrial, ribosomal, and hemoglobin gene proportions, were calculated for each cell. Cells were retained only if they satisfied all of the following criteria: nFeature_RNA > 500, nCount_RNA < 5,000, percent_mito < 25, percent_ribo > 3, and percent_hb < 1. Genes detected in fewer than 3 cells were excluded. In addition, MALAT1 and mitochondrial genes were removed prior to downstream analysis. After initial filtering, doublets were identified in each sample using DoubletFinder, with PCs = 1:30 and pN = 0.25; expected doublet rates were set according to sample-specific cell numbers (<4,000 cells: 2.5%; 4,000–8,000 cells: 5%; >8,000 cells: 6.5%). Only singlets were retained. Ambient RNA contamination was further estimated using decontX, and cells with contamination scores < 0.2 were retained.

The filtered data were normalized using the LogNormalize method with a scale factor of 10,000, followed by identification of variable genes, data scaling, and principal component analysis. Batch effects across samples were corrected using Harmony with orig.ident as the batch variable. The first 15 Harmony dimensions were used for UMAP visualization and neighbor-graph construction. Clustering was performed using FindNeighbors and FindClusters, and the final clustering result was defined at a resolution of 0.05. Cell types were annotated manually according to canonical marker genes together with FindAllMarkers results39.

For downstream functional contextualization, candidate-gene activity was evaluated at the single-cell level, and the relevant immune-cell subset was subjected to trajectory and intercellular communication analyses. Pseudotime analysis was performed using Monocle with DDRTree-based dimensionality reduction followed by cell ordering. Cell-cell communication analysis was conducted using CellChat with the human ligand–receptor database, restricted to the Secreted Signaling category, and communications involving fewer than 10 cells were filtered out.

For each cell, candidate-gene activity was quantified using three complementary approaches: AUCell, ssGSEA, and AddModuleScore. AUCell scores were calculated based on gene-ranking matrices, and ssGSEA scores were generated using the GSVA framework. AddModuleScore was computed using the Seurat built-in function. The resulting AUCell, ssGSEA, and AddModuleScore values were then combined into a single score matrix. Each score type was first standardized by Z-score transformation and subsequently rescaled to a 0–1 range using min–max normalization. The final composite score (“Scoring”) for each cell was defined as the sum of the three normalized scores:

Scoring = normalized AUCell + normalized ssGSEA + normalized AddModuleScore.

For downstream subgroup analyses, the CD8⁺ T-cell subset was extracted, and cells were dichotomized according to the median Scoring value within this subset. Cells with Scoring values greater than the median were assigned to the High_Hub_genes group, whereas the remaining cells were assigned to the Low_Hub_genes group.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Identification of candidate shared genes between major depressive disorder and dermatomyositis

Following data merging, normalization, and batch correction of dermatomyositis-related GEO datasets (Figure 2A,B), a total of 570 differentially expressed genes were identified (Figure 2C,D), comprising 517 up-regulated and 53 down-regulated genes. This dermatomyositis differential expression analysis was used to characterize disease-related transcriptional changes.

In parallel, WGCNA was applied separately to the major depressive disorder and dermatomyositis datasets to identify disease-associated co-expression modules. The optimal soft thresholding power was 4 for the major depressive disorder dataset and 9 for the dermatomyositis dataset (Figure 3A,B). Five modules were detected in each dataset. Module–trait correlation analysis showed that the grey module was most strongly associated with major depressive disorder (Cor = −0.28, p = 1 × 10-4), while the brown (Cor = −0.73, p = 5 × 10-18) and red (Cor = −0.89, p = 1 × 10-35) modules were most strongly associated with dermatomyositis (Figure 3C,D). The overlap between the selected key module genes of the two datasets was defined as the candidate shared gene set, yielding 33 shared genes (Figure 3E). Notably, the module–trait association in major depressive disorder was markedly weaker than that observed in dermatomyositis. Specifically, the major depressive disorder-associated grey module showed only a modest correlation with disease status, whereas the dermatomyositis-associated brown and red modules exhibited substantially stronger correlations. Therefore, the overlapping genes identified from these modules should be interpreted as candidate shared transcriptomic signals.

Gene expression analysis; data visualizations include heatmaps, bar charts, and volcano plot.
Figure 2: Data processing and cleaning for dermatomyositis. (A) Box plot before batch effect removal and data merging. (B) Box plot after batch effect removal and data merging. (C) Clustered heatmap after data cleaning. (D) Volcano plot after data cleaning. Please click here to view a larger version of this figure.

Gene co-expression network analysis; depression and dermatomyositis; scale-free topology, module-trait heatmap, Venn diagram.
Figure 3: Results of Weighted Gene Co-expression Network Analysis (WGCNA). (A) Determination of Soft Threshold power for major depressive disorder. (B) Determination of Soft Threshold power for dermatomyositis. (C) Module correlation heatmap of major depressive disorder. (D) Module correlation heatmap of dermatomyositis. (E) Venn diagram showing the overlap between genes from the key WGCNA module(s) of dermatomyositis and the key WGCNA module of major depressive disorder, yielding 33 shared genes. Please click here to view a larger version of this figure.

Functional enrichment and geneMANIA-Based network analysis of shared genes

GO and KEGG enrichment analyses were performed on the 33 shared genes to characterize their biological functions and pathway associations. A total of 297 significantly enriched GO terms and 7 KEGG pathways were identified. GO analysis indicated that these genes were mainly enriched in immune defense- and cytotoxicity-related biological processes. They were localized predominantly to granules and vesicles that store immune effector molecules, and were enriched for molecular functions such as serine hydrolase activity and calcium ion transport (Figure 4A,B). KEGG pathway analysis highlighted significant enrichment in the PPAR signaling pathway, antigen processing and presentation, and the IL-17 signaling pathway (Figure 4C). To further explore the functional interaction context of these shared genes, the 33 observed shared genes were used as seed/query genes in GeneMANIA. GeneMANIA then automatically added functionally related partner genes to generate an expanded network containing 49 total nodes. Therefore, the 49-node network shown in Figure 4D does not represent 49 directly observed cross-disease shared genes, but rather an expanded functional association network composed of the 33 shared genes plus GeneMANIA-added related partners. Within this expanded network, ELANE, PPBP, and CTSG emerged as highly connected candidate nodes (Figure 4D).

Biological data analysis with bar chart, circular plot, bubble chart, and network diagram for gene function.
Figure 4: Functional enrichment and GeneMANIA-based functional association network analysis of the shared genes. (A) Bar plot of Gene Ontology (GO) enrichment results. (B) Bubble plot of GO enrichment results. (C) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment results. (D) Functional association network generated using GeneMANIA from the 33 shared genes and exported to a network visualization platform for visualization and topological analysis. Please click here to view a larger version of this figure.

Machine-learning based feature prioritization and model interpretation

Machine-learning analysis was performed on the 33 shared genes. Among the 113 candidate model combinations evaluated in the present benchmarking framework, the RF+Enet model showed the highest relative performance and was therefore selected for subsequent feature prioritization and model interpretation (Figure 5A). This model retained 8 machine-learning-selected genes (including KIF4A and OLR1, among others). ROC curves and confusion matrices were used to summarize classification performance in the training and validation cohorts (Figure 5B,C). Detailed information on all evaluated model genes is provided in Supplementary File S1.

Machine learning model comparison; confusion matrices; ROC curves; algorithm performance; AUC values.
Figure 5: Machine-learning-based classification performance of candidate model combinations and the selected model. (A) Heatmap summarizing the area under the curve (AUC) values of 113 candidate machine-learning model combinations across the training cohort and independent validation cohorts, together with the mean AUC used for model comparison. (B) Confusion matrices of the selected Random Forest plus Elastic Net [RF+Enet(alpha = 0.7)] model in the training cohort and independent validation cohorts. (C) Receiver operating characteristic (ROC) curves of the selected RF+Enet(alpha = 0.7) model in the training cohort and independent validation cohorts. Please click here to view a larger version of this figure.

These 8 genes were subsequently carried forward as candidate model-selected genes for downstream interpretation (Figure 6A). To improve transparency of the selected classifier, SHAP (SHapley Additive exPlanations) analysis was used as a post hoc interpretability approach to quantify the relative contribution of retained features to model predictions. Three supervised machine-learning algorithms, including Random Forest (RF), XGBoost (XGB), and Gradient Boosting Machine (GBM), were compared for interpretability analysis (Figure 6B). Among them, RF showed relatively higher AUC, consistent with the benchmarking results, and was therefore used for SHAP visualization. The SHAP bar plot (Figure 6C) indicated that KIR2DL4 had the largest mean absolute SHAP value within the RF model, whereas AZU1 and SCG5 also showed relatively higher contributions, and KIF4A showed a comparatively smaller contribution. The beeswarm and scatter plots (Figure 6D,E) further illustrated how feature values were associated with the direction and magnitude of the model output at the sample level. These analyses were used to explain model behavior and support feature prioritization within the classifier.

ROC curves and SHAP value graphs; data analysis in predictive modeling of gene expression.
Figure 6: Receiver operating characteristic (ROC) and SHapley Additive exPlanations (SHAP)-based model interpretability analysis. (A) ROC curves of the candidate model-selected genes. (B) Comparison of three supervised machine-learning algorithms, including Random Forest (RF), XGBoost (XGB), and gradient boosting machine (GBM), used for model interpretability analysis. (C) SHAP bar plot and summary plot showing the relative contribution of each feature to the RF model. (D) SHAP beeswarm plot showing the distribution of feature contributions across samples. (E) Scatter plots showing the relationship between gene expression levels and SHAP values. Please click here to view a larger version of this figure.

Taken together, these analyses prioritized several candidate model-selected genes within the present dermatomyositis-related classification framework. Because the machine-learning model was trained and evaluated only in dermatomyositis-versus-control cohorts, these findings should not be interpreted as direct validation of biomarkers for comorbidity between major depressive disorder and dermatomyositis, but rather as supportive evidence for feature prioritization in a dermatomyositis-centered classification setting.

GSEA enrichment and immune infiltration

Among the positively enriched pathways (enriched in the high-expression group), immune- and inflammation-related pathways were consistently prominent across multiple genes. The cytokine–cytokine receptor interaction pathway was significantly enriched for KIR2DL4 (NES = 1.41, adjusted P = 1.93 × 10⁻9), KIF4A (NES = 1.29, adjusted P = 3.12 × 10⁻5), OLR1 (NES = 1.26, adjusted P = 1.34 × 10⁻5), and SCG5 (NES = 1.16, adjusted P = 5.09 × 10⁻3). The natural killer cell-mediated cytotoxicity pathway was enriched for KIR2DL4 (NES = 1.42, adjusted P = 1.04 × 10⁻5), LRRC37BP1 (NES = 1.22, adjusted P = 7.37 × 10⁻4), and KIF4A (NES = 1.24, adjusted P = 1.55 × 10⁻2). The antigen processing and presentation pathway was enriched for KIR2DL4 (NES = 1.48, adjusted P = 5.42 × 10⁻5) and KIF4A (NES = 1.30, adjusted P = 9.99 × 10⁻3). The JAK-STAT signaling pathway was enriched for KIR2DL4 (NES = 1.34, adjusted P = 2.39 × 10⁻4), OLR1 (NES = 1.22, adjusted P = 3.83 × 10⁻3), and KIF4A (NES = 1.25, adjusted P = 8.59 × 10⁻3). The Toll-like receptor signaling pathway was enriched for KIR2DL4 (NES = 1.35, adjusted P = 1.45 × 10⁻3), OLR1 (NES = 1.22, adjusted P = 1.50 × 10⁻2), and KIF4A (NES = 1.24, adjusted P = 3.00 × 10⁻2). The chemokine signaling pathway was enriched for KIR2DL4 (NES = 1.35, adjusted P = 3.17 × 10⁻5) and KIF4A (NES = 1.19, adjusted P = 1.50 × 10⁻2). The complement and coagulation cascades pathway was enriched for KIR2DL4 (NES = 1.40, adjusted P = 1.39 × 10⁻3) and OLR1 (NES = 1.24, adjusted P = 3.10 × 10⁻2). The NOD-like receptor signaling pathway was enriched for KIR2DL4 (NES = 1.40, adjusted P = 4.23 × 10⁻3) and KIF4A (NES = 1.35, adjusted P = 6.27 × 10⁻3).

In contrast, KRT23 showed predominantly negative enrichment for immune-related pathways, including the Toll-like receptor signaling pathway (NES = −2.12, adjusted P = 7.31 × 10⁻8), RIG-I-like receptor signaling pathway (NES = -2.03, adjusted P = 2.81 × 10⁻6), and antigen processing and presentation (NES = -2.02, adjusted P = 2.13 × 10⁻6), indicating an inverse association with immune activation.

Among the negatively enriched metabolic pathways, valine, leucine, and isoleucine degradation was enriched for KIR2DL4 (NES = -2.79, adjusted P = 6.33 × 10⁻9) and SCG5 (NES = -2.35, adjusted P = 3.37 × 10⁻6), and fatty acid metabolism was enriched for KIR2DL4 (NES = -2.42, adjusted P = 6.89 × 10⁻6) and SCG5 (NES = -2.12, adjusted P = 4.17 × 10⁻5).

Overall, these GSEA results indicated that the majority of the candidate model-selected genes were associated with coordinated upregulation of immune and inflammatory signaling pathways and concurrent downregulation of metabolic pathways in dermatomyositis, consistent with the functional enrichment and immune infiltration findings described above (Figure 7).

Gene expression enrichment analysis graphs showing multiple gene profiles, comparing expression levels.
Figure 7: Results of GSEA for key genes. Please click here to view a larger version of this figure.

To further examine the immune context of the identified candidate shared genes, a CIBERSORT-based immune deconvolution analysis was performed. The inferred immune-cell composition of each sample is shown in Figure 8A. Compared with the control group, significant differences were observed in the relative proportions of regulatory T cells (Tregs), resting mast cells, resting dendritic cells, M1 macrophages, and M2 macrophages (Figure 8B). Correlation analysis among immune-cell subsets showed a negative correlation between memory B cells and naive B cells (r = -0.64), whereas memory B cells and plasma cells showed a positive correlation (r = 0.37) (Figure 8C). In addition, LRRC37E, SCG5, AZU1, and KRT23 were positively correlated with neutrophils, eosinophils, and resting mast cells, whereas KIF4A and OLR1 were negatively correlated with naive B cells, memory B cells, and plasma cells (Figure 8D). These findings indicate associations between selected genes and inferred immune-cell distribution patterns in the current deconvolution analysis.

Cell composition, statistical analysis, heatmap, and network diagram of immune cell interactions.
Figure 8: Analysis of immune cell infiltration for key genes. (A) Overlapping histogram showing the proportion of immune cells in each sample. (B) Bar chart comparing 22 immune cells between the experimental group and control group. (C) Heatmap showing the correlation between candidate shared genes of the model and infiltration of 22 immune cells. (D) Positive and negative regulatory relationships between genes and immune cells. *p < 0.05; **p < 0.01; ****p < 0.0001; ns, not significant. Please click here to view a larger version of this figure.

Results of Single-Cell Analysis

To further localize the candidate shared genes at the cellular level, a dermatomyositis-related single-cell RNA-seq dataset (GSE190510) was analyzed as a cellular contextualization step rather than a direct validation dataset for comorbidity. UMAP clustering identified seven distinct cell clusters (Figure 9A), which were subsequently annotated into major immune-cell subsets (Figure 9B). A heatmap of cluster-associated marker genes further illustrated cell-type-related expression patterns and hierarchical relationships across clusters (Figure 9C). The expression distribution of those candidate genes detectable in the single-cell dataset (KIF4A, KIR2DL4, KRT23, AZU1, and SCG5) across annotated cell populations is shown in Figure 9D; the remaining candidates (OLR1, KIR3DS1, and LRRC37E) did not show detectable expression in this dataset. The AUCell, ssGSEA, AddModuleScore, and integrated composite scoring results across major cell subsets are presented in Figure 9E. Because the bulk-level analyses highlighted immune defense- and cytotoxicity-related programs, and because KIR2DL4 and AZU1 showed detectable expression within the CD8⁺ T-cell compartment, this subset was selected for downstream exploratory analyses (Figure 9F). Pseudotime analysis showed the distribution of high-score and low-score CD8⁺ T-cell states along the inferred differentiation trajectory, together with the overall pseudotime progression pattern (Figure 9G). Cell-cell communication analysis revealed that both groups maintained extensive interactions with monocytes/macrophages, platelets, and other immune-cell subsets (Figure 9H). Outgoing and incoming signaling-pattern analysis showed distinct communication preferences between the two composite-score-defined groups (Figure 9I). Pathway-level communication analysis further suggested differential signaling activities involving pathways such as ANNEXIN, IL16, MIF, and ncWNT among the indicated sender-receiver cell groups (Figure 9J). At the ligand-receptor level, interactions such as LGALS9-CD44, MIF-(CD74+CXCR4), and MIF-(CD74+CD44) were identified among CD8⁺ T-cell score groups and other immune-cell subsets (Figure 9K).

These findings should be interpreted as hypothesis-generating cellular context within dermatomyositis. They do not constitute direct evidence for mechanisms of comorbidity between major depressive disorder and dermatomyositis, but rather indicate potential immune-cell states and signaling interactions in which candidate shared genes may participate in a dermatomyositis-related immune background.

Single-cell RNA sequencing diagram showing gene expression clusters; includes t-SNE plots and heatmaps.
Figure 9: Single-cell RNA-seq analysis and downstream functional contextualization of candidate shared genes. (A) Uniform manifold approximation and projection (UMAP) plot showing unsupervised clustering of cells. (B) UMAP plot showing annotated immune-cell subsets. (C) Heatmap of cluster marker genes across annotated cell subsets. (D) UMAP feature plots showing the expression distribution of representative composite-score across clusters. (E) Dot plot showing AUCell, ssGSEA, AddModuleScore, and integrated composite scoring across annotated cell subsets. (F) Heatmap showing the expression patterns of KIR2DL4 and AZU1 in the CD8⁺ T-cell subset. (G) Pseudotime trajectory plots showing candidate-gene score grouping and pseudotime progression in the CD8⁺ T-cell subset. (H) Circle plot showing inferred intercellular communication among candidate-gene score groups and immune-cell subsets. (I) Heatmaps showing outgoing and incoming signaling patterns. (J) Bubble plot showing selected signaling pathways among the indicated sender–receiver cell groups. (K) Bubble plot showing selected ligand–receptor interactions among the indicated cell groups. Please click here to view a larger version of this figure.

DATA AVAILABILITY :

All datasets analyzed in this study were obtained from the Gene Expression Omnibus (GEO) database under accession numbers GSE5370, GSE1551, GSE46239, GSE11971, GSE39454, GSE128470, and GSE190510. The bulk transcriptomic datasets used for major depressive disorder and dermatomyositis analyses, as well as the dermatomyositis-related single-cell dataset, are publicly available through GEO. The analysis code, including the machine-learning scripts used in this study, has been made available through a public code repository [https://github.com/tengfeitcm/machine-learning.git]. Selected processed results generated during the present study are provided as Supplementary File 1.

Supplementary File 1: Details of all 113 evaluated machine-learning model combinations, including first- and second-stage algorithms, selected genes, AUCs in the training and independent validation cohorts, and mean AUCs for model comparison and ranking.Please click here to download this file.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Dermatomyositis is a chronic systemic autoimmune disease with prominent skin and muscle involvement, and accumulating clinical observations suggest that patients with dermatomyositis may also experience substantial psychiatric burden, including symptoms consistent with major depressive disorder. In this context, the present study applied an integrative bioinformatic reanalysis framework to identify candidate shared transcriptomic signals between major depressive disorder and dermatomyositis, with additional immune-infiltration analysis and single-cell contextualization. WGCNA, functional enrichment analysis, GeneMANIA-based network analysis, and machine-learning-based feature prioritization were integrated to screen candidate cross-disease genes and pathways. Within this framework, eight model-selected genes, including KIF4A, OLR1, KIR2DL4, KRT23, KIR3DS1, AZU1, SCG5, and LRRC37E, were highlighted for downstream interpretation. Immune-infiltration analysis was used to examine the immune-related context in which these genes were observed, and single-cell analysis in a dermatomyositis-related dataset provided additional cellular contextual information, particularly for KIR2DL4 and AZU1 in CD8⁺ T cells.

It should be noted that the cross-disease signal identified in this study was not symmetric between the two conditions. The module associated with major depressive disorder showed only a modest correlation with disease status, whereas the key dermatomyositis-related modules exhibited substantially stronger correlations. Therefore, the overlapping genes identified here should be regarded as candidate shared transcriptomic signals or candidate cross-disease associations, rather than definitive evidence of a common pathogenic mechanism. These findings are more appropriately interpreted as hypothesis-generating and require further validation in independent and ideally true comorbid cohorts.

Relative to controls, inferred differences in immune-cell infiltration were observed, particularly involving regulatory T cells (Tregs), resting mast cells, resting dendritic cells, and both M1 and M2 macrophage populations. These findings suggest that immune perturbation may represent one of the shared association patterns captured in the present analysis. Previous studies have reported that altered Treg homeostasis, together with increased helper and cytotoxic immune activation, is associated with enhanced inflammatory responses and neuroimmune disturbance 40. Beyond these immune subsets, the inverse association between naive and memory B cells, together with the positive correlation between memory B cells and plasma cells, may reflect changes in humoral immune status. B-cell dysregulation is also a recognized feature of dermatomyositis, as reflected by increased B-cell numbers and elevated BAFF levels 41. In parallel, psychiatric studies have suggested that altered B-cell-related immune responses may also be relevant to major depressive disorder42,43. Taken together, these observations support an immune-related associative framework, although they do not establish direct causal links between the identified genes and immune-cell changes.

Among the eight prioritized genes, the degree of biological interpretability is not uniform. OLR1 and AZU1 have relatively clearer links to innate inflammatory activity, whereas several other genes should be interpreted more cautiously in the present cross-disease setting44,45,46. KIF4A is classically known as a mitotic kinesin involved in chromosome segregation; however, previous work in idiopathic inflammatory myopathies has also associated KIF4 with activated peripheral blood lymphocytes and immune-cell activation47. In the present study, the KIF4A signal may therefore reflect activation or proliferative states of circulating immune cells rather than a direct pathogenic effector in muscle or mood pathology. KIR2DL4 and KIR3DS1 encode killer cell immunoglobulin-like receptors involved in natural killer cell-related cytokine and activation signaling48,49,50. Given that peripheral immune-cell alterations have been reported in major depressive disorder, the prioritization of these receptors in our analysis may reflect immune-state variation in cytotoxic lymphocyte- or natural killer-cell compartments under an inflammatory background, rather than a proven neurobiological mechanism linking major depressive disorder and dermatomyositis51. By contrast, LRRC37E remains poorly characterized, and similar uncertainty applies to KRT23 and SCG5, whose roles in this cross-disease immune-transcriptomic context remain to be clarified.

From a methodological perspective, the present workflow may be particularly useful as a stepwise prioritization strategy for cross-disease transcriptomic studies based on public datasets. Among the analytical stages, careful preprocessing of multi-cohort data, stable candidate-gene prioritization, and independent validation design are especially important for ensuring robust output quality. In particular, probe annotation, normalization, and batch adjustment provide the basis for reliable downstream co-expression and machine-learning analyses, while WGCNA-based module selection helps reduce the search space at the network level. In addition, restricting feature selection and model tuning to the training cohort, followed by evaluation in independent datasets, improves the interpretability and generalizability of the resulting candidate genes. Compared with simpler approaches such as direct DEG overlap or single-model screening, this framework offers a more integrated strategy by combining co-expression structure, functional context, model-based feature reduction, and single-cell contextualization. Therefore, its main methodological value lies in providing a transferable and structured approach for candidate prioritization when direct comorbid cohorts are not available. Within this framework, the single-cell analysis should be understood as a contextual extension of the bulk transcriptomic workflow, providing additional cell-type-level support for the prioritized candidate signals.

The single-cell analysis adds cellular resolution to the candidate shared genes, and its interpretive scope should remain cautious. Because the single-cell dataset used in this study was derived from a dermatomyositis-related context only, the observation that KIR2DL4 and AZU1 were detectable in CD8⁺ T cells, together with the pseudotime and cell-cell communication patterns, should be interpreted as indicating potential immune-cell contexts in which candidate shared genes may operate within dermatomyositis. CD8⁺ T cells were examined in greater detail because KIR2DL4 and AZU1 showed detectable expression within this compartment, and because the bulk-level analyses were enriched in immune defense- and cytotoxicity-related programs; this focus should therefore be interpreted as a targeted cellular contextualization strategy rather than evidence that the bulk cross-disease signal is uniquely or exhaustively explained by CD8⁺ T cells. Future studies should evaluate these candidate signals in clinically stratified and, ideally, true comorbid cohorts, with deeper cell-type-specific differential-expression and pathway-activity analyses to determine whether the observed CD8⁺ T-cell context is reproducible and biologically specific.

Several limitations should be acknowledged. First, currently available public datasets do not provide a true cohort of individuals simultaneously diagnosed with major depressive disorder and dermatomyositis; the present study therefore identifies candidate shared signals across separate disease datasets rather than directly analyzing a clinically confirmed comorbid cohort, and the single-cell component is accordingly restricted to a dermatomyositis-related context. Second, all analyses were based exclusively on public secondary datasets, and cross-platform heterogeneity across GEO series may introduce residual bias despite normalization and batch-correction procedures. Third, both major depressive disorder and dermatomyositis are clinically and molecularly heterogeneous conditions, so the identified candidates may preferentially reflect signals from specific inflammatory subgroups rather than a universal cross-disease signature; future work should therefore evaluate these candidates in clinically and molecularly stratified cohorts52.

From a methodological perspective, the WGCNA-based gene selection relied on the overlap of disease-associated modules, and the major depressive disorder-related module showed only a modest correlation with disease status compared with the much stronger dermatomyositis-related modules. Module preservation analysis, sensitivity analysis, and alternative module-prioritization strategies were not performed; the final 33-gene set should therefore be interpreted as a candidate signal set under the current analytic framework rather than a definitively robust shared signature. Genes with limited functional annotation, particularly LRRC37E, warrant additional caution, because model-based prioritization does not by itself establish biological relevance. Moreover, the study was observational and did not incorporate a formal causal-inference framework, and the immune-infiltration analysis was based on CIBERSORT-derived computational estimates rather than direct cellular measurements; accordingly, the identified genes, immune-cell associations, and signaling patterns should be interpreted as correlative rather than causal, and future intervention-oriented studies may benefit from more rigorous designs such as target trial emulation when appropriately structured longitudinal clinical data become available53.

Overall, the present study provides a hypothesis-generating overview of candidate shared transcriptomic signals and immune-related contextual patterns between major depressive disorder and dermatomyositis. These findings may help prioritize genes, pathways, and cell subsets for follow-up investigation, but independent datasets, orthogonal experimental validation (e.g., quantitative PCR or protein-level confirmation), and ideally true comorbid cohorts will be needed to determine their biological and clinical relevance.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors report no conflicts of interest in this work.

Acknowledgements

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors gratefully acknowledge the financial support from the Beijing Municipal Health Commission's Excellence Clinical Research Program (Grant number: BRWEP2024072120118), “Cultivation Program” of Beijing Municipal Hospital Management Center (Grant number: PZ2025030), Youth Project of China-Japan Friendship Hospital (No.2020-1-QN-8).

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
AddModuleScoreSeurat functionversion 4.4.0Module-score calculation within Seurat
RRID: NA
AUCellBioconductorversion 1.32.0Single-cell gene-set activity scoring
RRID: SCR_021327
caretCRANversion 7.0.1Machine-learning workflow support
RRID: SCR_022524
celda / decontXBioconductorversion 1.24.0Ambient RNA contamination estimation
RRID: NA
CellChatGitHub / CellChatversion 2.2.0Cell-cell communication analysis
RRID: SCR_021946
CIBERSORT / LM22 signature matrixCIBERSORTLM22Immune-cell infiltration estimation
RRID: NA
clusterProfilerBioconductorversion 4.12.6Functional enrichment analysis
RRID: SCR_016884
CytoscapeCytoscape Consortiumversion 3.10a network visualization platform for visualization and analysis
RRID: SCR_003032
DoubletFinderGitHub / McGinnis Labversion 2.0.4Doublet detection in single-cell datasets
RRID: NA
e1071CRANversion 1.7.16Support Vector Machine and Naive Bayes modeling
RRID: NA
gbmCRANversion 2.2.2Gradient Boosting Machine modeling
RRID: NA
Gene Expression Omnibus (GEO) databaseNational Center for Biotechnology Information (NCBI)GSE98793Major depressive disorder bulk transcriptome dataset
RRID: NA
Gene Expression Omnibus (GEO) databaseNCBIGSE1551Dermatomyositis training dataset; skeletal muscle biopsy samples
RRID: NA
Gene Expression Omnibus (GEO) databaseNCBIGSE46239Dermatomyositis training dataset; skin biopsy samples
RRID: NA
Gene Expression Omnibus (GEO) databaseNCBIGSE128470Dermatomyositis training dataset; dermatomyositis samples extracted from inflammatory myopathy cohort
RRID: NA
Gene Expression Omnibus (GEO) databaseNCBIGSE5370Independent dermatomyositis validation dataset; untreated adult muscle samples
RRID: NA
Gene Expression Omnibus (GEO) databaseNCBIGSE11971Independent dermatomyositis validation dataset
RRID: NA
Gene Expression Omnibus (GEO) databaseNCBIGSE39454Independent dermatomyositis validation dataset; dermatomyositis samples extracted from inflammatory myopathy cohort
RRID: NA
Gene Expression Omnibus (GEO) databaseNCBIGSE190510Dermatomyositis-related single-cell RNA-seq dataset
RRID: NA
GeneMANIAUniversity of Toronto / GeneMANIAweb server version accessed in this studyFunctional association network construction
RRID: RRID:SCR_005709
glmnetCRANversion 4.1.8LASSO, Ridge, and Elastic Net modeling
RRID: NA
GSVABioconductorversion 2.0.7ssGSEA scoring
RRID: NA
HarmonyCRANversion 1.2.4Batch correction for single-cell data integration
RRID: NA
limmaBioconductorversion 3.60.6Differential expression analysis, probe summarization, and normalization utilities
RRID: SCR_010943
MASSCRANversion 7.3.61Linear discriminant analysis
RRID: NA
mboostCRANversion 2.9.11glmBoost modeling
RRID: NA
MonocleBioconductorversion 2.38.0Pseudotime trajectory analysis
RRID: SCR_016339
org.Hs.eg.dbBioconductorversion 3.19.1Human gene annotation database
RRID: NA
plsRglmCRANversion 1.5.1Partial least squares generalized linear modeling
RRID: NA
pROCCRANversion 1.18.5ROC curve analysis
RRID: SCR_024286
R statistical softwareR Foundation for Statistical Computingversion 4.4.2Main statistical computing environment
RRID: SCR_001905
randomForestCRANversion 4.7.1.2Random Forest modeling
RRID: SCR_015718
RStudioPosit Software, PBCversion 2024.4.1.748Integrated development environment for R
RRID: SCR_000432
SeuratCRAN / Satija Labversion 4.4.0Single-cell RNA-seq preprocessing, clustering, and visualization
RRID: SCR_016341
shapvizCRANversion 0.10.2SHAP-based model interpretability analysis
RRID: NA
svaBioconductorversion 3.52.0Batch-effect correction using ComBat
RRID: NA
WGCNACRANversion 1.73Weighted gene co-expression network analysis
RRID: SCR_003302
xgboostCRANversion 1.7.8.1Extreme gradient boosting
RRID: NA

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

MedicinedepressionDermatomyositisMachine LearningSingle Cell AnalysisBioinformatics analysis

Related Articles