Research Article

Assessment of Malnutrition in Crohn's Disease Patients: A Novel Risk Prediction Model with Dynamic Optimization Potential and Effectiveness Validation

DOI:

10.3791/70247

June 22nd, 2026

In This Article

Summary

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study constructed and validated a novel malnutrition risk prediction model for patients with Crohn’s disease using meta-analysis, multivariable logistic regression, and machine learning. The model demonstrated good predictive performance and clinical utility for guiding personalized nutritional interventions.

Abstract

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

This study aimed to construct and validate a malnutrition risk prediction model combining multivariable logistic regression and machine learning for patients with Crohn’s disease (CD), with the goal of improving the precision of malnutrition risk identification through integration of inflammatory markers and disease characteristics. PubMed, Web of Science, Cochrane Library, Embase, and China National Knowledge Infrastructure (CNKI) were systematically searched to identify risk factors associated with malnutrition in patients with CD. High-quality studies using the Global Leadership Initiative on Malnutrition (GLIM) 2019 criteria, European Society for Clinical Nutrition and Metabolism (ESPEN) 2015 criteria, or Malnutrition Universal Screening Tool (MUST) criteria were included in the meta-analysis, while the study cohort applied the ESPEN 2015 criteria exclusively to ensure consistent outcome definition. The prediction model was developed using data from 800 patients with CD from the Inflammatory Bowel Disease Cohort Database (IBDCD) and validated using bootstrap resampling and an independent non-overlapping hold-out subset of 280 patients from the same database. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), Hosmer–Lemeshow test, and Brier score. No significant baseline differences were observed between the training and validation cohorts. The model achieved an AUC of 0.987 in the training cohort and 0.967 in the validation cohort, demonstrating good discrimination and calibration. Decision curve analysis further demonstrated clinically meaningful net benefits across relevant threshold probabilities. This model effectively identifies malnutrition risk in patients with CD and may support personalized nutritional intervention, optimize clinical decision-making, and improve patient outcomes and quality of life. Future multicenter studies are required to further validate the model's generalizability and to evaluate the integration of socioeconomic factors for further optimization.

Introduction

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Crohn’s disease (CD) is a chronic, progressive inflammatory bowel disease (IBD) with a multifactorial etiology, and its global burden has increased substantially in recent years1. Epidemiological studies indicate that although CD remains more prevalent in Western countries, the incidence and prevalence of CD in Asian populations, particularly in China, have increased markedly over the past three decades1.

Malnutrition is one of the most common and clinically significant complications in patients with CD.^2 Approximately one-third to one-half of patients with CD experience moderate to severe malnutrition2. Malnutrition is associated with prolonged hospitalization, increased surgical requirements, higher rates of postoperative complications, accelerated intestinal fibrosis, increased glucocorticoid dependency, reduced therapeutic response, and decreased health-related quality of life (HRQoL)3,4. In addition, malnutrition contributes to immune dysfunction, thereby exacerbating intestinal inflammation and creating a vicious cycle that imposes substantial socioeconomic and healthcare burdens3,4.

Several nutritional assessment tools are currently used in clinical practice; however, their applicability to patients with CD remains limited5. The Patient-Generated Subjective Global Assessment (PG-SGA), although recommended for nutritional screening in patients with chronic disease, relies heavily on subjective clinical judgment and demonstrates limited consistency among clinicians when applied to CD populations6. In addition, PG-SGA lacks specificity for intestinal malabsorption. The Nutritional Risk Screening 2002 (NRS 2002), which incorporates body mass index (BMI)-based thresholds, has shown relatively low sensitivity in Asian populations and may contribute to underdiagnosis7. Similarly, the Mini Nutritional Assessment-Short Form (MNA-SF), originally developed for elderly populations, demonstrates relatively high false-positive rates in younger adults with CD.

Importantly, most currently available nutritional assessment tools are based on static scoring systems and do not adequately capture the dynamic fluctuations in nutritional status associated with disease activity in CD8. As CD is characterized by alternating periods of remission and relapse, nutritional status may deteriorate or improve over time in parallel with inflammatory activity9. Conventional assessment approaches lack the ability to perform dynamic risk stratification and may therefore fail to identify early subclinical deterioration in nutritional status. Furthermore, many existing tools do not incorporate disease-specific biomarkers or inflammatory indicators, such as C-reactive protein (CRP), serum albumin, and fecal calprotectin, thereby limiting their predictive accuracy in complex clinical settings10.

Recent advances in artificial intelligence and machine learning (ML) have introduced new opportunities for improving nutritional risk assessment in chronic inflammatory diseases11. Compared with traditional regression-based approaches, ML algorithms can identify nonlinear associations and complex interactions among variables, making them particularly suitable for heterogeneous clinical datasets with multidimensional predictors. Emerging evidence suggests that ML-based nutritional risk prediction models may outperform conventional statistical approaches in predicting nutritional deterioration in patients with IBD12. In addition to improving predictive performance, ML approaches may reduce multicollinearity through regularization techniques and optimize predictor selection to improve model interpretability and generalizability13,14.

The increasing availability of large-scale public databases has further strengthened nutrition-related research in CD. International databases, including the IBD Biobank, National Health and Nutrition Examination Survey (NHANES), and UK Biobank, contain extensive longitudinal clinical data encompassing nutritional indices, inflammatory biomarkers, genomic information, and lifestyle characteristics15,16,17. Integration of these multidimensional datasets enables cross-population validation and improves the external applicability of predictive models. Meta-analyses have demonstrated that prediction models developed using multicenter integrated datasets exhibit greater stability and reproducibility than models derived from single-center cohorts.

This study hypothesized that integrating multidimensional predictive variables, including inflammatory markers, disease characteristics, and demographic and socioeconomic factors, combined with multivariable logistic regression and machine learning algorithms would improve the accuracy and efficiency of malnutrition risk prediction in patients with CD. The resulting prediction model may facilitate personalized nutritional intervention, optimize clinical decision-making, and improve patient outcomes and quality of life.

Protocol

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

All patients whose data were entered into the Inflammatory Bowel Disease Cohort Database (IBDCD) provided written informed consent for the use of de-identified clinical data for research purposes at the time of database enrollment. The same ethical approval and consent framework applied to all data extracted from the database, including both the development and validation cohorts. No identifiable patient information was used in this study, and all data were de-identified and encrypted for storage. As this study involved retrospective analysis of de-identified data from an established database, the ethics committee waived the requirement for additional patient consent for this specific analysis. The research tools used in the protocol are listed in the Table of Materials.

1. Screening of influencing factors in the meta-analysis

  1. Literature search strategy
    A systematic search strategy was conducted across PubMed, Web of Science, Cochrane Library, Embase, and China National Knowledge Infrastructure (CNKI). The English search strategy was defined as follows:

    ("Crohn Disease"[Mesh] OR "CD"[tiab] OR "Crohn*"[tiab]) AND ("Malnutrition"[Mesh] OR "Nutritional Status"[Mesh] OR "Nutritional Risk"[tiab]) AND ("Risk Factors"[Mesh] OR "Predict*"[tiab])

    The search period covered database inception through March 31, 2025. In addition, manual screening of reference lists from included studies was performed to identify potentially missed articles. This strategy enabled a comprehensive literature review of malnutrition-related risk and predictive factors in patients with Crohn’s disease (CD), with clearly defined databases, search terms, time frame, and supplementary manual retrieval procedures to ensure completeness of the literature collection.
  2. Inclusion and exclusion criteria
    Inclusion criteria required that study populations consist of adult patients (aged ≥18 years) diagnosed with Crohn’s disease according to the 2018 Consensus on the Diagnosis and Treatment of Inflammatory Bowel Disease. Given the clinical significance of malnutrition in CD, emphasis was placed on studies defining malnutrition using the European Society for Clinical Nutrition and Metabolism (ESPEN) 2015 criteria, Global Leadership Initiative on Malnutrition (GLIM) 2019 criteria, or Malnutrition Universal Screening Tool (MUST) score ≥2. A clear distinction was maintained between literature-defined diagnostic criteria and the exclusive application of the ESPEN 2015 criteria for outcome definition within the study cohort.

    Eligible study designs included cohort, case-control, cross-sectional, and mixed-methods studies to ensure comprehensive evaluation of malnutrition risk factors from multiple perspectives. Exclusion criteria included animal studies, review articles, and conference abstracts because of insufficient data support or lack of peer review. Studies lacking essential statistical information, including odds ratios (ORs) and corresponding 95% confidence intervals (CIs), were also excluded.

2. Literature screening and data extraction

Literature screening and data extraction were independently conducted by two researchers to ensure completeness and accuracy. The initial stage involved screening titles and abstracts to exclude studies that clearly failed to meet the inclusion criteria, thereby narrowing the dataset to potentially relevant, high-quality studies.

Subsequently, full-text screening was performed for studies considered potentially eligible. This stage involved detailed evaluation of study design, methodology, results, and adherence to predefined inclusion criteria to ensure that only studies meeting all requirements were included in the final analysis.

Data extraction was performed using standardized forms to systematically record key information, including author names, publication year, country, sample size, Montreal classification subtype distribution, average disease duration, definitions of malnutrition, assessment tools, and extracted ORs with corresponding 95% CIs for influencing factors associated with malnutrition risk.

Any discrepancies or uncertainties arising during screening and data extraction were resolved through internal discussion. When consensus could not be reached, a third-party expert was consulted to arbitrate, minimizing subjective bias and ensuring objective decision-making. This rigorous process enhanced methodological transparency and provided a robust foundation for the identification of key factors associated with malnutrition risk in CD.

3. Quality assessment

Rigorous quality assessment criteria were applied to ensure the scientific validity and reliability of the included studies. Cohort and case-control studies were evaluated using the Newcastle–Ottawa Scale (NOS), which assesses study quality across three domains: selection, comparability, and exposure. Only studies with NOS scores ≥7 were classified as high quality and included in the analysis.

Cross-sectional studies were evaluated using the Agency for Healthcare Research and Quality (AHRQ) assessment tool, which examines participant selection, sample size adequacy, validity of data collection methods, and appropriateness of statistical analyses. Studies with AHRQ scores ≥8 were considered eligible for inclusion.

Application of these stringent quality assessment criteria minimized potential bias related to study quality variability and ensured that the predictive model was supported by a reliable evidence base.

4. Prediction model construction

  1. Data source
    The Inflammatory Bowel Disease Cohort Database (IBDCD) is a multicenter prospective cohort database containing clinical data from patients with Crohn’s disease collected between January 2018 and March 2025, including demographic information, clinical characteristics, laboratory indicators, treatment regimens, and nutritional status assessments. The model development dataset included 800 patients with Crohn’s disease derived from the IBDCD database, consisting of 520 patients in the training cohort and 280 patients in the validation cohort.

    Inclusion criteria required a confirmed diagnosis of Crohn’s disease for at least 6 months and availability of complete clinical data, including identified influencing factors and nutritional assessment data derived from the meta-analysis. Patients with comorbid conditions potentially affecting nutritional status, including malignancies or chronic kidney disease, were excluded to ensure data accuracy and model validity.
  2. Variable definition and assignment
    Malnutrition was diagnosed strictly according to the ESPEN 2015 criteria, which include three core diagnostic components: unintentional weight loss (≥5% within 3 months or ≥10% within 6 months), reduced food intake or absorption (≥25% reduction for ≥14 days), and decreased muscle mass assessed through physical examination and anthropometric measurements, including mid-arm muscle circumference. A low body mass index (BMI < 18.5 kg/m2) was considered a supportive rather than a definitive diagnostic indicator. Malnutrition was confirmed only when at least one core criterion was present, while low BMI served as a supplementary indicator. Patients presenting with low BMI without evidence of the core criteria were not classified as malnourished. This definition avoided circularity between low BMI, a predictive variable, and malnutrition, an outcome.

    The prevalence of malnutrition was 42.5% in the training cohort (n = 520) and 40.4% in the validation cohort (n = 280). Based on the meta-analysis results, five key influencing factors were selected as predictive variables, including disease activity defined as C-reactive protein (CRP >10 mg/L; Yes = 1, No = 0), small bowel involvement defined by Montreal classification (LL1/LL3 versus LL2 colonic involvement; Yes = 1, No = 0), biologic use (Yes = 1, No = 0), history of intestinal resection (Yes = 1, No = 0), and low BMI (<18.5 kg/m2; Yes = 1, No = 0). Age and sex were collected as baseline demographic characteristics but were not included as predictive variables in the final model because they were not statistically significant during preliminary analysis.

    A total of 17 high-quality studies were included in the meta-analysis. The pooled ORs and corresponding 95% CIs for the five predictive factors were as follows: elevated CRP (OR = 4.72, 95% CI: 3.21–6.95), small bowel involvement (OR = 2.89, 95% CI: 1.93–4.33), biologic use (OR = 0.39, 95% CI: 0.15–1.01), history of intestinal resection (OR = 6.17, 95% CI: 2.35–16.18), and low BMI (OR = 3.56, 95% CI: 2.41–5.27).
  3. Model construction method
    The pooled ORs and corresponding 95% CIs derived from the meta-analysis were transformed into log-OR values and used to derive the regression coefficients (β) of the multivariable logistic regression model. Consistency between the meta-analysis results and model coefficients was verified using Pearson correlation analysis (r = 0.98, P < 0.001). Based on the identified influencing factors, a multivariable logistic regression model was constructed according to the following equation:

    "logit"(P) = α + β1 X1 + β2 X2 + β3 X3 + β4 X4 + β5 X5

    where P represents the probability of malnutrition occurrence and β values represent the natural logarithm of the pooled OR values derived from the meta-analysis.

    Using R software (version 4.2.1) and the “rms” package, the model was translated into a clinically applicable nomogram. Original risk scores (theoretical range: 0–325.1) were linearly scaled to a 0–100 range to improve clinical readability. Based on total nomogram scores, patients were stratified into low-risk (≤20 points), moderate-risk (21–40 points), and high-risk (>40 points) categories using percentile-based calibration relative to the cohort score distribution.

    Machine learning algorithms, including random forest (RF) and gradient boosting decision tree (GBDT), were used for feature optimization. A stacking strategy combined RF and GBDT as base learners with logistic regression as the meta-classifier. Hyperparameters were optimized using 5-fold cross-validation, with RF configured as n_estimators = 200 and max_depth = 10, and GBDT configured as n_estimators = 150 and learning_rate = 0.1. The Synthetic Minority Oversampling Technique (SMOTE) was applied to address minor class imbalance within the training dataset.

5. Model validation

  1. Validation dataset
    A dual validation strategy was employed. Internal validation was performed using bootstrap resampling with 1,000 repetitions. Hold-out validation was conducted using an independent non-overlapping subset of 280 patients extracted from the IBDCD database, excluding patients included in the training cohort. All patients in the validation cohort were diagnosed with Crohn’s disease according to the 2018 Inflammatory Bowel Disease Consensus, with prospectively collected data spanning January 2018 to March 2025.

    The validation cohort followed the same ascertainment procedures as the training cohort. Malnutrition was assessed using the ESPEN 2015 criteria, and all predictive variables, including CRP, lesion location, biologic use, history of intestinal resection, and BMI, were measured within 72 h of hospital admission. No significant differences were observed between the training and validation cohorts for baseline characteristics, including age, sex, and disease duration (P > 0.05), supporting the reliability and clinical applicability of the validation results.

    Additional performance metrics were calculated to further evaluate model performance. In the training cohort, sensitivity was 92.3%, specificity was 94.1%, positive predictive value (PPV) was 90.5%, and negative predictive value (NPV) was 95.2%. In the validation cohort, sensitivity was 90.2%, specificity was 92.8%, PPV was 88.7%, and NPV was 93.9%. The Brier score was 0.087 in the training cohort and 0.102 in the validation cohort, indicating acceptable prediction error.
  2. Validation metrics
    Model discrimination was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC). Calibration was assessed using the Hosmer–Lemeshow test, where P > 0.05 indicated no significant difference between predicted probabilities and observed outcomes, and the Brier score, where values <0.20 indicated acceptable prediction error. Decision curve analysis (DCA) was additionally performed to evaluate clinical net benefit across a range of threshold probabilities.

    This validation framework adhered to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement guidelines.

6. Statistical analysis

Meta-analysis was conducted using RevMan version 5.4, and pooled ORs with corresponding 95% CIs were calculated. Heterogeneity was assessed using the I2 statistic; values>50% indicate substantial heterogeneity. Random-effects models were applied when significant heterogeneity was present; otherwise, fixed-effects models were used. Sensitivity analysis was performed by sequentially excluding individual studies to evaluate the stability of the results. Publication bias was assessed using funnel plots and Egger’s test.

Basic statistical analyses were conducted using SPSS version 26.0. The “pROC”, “rms”, and “glmnet” packages in R software (version 4.2.1) were used for model construction and validation.

Results

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Basic characteristics of the study population
This meta-analysis included 17 high-quality studies evaluating malnutrition in patients with Crohn’s disease, as summarized in Supplementary Table 1. The studies were published between 2009 and 2025, with a median sample size of 502 patients (range: 175–773). Study designs included prospective cohort, retrospective cohort, case-control, cross-sectional, and mixed-methods studies, with mixed-methods studies accounting for 17.6% (3/17) of the included studies.

Definitions of malnutrition varied among studies. Body mass index (BMI) <18.5 kg/m2 was used in 35.3% (6/17) of studies, Malnutrition Universal Screening Tool (MUST) score ≥2 in 23.5% (4/17), European Society for Clinical Nutrition and Metabolism (ESPEN) 2015 criteria in 17.6% (3/17), Global Leadership Initiative on Malnutrition (GLIM) 2019 criteria in 11.8% (2/17), and Subjective Global Assessment (SGA) Grade B/C in 5.9% (1/17). In contrast, the study cohort used the ESPEN 2015 criteria exclusively to ensure consistency of outcome definition.

The reported prevalence of malnutrition ranged from 21.5% to 43.8% across the included studies. Demographic characteristics also demonstrated heterogeneity, with mean age ranging from approximately 36 to 55 years, male proportion ranging from 46% to 60%, and mean disease duration ranging from approximately 5 to 12 years. Predictor analysis identified elevated C-reactive protein (CRP >10 mg/L) and history of intestinal resection as common risk factors for malnutrition, with pooled ORs of 4.72 (95% CI: 3.21–6.95) and 6.17 (95% CI: 2.35–16.18), respectively. Most studies (76.5%, 13/17) adjusted for potential confounding variables, including disease activity, biologic use, and smoking history.

Overall, substantial heterogeneity was observed across studies in geographic region, study period, methodology, and patient characteristics, underscoring the need for standardized diagnostic criteria and multicenter collaborative studies. Importantly, the pooled prevalence derived from the meta-analysis (32.3%) was based on heterogeneous malnutrition definitions, whereas the prevalence observed in the study cohort (42.5% in the training cohort and 40.4% in the validation cohort) was determined exclusively using the ESPEN 2015 criteria. Therefore, direct comparison of prevalence estimates should be interpreted cautiously, as differences may reflect variation in diagnostic frameworks rather than true differences in malnutrition risk.

Meta-analysis results
The forest plot summarizes the prevalence of malnutrition and corresponding 95% confidence intervals (CIs) among patients with Crohn’s disease across the included studies (Figure 1). The vertical dashed line represents the pooled prevalence estimate (0.323). Several studies reported prevalence rates exceeding 0.40, indicating a substantial malnutrition burden, whereas other studies reported prevalence rates between 0.30 and 0.40 with relatively wide confidence intervals, reflecting increased uncertainty. A smaller number of studies reported prevalence rates below 0.30.

Forest plot analyzing malnutrition prevalence in Crohn's disease patients; study comparison chart.
Figure 1: Forest plot of pooled malnutrition prevalence and 95% confidence intervals in patients with Crohn’s disease. This forest plot summarizes the results of a meta-analysis of 17 high-quality studies evaluating the prevalence of malnutrition in adult patients with Crohn’s disease. Horizontal lines represent the 95% confidence intervals (CIs) for malnutrition prevalence in each study, and solid dots indicate the corresponding point estimates. The vertical dashed line represents the pooled malnutrition prevalence (0.323) across all included studies. Studies are stratified into high (≥0.4), moderate (0.3–0.4), and low (<0.3) malnutrition prevalence subgroups, illustrating variability in reported prevalence while confirming malnutrition as a common complication in Crohn’s disease. Error bars denote 95% CIs. Please click here to view a larger version of this figure.

Despite variability among studies, the pooled prevalence estimate of 32.3% indicates that malnutrition is a common clinical concern in patients with Crohn’s disease. The forest plot additionally demonstrates heterogeneity in effect sizes among studies. Five predictive factors were selected for model development based on statistical significance (P <0.05) and clinical applicability in CD management.

Characteristics of the 17 included studies are summarized in Supplementary Table 1, with corresponding citations18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34.

Contribution of individual studies to pooled estimates and variable selection for the prediction model
Each of the 17 included studies contributed individual odds ratios (ORs) and corresponding 95% confidence intervals (CIs) for candidate malnutrition risk factors in patients with Crohn’s disease. Pooled effect sizes were calculated using fixed-effects or random-effects models according to heterogeneity assessment using the I2 statistic. Random-effects models were applied when substantial heterogeneity was present (I2 >50%), whereas fixed-effects models were used for homogeneous datasets (I2 ≤50%) (Supplementary Table 2).

Only variables with statistically significant pooled effect sizes (P <0.05) were considered candidate predictive variables. Additional selection criteria included clinical availability in routine CD management and completeness of data within the IBDCD database. This screening process resulted in the selection of five predictive variables for model development: elevated CRP, small bowel involvement, biologic use, history of intestinal resection, and low BMI. Variables with non-significant pooled effects or incomplete cohort data were excluded.

Detailed study-level ORs and corresponding 95% CIs for each predictive factor are provided in Supplementary Table 3.

Baseline characteristics of the Crohn’s disease malnutrition cohort
Baseline characteristics of the training cohort (n = 520) and validation cohort (n = 280) are summarized in Table 1. No statistically significant differences were observed between cohorts for any baseline variable (P >0.05).

VariableTraining CohortValidation CohortP-value
Age (years)32.6 ± 9.332.8 ± 9.60.706
Gender (Female)226 (43.5%)124 (44.3%)0.881
Gender (Male)294 (56.5%)156 (55.7%)0.881
Lesion Location (LL1)181 (34.8%)96 (34.3%)0.944
Lesion Location (LL2)193 (37.1%)100 (35.7%)0.753
Lesion Location (LL3)146 (28.1%)84 (30.0%)0.623
Active Disease (0)168 (32.3%)98 (35.0%)0.489
Active Disease (1)352 (67.7%)182 (65.0%)0.489
Biologic Use (0)317 (61.0%)160 (57.1%)0.33
Biologic Use (1)203 (39.0%)120 (42.9%)0.33
History of intestinal resection (0)362 (69.6%)198 (70.7%)0.808
History of intestinal resection (1)158 (30.4%)82 (29.3%)0.808
BMI (kg/m²)20.5 ± 2.520.6 ± 2.50.459
Malnutrition (ESPEN) (0)299 (57.5%)167 (59.6%)0.609
Malnutrition (ESPEN) (1)221 (42.5%)113 (40.4%)0.609
Risk Score54.0 ± 31.952.0 ± 30.70.412

Table 1: Baseline characteristics of the Crohn’s disease training and validation cohorts. Values are presented as mean ± standard deviation for continuous variables and n (%) for categorical variables. P-values were calculated using independent-samples t-tests for continuous variables and χ2 tests for categorical variables. L1 = ileal disease; L2 = colonic disease; L3 = ileocolonic disease according to the Montreal classification for Crohn’s disease. Malnutrition was diagnosed strictly according to the ESPEN 2015 criteria, with low BMI (<18.5 kg/m2) considered a supportive rather than definitive indicator. No statistically significant differences were observed between the training and validation cohorts for any baseline variable (P > 0.05), indicating good baseline comparability and minimizing confounding due to cohort heterogeneity. Risk score represents the cumulative raw nomogram score derived from the sum of individual variable scores (Active Disease: 100 points; History of Intestinal Resection: 90.8 points; L3: 51.1 points; Biologic Use: 41.2 points; BMI: 42.0 points), with a theoretical maximum score of 325.1. The observed risk score range in the training cohort was 4.2–148.6. The scores presented in this table correspond to the raw cumulative score before scaling.

The mean age was 32.6 ± 9.3 years in the training cohort and 32.8 ± 9.6 years in the validation cohort (P = 0.706). Mean BMI was 20.5 ± 2.5 kg/m2 in the training cohort and 20.6 ± 2.5 kg/m2 in the validation cohort (P = 0.459). Mean risk scores were 54.0 ± 31.9 and 52.0 ± 30.7 in the training and validation cohorts, respectively (P = 0.412).

No statistically significant differences were identified for categorical variables. Female patients accounted for 43.5% (226/520) of the training cohort and 44.3% (124/280) of the validation cohort (P = 0.881). The distributions of lesion locations (L1, L2, and L3) were comparable between cohorts, with P values of 0.944, 0.753, and 0.623, respectively. Similarly, no significant differences were observed in disease activity status, biologic use, history of intestinal resection, or ESPEN-defined malnutrition status (all P >0.05).

Multivariable logistic regression analysis for prediction of high-risk malnutrition in Crohn’s disease
Multivariable logistic regression was performed to evaluate predictors of high-risk malnutrition outcomes (Table 2). Lesion location classified as L2 demonstrated a significant negative association with high-risk malnutrition (β = −8.184, OR = 0.000, 95% CI: 0.000–0.002, P < 0.001). The corresponding nomogram score was 0.0 points.

VariableβOR95% CIP-valueNomogram Score
Lesion Location (L2)-8.1840(0.000-0.002)<0.0010
Lesion Location (L3)0.7892.2(0.675-7.170)0.19151.1
Active Disease2.50112.182(4.850-30.580)<0.001100
Biologic Use-0.9420.39(0.150-1.012)0.05341.2
History of intestinal resection1.826.17(2.350-16.180)0.00190.8
BMI-0.8010.449(0.340-0.593)<0.00142

Table 2: Multivariable logistic regression analysis for prediction of high-risk malnutrition in Crohn’s disease. This table summarizes the results of the multivariable logistic regression model for malnutrition risk prediction, including regression coefficient (β), odds ratio (OR), 95% confidence interval (CI), P-value, and corresponding nomogram score for each predictive variable. Variables included lesion location according to the Montreal classification (L2 and L3), active disease status, biologic use, history of intestinal resection, and BMI. A P-value <0.001 indicates a statistically significant predictive effect, whereas the near-significant P-value for biologic use (P = 0.053) suggests a potential protective trend. The extremely negative β coefficient and near-zero OR observed for L2 reflect quasi-complete separation within the dataset because only 8.3% (18/221) of malnourished patients demonstrated isolated colonic involvement. This estimate should therefore be interpreted cautiously. Sensitivity analysis excluding L2 did not significantly alter model discrimination (AUC change <0.02), supporting overall model stability.

Lesion location classified as L3 demonstrated β = 0.789, OR = 2.200, 95% CI: 0.675–7.170, and P = 0.191, indicating no statistically significant association with high-risk malnutrition. The corresponding nomogram score was 51.1 points. Active disease status demonstrated a strong positive association with high-risk malnutrition (β = 2.501, OR = 12.182, 95% CI: 4.850–30.580, P < 0.001). The corresponding nomogram score was 85.0 points. Biologic use demonstrated β = −0.942, OR = 0.390, 95% CI: 0.150–1.012, and P = 0.053, suggesting a potential protective trend that did not reach statistical significance. The corresponding nomogram score was 41.2 points.

History of intestinal resection was identified as a significant predictor of high-risk malnutrition, with β = 1.820, OR = 6.170, 95% CI: 2.350–16.180, and P = 0.001. The corresponding nomogram score was 72.5 points. BMI demonstrated a significant negative association with high-risk malnutrition (β = −0.801, OR = 0.449, 95% CI: 0.340–0.593, P < 0.001). The corresponding nomogram score was 42.0 points.

The extremely large negative coefficient and near-zero OR observed for lesion location L2 suggest quasi-complete separation within the dataset, potentially attributable to the low proportion of L2 lesions among malnourished patients (8.3%, 18/221). This finding should therefore be interpreted cautiously, as it may overestimate the apparent protective effect of L2 lesions. Sensitivity analysis excluding L2 lesion data showed minimal change in model discrimination (AUC change <0.02), supporting the model's overall stability despite this sparse data pattern.

Nomogram analysis for prediction of high-risk malnutrition in Crohn’s disease
A nomogram was constructed to predict high-risk malnutrition in patients with Crohn’s disease (Figure 2). The nomogram presents scaled scores ranging from 0 to 100 for each predictive variable, whereas the theoretical raw total risk score ranged from 0 to 325.1 before scaling.

Nomogram bar chart for BMI, intestinal resection, biologic use, disease activity, lesion location.
Figure 2: Nomogram for predicting high-risk malnutrition in patients with Crohn’s disease. This clinically applicable nomogram was constructed using a stacked model combining multivariable logistic regression and machine learning algorithms, including random forests and gradient-boosted decision trees. Variable scores are displayed for each predictive factor, including Active Disease (100 points), History of Intestinal Resection (90.8 points), Lesion Location L3 (51.1 points), Biologic Use (41.2 points), and BMI (42.0 points). For visual clarity, the total nomogram score is scaled to 0–100; however, the actual cumulative raw risk score ranges from 0 to 325.1. Based on the scaled score, patients are stratified into low-risk (≤20 points), moderate-risk (21–40 points), and high-risk (>40 points) categories for individualized nutritional intervention. Please click here to view a larger version of this figure.

Among predictive variables, active disease demonstrated the highest nomogram score (100.0 points), indicating the greatest contribution to high-risk malnutrition prediction. The history of intestinal resection also demonstrated a high contribution, with a score of 90.8 points. Lesion location L3 scored 51.1 points, biologic use scored 41.2 points, and BMI scored 42.0 points, indicating moderate predictive contributions. The L2 lesion location demonstrated the lowest score (0 points).

Risk stratification thresholds were calibrated using percentile distributions of training cohort risk scores. Decision curve analysis demonstrated clinically meaningful net benefit for moderate-risk (21–40 points) and high-risk (>40 points) stratification categories.

Receiver operating characteristic curve analysis of the malnutrition prediction model in Crohn’s disease
Receiver operating characteristic (ROC) curve analysis was performed to evaluate model discrimination in the training and validation cohorts (Figure 3). The area under the curve (AUC) was 0.987 (95% CI: 0.978–0.996) in the training cohort and 0.967 (95% CI: 0.945–0.989) in the validation cohort.

ROC curves for malnutrition prediction; diagram with AUC values for training (0.987) and validation (0.967).
Figure 3: Receiver operating characteristic (ROC) curves of the malnutrition prediction model in the training and validation cohorts. The ROC curves of the malnutrition prediction model are shown for the training cohort (blue) and validation cohort (red; non-overlapping subset). The x-axis represents 1-specificity (false-positive rate), and the y-axis represents sensitivity (true-positive rate). The area under the curve (AUC) was calculated using the trapezoidal method, with 95% confidence intervals estimated using 1,000 bootstrap resamples. The AUC was 0.987 (95% CI: 0.978–0.996) for the training cohort and 0.967 (95% CI: 0.945–0.989) for the validation cohort. The rapid rise of both curves toward the upper-left corner indicates high discriminative ability of the model. Please click here to view a larger version of this figure.

Both ROC curves rose sharply toward the upper-left corner, indicating strong discriminatory ability for distinguishing malnourished from non-malnourished patients. Although the AUC was slightly lower in the validation cohort, discrimination remained high (>0.95), indicating stable model performance in the hold-out validation dataset.

Despite strong performance metrics, these findings should be interpreted cautiously, as exceptionally high AUC values may indicate overfitting. Further validation in larger, multicenter, and multi-ethnic cohorts is required to confirm the model's robustness and generalizability.

Calibration curve analysis of the high-risk malnutrition prediction model in Crohn’s disease
Calibration curves were used to evaluate agreement between predicted and observed probabilities of high-risk malnutrition (Figure 4). Calibration curves for both the training cohort and validation cohort were closely aligned with the ideal calibration line.

Calibration curve chart; predicted vs. observed probability; training, validation cohorts compare.
Figure 4: Calibration curves for the malnutrition prediction model. The calibration curves evaluate agreement between predicted and observed probabilities of malnutrition. The x-axis represents predicted probability derived from the model, and the y-axis represents the observed proportion of malnourished patients. The black dashed line indicates ideal calibration. Calibration slopes were 0.98 for the training cohort and 0.94 for the validation cohort. Corresponding Brier scores were 0.087 and 0.102, respectively, where slopes approaching 1 and Brier scores <0.25 indicate good calibration performance. The curves closely align with the ideal calibration line at predicted probabilities >0.4, indicating good calibration performance in high-risk patients. Please click here to view a larger version of this figure.

The calibration slope was 0.98 (95% CI: 0.95–1.01) in the training cohort and 0.94 (95% CI: 0.89–0.99) in the validation cohort. Minor deviations from the ideal calibration line were observed at low predicted probabilities, whereas close agreement was observed at moderate and high predicted probabilities (>0.4).

The Brier score was 0.087 (95% CI: 0.072–0.102) in the training cohort and 0.102 (95% CI: 0.085–0.119) in the validation cohort, indicating acceptable prediction error. No statistically significant difference in calibration performance was observed between cohorts (P = 0.32).

Decision curve analysis of the malnutrition prediction model in Crohn’s disease
Decision curve analysis (DCA) was performed to evaluate the clinical utility of the prediction model across a range of threshold probabilities (Figure 5). A positive net benefit was observed in both cohorts within the clinically relevant probability range of 0.10–0.60.

Decision curve analysis chart showing net benefit vs. threshold probability for different cohorts.
Figure 5: Decision curve analysis (DCA) of the malnutrition prediction model. Decision curve analysis was performed to evaluate the clinical utility of the malnutrition prediction model. The x-axis represents the threshold probability for identifying a patient as being at risk of malnutrition (range: 0.0–1.0), and the y-axis represents net benefit. Net benefit was calculated as follows: (true positives / N) – (false positives / N) × (threshold / [1 – threshold]). The model net benefit is shown for the training (blue) and validation (red) cohorts and compared with two reference strategies: “treat all” (gray) and “treat none” (black). Positive net benefit was observed across threshold probabilities of 0.10–0.60, supporting the clinical utility of the model for guiding nutritional intervention decisions. Please click here to view a larger version of this figure.

At a threshold probability of 0.20, the net benefit was 0.42 in the training cohort and 0.39 in the validation cohort. At a threshold probability of 0.40, the net benefit was 0.38 and 0.35 in the training and validation cohorts, respectively. Net benefit decreased substantially at threshold probabilities >0.60, and greater fluctuation was observed within the validation cohort at threshold probabilities between 0.80 and 1.00.

Overall, the model demonstrated clinically meaningful net benefit within threshold probabilities of 0.10–0.60, supporting its potential utility for guiding nutritional risk intervention in patients with Crohn’s disease.

DATA AVAILABILITY:
Key characteristics of all 17 included studies, including author, publication year, country, sample size, study design, and malnutrition diagnostic criteria, are summarized in Supplementary Table 1. Supplementary Table 2 contains the complete “Basic Study Information” worksheet used for data extraction, including study characteristics, study design, quality assessment scores, extracted effect sizes, and raw meta-analysis data for all included studies. The datasets generated and analyzed during the current study are provided in Supplementary File 3.

Supplementary Table 1: Characteristics of the 17 included studies in the meta-analysis. Supplementary Table 1 summarizes the characteristics of the 17 studies included in the meta-analysis. For each study, the table reports the first author, publication year, country, study design, total sample size (N_Total), number of malnourished patients (N_Malnourished), malnutrition diagnostic criteria applied (e.g., MUST score ≥2, BMI <18.5 kg/m2, GLIM criteria, ESPEN criteria, or SGA Grade B/C), and reported malnutrition prevalence presented as effect size with corresponding 95% confidence interval. The studies represent diverse geographic regions and clinical settings and, collectively, provide the evidence base for predictive variable selection in the malnutrition risk model for Crohn’s disease.Please click here to download this file.

Supplementary Table 2: Basic Study Information worksheet for included studies in the meta-analysis. Supplementary Table 2 contains the complete “Basic Study Information” worksheet used for data extraction. The worksheet includes study characteristics, study design, quality assessment scores, extracted effect sizes, and raw meta-analysis data for all included studies.Please click here to download this file.

Supplementary File 3. Datasets used in the present study. Please click here to download this file.

Discussion

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

Crohn’s disease (CD) is a chronic, recurrent inflammatory bowel disease that can affect any part of the gastrointestinal tract1,35,36,37. The etiology of CD remains unclear and is thought to involve multiple factors, including genetic susceptibility, environmental exposures, and immune dysregulation. Common clinical manifestations include abdominal pain, diarrhea, weight loss, and fistula formation, among which malnutrition is one of the most frequent complications in patients with CD38,39,40. Persistent intestinal inflammation often impairs intestinal absorption41, while reduced appetite and increased nutrient loss further contribute to the high prevalence of malnutrition in this population. Studies using different malnutrition diagnostic criteria, including GLIM, ESPEN, and MUST, have reported malnutrition prevalence rates ranging from 30% to 80% in patients with CD. In the present study cohort, where malnutrition was assessed exclusively using the ESPEN 2015 criteria, the prevalence of malnutrition was 42.5% in the training cohort and 40.4% in the validation cohort. Malnutrition may further aggravate disease severity, increase the risk of infection and surgical complications, and negatively affect quality of life and prognosis42,43. Although the pooled prevalence from the meta-analysis was 32.3%, this lower estimate likely reflects heterogeneity in diagnostic definitions rather than a lower disease burden; therefore, direct comparisons between the pooled and cohort prevalence estimates should be interpreted cautiously. Early identification of malnutrition risk and timely implementation of nutritional intervention are therefore critical for improving clinical outcomes in patients with CD44.

Previous studies have investigated multiple approaches for predicting malnutrition risk in patients with CD45,46. Some studies have focused on single predictive indicators, such as serum albumin levels, which have been associated with malnutrition risk in patients with CD47,48. Other studies have examined disease activity as a predictor of nutritional deterioration, reporting that patients with active disease are more likely to develop malnutrition49,50. In addition, several small-sample studies have attempted to construct prediction models incorporating multiple variables. However, these studies were often limited by insufficient predictor selection, incomplete integration of multidimensional factors such as inflammatory markers and demographic characteristics, and inadequate validation strategies51,52,53,54. Many previously reported models relied exclusively on internal validation and lacked independent validation cohorts, thereby limiting confidence in model stability and clinical applicability55,56. Furthermore, some models were difficult to apply clinically because of computational complexity and limited interpretability57.

Based on previous findings, the present study identified five key predictors of malnutrition in patients with CD through meta-analysis, including CRP, lesion location, history of intestinal resection, disease activity, and weight loss, and subsequently constructed a prediction model using multivariable logistic regression58,59,60,61. The model demonstrated strong predictive performance, with area under the curve (AUC) values of 0.987 and 0.967 in the training and validation cohorts, respectively. Calibration analysis demonstrated close agreement between predicted and observed risk probabilities, while decision curve analysis (DCA) further confirmed meaningful clinical net benefit across a broad range of threshold probabilities.

Compared with previously published models, the prediction model developed in this study demonstrates several advantages62,63,64, First, the predictors selected through systematic meta-analysis encompass multiple clinically relevant dimensions, including inflammatory status reflected by CRP, disease characteristics reflected by lesion location and history of intestinal resection, and nutritional status reflected by weight loss. Although weight loss and serum albumin are closely associated with nutritional status, these variables were not retained in the final model because of collinearity or incomplete availability within the meta-analysis dataset. Integration of multidimensional predictors enables more comprehensive assessment of malnutrition risk than single-indicator or small-sample prediction models, thereby improving overall predictive performance. Second, the model demonstrated stable performance across both training and validation cohorts, supporting its robustness and reducing the likelihood of overfitting associated with single-cohort model development. Third, the nomogram-based visualization of the model enhances clinical practicality by enabling rapid and intuitive estimation of malnutrition risk without requiring complex calculations. In addition, DCA confirmed that application of the model may provide meaningful clinical benefit for nutritional intervention decision-making.

From a clinical perspective, this model may facilitate early identification of patients at high risk of malnutrition and support implementation of individualized nutritional interventions. For example, patients with total risk scores >120 points may benefit from prioritized nutritional support, closer monitoring of nutritional indicators, and individualized dietary intervention strategies to reduce malnutrition-related complications and improve prognosis. Application of this model may therefore improve the timeliness and precision of nutritional management in patients with CD.

Despite the excellent discriminatory performance observed in both cohorts (AUC >0.96), several limitations should be acknowledged. First, the validation cohort consisted of an independent non-overlapping subset derived from the same IBDCD database as the training cohort, which may limit assessment of model generalizability across different institutions and clinical practice settings. External validation using fully independent multicenter datasets would provide stronger evidence of model generalizability. Second, predictive variables were selected using the same meta-analysis that informed the model coefficients, potentially introducing circularity and optimism bias. Although bootstrap resampling and hold-out validation were performed to reduce this risk, the possibility of inflated performance estimates cannot be fully excluded. Third, the exceptionally high AUC observed in the training cohort (0.987) may indicate partial overfitting to characteristics of the development dataset. Although the validation cohort demonstrated similarly strong performance, the relatively small difference between training and validation AUC values does not entirely eliminate concern regarding optimism bias. To mitigate overfitting, regularization strategies and ensemble machine learning methods with cross-validated hyperparameter tuning were applied. Nevertheless, further validation using multicenter and multi-ethnic datasets remains necessary to confirm robustness and recalibrate the model if required.

Additional limitations should also be considered. The study population was derived primarily from selected databases and hospitals, which may introduce regional bias and limit applicability to other populations. In addition, certain potentially relevant confounding factors, including dietary habits and socioeconomic status, were not incorporated into the model because of incomplete data availability. Finally, the current model represents a static prediction framework and does not account for dynamic temporal changes in predictors during disease progression, which may affect predictive accuracy over time. Development of dynamic nutritional risk prediction models with longitudinal optimization capability will therefore be an important direction for future research.

Advanced multimodal analytical approaches have been shown to improve predictive performance in chronic inflammatory diseases by capturing nonlinear variable relationships and immune-modulating effects that may not be identified by traditional regression models65,66. The integrated modeling strategy combining logistic regression and machine learning used in the present study is consistent with contemporary recommendations for complex clinical prediction modeling, balancing predictive performance with model interpretability67. Future studies should focus on several key directions. First, multicenter validation studies should be conducted to further evaluate model applicability across broader patient populations. Second, additional biomarkers, including intestinal microbiota profiles and inflammatory mediators, should be investigated to further improve predictive accuracy. Third, development of dynamic nutritional risk prediction models incorporating longitudinal follow-up data may enable real-time updating of malnutrition risk during disease progression. Fourth, prospective intervention studies are needed to determine whether model-guided nutritional intervention can reduce malnutrition incidence and improve clinical outcomes. Future work should also explore integration of genetic biomarkers and longitudinal immune factor data into time-dependent prediction frameworks for patients with CD68.

Disclosures

Loading...
$$\rightleftharpoonup{xx}$$ $$\longleftharp{xx}$$, $$\longrightharp{xx}$$,

The authors declare no conflicts of interest.

Materials

List of materials used in this article
NameCompanyCatalog NumberComments
RevMan softwareCochrane Collaboration‌5.4The professional meta-analysis software developed by the Cochrane collaboration is mainly used for the formulation of systematic reviews and meta-analyses, data entry, and visualization of results. ‌
R softwareThe R Project for Statistical Computing4.2.1R is a branch of the S language that was widely used in the field of statistics and was born around 1980. It can be regarded as an implementation of the S language. The S language, developed by AT&T Bell LABS, is an interpretive language used for data exploration, statistical analysis and graphing

Reprints and Permissions

Request permission to reuse the text or figures of this JoVE article

Request Permission

Tags

Malnutrition RiskCrohn s DiseaseRisk Prediction ModelMachine LearningLogistic RegressionInflammatory MarkersESPEN CriteriaGLIM CriteriaNutritional InterventionModel Validation

Related Articles