April 18th, 2025
This study evaluates prognostic systems for colorectal signet-ring cell carcinoma patients using machine learning models and competing risk analyses. It identifies log odds of positive lymph nodes as a superior predictor compared to pN staging, demonstrating strong predictive performance and aiding clinical decision-making through robust survival prediction tools.
- Our research evaluates three lymph node staging systems in colorectal signet ring cell carcinoma using machine learning and the competing risk models to optimize prognostic accuracy and the survival prediction.
By informatics methods, including machine learning, comparing risk models, and the Kaplan-Meier survival estimation are used to enhance survival prediction and lymph node classification accuracy.
Extending follow-up periods, while dating in diverse populations, refining prognostic nomograms and exploring molecular traits of colorectal signet ring cell carcinoma to enhance clinical decision-making tools.
[Narrator] To begin, download and install SEER. Then obtain the statistics 8.4.3 software from the SEER database website. Log into the software and click on case list session, followed by data, and select the incidents SEER research plus data, 17 registries, November 2022, sub 2000 to 2020 database. Now, click on selection, followed by edit and choose race, sex, year of diagnosis equal to 2004 through 2015. Then select site recode ICD-0-3 WHO 2008. Click on table and in the available variables interface, select all the diagnosis details required. Then click on output. Name the data and click on execute to output and save the data. Next, open the X-Tile software, click on file and choose open. Select the data file to import it into the software. Once the data is loaded, map the variable sensor corresponding to survival status, the survival time in marker one as the variable to be analyzed, ensuring the data matches correctly. Now click on do, followed by Kaplan-Meier and marker one to perform the Kaplan-Meier survival analysis and generate the survival curve. Then randomly assign a total of 2,409 eligible patient data with SRCC to a training cohort number 1,686, and a validation cohort number 723 in a 7-3 ratio. Use the provided code for random splitting. Download and install the required versions of RStudio and R software. Click on new file and select R Script to create a new R programming interface. Then enter the relevant code in the code editor and click on run to execute the code. Use the provided code to screen the variables included in the machine learning models by Cox regression analysis. Additionally, explore the impact of LODDS, LNR, and PN staging on cancer-specific survival in SRCC patients. Use the code to compare the prognostic prediction abilities of three lymph node systems, LODDS, LNR, and PN staging across the training, validation, and external validation cohorts. Then use the code to build an XGBoost model and generate bar graphs representing the relative importance of variables. Generate receiver operating characteristic curves and calibration curves to assess the performance of the three lymph node systems. Next, employ the code to build a random forest model and generate bar graphs of the relative importance of variables. Similarly, generate receiver operating characteristic curves and calibration curves to evaluate and compare the three lymph node systems. With the appropriate code, build a neural network model and produce bar graphs of the relative importance of variables. Generate receiver operating characteristic and calibration curves to compare the predictive performance of the three lymph node systems. Then, perform univariate analysis and plot the cumulative incidents function curve using the data.csv file. Replace site with other factors to perform univariate analysis for each factor. For multi-variate analysis, apply the code and visualize with data1.csv. Finally, plot the nomogram, receiver operating characteristic curve, and calibration curve. Train the model using data from the training cohort and use validation and external validation cohort data to validate the model. Based on multi-variate Cox regression analysis, LNR, LODDS, and PN staging were all significantly associated with cancer-specific survival in SRCC patients. LNR showed the highest importance in the RF and XGBoost models, while LODDS had the greatest predictive ability in the NN model, suggesting LODDS as the most reliable LN system overall. The XGBoost, RF and NN models achieved high predictive accuracy with AUC values ranging from 0.777 to 0.851, and calibration curves that aligned closely with the 45 degree line, confirming model reliability. Competing risk model analysis identified T staging, N staging, M staging, LODDS classification, and primary tumor location as independent prognostic factors. The competing risk nomogram demonstrated accurate one, three, and five-year cancer-specific survival predictions, supported by well-aligned calibration and ROC curves with AUCs above 0.75.
View the full transcript and gain access to thousands of scientific videos
This study evaluates prognostic systems for colorectal signet-ring cell carcinoma patients using machine learning models and competing risk analyses. It identifies log odds of positive lymph nodes as a superior predictor compared to pN staging, demonstrating strong predictive performance and aiding clinical decision-making through robust survival prediction tools.