Establishing a Competing Risk Regression Nomogram Model for Survival Data

Lunpo Wu; Chenyang Ge; Hongjuan Zheng; Haiping Lin; Wei Fu; Jianfei Fu

doi:10.3791/60684

Cancer Research

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published: October 23, 2020 doi: 10.3791/60684

Lunpo Wu^1,2, Chenyang Ge³, Hongjuan Zheng⁴, Haiping Lin⁵, Wei Fu⁶, Jianfei Fu⁴

¹Department of Gastroenterology, Second Affiliated Hospital of Zhejiang University School of Medicine, ²Institute of Gastroenterology, Zhejiang University, ³Department of Colorectal Surgery, Afiliated Jinhua Hospital, Zhejiang University School of Medicine, ⁴Department of Medical Oncology, Affiliated Jinhua Hospital, Zhejiang University School of Medicine, ⁵Department of Hepatobiliary Pancreatic Surgery, Affiliated Jinhua Hospital, Zhejiang University School of Medicine, ⁶Division of Oncology, Johns Hopkins University School of Medicine

Summary

Presented here is a protocol to build nomograms based on the Cox proportional hazards regression model and competing risk regression model. The competing method is a more rational method to apply when competing events are present in the survival analysis.

Abstract

The Kaplan–Meier method and Cox proportional hazards regression model are the most common analyses in the survival framework. These are relatively easy to apply and interpret and can be depicted visually. However, when competing events (e.g., cardiovascular and cerebrovascular accidents, treatment-related deaths, traffic accidents) are present, the standard survival methods should be applied with caution, and real-world data cannot be correctly interpreted. It may be desirable to distinguish different kinds of events that may lead to the failure and treat them differently in the analysis. Here, the methods focus on using the competing regression model to identify significant prognostic factors or risk factors when competing events are present. Additionally, nomograms based on a proportional hazard regression model and a competing regression model are established to help clinicians make individual assessments and risk stratifications in order to explain the impact of controversial factors on prognosis.

Introduction

The time to event survival analysis is quite common in clinical studies. Survival data measure the time span from the start time until the occurrence of the event of interest, but the occurrence of the event of interest is often precluded by another event. If more than one type of end point is present, they are called competing risks end points. In this case, the standard hazard analysis (i.e., Cox proportional cause-specific hazards model) often does not work well because individuals experiencing another type of event are censored. Individuals who experience a competing event often remain in the risk set, as the competing risks are usually not independent. Therefore, Fine and Gray¹ studied the regression model estimation for the sub distribution of a competing risk. In a competing risk setting, three different types of events can be discriminated.

One measures overall survival (OS) by demonstrating a direct clinical benefit from new treatment methods for a disease. OS measures the survival time from time of origin (i.e., time of diagnosis or treatment) to the time of death due to any cause and generally evaluates the absolute risk of death, thereby failing to differentiate the causes of death (e.g., cancer-specific death (CSD) or non-cancer-specific death (non-CSD))². OS is, therefore, considered as the most important endpoint. The events of interest are often cancer related, while the non-cancer-specific events, which include heart disease, traffic accidents or other unrelated causes, are considered competing events. Malignant patients with a favorable prognosis, who are expected to survive longer, are often at a greater risk of non-CSD. That is, the OS will be diluted by other causes of death and fail to correctly interpret the real effectiveness of clinical treatment. Therefore, OS may not be the optimal measure for accessing the outcomes of disease³. Such biases could be corrected by the competing risk regression model.

There are two main methods for competing risk data: cause-specific hazard models (Cox models) and subdistribution hazard models (competing models). In the following protocol, we present two methods to generate nomograms based on the cause-specific hazard model and the subdistribution hazard model. The cause-specific hazard model can be made to fit in the Cox proportional hazards model, which treats subjects who experience the competing event as censored at the time that the competing event occurred. In the subdistribution hazard model that was introduced by Fine and Gray¹ in 1999, three different types of events can be discriminated, and individuals who experience a competing event remain at the risk set forever.

A nomogram is a mathematical representation of the relationship between three or more variables⁴. Medical nomograms consider biological and clinical event as variables (e.g., tumor grade and patient age) and generate probabilities of a clinical event (e.g., cancer recurrence or death) that is graphically depicted as a statistical prognostic model for a given individual. Generally, a nomogram is formulated based on the results of the Cox proportional hazards model⁵^,⁶^,⁷^,⁸^,⁹^,¹⁰.

However, when competing risks are present, a nomogram based on the Cox model might fail to perform well. Though several previous studies¹¹^,¹²^,¹³^,¹⁴ have applied the competing risk nomogram to estimate the probability of CSD, few studies have described how to establish the nomogram based on a competing risk regression model, and there is no existing package available to accomplish this. Therefore, the method presented below will provide a step-by-step protocol to establish a specific competing-risk nomogram based on a competing risk regression model as well as a risk score estimation to aid clinicians in treatment decision-making.

Subscription Required. Please recommend JoVE to your librarian.

Protocol

The research protocol was approved by the Ethics Committee of Jinhua Hospital, Zhejiang University School of Medicine. For this experiment, the cases were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. SEER is an open-access database that includes demographic, incidence and survival data from 18 population-based cancer registries. We registered on the SEER website and signed a letter of assurance to acquire the research data (12296-Nov2018).

1. Data source

Obtain cases from the databases as well as permission (if any) to use the cases from the registries.
NOTE: The cohort data are uploaded in Supplementary File 1. Readers who already have survival data with competing risks can skip this section.

2. Installing and loading packages and importing data

NOTE: Perform the following procedures based on R software (version 3.5.3) using the packages rms¹⁵ and cmprsk¹⁶ (http://www.r-project.org/).

Install rms and cmprsk R packages.
>install.packages("rms")
>install.packages("cmprsk")
Load the R packages.
>library("rms")
>library("cmprsk")
Import the cohort data.
>Dataset<-read.csv("…/Cohort Data.csv") # cohort data is the example

3. Nomogram based on the Cox Proportional Hazards Regression model

Establish the Cox Proportional Hazards Regression model.
NOTE: The independent variables (X) include categorical variables (dummy variables, such as race) and continuous variables (such as age). The factors significant in the univariable analysis will be selected for the use in multivariable analysis.
1. Fit the Cox proportional hazards model to the data. Establish the Cox proportional hard regression model using the function cph. The simplified format in R is shown below:
  > f0 <- cph(Surv(Survivalmonths, status) ~ factor1+ factor2+…,
  x=T, y=T, surv=T, data=Dataset)
  NOTE: Death was set as the status in the example code.
Develop a Cox Regression Nomogram using the commands detailed below.
> nom <- nomogram(f0, fun=list(function(x) surv(24, x)…), funlabel=c(“2-year predicted survival rate”…), maxscale=100, fun.at)
> plot (nom)
NOTE: Take the 2-year predicted survival rate as an example.

4. Nomogram based on the Competing Risk Regression Model

Establish the Competing Risk Regression Model.
1. Fit the competing risk regression model. Readers could include the factors that they consider important, this step could be skipped. In the example, the factors significant in the univariable analysis are included.
  NOTE: The censoring variable is coded as 1 for the event of interest and as 2 for the competing risk event. To facilitate the analysis, Scrucca et al.¹⁷ provide an R function factor2ind(), which creates a matrix of indicator variables from a factor.
2. For categorical variables, carefully code them numerically when including them in the competing model. That is, for a categorical variable made of J levels, create J-1 dummy variables or indicator variables.
3. To establish a competing risk regression model, first place prognostic variables into a matrix. Use the function cbind() to concatenate the variables by columns and fit them into the competing regression model.
  >x <-cbind(factor2ind(factor1, "1"), factor2ind(factor2, "1")…)
  > mod<- crr (Survivalmonths, fstatus, failcode=1 or 2, cov1=x)
Plot the competing nomogram
NOTE: The beta value (β value) is the regression coefficient of a variate (X) in the formula of the Cox proportional hazards regression. The X.score (comprehensive effect of the dependent variable) and X.real (at special timepoints, for example, 60 months, to predict the cumulative incidence function) are calculated from the Cox regression model and then, a nomogram is established.
1. Use the function nomogram to construct Cox nom (as listed in step 3.2).
2. Replace X.beta and X.point as well as total.points, X.real, and X.score of the competing risk regression model.
  1. Get the baseline cif, that is cif(min). See Supplementary file 2 for details.
    > x0=x
    > x0 <- as.matrix(x0)
    > lhat <- matrix(0, nrow = length(mod$uftime), ncol = nrow(x0))
    > for (j in 1:nrow(x0)) lhat[, j] <- cumsum(exp(sum(x0[j, ] * mod$coef)) * mod$bfitj)
    > lhat <- cbind(mod$uftime, 1 - exp(-lhat))
    > suv<-as.data.frame(lhat)
    > colnames(suv)<- c("time")
    > line24<-which(suv$time=="24")
    > cif.min24<-suv[line24,which.min(suv[line24,])]
  2. Replace the X.beta and X.point.
    > lmaxbeta<-which.max(abs(mod$coef))
    > maxbeta<-abs(mod$coef[lmaxbeta])
    > race0<-0
    > names(race0)<-"race:1"
    > race.beta<-c(race0,mod$coef[c("race:2","race:3")])
    > race.beta.min<-race.beta[which.min(race.beta)]
    > race.beta1<-race.beta-race.beta.min
    > race.scale<-(race.beta1/maxbeta*100) # how the scale is calculated
    > nom$Race$Xbeta<-race.beta1
    > nom$Race$points<-race.scale
    NOTE: Take race as an example.
  3. Replace the total X.point and X.real.
    > nom$total.points$x<-c(0,50,100, …)
    > real.2y<-c(0.01,0.1,0.2,…)
    NOTE: Replacements are according to the minimax value.
  4. Calculate the X.score and plot the nomogram.
    > score.2y<-log(log((1-real.2y),(1-cif.min24)))/(maxbeta/100)
    > nom$`2-year survival`$x<-score.2y
    > nom$`2-year survival`$x.real<-real.2y
    > nom$`2-year survival`$fat<-as.character(real.2y)
    > plot(nom)
    NOTE: X.score=log(log((1-X.real),(1-cif0)))/(maxbeta/100). The equations for the X.score and X.real relationship can be calculated according to the intrinsic attribution of the competing model(crr). Cif0 means baseline cif, which will be calculated by the predict.crr function.

5. Subgroup analysis based on the Group Risk Score (GRS)

Calculate the risk score (RS)
NOTE: Calculate the risk score for each patient by totalling the points of every variable. Cut-off values are used to classify the cohort. Taking 3 subgroups as an example, use the package meta to draw a forest plot.
1. Install and load the R packages
  > install.packages("meta")
  > library("meta")
2. Obtain the GRS and divide the cohort into 3 subgroups.
  > d1<-Dataset
  > d1$X<-nom$X$points
  > #For example, d1$race[d1$race==1]<-nom$race$point[1]
  > d1$RS<-d1$race + d1$marry + d1$histology + d1$grademodify + d1$Tclassification + d1$Nclassification
  > d1$GRS<- cut(d1$RS, quantile(d1$RS, seq(0, 1,1/3)), include.lowest = TRUE, labels = 1:3)
3. Draw the forest plot. Get the HR, LCI and UCI via the function crr.
  > subgroup<-crr(ftime, fstatus, cov1, failcode=1)
  > HR<- summary(subgroup)$conf.int[1]
  > LCI<- summary(subgroup)$conf.int[3]
  > UCI<- summary(subgroup)$conf.int[4]
  > LABxx<-c("Low Risk", "Median Risk", "High Risk")
  > xx<-metagen(log(HR), lower = log(LCI), upper = log(UCI), studlab = LABxx, sm = "HR")
  > forest(xx, col.square = "black", hetstat =TRUE, leftcols = "studlab")

Subscription Required. Please recommend JoVE to your librarian.

Representative Results

Survival characteristics of the example cohort
In the example cohort, a total of 8,550 eligible patients were included in the analysis and the median follow-up time was 88 months (range, 1 to 95 months). A total of 679 (7.94%) patients were younger than 40 years old and 7,871 (92.06%) patients were older than 40. At the end of the trial, 7,483 (87.52%) patients were still alive, 662 (7.74%) died because of breast cancer, and 405 (4.74%) patients died because of other causes (competing risks).

Comparison of two survival models
The cumulative incidences of tumor death/no tumor death and competing events were calculated by the Kaplan-Meier method and the competing risk regression function, respectively (presented in Figure 1). As shown in Figure 1, the sum of the cumulative incidences of tumor death and no tumor death as calculated by the Kaplan-Meier method was higher than the sum of the estimates of all causes of death, which was equal to the cumulative incidence of CSD when the competing method was used. Clearly, the Kaplan-Meier method overestimated the cumulative incidence of tumor death and no tumor death. The competing method could correct its overestimation of the probability of death.

Nomogram based on the Cox proportional hazards regression model
A nomogram was constructed based on significant factors as shown in Figure 2A and Table 1. This included marital status, race, histological type, differentiated grade, T classification, and N classification.

Nomogram based on the competing risk regression model
A competing nomogram based on multiple factors, including race, marital status, histological type, differential grade, T classification, and N classification was constructed (Figure 2B). The beta-coefficients from the model were used for the allocation of scale (Table 1).

Stratification analysis by the risk score
Based on the risk score, the cohort was classified into three subgroups: low risk score: 0-44; medium risk score: 45-85; and high-risk score: 86-299. The forest plot could clearly present the interaction between the GRS and the specific factor (age) (Figure 3). Based on the GRS classification, the worse prognosis of young women only appeared in the low-risk subgroup and young age may act as a protective factor of prognosis in medium- and high-risk subgroups.

Figure 1: Stacked cumulative incidence plot. K-M: Cumulative incidences based on Kaplan-Meier estimates; CR: Cumulative incidences based on cumulative incidence competing risk estimates; Tumor death + no tumor death (K-M): sum of estimates of the cumulative incidence of cancer specific death and non-cancer specific death; CSD + non-CSD (CR): sum of estimates of cancer-specific death and non-cancer-specific death when the CR method was used. Please click here to view a larger version of this figure.

Figure 2: Nomograms of the Cox proportional hazards regression model and competing risk regression model. (A) Nomogram based on the Cox proportional hazards regression model. (B) Nomogram based on the competing risk regression model. For application of the nomograms, each variable axis shows an individual risk factor, and the line drawn upwards is used for the determination of the points of each variable. Then, the total points are calculated to obtain the probability of 2-, 3- and 5-year cancer-specific survival or cumulative incidence function (CIF). Race: 1=white, 2=black, 3=other; Marital status: 1=married, 2= single (never married or domestic partner), 3= divorced (separated, divorced, widowed); Histological type: 1=infiltrative duct cancer, 2= infiltrative lobular cancer, 3= infiltrating duct and lobular carcinoma; Tumor grade: 1= well differentiation, 2= moderate differentiation; 3= poor differentiation. T and N classification was according to the 7th AJCC TNM staging system. Please click here to view a larger version of this figure.

Figure 3: Forest plot of stratification analysis by the risk score for the probability of breast cancer-specific death in younger and older women with breast cancer. Please click here to view a larger version of this figure.

(HR: hazard ratio)

Variables	Score (Cox Model)	Estimated Probility		Score (Competing Model)	Estimated Probility
Race
1:White	10			4
2:Black	32			31
3:Other	0			0
Marital status
1:Married	0			0
2:Unmarried	9			5
3:Divorced	37			15
Histology
1:Adenocarcinoma	10			12
2:Mucinous adenocarcinoma	8			5
3:Singet ring cell carcinoma	0			0
Differential grade
1:Grade I	0			0
2:Grade II	6			36
3:Grade III	37			77
T classification^a
1:T1	0			0
2:T2	41			50
3:T3	59			68
4:T4	100			98
N classification^a
00:00	0			0
1:0-3	17			42
2:3-6	43			65
3:6-12	74			100

Total score (2-year Survival)	278	0.6	Total score (2-year CIF)	95	0.01
	254	0.7		233	0.1
	223	0.8		277	0.2
	173	0.9		305	0.3
	125	0.95		326	0.4
				344	0.5

Total score (3-year Survival)	281	0.4	Total score (3-year CIF)	62	0.01
	242	0.6		245	0.2
	218	0.7		293	0.4
	187	0.8		311	0.5
	137	0.9		328	0.6
	89	0.95		344	0.7

Total score (5-year Survival)	303	0.1	Total score (5-year CIF)	29	0.01
	279	0.2		212	0.2
	241	0.4		260	0.4
	203	0.6		295	0.6
	179	0.7		328	0.8
	148	0.8		349	0.9
	98	0.9
	50	0.95
^aT and N classification according to 7^th AJCC staging system CIF: Cumulative Incidence Function

Table 1: Point assignment and prognostic score in the nomogram based on Cox the proportional hazards regression model and competing risk regression model.

Supplemental File 1. Please click here to download this file.

Supplemental File 2. Please click here to download this file.

Supplemental File 3. Please click here to download this file.

Subscription Required. Please recommend JoVE to your librarian.

Discussion

The overall goal of the current study was to establish a specific competing-risk nomogram that could describe real-world diseases and to develop a convenient individual assessment model for clinicians to approach treatment decisions. Here, we provide a step-by-step tutorial for establishing nomograms based on the Cox regression model and competing risk regression model and further performing subgroup analysis. Zhang et al.¹⁸ introduced an approach to create a competing-risk nomogram, but the main concept of the methodology described in the paper is totally different. The methods of Zhang et al. first transformed the original data to weighted data by the crprep() function in the mstate package¹⁹, and then drew the nomogram by the rms package. However, the core concept of the method is totally different from that. Simply put, we replace the parameters generated by cph with the outcome of the function crr and then draw a competing-risk nomogram in the frame of the Cox nomogram. In this method, the Cox nomogram is more like a frame.

Malignant patients with a favorable prognosis who are expected to have a longer survival with cancer are at a greater risk of non-cancer-specific death. Their OS will be largely diluted by the incidence of non-CSD, as shown in Figure 1. Taking patients with stage II colon cancer¹³ as an example, if we take no account of causes of cancer in generating curves of all causes of death according to the Kaplan-Meier method, such curves would be largely affected by the cumulative incidence of non-CSD rather than the cumulative incidence of CSD.

The standard Cox model for the assessment of covariates would definitely lead to incorrect and biased results (for example, for chemotherapy in stage II colon cancer¹³, chemotherapy was a protective factor for OS). The bias could be corrected by the competing risk regression method, especially for the oldest subgroup (chemotherapy will be defined as a harmful factor for CSD). The non-CSD event is a nonnegligible competing risk in patients with cancer, especially for those with favorable prognosis.

Then, after we established a nomogram, the probability of death in associated with each variable was presented as a point on the nomogram. The risk score for each patient was calculated by totalling the points of all the variables. Based on the total score, we can further divide the cohort into three subgroups (low, medium, high) to stratify the impact of controversial factors on prognosis, which might be helpful for clinicians to solve clinical issues. Take the effect of age on breast cancer as an example²⁰. The impact of age on the outcomes of patients with early breast cancer has not been clinically established and is controversial. Based on the GRS classification, the worse prognosis of young women only appeared in the low- and medium-risk subgroups, and young age may act as a protective factor of prognosis.

In terms of limitations, the competing risk estimate might lead to over competition in some situations²¹. For example, diseases with poor prognosis (such as advanced malignant tumors or poor differentiated pancreatic cancer) and great toxicities will inevitably have predominant effects on non-CSD. Whether the Cox model or the subdistribution proportional regression model (competing risk) should be applied in survival analysis should be carefully considered. Both non-CSD and over competition should be addressed carefully when survival is being estimated. Based on the results, we propose that for diseases with good prognosis and patients with old age, the impact of non-CSD on OS should be carefully considered in future clinical trials. CSD, which is based on a competing risk model, may be an alternative endpoint instead of always using traditional OS.

In conclusion, we propose that not only malignant tumors with different prognosis but also the same disease with different stages might require the individual choice of an appropriate endpoint. Additionally, this methodology could be used to establish a nomogram based on the proper model (Cox or competing regression model) for quantifying risk, which can be further used for individualized guidance as well as better explain clinical phenomena in clinical practice.

Subscription Required. Please recommend JoVE to your librarian.

Disclosures

None

Acknowledgments

The study was supported by grants from the general program of Zhejiang Province Natural Science Foundation (grant number LY19H160020) and key program of the Jinhua Municipal Science & Technology Bureau (grant number 2016-3-005, 2018-3-001d and 2019-3-013).

Materials

Name	Company	Catalog Number	Comments
no	no	no

DOWNLOAD MATERIALS LIST

References

Fine, J. P., Gray, R. J. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 94 (446), 496-509 (1999).
Fu, J., et al. Real-world impact of non-breast cancer-specific death on overall survival in resectable breast cancer. Cancer. 123 (13), 2432-2443 (2017).
Kim, H. T. Cumulative incidence in competing risks data and competing risks regression analysis. Clinical Cancer Research. 13, 2 Pt 1 559-565 (2007).
Balachandran, V. P., Gonen, M., Smith, J. J., DeMatteo, R. P. Nomograms in oncology: more than meets the eye. Lancet Oncology. 16 (4), 173-180 (2015).
Han, D. S., et al. Nomogram predicting long-term survival after d2 gastrectomy for gastric cancer. Journal of Clinical Oncology. 30 (31), 3834-3840 (2012).
Karakiewicz, P. I., et al. Multi-institutional validation of a new renal cancer-specific survival nomogram. Journal of Clinical Oncology. 25 (11), 1316-1322 (2007).
Liang, W., et al. Development and validation of a nomogram for predicting survival in patients with resected non-small-cell lung cancer. Journal of Clinical Oncology. 33 (8), 861-869 (2015).
Valentini, V., et al. Nomograms for predicting local recurrence, distant metastases, and overall survival for patients with locally advanced rectal cancer on the basis of European randomized clinical trials. Journal of Clinical Oncology. 29 (23), 3163-3172 (2011).
Iasonos, A., Schrag, D., Raj, G. V., Panageas, K. S. How to build and interpret a nomogram for cancer prognosis. Journal of Clinical Oncology. 26 (8), 1364-1370 (2008).
Chisholm, J. C., et al. Prognostic factors after relapse in nonmetastatic rhabdomyosarcoma: a nomogram to better define patients who can be salvaged with further therapy. Journal of Clinical Oncology. 29 (10), 1319-1325 (2011).
Brockman, J. A., et al. Nomogram Predicting Prostate Cancer-specific Mortality for Men with Biochemical Recurrence After Radical Prostatectomy. European Urology. 67 (6), 1160-1167 (2015).
Zhou, H., et al. Nomogram to Predict Cause-Specific Mortality in Patients With Surgically Resected Stage I Non-Small-Cell Lung Cancer: A Competing Risk Analysis. Clinical Lung Cancer. 19 (2), 195-203 (2018).
Fu, J., et al. De-escalating chemotherapy for stage II colon cancer. Therapeutic Advances in Gastroenterology. 12, 1756284819867553 (2019).
Chen, D., Li, J., Chong, J. K. Hazards regression for freemium products and services: a competing risks approach. Journal of Statistical Computation and Simulation. 87 (9), 1863-1876 (2017).
Frank, E., H, J. rms: Regression Modeling Strategies. R package version 5.1-2. , Available from: https://CRAN.R-project.org/package=rms (2018).
Gray, B. cmprsk: Subdistribution Analysis of Competing Risks. R package version 2.2-7. , Available from: https://CRAN.R-project.org/package=cmprsk (2014).
Scrucca, L., Santucci, A., Aversa, F. Regression modeling of competing risk using R: an in depth guide for clinicians. Bone Marrow Transplantation. 45 (9), 1388-1395 (2010).
Zhang, Z., Geskus, R. B., Kattan, M. W., Zhang, H., Liu, T. Nomogram for survival analysis in the presence of competing risks. Annals in Translational Medicine. 5 (20), 403 (2017).
Geskus, R. B. Cause-specific cumulative incidence estimation and the fine and gray model under both left truncation and right censoring. Biometrics. 67 (1), 39-49 (2011).
Fu, J., et al. Young-onset breast cancer: a poor prognosis only exists in low-risk patients. Journal of Cancer. 10 (14), 3124-3132 (2019).
de Glas, N. A., et al. Performing Survival Analyses in the Presence of Competing Risks: A Clinical Example in Older Breast Cancer Patients. Journal of the National Cancer Institute. 108 (5), (2016).

Cancer Research