Presented here is a protocol to build nomograms based on the Cox proportional hazards regression model and competing risk regression model. The competing method is a more rational method to apply when competing events are present in the survival analysis.
The Kaplan–Meier method and Cox proportional hazards regression model are the most common analyses in the survival framework. These are relatively easy to apply and interpret and can be depicted visually. However, when competing events (e.g., cardiovascular and cerebrovascular accidents, treatment-related deaths, traffic accidents) are present, the standard survival methods should be applied with caution, and real-world data cannot be correctly interpreted. It may be desirable to distinguish different kinds of events that may lead to the failure and treat them differently in the analysis. Here, the methods focus on using the competing regression model to identify significant prognostic factors or risk factors when competing events are present. Additionally, nomograms based on a proportional hazard regression model and a competing regression model are established to help clinicians make individual assessments and risk stratifications in order to explain the impact of controversial factors on prognosis.
The time to event survival analysis is quite common in clinical studies. Survival data measure the time span from the start time until the occurrence of the event of interest, but the occurrence of the event of interest is often precluded by another event. If more than one type of end point is present, they are called competing risks end points. In this case, the standard hazard analysis (i.e., Cox proportional cause-specific hazards model) often does not work well because individuals experiencing another type of event are censored. Individuals who experience a competing event often remain in the risk set, as the competing risks are usually not independent. Therefore, Fine and Gray1 studied the regression model estimation for the sub distribution of a competing risk. In a competing risk setting, three different types of events can be discriminated.
One measures overall survival (OS) by demonstrating a direct clinical benefit from new treatment methods for a disease. OS measures the survival time from time of origin (i.e., time of diagnosis or treatment) to the time of death due to any cause and generally evaluates the absolute risk of death, thereby failing to differentiate the causes of death (e.g., cancer-specific death (CSD) or non-cancer-specific death (non-CSD))2. OS is, therefore, considered as the most important endpoint. The events of interest are often cancer related, while the non-cancer-specific events, which include heart disease, traffic accidents or other unrelated causes, are considered competing events. Malignant patients with a favorable prognosis, who are expected to survive longer, are often at a greater risk of non-CSD. That is, the OS will be diluted by other causes of death and fail to correctly interpret the real effectiveness of clinical treatment. Therefore, OS may not be the optimal measure for accessing the outcomes of disease3. Such biases could be corrected by the competing risk regression model.
There are two main methods for competing risk data: cause-specific hazard models (Cox models) and subdistribution hazard models (competing models). In the following protocol, we present two methods to generate nomograms based on the cause-specific hazard model and the subdistribution hazard model. The cause-specific hazard model can be made to fit in the Cox proportional hazards model, which treats subjects who experience the competing event as censored at the time that the competing event occurred. In the subdistribution hazard model that was introduced by Fine and Gray1 in 1999, three different types of events can be discriminated, and individuals who experience a competing event remain at the risk set forever.
A nomogram is a mathematical representation of the relationship between three or more variables4. Medical nomograms consider biological and clinical event as variables (e.g., tumor grade and patient age) and generate probabilities of a clinical event (e.g., cancer recurrence or death) that is graphically depicted as a statistical prognostic model for a given individual. Generally, a nomogram is formulated based on the results of the Cox proportional hazards model5,6,7,8,9,10.
However, when competing risks are present, a nomogram based on the Cox model might fail to perform well. Though several previous studies11,12,13,14 have applied the competing risk nomogram to estimate the probability of CSD, few studies have described how to establish the nomogram based on a competing risk regression model, and there is no existing package available to accomplish this. Therefore, the method presented below will provide a step-by-step protocol to establish a specific competing-risk nomogram based on a competing risk regression model as well as a risk score estimation to aid clinicians in treatment decision-making.
The research protocol was approved by the Ethics Committee of Jinhua Hospital, Zhejiang University School of Medicine. For this experiment, the cases were obtained from the Surveillance, Epidemiology, and End Results (SEER) database. SEER is an open-access database that includes demographic, incidence and survival data from 18 population-based cancer registries. We registered on the SEER website and signed a letter of assurance to acquire the research data (12296-Nov2018).
1. Data source
2. Installing and loading packages and importing data
NOTE: Perform the following procedures based on R software (version 3.5.3) using the packages rms15 and cmprsk16 (http://www.r-project.org/).
3. Nomogram based on the Cox Proportional Hazards Regression model
4. Nomogram based on the Competing Risk Regression Model
5. Subgroup analysis based on the Group Risk Score (GRS)
Survival characteristics of the example cohort
In the example cohort, a total of 8,550 eligible patients were included in the analysis and the median follow-up time was 88 months (range, 1 to 95 months). A total of 679 (7.94%) patients were younger than 40 years old and 7,871 (92.06%) patients were older than 40. At the end of the trial, 7,483 (87.52%) patients were still alive, 662 (7.74%) died because of breast cancer, and 405 (4.74%) patients died because of other causes (competing risks).
Comparison of two survival models
The cumulative incidences of tumor death/no tumor death and competing events were calculated by the Kaplan-Meier method and the competing risk regression function, respectively (presented in Figure 1). As shown in Figure 1, the sum of the cumulative incidences of tumor death and no tumor death as calculated by the Kaplan-Meier method was higher than the sum of the estimates of all causes of death, which was equal to the cumulative incidence of CSD when the competing method was used. Clearly, the Kaplan-Meier method overestimated the cumulative incidence of tumor death and no tumor death. The competing method could correct its overestimation of the probability of death.
Nomogram based on the Cox proportional hazards regression model
A nomogram was constructed based on significant factors as shown in Figure 2A and Table 1. This included marital status, race, histological type, differentiated grade, T classification, and N classification.
Nomogram based on the competing risk regression model
A competing nomogram based on multiple factors, including race, marital status, histological type, differential grade, T classification, and N classification was constructed (Figure 2B). The beta-coefficients from the model were used for the allocation of scale (Table 1).
Stratification analysis by the risk score
Based on the risk score, the cohort was classified into three subgroups: low risk score: 0-44; medium risk score: 45-85; and high-risk score: 86-299. The forest plot could clearly present the interaction between the GRS and the specific factor (age) (Figure 3). Based on the GRS classification, the worse prognosis of young women only appeared in the low-risk subgroup and young age may act as a protective factor of prognosis in medium- and high-risk subgroups.
Figure 1: Stacked cumulative incidence plot. K-M: Cumulative incidences based on Kaplan-Meier estimates; CR: Cumulative incidences based on cumulative incidence competing risk estimates; Tumor death + no tumor death (K-M): sum of estimates of the cumulative incidence of cancer specific death and non-cancer specific death; CSD + non-CSD (CR): sum of estimates of cancer-specific death and non-cancer-specific death when the CR method was used. Please click here to view a larger version of this figure.
Figure 2: Nomograms of the Cox proportional hazards regression model and competing risk regression model. (A) Nomogram based on the Cox proportional hazards regression model. (B) Nomogram based on the competing risk regression model. For application of the nomograms, each variable axis shows an individual risk factor, and the line drawn upwards is used for the determination of the points of each variable. Then, the total points are calculated to obtain the probability of 2-, 3- and 5-year cancer-specific survival or cumulative incidence function (CIF). Race: 1=white, 2=black, 3=other; Marital status: 1=married, 2= single (never married or domestic partner), 3= divorced (separated, divorced, widowed); Histological type: 1=infiltrative duct cancer, 2= infiltrative lobular cancer, 3= infiltrating duct and lobular carcinoma; Tumor grade: 1= well differentiation, 2= moderate differentiation; 3= poor differentiation. T and N classification was according to the 7th AJCC TNM staging system. Please click here to view a larger version of this figure.
Figure 3: Forest plot of stratification analysis by the risk score for the probability of breast cancer-specific death in younger and older women with breast cancer. Please click here to view a larger version of this figure.
(HR: hazard ratio)
Variables | Score (Cox Model) |
Estimated Probility | Score (Competing Model) |
Estimated Probility | |
Race | |||||
1:White | 10 | 4 | |||
2:Black | 32 | 31 | |||
3:Other | 0 | 0 | |||
Marital status | |||||
1:Married | 0 | 0 | |||
2:Unmarried | 9 | 5 | |||
3:Divorced | 37 | 15 | |||
Histology | |||||
1:Adenocarcinoma | 10 | 12 | |||
2:Mucinous adenocarcinoma | 8 | 5 | |||
3:Singet ring cell carcinoma | 0 | 0 | |||
Differential grade | |||||
1:Grade I | 0 | 0 | |||
2:Grade II | 6 | 36 | |||
3:Grade III | 37 | 77 | |||
T classificationa | |||||
1:T1 | 0 | 0 | |||
2:T2 | 41 | 50 | |||
3:T3 | 59 | 68 | |||
4:T4 | 100 | 98 | |||
N classificationa | |||||
00:00 | 0 | 0 | |||
1:0-3 | 17 | 42 | |||
2:3-6 | 43 | 65 | |||
3:6-12 | 74 | 100 | |||
Total score (2-year Survival) |
278 | 0.6 | Total score (2-year CIF) |
95 | 0.01 |
254 | 0.7 | 233 | 0.1 | ||
223 | 0.8 | 277 | 0.2 | ||
173 | 0.9 | 305 | 0.3 | ||
125 | 0.95 | 326 | 0.4 | ||
344 | 0.5 | ||||
Total score (3-year Survival) |
281 | 0.4 | Total score (3-year CIF) |
62 | 0.01 |
242 | 0.6 | 245 | 0.2 | ||
218 | 0.7 | 293 | 0.4 | ||
187 | 0.8 | 311 | 0.5 | ||
137 | 0.9 | 328 | 0.6 | ||
89 | 0.95 | 344 | 0.7 | ||
Total score (5-year Survival) |
303 | 0.1 | Total score (5-year CIF) |
29 | 0.01 |
279 | 0.2 | 212 | 0.2 | ||
241 | 0.4 | 260 | 0.4 | ||
203 | 0.6 | 295 | 0.6 | ||
179 | 0.7 | 328 | 0.8 | ||
148 | 0.8 | 349 | 0.9 | ||
98 | 0.9 | ||||
50 | 0.95 | ||||
aT and N classification according to 7th AJCC staging system CIF: Cumulative Incidence Function |
Table 1: Point assignment and prognostic score in the nomogram based on Cox the proportional hazards regression model and competing risk regression model.
Supplemental File 1. Please click here to download this file.
Supplemental File 2. Please click here to download this file.
Supplemental File 3. Please click here to download this file.
The overall goal of the current study was to establish a specific competing-risk nomogram that could describe real-world diseases and to develop a convenient individual assessment model for clinicians to approach treatment decisions. Here, we provide a step-by-step tutorial for establishing nomograms based on the Cox regression model and competing risk regression model and further performing subgroup analysis. Zhang et al.18 introduced an approach to create a competing-risk nomogram, but the main concept of the methodology described in the paper is totally different. The methods of Zhang et al. first transformed the original data to weighted data by the crprep() function in the mstate package19, and then drew the nomogram by the rms package. However, the core concept of the method is totally different from that. Simply put, we replace the parameters generated by cph with the outcome of the function crr and then draw a competing-risk nomogram in the frame of the Cox nomogram. In this method, the Cox nomogram is more like a frame.
Malignant patients with a favorable prognosis who are expected to have a longer survival with cancer are at a greater risk of non-cancer-specific death. Their OS will be largely diluted by the incidence of non-CSD, as shown in Figure 1. Taking patients with stage II colon cancer13 as an example, if we take no account of causes of cancer in generating curves of all causes of death according to the Kaplan-Meier method, such curves would be largely affected by the cumulative incidence of non-CSD rather than the cumulative incidence of CSD.
The standard Cox model for the assessment of covariates would definitely lead to incorrect and biased results (for example, for chemotherapy in stage II colon cancer13, chemotherapy was a protective factor for OS). The bias could be corrected by the competing risk regression method, especially for the oldest subgroup (chemotherapy will be defined as a harmful factor for CSD). The non-CSD event is a nonnegligible competing risk in patients with cancer, especially for those with favorable prognosis.
Then, after we established a nomogram, the probability of death in associated with each variable was presented as a point on the nomogram. The risk score for each patient was calculated by totalling the points of all the variables. Based on the total score, we can further divide the cohort into three subgroups (low, medium, high) to stratify the impact of controversial factors on prognosis, which might be helpful for clinicians to solve clinical issues. Take the effect of age on breast cancer as an example20. The impact of age on the outcomes of patients with early breast cancer has not been clinically established and is controversial. Based on the GRS classification, the worse prognosis of young women only appeared in the low- and medium-risk subgroups, and young age may act as a protective factor of prognosis.
In terms of limitations, the competing risk estimate might lead to over competition in some situations21. For example, diseases with poor prognosis (such as advanced malignant tumors or poor differentiated pancreatic cancer) and great toxicities will inevitably have predominant effects on non-CSD. Whether the Cox model or the subdistribution proportional regression model (competing risk) should be applied in survival analysis should be carefully considered. Both non-CSD and over competition should be addressed carefully when survival is being estimated. Based on the results, we propose that for diseases with good prognosis and patients with old age, the impact of non-CSD on OS should be carefully considered in future clinical trials. CSD, which is based on a competing risk model, may be an alternative endpoint instead of always using traditional OS.
In conclusion, we propose that not only malignant tumors with different prognosis but also the same disease with different stages might require the individual choice of an appropriate endpoint. Additionally, this methodology could be used to establish a nomogram based on the proper model (Cox or competing regression model) for quantifying risk, which can be further used for individualized guidance as well as better explain clinical phenomena in clinical practice.
The authors have nothing to disclose.
The study was supported by grants from the general program of Zhejiang Province Natural Science Foundation (grant number LY19H160020) and key program of the Jinhua Municipal Science & Technology Bureau (grant number 2016-3-005, 2018-3-001d and 2019-3-013).