Categories: Biostatistics

Crude Cumulative Incidence: Flexible Estimation Methods

Crude Cumulative Incidence: Flexible Estimation Methods

Introduction

In clinical research, overall survival (OS) has long been the standard endpoint for assessing therapies. Yet OS conflates several competing events that shape patient risk trajectories over extended follow-up. Crude cumulative incidence (CIF) provides a direct, clinically interpretable estimate of the probability that a specific event occurs over time while accounting for competing risks. This article reviews modeling strategies for flexible CIF estimation in long studies and outlines approaches to evaluating predictive ability across competing-risk frameworks.

Key concepts for long follow-up data

Crude cumulative incidence refers to the probability that a particular event (e.g., disease progression, cause-specific death) occurs by a given time, in the presence of competing events that may preclude the event of interest. In long follow-up, covariate effects may change over time, censoring patterns may become complex, and simple OS-focused analyses can mislead inference about real-world risk. Distinguishing CIF from the cause-specific hazard and understanding subdistribution hazards are essential for appropriate modeling and interpretation in this context.

Modeling strategies for flexible CIF estimation

Several modeling avenues balance interpretability, flexibility, and predictive performance when estimating CIF under competing risks and long follow-up.

Nonparametric CIF estimation

The Aalen-Johansen estimator provides a nonparametric CIF estimate that is robust and easy to compute without covariates. While ideal for crude CIFs and visualization, it does not readily accommodate covariate-based prediction or time-varying effects, limiting its use for adjustment and forecasting in heterogeneous populations.

Regression approaches under competing risks

The Fine-Gray subdistribution hazards model directly links covariates to the CIF, making it attractive for covariate-adjusted CIF estimation and prediction. Interpretation centers on the subdistribution hazard, which reflects how covariates shift the probability of the event of interest in the presence of competing risks. Alternatively, cause-specific hazards models estimate covariate effects on each cause’s hazard; CIFs can then be derived, but the link to absolute risk is less direct and may require additional transformation. For long follow-up, time-varying effects can be important, motivating more flexible specifications within these frameworks.

Flexible parametric and regression-based CIF modeling

Flexible parametric survival models (e.g., Royston-Parmar formulations) can be adapted to model CIF by incorporating splines for baseline subdistribution hazards or using pseudo-values for regression on CIF. These approaches capture time-varying effects and complex hazard shapes without rigid parametric assumptions. Software packages (in R and other environments) often support these models alongside standard competing-risks tools, enabling practical implementation in real-world data analyses.

Pseudo-values and regression on CIF

The pseudo-value approach enables regression on CIF directly at specific time horizons, providing intuitive interpretation while retaining the ability to adjust for covariates. It is particularly useful when the underlying hazard structure is complex or when researchers wish to avoid fully specifying the hazard function.

Evaluating predictive ability of CIF models

Predictive performance in competing risks settings is assessed through discrimination, calibration, and clinical usefulness, with adaptations to account for censoring and competing events.

Discrimination and time-dependent metrics

Time-dependent AUC (e.g., Uno’s estimator) and the concordance index adapted for competing risks measure how well models distinguish individuals who experience the event earlier versus later. Inverse probability of censoring weighting (IPCW) often stabilizes these estimates in the presence of censoring.

Calibration and accuracy

Calibration assesses how closely predicted CIFs match observed CIFs at specific time points. The IPCW Brier score quantifies overall prediction error, while calibration plots visualize agreement across risk strata. Dynamic predictions and landmark analyses provide a practical view of model accuracy over time.

Internal and external validation

Bootstrap resampling or cross-validation can estimate optimism in predictive metrics. External validation in independent cohorts tests transportability and generalizability, informing confidence in applying CIF models to new populations.

Practical guidance for researchers

To implement robust CIF modeling in long follow-ups, consider the following workflow:

  • Define the event of interest and competing risks clearly; decide whether CIF or cause-specific hazards best serves the clinical question.
  • Explore nonparametric CIF estimates for intuitive understanding and to guide model choice.
  • Fit covariate-adjusted CIF models (e.g., Fine-Gray, cause-specific hazards with CIF derivation) and assess time-varying effects when plausible.
  • Consider flexible parametric or pseudo-values approaches to capture complex hazard shapes and evolving covariate effects over time.
  • Evaluate predictive ability using time-dependent AUC, IPCW Brier score, and calibration plots; perform bootstrapping or cross-validation for robust inference.
  • Report model assumptions, validation results, and sensitivity analyses (e.g., alternative time horizons, missing data handling).

Conclusion

Flexible estimation of crude cumulative incidence in the setting of long follow-ups enhances risk interpretation and clinical decision-making by directly modeling the probability of events in the presence of competing risks. Thoughtful model choice, time-varying considerations, and rigorous predictive-ability evaluation are essential to translate CIF analyses into actionable insights for patient care and policy.