Prediction in Cox regression

Question

I am doing a multivariate Cox regression, I have my significant independent variables and beta values. The model fits to my data very well.

Now, I would like to use my model and predict the survival of a new observation. I am unclear how to do this with a Cox model. In a linear or logistic regression, it would be easy, just put the values of new observation in the regression and multiply them with betas and so I have the prediction of my outcome.

How can I determine my baseline hazard? I need it in addition to computing the prediction.

How is this done in a Cox model?

ocram · Accepted Answer · 2015-06-23 07:41:44Z

46

Following Cox model, the estimated hazard for individual $i$ with covariate vector $x_{i}$ has the form

{\hat{h}}_{i} (t) = {\hat{h}}_{0} (t) \exp (x_{i}^{'} \hat{β}),

where

\hat{β}

is found by maximising the partial likelihood, while

{\hat{h}}_{0}

follows from the Nelson-Aalen estimator,

{\hat{h}}_{0} (t_{i}) = \frac{d_{i}}{\sum_{j : t_{j} \geq t_{i}} \exp (x_{j}^{'} \hat{β})}

with

t_{1}

,

t_{2}, \dots

the distinct event times and

d_{i}

the number of deaths at

t_{i}

(see, e.g., Section 3.6).

Similarly,

{\hat{S}}_{i} (t) = {\hat{S}}_{0} (t)^{\exp (x_{i}^{'} \hat{β})}

with

{\hat{S}}_{0} (t) = \exp (- {\hat{Λ}}_{0} (t))

and

{\hat{Λ}}_{0} (t) = \sum_{j : t_{j} \leq t} {\hat{h}}_{0} (t_{j}) .

EDIT: This might also be of interest :-)

edited Jun 23, 2015 at 7:41

answered Sep 11, 2012 at 5:11

ocram

22.1k5 gold badges85 silver badges83 bronze badges

2
That is exactly my question... I need an estimation of baseline hazard function to be able to have the prediction, correct? Do you know any method for estimating it?
– Marja
Commented Sep 11, 2012 at 11:32
2
@Marjan the jackknife may not properly reflect uncertainty caused by variable selection. The bootstrap properly shows more variability in which variables are labeled "significant". If you want to do a "relative validation" you can show that predictive discrimination is good after correcting for overfitting. This does not require dealing with the baseline hazard, but is validating relative log hazard estimates. The validate function in the R rms package in conjunction with the cph function will do that. The only stepwise algorithm implemented in validate is backwards stepdown.
– Frank Harrell
Commented Sep 12, 2012 at 12:44
2
Getting predicted relative hazards (i.e., the linear predictor) is quite simple. But I quit using SAS in 1991.
– Frank Harrell
Commented Sep 13, 2012 at 17:57
11
The link has gone dead :-(.
– gung - Reinstate Monica
Commented Oct 13, 2014 at 3:02
4
Is there a way to predict the survival Time T for a specific individual? I mean that given a list of values for the covariates, what is the way to find out the time after which the individual is most likely to die?
– statBeginner
Commented Feb 12, 2015 at 21:31

| Show 9 more comments

miura · Accepted Answer · 2015-07-08 12:40:33Z

The function predictSurvProb in the pec package can give you absolute risk estimates for new data based on an existing cox model if you use R.

The mathematical details I cannot explain.

EDIT: The function provides survival probabilities, which I have so far taken as 1-(Event probability).

EDIT 2:

One can do without the pec package. Using only the survival package, the following function returns absolute risk based on a Cox model

risk = function(model, newdata, time) {
  as.numeric(1-summary(survfit(model, newdata = newdata, se.fit = F, conf.int = F), times = time)$surv)
}

1-Survival probability is the cumulative hazard. I think the OP requests the instantaneous hazard function (of the baseline) or some kind of smoothed estimate of it (muhaz packages in R). — ECII, Commented Mar 9, 2013 at 21:32
1-Survival probability is not the cumulative hazard. In the absence of competing risks the two are connected as detailed on en.wikipedia.org/wiki/…. — miura, Commented Jul 8, 2015 at 12:34

gung - Reinstate Monica · Accepted Answer · 2014-10-13 02:56:22Z

Maybe you would also like to try something like this? Fit a Cox proportional hazards model and use it to get the predicted Survival curve for a new instance.

Taken out of the help file for the survfit.coxph in R (I just added the lines part)

# fit a Cox proportional hazards model and plot the  
# predicted survival for a 60 year old 
fit <- coxph(Surv(futime, fustat) ~ age, data=ovarian) 
plot(survfit(fit, newdata=data.frame(age=60)),
     xscale=365.25, xlab="Years", ylab="Survival", conf.int=F) 
# also plot the predicted survival for a 70 year old
lines(survfit(fit, newdata=data.frame(age=70)),
     xscale=365.25, xlab="Years", ylab="Survival")

You should keep in mind though that for the proportional hazards assumption to still hold for your prediction, the patient for which you predict should be from a group that is qualitatively the same as the one used to derive the Cox proportional hazards model you used for the prediction.

ECII · Accepted Answer · 2013-03-09 21:41:27Z

The basehaz function of survival packages provides the baseline hazard at the event time points. From that you can work your way up the math that ocram provides and include the ORs of your coxph estimates.

Michael R. Chernick · Accepted Answer · 2012-09-10 15:08:04Z

2

The whole point of the Cox model is the proportional hazard's assumption and the use of the partial likelhood. The partial likelihood has the baseline hazard function eliminated. So you do not need to specify one. That is the beauty of it!

answered Sep 10, 2012 at 15:08

Michael R. Chernick

43k28 gold badges84 silver badges158 bronze badges

3
If you however want to get an estimate of the hazard or the survival for a particular value of the covariate vector, then you do need an estimate of the baseline hazard or survival. The Nelson-Aalen estimate usually makes the job...
– ocram
Commented Sep 10, 2012 at 15:13
2
Often with the Cox model you are comparing two survival functions and the key is the hazard ratio rather than the hazard function. The baseline hazard is like a nuisance parameter that Cox so cleverly eliminated from the problem using the proportional hazards assumption. Whatever method you would like to use for estimating the hazard function and/or the baseline hazard in the context of the model would require using the Cox form of the model which forces proportionality.
– Michael R. Chernick
Commented Sep 10, 2012 at 15:30
Thank you so much, It would be great if you see my comment on the answer of ocram. Maybe you could help me too?
– Marja
Commented Sep 11, 2012 at 7:17
3
You can also stratify on factors that are not in proportional hazards. But at any rate the Cox model and its after-the-fit estimator of the baseline hazard can be used to get predicted quantiles of survival time, various survival probabilities, and predicted mean survival time if you have long-term follow-up. All these quantities are easy to get in the R package rms.
– Frank Harrell
Commented Sep 11, 2012 at 11:31
1
You don't need to specify it, but it is estimated.
– DWin
Commented Oct 31, 2018 at 21:55

Add a comment |

Stack Exchange Network

Prediction in Cox regression

5 Answers 5

Your Answer

Not the answer you're looking for? Browse other questions tagged
regression
survival
prediction
cox-model
or ask your own question.

Linked

Hot Network Questions

Strictly Necessary Cookies

Performance Cookies

Functional Cookies

Targeting Cookies

Prediction in Cox regression

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions taggedregressionsurvivalpredictioncox-modelor ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
regression
survival
prediction
cox-model
or ask your own question.