56

I am doing a multivariate Cox regression, I have my significant independent variables and beta values. The model fits to my data very well.

Now, I would like to use my model and predict the survival of a new observation. I am unclear how to do this with a Cox model. In a linear or logistic regression, it would be easy, just put the values of new observation in the regression and multiply them with betas and so I have the prediction of my outcome.

How can I determine my baseline hazard? I need it in addition to computing the prediction.

How is this done in a Cox model?

CC BY-SA 3.0

    5 Answers 5

    Reset to default
    46

    Following Cox model, the estimated hazard for individual iwith covariate vector xihas the form

    h^i(t)=h^0(t)exp(xiβ^),
    where β^is found by maximising the partial likelihood, while h^0follows from the Nelson-Aalen estimator,
    h^0(ti)=dij:tjtiexp(xjβ^)
    with t1, t2,the distinct event times and dithe number of deaths at ti(see, e.g., Section 3.6).

    Similarly,

    S^i(t)=S^0(t)exp(xiβ^)
    with S^0(t)=exp(Λ^0(t))and
    Λ^0(t)=j:tjth^0(tj).

    EDIT: This might also be of interest :-)

    CC BY-SA 3.0
    14
    • 2
      That is exactly my question... I need an estimation of baseline hazard function to be able to have the prediction, correct? Do you know any method for estimating it?
      – Marja
      Commented Sep 11, 2012 at 11:32
    • 2
      @Marjan the jackknife may not properly reflect uncertainty caused by variable selection. The bootstrap properly shows more variability in which variables are labeled "significant". If you want to do a "relative validation" you can show that predictive discrimination is good after correcting for overfitting. This does not require dealing with the baseline hazard, but is validating relative log hazard estimates. The validate function in the R rms package in conjunction with the cph function will do that. The only stepwise algorithm implemented in validate is backwards stepdown. Commented Sep 12, 2012 at 12:44
    • 2
      Getting predicted relative hazards (i.e., the linear predictor) is quite simple. But I quit using SAS in 1991. Commented Sep 13, 2012 at 17:57
    • 11
    • 4
      Is there a way to predict the survival Time T for a specific individual? I mean that given a list of values for the covariates, what is the way to find out the time after which the individual is most likely to die? Commented Feb 12, 2015 at 21:31
    16

    The function predictSurvProb in the pec package can give you absolute risk estimates for new data based on an existing cox model if you use R.

    The mathematical details I cannot explain.

    EDIT: The function provides survival probabilities, which I have so far taken as 1-(Event probability).

    EDIT 2:

    One can do without the pec package. Using only the survival package, the following function returns absolute risk based on a Cox model

    risk = function(model, newdata, time) {
      as.numeric(1-summary(survfit(model, newdata = newdata, se.fit = F, conf.int = F), times = time)$surv)
    }
    
    CC BY-SA 3.0
    2
    15

    Maybe you would also like to try something like this? Fit a Cox proportional hazards model and use it to get the predicted Survival curve for a new instance.

    Taken out of the help file for the survfit.coxph in R (I just added the lines part)

    # fit a Cox proportional hazards model and plot the  
    # predicted survival for a 60 year old 
    fit <- coxph(Surv(futime, fustat) ~ age, data=ovarian) 
    plot(survfit(fit, newdata=data.frame(age=60)),
         xscale=365.25, xlab="Years", ylab="Survival", conf.int=F) 
    # also plot the predicted survival for a 70 year old
    lines(survfit(fit, newdata=data.frame(age=70)),
         xscale=365.25, xlab="Years", ylab="Survival") 
    

    You should keep in mind though that for the proportional hazards assumption to still hold for your prediction, the patient for which you predict should be from a group that is qualitatively the same as the one used to derive the Cox proportional hazards model you used for the prediction.

    CC BY-SA 3.0
      7

      The basehaz function of survival packages provides the baseline hazard at the event time points. From that you can work your way up the math that ocram provides and include the ORs of your coxph estimates.

      CC BY-SA 3.0
        2

        The whole point of the Cox model is the proportional hazard's assumption and the use of the partial likelhood. The partial likelihood has the baseline hazard function eliminated. So you do not need to specify one. That is the beauty of it!

        CC BY-SA 3.0
        5
        • 3
          If you however want to get an estimate of the hazard or the survival for a particular value of the covariate vector, then you do need an estimate of the baseline hazard or survival. The Nelson-Aalen estimate usually makes the job...
          – ocram
          Commented Sep 10, 2012 at 15:13
        • 2
          Often with the Cox model you are comparing two survival functions and the key is the hazard ratio rather than the hazard function. The baseline hazard is like a nuisance parameter that Cox so cleverly eliminated from the problem using the proportional hazards assumption. Whatever method you would like to use for estimating the hazard function and/or the baseline hazard in the context of the model would require using the Cox form of the model which forces proportionality. Commented Sep 10, 2012 at 15:30
        • Thank you so much, It would be great if you see my comment on the answer of ocram. Maybe you could help me too?
          – Marja
          Commented Sep 11, 2012 at 7:17
        • 3
          You can also stratify on factors that are not in proportional hazards. But at any rate the Cox model and its after-the-fit estimator of the baseline hazard can be used to get predicted quantiles of survival time, various survival probabilities, and predicted mean survival time if you have long-term follow-up. All these quantities are easy to get in the R package rms. Commented Sep 11, 2012 at 11:31
        • 1
          You don't need to specify it, but it is estimated.
          – DWin
          Commented Oct 31, 2018 at 21:55

        Your Answer

        By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

        Not the answer you're looking for? Browse other questions taggedor ask your own question.