15

I wish to create a toy survival (time to event) data which is right censored and follows some distribution with proportional hazards and constant baseline hazard.

I created the data as follows, but I am unable to obtain estimated hazard ratios that are close to the true values after fitting a Cox proportional hazards model to the simulated data.

What did I do wrong?

R codes:

library(survival)

#set parameters
set.seed(1234)

n = 40000 #sample size


#functional relationship

lambda=0.000020 #constant baseline hazard 2 per 100000 per 1 unit time

b_haz <-function(t) #baseline hazard
  {
    lambda #constant hazard wrt time 
  }

x = cbind(hba1c=rnorm(n,2,.5)-2,age=rnorm(n,40,5)-40,duration=rnorm(n,10,2)-10)

B = c(1.1,1.2,1.3) # hazard ratios (model coefficients)

hist(x %*% B) #distribution of scores

haz <-function(t) #hazard function
{
  b_haz(t) * exp(x %*% B)
}

c_hf <-function(t) #cumulative hazards function
{
  exp(x %*% B) * lambda * t 
}

S <- function(t) #survival function
{
  exp(-c_hf(t))
}

S(.005)
S(1)
S(5)

#simulate censoring

time = rnorm(n,10,2)

S_prob = S(time)

#simulate events

event = ifelse(runif(1)>S_prob,1,0)

#model fit

km = survfit(Surv(time,event)~1,data=data.frame(x))

plot(km) #kaplan-meier plot

#Cox PH model

fit = coxph(Surv(time,event)~ hba1c+age+duration, data=data.frame(x))

summary(fit)            

cox.zph(fit)

Results:

Call:
coxph(formula = Surv(time, event) ~ hba1c + age + duration, data = data.frame(x))

  n= 40000, number of events= 3043 

             coef exp(coef) se(coef)     z Pr(>|z|)    
hba1c    0.236479  1.266780 0.035612  6.64 3.13e-11 ***
age      0.351304  1.420919 0.003792 92.63  < 2e-16 ***
duration 0.356629  1.428506 0.008952 39.84  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

         exp(coef) exp(-coef) lower .95 upper .95
hba1c        1.267     0.7894     1.181     1.358
age          1.421     0.7038     1.410     1.432
duration     1.429     0.7000     1.404     1.454

Concordance= 0.964  (se = 0.006 )
Rsquare= 0.239   (max possible= 0.767 )
Likelihood ratio test= 10926  on 3 df,   p=0
Wald test            = 10568  on 3 df,   p=0
Score (logrank) test = 11041  on 3 df,   p=0

but true values are set as

B = c(1.1,1.2,1.3) # hazard ratios (model coefficients)
CC BY-SA 3.0
1

3 Answers 3

Reset to default
27

It is not clear to me how you generate your event times (which, in your case, might be <0) and event indicators:

time = rnorm(n,10,2) 
S_prob = S(time)
event = ifelse(runif(1)>S_prob,1,0)

So here is a generic method, followed by some R code.


Generating survival times to simulate Cox proportional hazards models

To generate event times from the proportional hazards model, we can use the inverse probability method (Bender et al., 2005): if Vis uniform on (0,1)and if S(|x)is the conditional survival function derived from the proportional hazards model, i.e.

S(t|x)=exp(H0(t)exp(xβ)()
then it is a fact that the random variable
T=S1(V|x)=H01(log(V)exp(xβ))
has survival function S(|x). This result is known as ``the inverse probability integral transformation''. Therefore, to generate a survival time TS(|x)given the covariate vector, it suffices to draw vfrom VU(0,1)and to make the inverse transformation t=S1(v|x).


Example [Weibull baseline hazard]

Let h0(t)=λρtρ1with shape ρ>0and scale λ>0. Then H0(t)=λtρand H01(t)=(tλ)1ρ. Following the inverse probability method, a realisation of TS(|x)is obtained by computing

t=(log(v)λexp(xβ))1ρ
with va uniform variate on (0,1). Using results on transformations of random variables, one may notice that Thas a conditional Weibull distribution (given x) with shape ρand scale λexp(xβ).


R code

The following R function generates a data set with a single binary covariate x(e.g. a treatment indicator). The baseline hazard has a Weibull form. Censoring times are randomly drawn from an exponential distribution.

# baseline hazard: Weibull

# N = sample size    
# lambda = scale parameter in h0()
# rho = shape parameter in h0()
# beta = fixed effect parameter
# rateC = rate parameter of the exponential distribution of C

simulWeib <- function(N, lambda, rho, beta, rateC)
{
  # covariate --> N Bernoulli trials
  x <- sample(x=c(0, 1), size=N, replace=TRUE, prob=c(0.5, 0.5))

  # Weibull latent event times
  v <- runif(n=N)
  Tlat <- (- log(v) / (lambda * exp(x * beta)))^(1 / rho)

  # censoring times
  C <- rexp(n=N, rate=rateC)

  # follow-up times and event indicators
  time <- pmin(Tlat, C)
  status <- as.numeric(Tlat <= C)

  # data set
  data.frame(id=1:N,
             time=time,
             status=status,
             x=x)
}

Test

Here is some quick simulation with β=0.6:

set.seed(1234)
betaHat <- rep(NA, 1e3)
for(k in 1:1e3)
{
  dat <- simulWeib(N=100, lambda=0.01, rho=1, beta=-0.6, rateC=0.001)
  fit <- coxph(Surv(time, status) ~ x, data=dat)
  betaHat[k] <- fit$coef
}

> mean(betaHat)
[1] -0.6085473
CC BY-SA 3.0
11
6

Based on @ocram answer, toy data can also be created by using a built-in R function, rweibull(), with a scaled scale λ(β)where

λ=λexp(xβρ).
Because
S(t|x)=exp(H0(t)exp(xβ)()
and S, or 1 - CDF, of Weibull distribution is
S(t)=exp((t/λ)ρ)
so
S(t|x,λ)=exp((t/λ)ρexp(xβ))=exp((t/λ)ρexp(xβ/ρ)ρ)=exp((tλexp(xβ/ρ))ρ)=S(t|x,λ)

R code with alternative draw for event times

simulWeib <- function(N, lambda, rho, beta, rateC)
{
  # covariate --> N Bernoulli trials
  x <- sample(x=c(0, 1), size=N, replace=TRUE, prob=c(0.5, 0.5))
  
  # Weibull latent event times
  # v <- runif(n=N)
  # Tlat <- (- log(v) / (lambda * exp(x * beta)))^(1 / rho)
  
  # An alternative draw for event times
  lambda_wiki = lambda^(-1 / rho) #change definition of lambda to Wikipedia's
  lambda_prime = lambda_wiki / exp(x * beta / rho) #re-scale according to beta
  Tlat = rweibull(length(x), shape=rho, scale=lambda_prime)

  # censoring times
  C <- rexp(n=N, rate=rateC)
  
  # follow-up times and event indicators
  time <- pmin(Tlat, C)
  status <- as.numeric(Tlat <= C)
  
  # data set
  data.frame(id=1:N,
             time=time,
             status=status,
             x=x)
}
set.seed(1234)
betaHat <- rep(NA, 1e3)
for(k in 1:1e3)
{
  dat <- simulWeib(N=100, lambda=0.01, rho=1, beta=-0.6, rateC=0.001)
  fit <- coxph(Surv(time, status) ~ x, data=dat)
  betaHat[k] <- fit$coef
}

> mean(betaHat)
[1] -0.6085473
CC BY-SA 4.0
    3

    For Weibull distribution,
    S(t) = e(λe(xβ)t)ρ

    "(1/rho)" will be only for log(v)

    so, I modified like this

    Tlat <- (- log(v))^(1 / rho) / (lambda * exp(x * beta))
    

    if rho = 1, result will be same.

    CC BY-SA 3.0

      Your Answer

      By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

      Not the answer you're looking for? Browse other questions taggedor ask your own question.