How to create a toy survival (time to event) data with right censoring

Question

I wish to create a toy survival (time to event) data which is right censored and follows some distribution with proportional hazards and constant baseline hazard.

I created the data as follows, but I am unable to obtain estimated hazard ratios that are close to the true values after fitting a Cox proportional hazards model to the simulated data.

What did I do wrong?

R codes:

library(survival)

#set parameters
set.seed(1234)

n = 40000 #sample size


#functional relationship

lambda=0.000020 #constant baseline hazard 2 per 100000 per 1 unit time

b_haz <-function(t) #baseline hazard
  {
    lambda #constant hazard wrt time 
  }

x = cbind(hba1c=rnorm(n,2,.5)-2,age=rnorm(n,40,5)-40,duration=rnorm(n,10,2)-10)

B = c(1.1,1.2,1.3) # hazard ratios (model coefficients)

hist(x %*% B) #distribution of scores

haz <-function(t) #hazard function
{
  b_haz(t) * exp(x %*% B)
}

c_hf <-function(t) #cumulative hazards function
{
  exp(x %*% B) * lambda * t 
}

S <- function(t) #survival function
{
  exp(-c_hf(t))
}

S(.005)
S(1)
S(5)

#simulate censoring

time = rnorm(n,10,2)

S_prob = S(time)

#simulate events

event = ifelse(runif(1)>S_prob,1,0)

#model fit

km = survfit(Surv(time,event)~1,data=data.frame(x))

plot(km) #kaplan-meier plot

#Cox PH model

fit = coxph(Surv(time,event)~ hba1c+age+duration, data=data.frame(x))

summary(fit)            

cox.zph(fit)

Results:

Call:
coxph(formula = Surv(time, event) ~ hba1c + age + duration, data = data.frame(x))

  n= 40000, number of events= 3043 

             coef exp(coef) se(coef)     z Pr(>|z|)    
hba1c    0.236479  1.266780 0.035612  6.64 3.13e-11 ***
age      0.351304  1.420919 0.003792 92.63  < 2e-16 ***
duration 0.356629  1.428506 0.008952 39.84  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

         exp(coef) exp(-coef) lower .95 upper .95
hba1c        1.267     0.7894     1.181     1.358
age          1.421     0.7038     1.410     1.432
duration     1.429     0.7000     1.404     1.454

Concordance= 0.964  (se = 0.006 )
Rsquare= 0.239   (max possible= 0.767 )
Likelihood ratio test= 10926  on 3 df,   p=0
Wald test            = 10568  on 3 df,   p=0
Score (logrank) test = 11041  on 3 df,   p=0

but true values are set as

B = c(1.1,1.2,1.3) # hazard ratios (model coefficients)

for your task, a quick start is to use an existing simulation package: cran.r-project.org/web/packages/survsim/index.html — zhanxw, Commented Jan 18, 2017 at 7:48

ocram · Accepted Answer · 2017-02-08 06:51:05Z

It is not clear to me how you generate your event times (which, in your case, might be $< 0$ ) and event indicators:

time = rnorm(n,10,2) 
S_prob = S(time)
event = ifelse(runif(1)>S_prob,1,0)

So here is a generic method, followed by some R code.

Generating survival times to simulate Cox proportional hazards models

To generate event times from the proportional hazards model, we can use the inverse probability method (Bender et al., 2005): if $V$ is uniform on $(0, 1)$ and if $S (\cdot | x)$ is the conditional survival function derived from the proportional hazards model, i.e.

S (t | x) = \exp (- H_{0} (t) \exp (x^{'} β))

then it is a fact that the random variable

T = S^{- 1} (V | x) = H_{0}^{- 1} (- \frac{\log (V)}{\exp (x^{'} β)})

has survival function

S (\cdot | x)

. This result is known as ``the inverse probability integral transformation''. Therefore, to generate a survival time

T \sim S (\cdot | x)

given the covariate vector, it suffices to draw

v

from

V \sim U (0, 1)

and to make the inverse transformation

t = S^{- 1} (v | x)

.

Example [Weibull baseline hazard]

Let $h_{0} (t) = λ ρ t^{ρ - 1}$ with shape $ρ > 0$ and scale $λ > 0$ . Then $H_{0} (t) = λ t^{ρ}$ and $H_{0}^{- 1} (t) = (\frac{t}{λ})^{\frac{1}{ρ}}$ . Following the inverse probability method, a realisation of $T \sim S (\cdot | x)$ is obtained by computing

t = {(- \frac{\log (v)}{λ \exp (x^{'} β)})}^{\frac{1}{ρ}}

with

v

a uniform variate on

(0, 1)

. Using results on transformations of random variables, one may notice that

T

has a conditional Weibull distribution (given

x

) with shape

ρ

and scale

λ \exp (x^{'} β)

.

R code

The following R function generates a data set with a single binary covariate $x$ (e.g. a treatment indicator). The baseline hazard has a Weibull form. Censoring times are randomly drawn from an exponential distribution.

# baseline hazard: Weibull

# N = sample size    
# lambda = scale parameter in h0()
# rho = shape parameter in h0()
# beta = fixed effect parameter
# rateC = rate parameter of the exponential distribution of C

simulWeib <- function(N, lambda, rho, beta, rateC)
{
  # covariate --> N Bernoulli trials
  x <- sample(x=c(0, 1), size=N, replace=TRUE, prob=c(0.5, 0.5))

  # Weibull latent event times
  v <- runif(n=N)
  Tlat <- (- log(v) / (lambda * exp(x * beta)))^(1 / rho)

  # censoring times
  C <- rexp(n=N, rate=rateC)

  # follow-up times and event indicators
  time <- pmin(Tlat, C)
  status <- as.numeric(Tlat <= C)

  # data set
  data.frame(id=1:N,
             time=time,
             status=status,
             x=x)
}

Test

Here is some quick simulation with $β = - 0.6$ :

set.seed(1234)
betaHat <- rep(NA, 1e3)
for(k in 1:1e3)
{
  dat <- simulWeib(N=100, lambda=0.01, rho=1, beta=-0.6, rateC=0.001)
  fit <- coxph(Surv(time, status) ~ x, data=dat)
  betaHat[k] <- fit$coef
}

> mean(betaHat)
[1] -0.6085473

Thank you for your excellent answer. I realized I had messed up the event times by getting events status after I randomized event times, which didn't make sense.. silly me! — stats_newb, Commented Jan 27, 2015 at 9:12
May I ask is there any specific reason why you draw censoring time from an exponential distribution? — pthao, Commented May 16, 2016 at 5:06
@pthao: there is no particular reason (this was just an illustration where I used the exponential distribution) — ocram, Commented May 16, 2016 at 5:43
Is there any guideline for choosing the distribution for censoring times? — pthao, Commented May 16, 2016 at 6:51
@ocram Interestingly, when I run flexsurvreg(Surv(time, status) ~ x, data=dat, dist = "weibull") on the same simulated data, the coefficient appears as 0.6212. Why is this? — neither-nor, Commented Jun 4, 2019 at 23:50

Ryan SY Kwan · Accepted Answer · 2020-06-15 11:57:21Z

Based on @ocram answer, toy data can also be created by using a built-in R function, rweibull(), with a scaled scale $λ^{'} (β)$ where

λ^{'} = \frac{λ}{\exp (\frac{x^{'} β}{ρ})} .

Because

S (t | x) = \exp (- H_{0} (t) \exp (x^{'} β))

and S, or 1 - CDF, of Weibull distribution is

S (t) = \exp (- (t / λ)^{ρ})

so

S (t | x, λ) = \exp (- (t / λ)^{ρ} \exp (x^{'} β)) = \exp (- (t / λ)^{ρ} \exp (x^{'} β / ρ)^{ρ}) = \exp (- {(\frac{t}{\frac{λ}{\exp (x^{'} β / ρ)}})}^{ρ}) = S (t | x, λ^{'})

R code with alternative draw for event times

simulWeib <- function(N, lambda, rho, beta, rateC)
{
  # covariate --> N Bernoulli trials
  x <- sample(x=c(0, 1), size=N, replace=TRUE, prob=c(0.5, 0.5))
  
  # Weibull latent event times
  # v <- runif(n=N)
  # Tlat <- (- log(v) / (lambda * exp(x * beta)))^(1 / rho)
  
  # An alternative draw for event times
  lambda_wiki = lambda^(-1 / rho) #change definition of lambda to Wikipedia's
  lambda_prime = lambda_wiki / exp(x * beta / rho) #re-scale according to beta
  Tlat = rweibull(length(x), shape=rho, scale=lambda_prime)

  # censoring times
  C <- rexp(n=N, rate=rateC)
  
  # follow-up times and event indicators
  time <- pmin(Tlat, C)
  status <- as.numeric(Tlat <= C)
  
  # data set
  data.frame(id=1:N,
             time=time,
             status=status,
             x=x)
}
set.seed(1234)
betaHat <- rep(NA, 1e3)
for(k in 1:1e3)
{
  dat <- simulWeib(N=100, lambda=0.01, rho=1, beta=-0.6, rateC=0.001)
  fit <- coxph(Surv(time, status) ~ x, data=dat)
  betaHat[k] <- fit$coef
}

> mean(betaHat)
[1] -0.6085473

unk · Accepted Answer · 2017-02-09 10:25:45Z

For Weibull distribution,
S(t) = $e^{- (λ * e^{(} x * β) * t)^{ρ}}$

" $^{(1 / r h o)}$ " will be only for log(v)

so, I modified like this

Tlat <- (- log(v))^(1 / rho) / (lambda * exp(x * beta))

if rho = 1, result will be same.

Stack Exchange Network

How to create a toy survival (time to event) data with right censoring

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged
survival
cox-model
monte-carlo
or ask your own question.

Linked

Hot Network Questions

Strictly Necessary Cookies

Performance Cookies

Functional Cookies

Targeting Cookies

How to create a toy survival (time to event) data with right censoring

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions taggedsurvivalcox-modelmonte-carloor ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
survival
cox-model
monte-carlo
or ask your own question.