Skip to contents
# Load data layout corresponding to the third layout
data("data_layout_1", package = "revents")
data_layout_1 <- revents::data_layout_1

Cox proportional hazard model (CoxPH)

In time to first event analyses (i.e. classical cox regression), the risk set is only restricted to first event of relapses. Thus, when working with recurrent event data, this risk set is specified by using the subset of SEVENT == 1.

cox.relapses <- coxph(Surv(TSTART, TSTOP, STATUS) ~ DISEASE_COURSE + AGE + SEX 
                      + RACE + TIME_SINCE_DIAGNOSIS, ties = "breslow", 
                      data = subset(data_layout_1, SEVENT == 1))

cox.relapses
#> Call:
#> coxph(formula = Surv(TSTART, TSTOP, STATUS) ~ DISEASE_COURSE + 
#>     AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS, data = subset(data_layout_1, 
#>     SEVENT == 1), ties = "breslow")
#> 
#>                            coef  exp(coef)   se(coef)      z        p
#> DISEASE_COURSESPMS   -0.5065415  0.6025760  0.1169950 -4.330 1.49e-05
#> AGE                   0.0001745  1.0001746  0.0045354  0.038   0.9693
#> SEXMale               0.1992437  1.2204793  0.0883283  2.256   0.0241
#> RACEWhite            -0.1131750  0.8929943  0.1414812 -0.800   0.4238
#> TIME_SINCE_DIAGNOSIS  0.0032136  1.0032188  0.0019227  1.671   0.0946
#> 
#> Likelihood ratio test=33.5  on 5 df, p=2.997e-06
#> n= 1313, number of events= 565

Here it can be seen that the CoxPH model gives an decreasing risk of relapses of 40% in the SPMS group compared with the RRMS group. It is important to consider that this model is ignoring all the other events that precede the 1st events.

In order to account for them, we can use recurrent events models:

Andersen-Gill (AG) model

This is a classical cox extension model for recurrent event data which assumes that events are independent and that the hazard ratio is constant over time.

ag.relapses <- coxph(Surv(TSTART, TSTOP, STATUS) ~ DISEASE_COURSE + AGE +
                       SEX + RACE + TIME_SINCE_DIAGNOSIS, data = data_layout_1)

ag.relapses %>% tidy(exp = TRUE, conf.int = TRUE)
#> # A tibble: 5 × 7
#>   term                 estimate std.error statistic  p.value conf.low conf.high
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 DISEASE_COURSESPMS      0.564  0.0404     -14.2   1.44e-45    0.522     0.611
#> 2 AGE                     1.00   0.00152      1.88  6.02e- 2    1.00      1.01 
#> 3 SEXMale                 0.980  0.0305      -0.667 5.05e- 1    0.923     1.04 
#> 4 RACEWhite               1.12   0.0525       2.11  3.47e- 2    1.01      1.24 
#> 5 TIME_SINCE_DIAGNOSIS    1.00   0.000759     1.50  1.35e- 1    1.00      1.00

Adjusting for covariates (in this case there are no time-dependent covariates) and considering all events, the AG model shows that the risk of relapse in the SPMS group is 44% lower than in the RRMS group.

It is important to mention that the AG model can also be fitted with a robust = TRUE to take into account correlations among the events and provides robust standard errors (not applied in this example).

Prentice Williams Peterson (PWP TT/CT) model (overall effect)

The PWP model turns into a useful model when dependency between events become relevant to account. This model add an strata term to the model, which allows to account for the dependency between events while obtaining the overall effect.

pwp.tt.relapses <- coxph(Surv(TSTART, TSTOP, STATUS) ~ DISEASE_COURSE + 
                           AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS + 
                           strata(STATUS), data = data_layout_1)

pwp.tt.relapses
#> Call:
#> coxph(formula = Surv(TSTART, TSTOP, STATUS) ~ DISEASE_COURSE + 
#>     AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS + strata(STATUS), 
#>     data = data_layout_1)
#> 
#>                            coef  exp(coef)   se(coef)       z       p
#> DISEASE_COURSESPMS   -0.5459158  0.5793110  0.0401215 -13.607 < 2e-16
#> AGE                   0.0028041  1.0028080  0.0015296   1.833 0.06678
#> SEXMale              -0.0427495  0.9581514  0.0305230  -1.401 0.16134
#> RACEWhite             0.1621498  1.1760364  0.0525550   3.085 0.00203
#> TIME_SINCE_DIAGNOSIS  0.0008012  1.0008015  0.0007341   1.091 0.27512
#> 
#> Likelihood ratio test=272.3  on 5 df, p=< 2.2e-16
#> n= 6347, number of events= 5092

When analyses are performed based on restricted risk sets (the risk set only involves those with the same number of previous events) with a calendar time scale, the PWP gives a HR:0.57 for the SPMS group compared with the RRMS group.

Prentice Williams Peterson (PWP-GT) model (overall effect)

The PWP model can also be used to account for inter-event dependence when the time scale is the gap time scale. This model is useful when the time between events is of interest and there is a renewal after each event (i.e. the participant returns to the previous state as, for example, in diseases where the community does not develop after the first event).

pwp.gt.relapses <- coxph(Surv(TGAP, STATUS) ~ DISEASE_COURSE + AGE + SEX
                         + RACE + TIME_SINCE_DIAGNOSIS +
                           strata(STATUS), data = data_layout_1)

pwp.gt.relapses
#> Call:
#> coxph(formula = Surv(TGAP, STATUS) ~ DISEASE_COURSE + AGE + SEX + 
#>     RACE + TIME_SINCE_DIAGNOSIS + strata(STATUS), data = data_layout_1)
#> 
#>                            coef  exp(coef)   se(coef)       z       p
#> DISEASE_COURSESPMS   -0.5636383  0.5691346  0.0408425 -13.800 < 2e-16
#> AGE                   0.0028712  1.0028754  0.0015265   1.881 0.05998
#> SEXMale              -0.0327096  0.9678196  0.0305313  -1.071 0.28401
#> RACEWhite             0.1422456  1.1528598  0.0526099   2.704 0.00686
#> TIME_SINCE_DIAGNOSIS  0.0011251  1.0011257  0.0007544   1.491 0.13586
#> 
#> Likelihood ratio test=271.4  on 5 df, p=< 2.2e-16
#> n= 6347, number of events= 5092

When gap time scale is used, the PWP model gives a HR:0.56 for the SPMS group compared with the RRMS group.

Frailty model

The frailty model, introduces a random covariate or frailty term into the model that induces dependence among the recurrent event times.

frailty.relapses <- coxph(Surv(TSTOP, STATUS) ~ DISEASE_COURSE + AGE + SEX
                          + RACE + TIME_SINCE_DIAGNOSIS + frailty(ID),
                          data = data_layout_1)

tidy(frailty.relapses, exp = TRUE, conf.int = TRUE)
#> # A tibble: 6 × 7
#>   term                 estimate std.error statistic  p.value conf.low conf.high
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 DISEASE_COURSESPMS      0.774  0.0404   40.0      2.57e-10    0.715     0.838
#> 2 AGE                     1.00   0.00152   0.156    6.93e- 1    0.998     1.00 
#> 3 SEXMale                 0.999  0.0305    0.000679 9.79e- 1    0.941     1.06 
#> 4 RACEWhite               1.01   0.0525    0.0431   8.35e- 1    0.912     1.12 
#> 5 TIME_SINCE_DIAGNOSIS    1.00   0.000761  2.17     1.40e- 1    1.00      1.00 
#> 6 frailty(ID)            NA     NA         0.000620 8.54e- 1   NA        NA

Conditional on the unmeasured heterogeneity and covariates, the frailty model indicates that the SPMS group had a reducing risk of 23% than the RRMS (HR: 0.77; 95% CI: 0.71, 0.83).

In order to verify if the use of frailty models is justified in our study, it is possible to use the emfrail function from frailtyEM package for getting the frailty variance or Kendall’s tau. A variance close to 0 means no heterogeneity between patients.

frailty_var <- emfrail(Surv(TSTOP, STATUS) ~ cluster(ID) + DISEASE_COURSE
                       + AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS, 
                       data = data_layout_1) 

summary(frailty_var)
#> Call: 
#> emfrail(formula = Surv(TSTOP, STATUS) ~ cluster(ID) + DISEASE_COURSE + 
#>     AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS, data = data_layout_1)
#> 
#> Regression coefficients:
#>                           coef exp(coef)  se(coef)         z    p
#> DISEASE_COURSESPMS   -0.255557  0.774485  0.040421 -6.322323 0.00
#> AGE                   0.000600  1.000600  0.001519  0.394967 0.69
#> SEXMale              -0.000795  0.999205  0.030528 -0.026039 0.98
#> RACEWhite             0.010910  1.010969  0.052527  0.207696 0.84
#> TIME_SINCE_DIAGNOSIS  0.001122  1.001123  0.000761  1.474168 0.14
#> Estimated distribution: gamma / left truncation: FALSE 
#> 
#> Fit summary:
#> Commenges-Andersen test for heterogeneity: p-val  1.55e-145 
#> no-frailty Log-likelihood: -39801.63 
#> Log-likelihood: -39801.63 
#> LRT: 1/2 * pchisq(0), p-val>0.5
#> 
#> Frailty summary:
#>                    estimate lower 95% upper 95%
#> Var[Z]                 0.00     0.000     0.001
#> Kendall's tau          0.00     0.000     0.001
#> Median concordance     0.00     0.000     0.000
#> E[logZ]                0.00    -0.001     0.000
#> Var[logZ]              0.00     0.000     0.001
#> theta              38640.84   998.881       Inf
#> Confidence intervals based on the likelihood function

In our example, there is no heterogeneity, so the use of frailty models would not be justified. The differences detected in the frailty model estimates with other conditional models may be due to model assumptions rather than heterogeneity.