Skip to contents
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
suppressWarnings({
  suppressMessages({
library(revents)
library(dplyr)
library(survival)
library(broom)
library(MASS)
  })
})
# Load data layout corresponding to the third layout
data("data_layout_3", package = "revents")
data_layout_3 <- revents::data_layout_3
# set scientific notation to FALSE
options(scipen = 999)

Poisson regression

The Poisson model can be used as baseline model for recurrent events data. However, the Poisson model assumes that the mean and the variance are equal, which is not always the case in recurrent events data and does not account for dependence between events and event timing.

poisson.relapses <- glm(
  COUNT ~ offset(log(LENGHT.TIME)) + DISEASE_COURSE + AGE + SEX +  RACE + TIME_SINCE_DIAGNOSIS, 
  family = poisson(link = "log"), 
  data = data_layout_3)

tidy(poisson.relapses, exp = TRUE, conf.int = TRUE) 
#> # A tibble: 6 × 7
#>   term                 estimate std.error statistic  p.value conf.low conf.high
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)             0.275  0.0880     -14.7   1.11e-48    0.231     0.327
#> 2 DISEASE_COURSESPMS      0.450  0.0486     -16.4   1.12e-60    0.409     0.494
#> 3 AGE                     1.00   0.00176      2.17  2.98e- 2    1.00      1.01 
#> 4 SEXMale                 0.972  0.0355      -0.788 4.31e- 1    0.907     1.04 
#> 5 RACEWhite               1.16   0.0608       2.41  1.61e- 2    1.03      1.31 
#> 6 TIME_SINCE_DIAGNOSIS    1.00   0.000910     1.86  6.32e- 2    1.00      1.00

In a standard Poisson regression model, the mean and the variance are assumed to be equal. For calculating the underdispersion or overdispersion ratio, we need to calculate the residual deviance and the residual degrees of freedom. This ratio is calculated as the ratio of the residual deviance to the residual degrees of freedom.

# Get the residual degrees of freedom
residual_deviance <- deviance(poisson.relapses)
residual_df <- df.residual(poisson.relapses)

 # Calculate the ratio
over.under.dispersion_ratio <- residual_deviance / residual_df
over.under.dispersion_ratio
#> [1] 0.9431019

In this simulated dataset, the dispersion ratio was close to 1 (i.e., the Poisson model is appropiate). However, in cases of overdispersion (dispersion ratio > 1), which are more common, the quasi-Poisson model adjusts for it by incorporating a dispersion parameter that scales the variance, leading to more reliable standard error estimates.

Quassi-poisson model

qpoisson.relapses <- glm(
  COUNT ~ offset(log(LENGHT.TIME)) + DISEASE_COURSE + AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS,
  family = quasipoisson(link = "log"),
  data = data_layout_3)

tidy(qpoisson.relapses, exp = TRUE, conf.int = TRUE) 
#> # A tibble: 6 × 7
#>   term                 estimate std.error statistic  p.value conf.low conf.high
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)             0.275  0.0808     -16.0   1.54e-52    0.235     0.322
#> 2 DISEASE_COURSESPMS      0.450  0.0447     -17.9   3.44e-64    0.412     0.491
#> 3 AGE                     1.00   0.00162      2.36  1.82e- 2    1.00      1.01 
#> 4 SEXMale                 0.972  0.0326      -0.858 3.91e- 1    0.912     1.04 
#> 5 RACEWhite               1.16   0.0559       2.62  8.92e- 3    1.04      1.29 
#> 6 TIME_SINCE_DIAGNOSIS    1.00   0.000835     2.02  4.33e- 2    1.00      1.00

Negative Binomial model

The NB regression provides an improved model for recurrent events data compared with Poisson regression as this model assumes that each individual has their own underlying event rate over time and also may include a random component that follows a gamma distribution for accounting subject heterogeneity.

nb.relapses <- glm.nb(
  COUNT ~ offset(log(LENGHT.TIME)) + DISEASE_COURSE + AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS,
  data = data_layout_3)

tidy(nb.relapses, exp = TRUE, conf.int = TRUE) 
#> # A tibble: 6 × 7
#>   term                 estimate std.error statistic  p.value conf.low conf.high
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)             0.275  0.0880     -14.7   1.12e-48    0.231     0.327
#> 2 DISEASE_COURSESPMS      0.450  0.0486     -16.4   1.13e-60    0.409     0.494
#> 3 AGE                     1.00   0.00176      2.17  2.98e- 2    1.00      1.01 
#> 4 SEXMale                 0.972  0.0355      -0.788 4.31e- 1    0.907     1.04 
#> 5 RACEWhite               1.16   0.0608       2.41  1.61e- 2    1.03      1.31 
#> 6 TIME_SINCE_DIAGNOSIS    1.00   0.000910     1.86  6.32e- 2    1.00      1.00

In this study, with constant relapse rate (see descriptive analyses), the use of NB model can be accurate and recommended also for its ability to account for overdispersion and heterogeneity. However, the NB model, as other count-based models, does not consider event timing.