Skip to contents
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
suppressWarnings({
  suppressMessages({
library(revents)
library(dplyr)
library(survival)
library(broom)
library(MASS)
  })
})
# Load data layout corresponding to the third layout
data("data_layout_3", package = "revents")
data_layout_3 <- revents::data_layout_3
# set scientific notation to FALSE
options(scipen = 999)

Poisson regression

The Poisson model can be used as baseline model for recurrent events data. However, the Poisson model assumes that the mean and the variance are equal, which is not always the case in recurrent events data and does not account for dependence between events and event timing.

poisson.relapses <- glm(
  COUNT ~ offset(log(LENGHT.TIME)) + DISEASE_COURSE + AGE + SEX +  RACE + TIME_SINCE_DIAGNOSIS, 
  family = poisson(link = "log"), 
  data = data_layout_3)

tidy(poisson.relapses, exp = TRUE, conf.int = TRUE) 
#> # A tibble: 6 × 7
#>   term                 estimate std.error statistic  p.value conf.low conf.high
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)             0.376  0.0760     -12.9   7.10e-38    0.324     0.436
#> 2 DISEASE_COURSESPMS      0.564  0.0404     -14.2   1.44e-45    0.521     0.611
#> 3 AGE                     1.00   0.00152      1.88  6.02e- 2    1.00      1.01 
#> 4 SEXMale                 0.980  0.0305      -0.667 5.05e- 1    0.923     1.04 
#> 5 RACEWhite               1.12   0.0525       2.11  3.47e- 2    1.01      1.24 
#> 6 TIME_SINCE_DIAGNOSIS    1.00   0.000759     1.50  1.35e- 1    1.00      1.00

In a standard Poisson regression model, the mean and the variance are assumed to be equal. For calculating the underdispersion or overdispersion ratio, we need to calculate the residual deviance and the residual degrees of freedom. This ratio is calculated as the ratio of the residual deviance to the residual degrees of freedom.

# Get the residual degrees of freedom
residual_deviance <- deviance(poisson.relapses)
residual_df <- df.residual(poisson.relapses)

 # Calculate the ratio
over.under.dispersion_ratio <- residual_deviance / residual_df
over.under.dispersion_ratio
#> [1] 0.6508522

As the dispersion ratio is different from 1, the Poisson model is not recommended. However, the quasi-Poisson model corrects for under or overdispersion by incorporating a dispersion parameter that scales the variance, providing more accurate estimates.

Quassi-poisson model

qpoisson.relapses <- glm(
  COUNT ~ offset(log(LENGHT.TIME)) + DISEASE_COURSE + AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS,
  family = quasipoisson(link = "log"),
  data = data_layout_3)

tidy(qpoisson.relapses, exp = TRUE, conf.int = TRUE) 
#> # A tibble: 6 × 7
#>   term                 estimate std.error statistic  p.value conf.low conf.high
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)             0.376  0.0605     -16.2   8.49e-54    0.334     0.423
#> 2 DISEASE_COURSESPMS      0.564  0.0321     -17.8   9.72e-64    0.530     0.601
#> 3 AGE                     1.00   0.00121      2.36  1.83e- 2    1.00      1.01 
#> 4 SEXMale                 0.980  0.0243      -0.839 4.02e- 1    0.934     1.03 
#> 5 RACEWhite               1.12   0.0418       2.66  8.02e- 3    1.03      1.21 
#> 6 TIME_SINCE_DIAGNOSIS    1.00   0.000603     1.88  6.01e- 2    1.00      1.00

Although the estimated coefficients are similar to the Poisson model, the quasi-Poisson model provides more accurate estimates by accounting for underspersion. These differences can be found in the 95% CI of the coefficients.

Negative Binomial model

The NB regression provides an improved model for recurrent events data compared with Poisson regression as this model assumes that each individual has their own underlying event rate over time and also may include a random component for accounting subject heterogeneity.

nb.relapses <- glm.nb(
  COUNT ~ offset(log(LENGHT.TIME)) + DISEASE_COURSE + AGE + SEX + RACE + TIME_SINCE_DIAGNOSIS,
  data = data_layout_3)

tidy(nb.relapses, exp = TRUE, conf.int = TRUE) 
#> # A tibble: 6 × 7
#>   term                 estimate std.error statistic  p.value conf.low conf.high
#>   <chr>                   <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
#> 1 (Intercept)             0.376  0.0760     -12.9   7.13e-38    0.324     0.436
#> 2 DISEASE_COURSESPMS      0.564  0.0404     -14.2   1.44e-45    0.521     0.611
#> 3 AGE                     1.00   0.00152      1.88  6.02e- 2    1.00      1.01 
#> 4 SEXMale                 0.980  0.0305      -0.667 5.05e- 1    0.923     1.04 
#> 5 RACEWhite               1.12   0.0525       2.11  3.47e- 2    1.01      1.24 
#> 6 TIME_SINCE_DIAGNOSIS    1.00   0.000759     1.50  1.35e- 1    1.00      1.00

In this study, with constant relapse rate (see descriptive analyses), the use of NB model can be accurate and recommended also for its ability to account for over or underdispersion and heterogeneity. However, the NB model does not consider event timing.