SIR modelling with conmat

library(conmat)
library(deSolve)
library(tidyr)
library(ggplot2)
library(dplyr)
library(purrr)

Introduction: What is an SIR model?

SIR (Susceptible, Infected, Recovered) models, sometimes known as compartmental models, are a mathematical modelling technique used to understand facets of an epidemic. They can help answer questions like:

  • What is the duration of an epidemic?
  • What is the total number of infected people?
  • How does the disease spread?
  • What is the reproductive number?
  • What is the impact of a public health intervention?

The SIR refers to the number of:

  • S: susceptible
  • I: infected
  • R: recovered / removed

people at a given time point. We can model how these numbers change at each time step, based on initial population numbers, and other parameters like how infection spreads (is it more likely to infect younger or older people?),

An SIR Model with homogenous mixing

We start with a complicated version of a relatively simple model: an age-stratified SIR Model, but with all age groups acting exactly the same.

We will use 17 age groups, each in 5 year age bands, and turn these into a conmat_population object. This is an object that knows which columns represent age and population, which is used by other functions within conmat.

homogeneous_population <- data.frame(
  age = seq(0, 80, by = 10),
  population = rep(100, times = 9)
) |>
  as_conmat_population(
    age = age,
    population = population
  )

homogeneous_population
#> # A tibble: 9 × 2 (conmat_population)
#>  - age: age
#>  - population: population
#>     age population
#>   <dbl>      <dbl>
#> 1     0        100
#> 2    10        100
#> 3    20        100
#> 4    30        100
#> 5    40        100
#> 6    50        100
#> 7    60        100
#> 8    70        100
#> 9    80        100

Then, we extrapolate these into a set of contact matrices, which we can construct using setting_prediction_matrix. We set these as matrices of 1 - the contact rate is homogenous and exactly the same.

age_breaks_0_80_plus <- c(seq(0, 80, by = 10), Inf)
mat_ones <- matrix(1, nrow = 9, ncol = 9)

# Relative number of contacts between individuals in 2 age categories
# Think of as P(contact)
homogeneous_contact <- setting_prediction_matrix(
  home = mat_ones,
  work = mat_ones,
  school = mat_ones,
  other = mat_ones,
  age_breaks = age_breaks_0_80_plus
)

homogeneous_contact
#> 
#> ── Setting Prediction Matrices ─────────────────────────────────────────────────
#> A list of matrices containing the model predicted contact rate between ages in
#> each setting.
#> There are 9 age breaks, ranging 0-80+ years, with a regular 10 year interval
#> • home: a 9x9 <matrix>
#> • work: a 9x9 <matrix>
#> • school: a 9x9 <matrix>
#> • other: a 9x9 <matrix>
#> • all: a 9x9 <matrix>
#> ℹ Access each <matrix> with `x$name`
#> ℹ e.g., `x$home`

Similarly, we construct a set of transmission matrices, which provide the probability of transmission for each age group, using transmission_probability_matrix. These all have the same transmission probability - 0.05 (1 in 20).

mat_05 <- matrix(0.05, nrow = 9, ncol = 9)
transmission_matrix <- transmission_probability_matrix(
  home = mat_05,
  work = mat_05,
  school = mat_05,
  other = mat_05,
  age_breaks = age_breaks_0_80_plus
)

transmission_matrix
#> 
#> ── Transmission Probability Matrices ───────────────────────────────────────────
#> A list of matrices, each <matrix> containing the relative probability of
#> individuals in a given age group infecting an individual in another age group,
#> for that setting.
#> There are 9 age breaks, ranging 0-80+ years, with a regular 10 year interval
#> • home: a 9x9 <matrix>
#> • work: a 9x9 <matrix>
#> • school: a 9x9 <matrix>
#> • other: a 9x9 <matrix>
#> ℹ Access each <matrix> with `x$name`
#> ℹ e.g., `x$home`

We also need to set up our population structures. We’ll have all the S states, then I, then R. Since we’re using deSolve to solve this system, we need to make sure this order stays the same throughout!

S0 <- rep(999, times = 9)
I0 <- rep(1, times = 9)
R0 <- rep(0, times = 9)
initial_condition <- c(S0, I0, R0)
names(initial_condition) <- paste(
  rep(c("S0", "I0", "R0"), each = 9),
  age_breaks_0_80_plus[1:9],
  sep = "_"
)

initial_condition
#>  S0_0 S0_10 S0_20 S0_30 S0_40 S0_50 S0_60 S0_70 S0_80  I0_0 I0_10 I0_20 I0_30 
#>   999   999   999   999   999   999   999   999   999     1     1     1     1 
#> I0_40 I0_50 I0_60 I0_70 I0_80  R0_0 R0_10 R0_20 R0_30 R0_40 R0_50 R0_60 R0_70 
#>     1     1     1     1     1     0     0     0     0     0     0     0     0 
#> R0_80 
#>     0

For an SIR model, we need to compute the force of infection, which is λ(t) = βI(t). The β term is the product of the probability of infection given contact, and the probability of contact, for which we can use the matrices we have just defined:

parameters <- list(
  "transmission_matrix" = transmission_matrix,
  "homogeneous_contact" = homogeneous_contact,
  "gamma" = 1,
  "s_indexes" = 1:9,
  "i_indexes" = 10:18,
  "r_indexes" = 19:27
)

parameters
#> $transmission_matrix
#> 
#> ── Transmission Probability Matrices ───────────────────────────────────────────
#> A list of matrices, each <matrix> containing the relative probability of
#> individuals in a given age group infecting an individual in another age group,
#> for that setting.
#> There are 9 age breaks, ranging 0-80+ years, with a regular 10 year interval
#> • home: a 9x9 <matrix>
#> • work: a 9x9 <matrix>
#> • school: a 9x9 <matrix>
#> • other: a 9x9 <matrix>
#> ℹ Access each <matrix> with `x$name`
#> ℹ e.g., `x$home`
#> 
#> $homogeneous_contact
#> 
#> ── Setting Prediction Matrices ─────────────────────────────────────────────────
#> A list of matrices containing the model predicted contact rate between ages in
#> each setting.
#> There are 9 age breaks, ranging 0-80+ years, with a regular 10 year interval
#> • home: a 9x9 <matrix>
#> • work: a 9x9 <matrix>
#> • school: a 9x9 <matrix>
#> • other: a 9x9 <matrix>
#> • all: a 9x9 <matrix>
#> ℹ Access each <matrix> with `x$name`
#> ℹ e.g., `x$home`
#> 
#> $gamma
#> [1] 1
#> 
#> $s_indexes
#> [1] 1 2 3 4 5 6 7 8 9
#> 
#> $i_indexes
#> [1] 10 11 12 13 14 15 16 17 18
#> 
#> $r_indexes
#> [1] 19 20 21 22 23 24 25 26 27

Now we construct a function for the age structured SIR model, to pass to deSolve. This calculates the force of infection for each setting.

age_structured_sir <- function(time, state, parameters) {
  # Calculate the force of infection for each setting:
  # unstructured SIR beta is age_group_n / pop_n

  N_by_age <- map_dbl(
    .x = parameters$s_indexes,
    .f = function(i) {
      current_indexes_to_sum <- c(
        parameters$s_indexes[i],
        parameters$i_indexes[i],
        parameters$r_indexes[i]
      )
      sum(state[current_indexes_to_sum])
    }
  )

  # normalise by the age population
  N_infected_by_age <- state[parameters$i_indexes] / N_by_age

  # functional method for takign the product of two matrices
  product <- function(transmission, contact) {
    map2(transmission, contact, `*`)
  }

  age_normalise <- function(beta) {
    # matrix multiply by infected and normalise by age population
    map(beta, function(beta) {
      beta %*% N_infected_by_age
    })
  }

  lambdas <- tibble(
    setting = names(parameters$transmission_matrix),
    transmission_matrix = parameters$transmission_matrix,
    homogeneous_contact = parameters$homogeneous_contact[1:4]
  ) %>%
    mutate(
      beta = product(transmission_matrix, homogeneous_contact),
      lambda = age_normalise(beta)
    )

  # Combine them all into one term for ease of computation
  lambda_total <- Reduce("+", lambdas$lambda)

  # Don't forget to normalise your infection rate by the population!
  dSdt <- -lambda_total * state[parameters$s_indexes]

  dIdt <- lambda_total * state[parameters$s_indexes] -
    parameters$gamma * state[parameters$i_indexes]

  dRdt <- parameters$gamma * state[parameters$i_indexes]

  return(
    list(
      c(
        dSdt,
        dIdt,
        dRdt
      )
    )
  )
}

Then we solve the ODE like so:

times <- seq(0, 20, by = 0.1)
homogeneous_soln <- ode(
  y = initial_condition,
  times = times,
  func = age_structured_sir,
  parms = parameters
)

# Have to convert ode output to a data frame to do any plotting
homogeneous_soln <- as.data.frame(homogeneous_soln) %>% as_tibble()

Now, let’s compare this to an SIR model with no stratification - as in, no age groups:

parameters_sir <- c("beta" = 1.8, "gamma" = 1)
initial_condition_sir <- c("S" = 8991, "I" = 9, "R" = 0)

sir <- function(time, state, parameters) {
  N <- sum(state)
  lambda_total <- parameters["beta"] * state["I"]
  dSdt <- -lambda_total / N * state["S"]
  dIdt <- parameters["beta"] / N * state["S"] * state["I"] - parameters["gamma"] * state["I"]
  dRdt <- parameters["gamma"] * state["I"]

  return(list(c(dSdt, dIdt, dRdt)))
}

sir_soln <- ode(
  y = initial_condition_sir,
  times = times,
  func = sir,
  parms = parameters_sir
)

sir_soln <- as_tibble(as.data.frame(sir_soln))
ungrouped_structure <- sir_soln %>%
  pivot_longer(cols = -time)

# we are going to tidy up ODE output a few times, so wrap it into a function:
tidy_ode <- function(ode_soln) {
  ode_soln %>%
    pivot_longer(cols = -time) %>%
    mutate(parent_state = substr(name, 1, 1)) %>%
    group_by(time, parent_state) %>%
    summarise(value = sum(value)) %>%
    ungroup() %>%
    rename(name = parent_state)
}

# For the stratified model, we have to add up all the age categories together for a fair comparison.
grouped_structure <- tidy_ode(homogeneous_soln)
#> `summarise()` has grouped output by 'time'. You can override using the
#> `.groups` argument.

combined_solutions <- bind_rows(
  "non_structured" = ungrouped_structure,
  "age_structured" = grouped_structure,
  .id = "type"
) %>%
  mutate(
    name = factor(name, levels = c("S", "I", "R"))
  )

combined_solutions
#> # A tibble: 1,206 × 4
#>    type            time name     value
#>    <chr>          <dbl> <fct>    <dbl>
#>  1 non_structured   0   S     8991    
#>  2 non_structured   0   I        9    
#>  3 non_structured   0   R        0    
#>  4 non_structured   0.1 S     8989.   
#>  5 non_structured   0.1 I        9.75 
#>  6 non_structured   0.1 R        0.937
#>  7 non_structured   0.2 S     8987.   
#>  8 non_structured   0.2 I       10.6  
#>  9 non_structured   0.2 R        1.95 
#> 10 non_structured   0.3 S     8986.   
#> # ℹ 1,196 more rows

Now let’s plot these two models approaches - the age structure and the non age structured:

gg_combined_solutions <- ggplot(
  combined_solutions,
  aes(x = time, y = value, colour = name, linetype = type)
) +
  geom_line() +
  labs(x = "Time", y = "Value", colour = "State", linetype = "Model")

gg_combined_solutions

Voila! These lines are the same! We can double check this by plotting them as facets:

gg_combined_solutions + facet_wrap(~type)

So, we have successfully collapsed our stratified model down to the non-stratified model, which is a great sense check for every time you write out a complicated model.

Comparison to other age matrices

Now that we’ve established an age-structured SIR model, we can repeat the process with conmat matrices. This process is the same as in the vignette, “Data Sources”.

world_data <- socialmixr::wpp_age() %>%
  mutate(
    new_lower_age = if_else(lower.age.limit >= 75, 75L, lower.age.limit)
  ) %>%
  group_by(new_lower_age, country, year) %>%
  summarise(
    population = sum(population)
  )
#> `summarise()` has grouped output by 'new_lower_age', 'country'. You can
#> override using the `.groups` argument.
germany_2015 <- age_population(
  data = world_data,
  location_col = country,
  location = "Germany",
  age_col = new_lower_age,
  year_col = year,
  year = 2015
)

germany_2015
#> # A tibble: 16 × 6 (conmat_population)
#>  - age: lower.age.limit
#>  - population: population
#>    new_lower_age country  year population lower.age.limit upper.age.limit
#>            <int> <chr>   <int>      <dbl>           <dbl>           <dbl>
#>  1             0 Germany  2015    3517800               0               4
#>  2             5 Germany  2015    3507779               5               9
#>  3            10 Germany  2015    3693474              10              14
#>  4            15 Germany  2015    4101901              15              19
#>  5            20 Germany  2015    4571184              20              24
#>  6            25 Germany  2015    5213301              25              29
#>  7            30 Germany  2015    5058402              30              34
#>  8            35 Germany  2015    4782027              35              39
#>  9            40 Germany  2015    5190881              40              44
#> 10            45 Germany  2015    6805444              45              49
#> 11            50 Germany  2015    6920702              50              54
#> 12            55 Germany  2015    5998415              55              59
#> 13            60 Germany  2015    5091795              60              64
#> 14            65 Germany  2015    4216896              65              69
#> 15            70 Germany  2015    4225748              70              74
#> 16            75 Germany  2015    8812050              75              79

Now let’s construct a non-homogenous contact matrix, and transmission probability matrix from the data we have on Germany.

age_breaks_socialmixr <- c(seq(0, 75, by = 5), Inf)

germany_contacts <- extrapolate_polymod(
  population = germany_2015,
  age_breaks = age_breaks_socialmixr
)

n_finite_states <- length(age_breaks_socialmixr) - 1
socialmixr_matrix <- matrix(0.1761765,
  nrow = n_finite_states,
  ncol = n_finite_states
)

transmission_matrix <- transmission_probability_matrix(
  home = socialmixr_matrix,
  work = socialmixr_matrix,
  school = socialmixr_matrix,
  other = socialmixr_matrix,
  age_breaks = age_breaks_socialmixr
)

parameters <- list(
  "transmission_matrix" = transmission_matrix,
  "homogeneous_contact" = germany_contacts,
  "gamma" = 1,
  "s_indexes" = 1:n_finite_states,
  "i_indexes" = (n_finite_states + 1):(2 * n_finite_states),
  "r_indexes" = (2 * n_finite_states + 1):(3 * n_finite_states)
)

S0 <- germany_2015$population
I0 <- rep(1, times = n_finite_states)
R0 <- rep(0, times = n_finite_states)
initial_condition <- c(S0, I0, R0)
names(initial_condition) <- paste(
  rep(c("S0", "I0", "R0"), each = n_finite_states),
  age_breaks_socialmixr[1:n_finite_states],
  sep = "_"
)

Then, similar to above, we solve the ODE

times <- seq(0, 100, by = 0.1)
germany_soln <- ode(
  y = initial_condition,
  times = times,
  func = age_structured_sir,
  parms = parameters
)

# Have to convert ode output to a data frame to do any plotting
germany_soln <- as_tibble(as.data.frame(germany_soln))

head(germany_soln)
#> # A tibble: 6 × 49
#>    time     S0_0    S0_5  S0_10  S0_15  S0_20  S0_25  S0_30  S0_35  S0_40  S0_45
#>   <dbl>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1   0   3517800   3.51e6 3.69e6 4.10e6 4.57e6 5.21e6 5.06e6 4.78e6 5.19e6 6.81e6
#> 2   0.1 3517800.  3.51e6 3.69e6 4.10e6 4.57e6 5.21e6 5.06e6 4.78e6 5.19e6 6.81e6
#> 3   0.2 3517800.  3.51e6 3.69e6 4.10e6 4.57e6 5.21e6 5.06e6 4.78e6 5.19e6 6.81e6
#> 4   0.3 3517800.  3.51e6 3.69e6 4.10e6 4.57e6 5.21e6 5.06e6 4.78e6 5.19e6 6.81e6
#> 5   0.4 3517799.  3.51e6 3.69e6 4.10e6 4.57e6 5.21e6 5.06e6 4.78e6 5.19e6 6.81e6
#> 6   0.5 3517799.  3.51e6 3.69e6 4.10e6 4.57e6 5.21e6 5.06e6 4.78e6 5.19e6 6.81e6
#> # ℹ 38 more variables: S0_50 <dbl>, S0_55 <dbl>, S0_60 <dbl>, S0_65 <dbl>,
#> #   S0_70 <dbl>, S0_75 <dbl>, I0_0 <dbl>, I0_5 <dbl>, I0_10 <dbl>, I0_15 <dbl>,
#> #   I0_20 <dbl>, I0_25 <dbl>, I0_30 <dbl>, I0_35 <dbl>, I0_40 <dbl>,
#> #   I0_45 <dbl>, I0_50 <dbl>, I0_55 <dbl>, I0_60 <dbl>, I0_65 <dbl>,
#> #   I0_70 <dbl>, I0_75 <dbl>, R0_0 <dbl>, R0_5 <dbl>, R0_10 <dbl>, R0_15 <dbl>,
#> #   R0_20 <dbl>, R0_25 <dbl>, R0_30 <dbl>, R0_35 <dbl>, R0_40 <dbl>,
#> #   R0_45 <dbl>, R0_50 <dbl>, R0_55 <dbl>, R0_60 <dbl>, R0_65 <dbl>, …
tail(germany_soln)
#> # A tibble: 6 × 49
#>    time     S0_0    S0_5  S0_10  S0_15  S0_20  S0_25  S0_30  S0_35  S0_40  S0_45
#>   <dbl>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1  99.5 1154567. 439683. 2.19e5 2.62e5 4.97e5 6.03e5 5.77e5 5.75e5 5.86e5 5.79e5
#> 2  99.6 1154567. 439683. 2.19e5 2.62e5 4.97e5 6.03e5 5.77e5 5.75e5 5.86e5 5.79e5
#> 3  99.7 1154567. 439683. 2.19e5 2.62e5 4.97e5 6.03e5 5.77e5 5.75e5 5.86e5 5.79e5
#> 4  99.8 1154567. 439683. 2.19e5 2.62e5 4.97e5 6.03e5 5.77e5 5.75e5 5.86e5 5.79e5
#> 5  99.9 1154567. 439683. 2.19e5 2.62e5 4.97e5 6.03e5 5.77e5 5.75e5 5.86e5 5.79e5
#> 6 100   1154567. 439683. 2.19e5 2.62e5 4.97e5 6.03e5 5.77e5 5.75e5 5.86e5 5.79e5
#> # ℹ 38 more variables: S0_50 <dbl>, S0_55 <dbl>, S0_60 <dbl>, S0_65 <dbl>,
#> #   S0_70 <dbl>, S0_75 <dbl>, I0_0 <dbl>, I0_5 <dbl>, I0_10 <dbl>, I0_15 <dbl>,
#> #   I0_20 <dbl>, I0_25 <dbl>, I0_30 <dbl>, I0_35 <dbl>, I0_40 <dbl>,
#> #   I0_45 <dbl>, I0_50 <dbl>, I0_55 <dbl>, I0_60 <dbl>, I0_65 <dbl>,
#> #   I0_70 <dbl>, I0_75 <dbl>, R0_0 <dbl>, R0_5 <dbl>, R0_10 <dbl>, R0_15 <dbl>,
#> #   R0_20 <dbl>, R0_25 <dbl>, R0_30 <dbl>, R0_35 <dbl>, R0_40 <dbl>,
#> #   R0_45 <dbl>, R0_50 <dbl>, R0_55 <dbl>, R0_60 <dbl>, R0_65 <dbl>, …

germany_soln_long <- germany_soln %>%
  tidy_ode() %>%
  mutate(type = "age_structured")
#> `summarise()` has grouped output by 'time'. You can override using the
#> `.groups` argument.

germany_soln_long
#> # A tibble: 3,003 × 4
#>     time name        value type          
#>    <dbl> <chr>       <dbl> <chr>         
#>  1   0   I           16    age_structured
#>  2   0   R            0    age_structured
#>  3   0   S     81707799    age_structured
#>  4   0.1 I           18.4  age_structured
#>  5   0.1 R            1.72 age_structured
#>  6   0.1 S     81707795.   age_structured
#>  7   0.2 I           21.1  age_structured
#>  8   0.2 R            3.69 age_structured
#>  9   0.2 S     81707790.   age_structured
#> 10   0.3 I           24.3  age_structured
#> # ℹ 2,993 more rows
gg_germany_sir <- ggplot(
  germany_soln_long,
  aes(x = time, y = value / sum(initial_condition), colour = name)
) +
  geom_line() +
  labs(x = "Time", y = "Proportion")

gg_germany_sir

Let’s compare to the Prem matrices. Prem only has 16 age classes so we do need to re-do our population.

# NOTE - consider ways to present this data nicer
# str(prem_germany_contact_matrices)
as_setting_prediction_matrix(
  prem_germany_contact_matrices,
  age_breaks = seq(0, 80, by = 5)
)
#> 
#> ── Setting Prediction Matrices ─────────────────────────────────────────────────
#> A list of matrices containing the model predicted contact rate between ages in
#> each setting.
#> There are 16 age breaks, ranging 0-80 years, with a regular 5 year interval
#> • home: a 16x16 <matrix>
#> • work: a 16x16 <matrix>
#> • school: a 16x16 <matrix>
#> • other: a 16x16 <matrix>
#> • all: a 16x16 <matrix>
#> ℹ Access each <matrix> with `x$name`
#> ℹ e.g., `x$home`

So we go through a similar process, setting up parameters, and solving the ODE

parameters_prem <- list(
  "transmission_matrix" = transmission_matrix,
  "homogeneous_contact" = prem_germany_contact_matrices,
  "gamma" = 1,
  "s_indexes" = 1:n_finite_states,
  "i_indexes" = (n_finite_states + 1):(2 * n_finite_states),
  "r_indexes" = (2 * n_finite_states + 1):(3 * n_finite_states)
)

prem_soln <- ode(
  y = initial_condition,
  times = times,
  func = age_structured_sir,
  parms = parameters_prem
)

# Have to convert ode output to a data frame to do any plotting
prem_soln <- as_tibble(as.data.frame(prem_soln))

tail(prem_soln)
#> # A tibble: 6 × 49
#>    time     S0_0    S0_5  S0_10  S0_15  S0_20  S0_25  S0_30  S0_35  S0_40  S0_45
#>   <dbl>    <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
#> 1  99.5 2757450.  2.53e6 1.86e6 1.01e6 1.79e6 2.64e6 2.48e6 1.84e6 1.87e6 3.01e6
#> 2  99.6 2757450.  2.53e6 1.86e6 1.01e6 1.79e6 2.64e6 2.48e6 1.84e6 1.87e6 3.01e6
#> 3  99.7 2757450.  2.53e6 1.86e6 1.01e6 1.79e6 2.64e6 2.48e6 1.84e6 1.87e6 3.01e6
#> 4  99.8 2757450.  2.53e6 1.86e6 1.01e6 1.79e6 2.64e6 2.48e6 1.84e6 1.87e6 3.01e6
#> 5  99.9 2757450.  2.53e6 1.86e6 1.01e6 1.79e6 2.64e6 2.48e6 1.84e6 1.87e6 3.01e6
#> 6 100   2757450.  2.53e6 1.86e6 1.01e6 1.79e6 2.64e6 2.48e6 1.84e6 1.87e6 3.01e6
#> # ℹ 38 more variables: S0_50 <dbl>, S0_55 <dbl>, S0_60 <dbl>, S0_65 <dbl>,
#> #   S0_70 <dbl>, S0_75 <dbl>, I0_0 <dbl>, I0_5 <dbl>, I0_10 <dbl>, I0_15 <dbl>,
#> #   I0_20 <dbl>, I0_25 <dbl>, I0_30 <dbl>, I0_35 <dbl>, I0_40 <dbl>,
#> #   I0_45 <dbl>, I0_50 <dbl>, I0_55 <dbl>, I0_60 <dbl>, I0_65 <dbl>,
#> #   I0_70 <dbl>, I0_75 <dbl>, R0_0 <dbl>, R0_5 <dbl>, R0_10 <dbl>, R0_15 <dbl>,
#> #   R0_20 <dbl>, R0_25 <dbl>, R0_30 <dbl>, R0_35 <dbl>, R0_40 <dbl>,
#> #   R0_45 <dbl>, R0_50 <dbl>, R0_55 <dbl>, R0_60 <dbl>, R0_65 <dbl>, …
germany_aggregated <- tidy_ode(germany_soln)
#> `summarise()` has grouped output by 'time'. You can override using the
#> `.groups` argument.

# For the stratified model, we have to add up all the age categories together for a fair comparison.
prem_aggregated <- tidy_ode(prem_soln)
#> `summarise()` has grouped output by 'time'. You can override using the
#> `.groups` argument.

conmat_prem_soln <- bind_rows(
  conmat = germany_aggregated,
  prem = prem_aggregated,
  .id = "type"
) %>%
  mutate(name = factor(name, levels = c("S", "I", "R")))

head(conmat_prem_soln)
#> # A tibble: 6 × 4
#>   type    time name        value
#>   <chr>  <dbl> <fct>       <dbl>
#> 1 conmat   0   I           16   
#> 2 conmat   0   R            0   
#> 3 conmat   0   S     81707799   
#> 4 conmat   0.1 I           18.4 
#> 5 conmat   0.1 R            1.72
#> 6 conmat   0.1 S     81707795.
tail(conmat_prem_soln)
#> # A tibble: 6 × 4
#>   type   time name    value
#>   <chr> <dbl> <fct>   <dbl>
#> 1 prem   99.9 I     2.82e-4
#> 2 prem   99.9 R     3.15e+7
#> 3 prem   99.9 S     5.02e+7
#> 4 prem  100   I     2.73e-4
#> 5 prem  100   R     3.15e+7
#> 6 prem  100   S     5.02e+7
ggplot(conmat_prem_soln, aes(x = time, y = value, colour = type)) +
  geom_line() +
  labs(x = "Time", y = "Value", colour = "Model") +
  facet_wrap(~name, nrow = 1)

These are really different, but we have to be careful about why. The contact matrices might refer to the same quantity, but if we dive a little deeper, we find out that might not be the case…

Calculating reproductive number - R0

To fairly compare a dynamic disease model that differs only by it’s contact matrices, it’s important to remember that the (i, j)th element of one of these matrices is the relative number of contacts between individuals of age i and age j. But, what the number is relative to might be different, and this will lead to different basic reproduction numbers, which will give misleading model conclusions.

At this point, it is important to point out the two definitions of a next generation matrix.

  1. The next generation of the offspring distribution assuming infinite lifetime, which probabilists will be used to, and
  2. The number of newly infected individuals over the course of one generation of infections, which infectious diseases modellers will be used to.

conmat calculates the first of these in it’s functions (such as generate_ngm), hence why the arguments to these functions have no concept of an infectious period (which is analogous to ‘death’ in a branching process).

Following the approach of Diekmann, Heesterbrook and Roberts (2009), one can think of the NGM generated by conmat as only the transmissions term of Equation 2.9. So, to ensure both models have the same value of R0, we can multiply each matrix by a scaling factor to give a target R0.

To target R0 = 1.5 for example,

calculate_R0 <- function(multiplier, transmission_matrices, contact_matrices) {
  total_matrix <- transmission_matrices$home * contact_matrices$home +
    transmission_matrices$work * contact_matrices$work +
    transmission_matrices$school * contact_matrices$school +
    transmission_matrices$other * contact_matrices$other

  abs(Re(eigen(total_matrix * multiplier)$values[1]) - 1.5)
}

scaling_factor <- function(contact_matrix) {
  optimize(
    f = calculate_R0,
    interval = c(0.001, 5),
    transmission_matrices = transmission_matrix,
    contact_matrices = contact_matrix
  )
}

scaling_factor_prem <- scaling_factor(prem_germany_contact_matrices)
scaling_factor_socialmixr <- scaling_factor(germany_contacts)

scaling_factor_prem$minimum
#> [1] 0.9999758
scaling_factor_socialmixr$minimum
#> [1] 0.590031

We can adjust our contact matrices with these factors, and then our R0s will be the same, meaning that the only difference between the two models should be differences in the contact matrices.

prem_germany_contact_matrices <- lapply(prem_germany_contact_matrices, `*`, scaling_factor_prem$minimum)
germany_contacts <- lapply(germany_contacts, `*`, scaling_factor_socialmixr$minimum)

parameters$homogeneous_contact <- germany_contacts
germany_soln <- ode(
  y = initial_condition,
  times = times,
  func = age_structured_sir,
  parms = parameters
)

parameters$homogeneous_contact <- prem_germany_contact_matrices
prem_soln <- ode(
  y = initial_condition,
  times = times,
  func = age_structured_sir,
  parms = parameters
)

germany_aggregated <- tidy_ode(as_tibble(as.data.frame(germany_soln)))
#> `summarise()` has grouped output by 'time'. You can override using the
#> `.groups` argument.
prem_aggregated <- tidy_ode(as_tibble(as.data.frame(prem_soln)))
#> `summarise()` has grouped output by 'time'. You can override using the
#> `.groups` argument.

conmat_prem_soln <- bind_rows(
  conmat = germany_aggregated,
  prem = prem_aggregated,
  .id = "type"
) %>%
  mutate(name = factor(name, levels = c("S", "I", "R")))
library(scales)
#> 
#> Attaching package: 'scales'
#> The following object is masked from 'package:purrr':
#> 
#>     discard
conmat_prem_soln %>%
  filter(time <= 50) %>%
  ggplot(aes(x = time, y = value, colour = type)) +
  geom_line() +
  facet_wrap(~name,
    ncol = 1,
    scales = "free_y",
    labeller = labeller(
      name = c(
        S = "Susceptible",
        I = "Infected",
        R = "Recovered"
      )
    )
  ) +
  scale_y_continuous(
    labels = label_number(scale_cut = cut_si("")),
    n.breaks = 3
  ) +
  scale_colour_brewer(palette = "Dark2") +
  labs(x = "Time", y = "Population", colour = "Model") +
  theme_minimal()

So now we have as fair of a comparison of the two matrices as we will get, and yet, there are significant differences in the dynamics of the two models.