--- title: "Example Pipeline" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Example Pipeline} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(conmat) ``` This vignette outlines a basic workflow of: * Create a new synthetic matrix by extrapolating from POLYMOD data to a new age distribution * Generating a Next Generation Matrix * Applying Vaccination Rates * Comparing R0 before and post vaccination rates ## Create a new synthetic matrix from all POLYMOD data We can create a synthetic matrix from all POLYMOD data by using the `extrapolate_polymod` function. First, let's extract an age distribution from the ABS data. ```{r fairfield} fairfield <- abs_age_lga("Fairfield (C)") fairfield ``` Note that this is a `conmat_population` object, which is just a data frame that knows which columns represent the `age` and `population` information. We then extrapolate this to home, work, school, other and all settings, using the full POLYMOD data. This gives us a setting prediction matrix. ```{r extrapolate-fairfield} age_breaks_0_80_plus <- c(seq(0, 80, by = 5), Inf) synthetic_fairfield_5y <- extrapolate_polymod( population = fairfield, age_breaks = age_breaks_0_80_plus ) synthetic_fairfield_5y synthetic_fairfield_5y$home ``` By full POLYMOD data, we mean these data: ```{r polymod-setting-population} polymod_setting <- get_polymod_setting_data() polymod_population <- get_polymod_population() polymod_setting polymod_setting$home polymod_population ``` The `extrapolate_polymod()` function does the following: * Uses an already fit model (`polymod_setting_models`) of the contact rate to the full POLYMOD data above * Predicts it to the provided fairfield population data It also has options to predict to specified age brackets, defaulting to 5 year age groups up to 75, then 75 and older. This object, `synthetic_fairfield_5y`, contains a matrix of predictions for each of the settings, home, work, school, other, and all settings, which is summarised when you print the object to the console: ```{r} synthetic_fairfield_5y ``` You can see more detail by using `str` if you like: ```{r} str(synthetic_fairfield_5y) ``` ## Generating a Next Generation Matrix Once infected, a person can transmit an infectious disease to another, creating generations of infected individuals. We can define a matrix describing the number of newly infected individuals in given categories, such as age, for consecutive generations. This matrix is called a "next generation matrix" (NGM). We can generate an NGM using the population data ```{r} fairfield_ngm_age_data <- generate_ngm( fairfield, age_breaks = age_breaks_0_80_plus, R_target = 1.5 ) ``` Or if you've already got the fitted settings contact matrices, then you can pass that to `generate_ngm` instead: ```{r} fairfield_ngm <- generate_ngm( synthetic_fairfield_5y, age_breaks = age_breaks_0_80_plus, R_target = 1.5 ) ``` However, note in these cases the age breaks specified in `generate_ngm` must be the same as the age breaks specified in the synthetic contact matrix, otherwise it will error as it is trying to multiple incompatible matrices. You can also specify your own transmission matrix, like so: ```{r} # using our own transmission matrix new_transmission_matrix <- get_setting_transmission_matrices( age_breaks = age_breaks_0_80_plus, # is normally 0.5 asymptomatic_relative_infectiousness = 0.75 ) new_transmission_matrix fairfield_ngm_0_80_new_tmat <- generate_ngm( synthetic_fairfield_5y, age_breaks = age_breaks_0_80_plus, R_target = 1.5, setting_transmission_matrix = new_transmission_matrix ) ``` We can also generate an NGM for Australian specific data like so, which refits and extrapolates the data based on the Australian state or LGA provided. ```{r ngm-fairfield} ngm_fairfield <- generate_ngm_oz( lga_name = "Fairfield (C)", age_breaks = age_breaks_0_80_plus, R_target = 1.5 ) ``` The output of this is a matrix for each of the settings, where each value is the number of newly infected individuals ```{r} ngm_fairfield$home str(ngm_fairfield) ``` ## Applying Vaccination Rates It is important to understand the effect of vaccination on the next generation of infections. We can use `apply_vaccination()` to return the percentage reduction in acquisition and transmission in each age group. It takes two key arguments: 1. The next generation matrix 2. The vaccination effect data The vaccination effect could look like the following: ```{r print-vaccination-effect} vaccination_effect_example_data ``` Each row contains information, for each age band: * Coverage % vaccinated * Acquisition - probability of acquiring COVID * Transmission - the probability of transmission Then you need to specify the columns in the vaccination effect data frame related to coverage, acquisition, and transmission. ```{r} # Apply vaccination effect to next generation matrices ngm_nsw_vacc <- apply_vaccination( ngm = ngm_fairfield, data = vaccination_effect_example_data, coverage_col = coverage, acquisition_col = acquisition, transmission_col = transmission ) ngm_nsw_vacc ``` # Fitting a new model with asymmetric terms In the examples so far we have focussed on using `extrapolate_polymod` to fit the contact model - this is very useful because it doesn't involve many lines of code to fit: ```{r} #| eval: FALSE fairfield <- abs_age_lga("Fairfield (C)") age_breaks_0_80_plus <- c(seq(0, 80, by = 5), Inf) synthetic_fairfield_5y <- extrapolate_polymod( population = fairfield, age_breaks = age_breaks_0_80_plus ) ``` It also fits quite quickly, since it uses a pre-computed model, `polymod_setting_models`, (See `?polymod_setting_models` for more details). Under the hood of `extrapolate_polymod`, this uses this already fit model for each setting (home, work, school, other), and then predicts using that model, and the provided data, to predict the new contact rates. So the process is: 1. Create a model that predicts contact rate for each setting 2. Predict to a new population using that model Let's show each step and unpack them. First let's create a model that predicts contact rate for each setting: ```{r} polymod_setting_data <- get_polymod_setting_data() polymod_population <- get_polymod_population() contact_setting_model_not_sym <- fit_setting_contacts( contact_data_list = polymod_setting_data, population = polymod_population, symmetrical = FALSE ) ``` Here, we first get the polymod setting data (`polymod_setting_data`), and the polymod population (`polymod_population`), to create a model for each setting. These data look like this, if you are interested. ```{r} polymod_setting_data polymod_population ``` We also specify the `symmetrical = FALSE` option - by default this is TRUE. Briefly, this changes some of the terms we use in creating the model, to use terms that aren't strictly symmetric. Now that we've got our model, we can predict to our fairfield data, like so: ```{r} fairfield_hh <- get_abs_per_capita_household_size(lga = "Fairfield (C)") fairfield_hh contact_model_pred <- predict_setting_contacts( population = fairfield, contact_model = contact_setting_model_not_sym, age_breaks = age_breaks_0_80_plus, per_capita_household_size = fairfield_hh ) ``` * `population` is our population to predict to * `contact_model` is our contact rate model for each setting * `age_breaks` are our age breaks to predict to * `per_capita_household_size` is the household size for that population, in our case we have a helper function, `get_abs_per_capita_household_size` which works for each LGA in Australia. alternatively, you can use the `estimate_setting_contacts` function to do a similar task: ```{r} contact_model_pred_est <- estimate_setting_contacts( contact_data_list = polymod_setting_data, survey_population = polymod_population, prediction_population = fairfield, age_breaks = age_breaks_0_80_plus, per_capita_household_size = fairfield_hh, symmetrical = FALSE ) ``` This is a bit briefer than the two step process, and might be preferable to creating a separate model.