---
title: "Example Pipeline"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Example Pipeline}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(conmat)
```

This vignette outlines a basic workflow of:

* Create a new synthetic matrix by extrapolating from POLYMOD data to a new age distribution
* Generating a Next Generation Matrix
* Applying Vaccination Rates
* Comparing R0 before and post vaccination rates

## Create a new synthetic matrix from all POLYMOD data

We can create a synthetic matrix from all POLYMOD data by using the `extrapolate_polymod` function. First, let's extract an age distribution from the ABS data.

```{r fairfield}
fairfield <- abs_age_lga("Fairfield (C)")
fairfield
```

Note that this is a `conmat_population` object, which is just a data frame that knows which columns represent the `age` and `population` information.

We then extrapolate this to home, work, school, other and all settings, using the full POLYMOD data. This gives us a setting prediction matrix.

```{r extrapolate-fairfield}
age_breaks_0_80_plus <- c(seq(0, 80, by = 5), Inf)
synthetic_fairfield_5y <- extrapolate_polymod(
  population = fairfield,
  age_breaks = age_breaks_0_80_plus
)
synthetic_fairfield_5y
synthetic_fairfield_5y$home
```

By full POLYMOD data, we mean these data:

```{r polymod-setting-population}
polymod_setting <- get_polymod_setting_data()

polymod_population <- get_polymod_population()

polymod_setting
polymod_setting$home
polymod_population
```

The `extrapolate_polymod()` function does the following:

* Uses an already fit model (`polymod_setting_models`) of the contact rate to the full POLYMOD data above
* Predicts it to the provided fairfield population data

It also has options to predict to specified age brackets, defaulting to 5 year age groups up to 75, then 75 and older.

This object, `synthetic_fairfield_5y`, contains a matrix of predictions for each of the settings, home, work, school, other, and all settings, which is summarised when you print the object to the console:

```{r}
synthetic_fairfield_5y
```

You can see more detail by using `str` if you like:

```{r}
str(synthetic_fairfield_5y)
```

## Generating a Next Generation Matrix

Once infected, a person can transmit an infectious disease to another, creating generations of infected individuals. We can define a matrix describing the number of newly infected individuals in given categories, such as age, for consecutive generations. This matrix is called a "next generation matrix" (NGM).

We can generate an NGM using the population data

```{r}
fairfield_ngm_age_data <- generate_ngm(
  fairfield,
  age_breaks = age_breaks_0_80_plus,
  R_target = 1.5
)
```

Or if you've already got the fitted settings contact matrices, then you can pass that to `generate_ngm` instead:

```{r}
fairfield_ngm <- generate_ngm(
  synthetic_fairfield_5y,
  age_breaks = age_breaks_0_80_plus,
  R_target = 1.5
)
```

However, note in these cases the age breaks specified in `generate_ngm` must be the same as the age breaks specified in the synthetic contact matrix, otherwise it will error as it is trying to multiple incompatible matrices.

You can also specify your own transmission matrix, like so:

```{r}
# using our own transmission matrix
new_transmission_matrix <- get_setting_transmission_matrices(
  age_breaks = age_breaks_0_80_plus,
  # is normally 0.5
  asymptomatic_relative_infectiousness = 0.75
)

new_transmission_matrix

fairfield_ngm_0_80_new_tmat <- generate_ngm(
  synthetic_fairfield_5y,
  age_breaks = age_breaks_0_80_plus,
  R_target = 1.5,
  setting_transmission_matrix = new_transmission_matrix
)
```

We can also generate an NGM for Australian specific data like so, which refits and extrapolates the data based on the Australian state or LGA provided.

```{r ngm-fairfield}
ngm_fairfield <- generate_ngm_oz(
  lga_name = "Fairfield (C)",
  age_breaks = age_breaks_0_80_plus,
  R_target = 1.5
)
```

The output of this is a matrix for each of the settings, where each value is the number of newly infected individuals

```{r}
ngm_fairfield$home
str(ngm_fairfield)
```


## Applying Vaccination Rates

It is important to understand the effect of vaccination on the next
generation of infections. We can use `apply_vaccination()` to return the percentage reduction in acquisition and transmission in each age group. 

It takes two key arguments:

1. The next generation matrix
2. The vaccination effect data

The vaccination effect could look like the following:

```{r print-vaccination-effect}
vaccination_effect_example_data
```

Each row contains information, for each age band:

* Coverage % vaccinated
* Acquisition - probability of acquiring COVID
* Transmission - the probability of transmission

Then you need to specify the columns in the vaccination effect data frame related to coverage, acquisition, and transmission.

```{r}
# Apply vaccination effect to next generation matrices
ngm_nsw_vacc <- apply_vaccination(
  ngm = ngm_fairfield,
  data = vaccination_effect_example_data,
  coverage_col = coverage,
  acquisition_col = acquisition,
  transmission_col = transmission
)

ngm_nsw_vacc
```

# Fitting a new model with asymmetric terms

In the examples so far we have focussed on using `extrapolate_polymod` to fit the contact model - this is very useful because it doesn't involve many lines of code to fit:

```{r}
#| eval: FALSE
fairfield <- abs_age_lga("Fairfield (C)")
age_breaks_0_80_plus <- c(seq(0, 80, by = 5), Inf)
synthetic_fairfield_5y <- extrapolate_polymod(
  population = fairfield,
  age_breaks = age_breaks_0_80_plus
)
```

It also fits quite quickly, since it uses a pre-computed model, `polymod_setting_models`, (See `?polymod_setting_models` for more details).

Under the hood of `extrapolate_polymod`, this uses this already fit model for each setting (home, work, school, other), and then predicts using that model, and the provided data, to predict the new contact rates.

So the process is:

1. Create a model that predicts contact rate for each setting
2. Predict to a new population using that model

Let's show each step and unpack them.

First let's create a model that predicts contact rate for each setting:

```{r}
polymod_setting_data <- get_polymod_setting_data()
polymod_population <- get_polymod_population()

contact_setting_model_not_sym <- fit_setting_contacts(
  contact_data_list = polymod_setting_data,
  population = polymod_population,
  symmetrical = FALSE
)
```

Here, we first get the polymod setting data (`polymod_setting_data`), and the polymod population (`polymod_population`), to create a model for each setting. These data look like this, if you are interested.

```{r}
polymod_setting_data
polymod_population
```

We also specify the `symmetrical = FALSE` option - by default this is TRUE. Briefly, this changes some of the terms we use in creating the model, to use terms that aren't strictly symmetric. 

Now that we've got our model, we can predict to our fairfield data, like so:

```{r}
fairfield_hh <- get_abs_per_capita_household_size(lga = "Fairfield (C)")
fairfield_hh
contact_model_pred <- predict_setting_contacts(
  population = fairfield,
  contact_model = contact_setting_model_not_sym,
  age_breaks = age_breaks_0_80_plus,
  per_capita_household_size = fairfield_hh
)
```

* `population` is our population to predict to
* `contact_model` is our contact rate model for each setting
* `age_breaks` are our age breaks to predict to
* `per_capita_household_size` is the household size for that population, in our case we have a helper function, `get_abs_per_capita_household_size` which works for each LGA in Australia.

alternatively, you can use the `estimate_setting_contacts` function to do a similar task:

```{r}
contact_model_pred_est <- estimate_setting_contacts(
  contact_data_list = polymod_setting_data,
  survey_population = polymod_population,
  prediction_population = fairfield,
  age_breaks = age_breaks_0_80_plus,
  per_capita_household_size = fairfield_hh,
  symmetrical = FALSE
)
```

This is a bit briefer than the two step process, and might be preferable to creating a separate model.

<!-- ## Comparing R0 before and post vaccination rates -->


<!-- ## A note on transmission matrices -->

<!-- The initial phase of COVID-19 pandemic saw a significant age dependence in the distribution of confirmed cases with fewer confirmed cases among children. This might have been a result of younger ages being less susceptible to infection and/or less likely to exhibit clinical signs when infected [ [Davies et al](https://www.nature.com/articles/s41591-020-0962-9)] . A population's clinical fraction and age-varying susceptibility to infection both profoundly influence the likelihood that an infectious agent would spread there. This likelihood that an infectious agent would spread from an infected source to a new susceptible host and infect the host is known as transmission probability. -->

<!-- The age and setting specific relative per-contact transmission probability matrices when combined with contact matrices could be used to produce the setting-specific relative next generation matrices (NGMs) which is used to obtain the distribution of numbers of new cases in each generation of infection from any arbitrary initial number of introduced infections. Conmat creates these age and setting specific relative per-contact transmission probability matrices through `get_setting_transmission_matrices()` by making use of clinical fraction and relative susceptibility parameters from Davies et al available through `davies_age_extended`.  -->