---
title: "Data Sources"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Data Sources}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
We provide access to a variety of different data sources in `conmat`. Most of these are centred around Australian data, as the package was initially created for disease modelling work in Australia. The aim of this vignette is to give a quick tour of the data sources available in `conmat`.
```{r setup}
library(conmat)
```
## World data
We provide functions to clean up world population data from `socialmixr`.
```{r}
world_data <- socialmixr::wpp_age()
head(world_data)
```
We can tidy the data up, filtering down to a specified location and year with the `age_population` function:
```{r}
nz_2015 <- age_population(
data = world_data,
location_col = country,
location = "New Zealand",
age_col = lower.age.limit,
year_col = year,
year = 2015
)
nz_2015
```
This returns a `conmat_population` object, which is a data frame that knows which columns represent `age` and `population` information. This is useful for other modelling parts of the `conmat` package.
## Australian Bureau of Statistics (ABS) data
### Accessing Functions
We provide two functions to access LGA (Local Government Area), and state level population age data, which are provided in 5 year age bins from 0, 5, up to 85+. These data are `conmat_population` tibbles, which means that they know which columns represent the `age` and `population` information. This means that functions inside of `conmat` can work a bit smoother as we refer to these columns frequently.
#### `abs_age_lga()`
```{r abs-age-lga}
fairfield <- abs_age_lga(lga_name = "Fairfield (C)")
fairfield
```
Note that this is a `conmat_population` object, which prints in red at the top of the data frame. This provides the information on the `age` and `population` columns, stating: `age: lower.age.limit`, and `population: population`, indicating which columns refer to the appropriate variables.
Also note that `abs_age_lga` requires you to know the exact name of the LGA, you can see them in the dataset, `abs_lga_lookup`
```{r abs-lga-lookup}
abs_lga_lookup
```
And if you're not sure about a particular name of a place, you can use `agrep` and `filter`, to match on similar-ish characters, like so:
```{r abs-lga-lookup-filter}
library(dplyr)
abs_lga_lookup %>%
filter(agrepl("Sydney", lga))
```
#### `abs_age_state()`
This takes in the abbreviated state names, and is also a `conmat_population` object.
```{r abs-age-state}
abs_age_state(state_name = "NSW")
```
You can see these state names with:
```{r}
unique(abs_lga_lookup$state)
```
Note that "OT" stands for "other territories"
### ABS data
We provide other ABS data, listed now. You can read the full details of the data at their respective helpfiles, by writing, for example, `?abs_education_state`.
#### Education by state data for 2006 -2020
```{r abs-education-state}
abs_education_state
```
#### Education by state data for 2020
```{r abs-education-state-2020}
abs_education_state_2020
```
#### Employment by LGA for 2016
```{r abs-employ-age-lga}
abs_employ_age_lga
```
#### Number of people in each household by LGA for 2016
```{r abs-household-lga}
abs_household_lga
```
#### LGA age population for 2016 for all states and LGAs
```{r abs-pop-age-lga-2016}
abs_pop_age_lga_2016
```
#### LGA age population for 2020 for all states and LGAs
```{r abs-pop-age-lga-2020}
abs_pop_age_lga_2020
```
#### State age population for 2020
```{r abs-state-age}
abs_state_age
```
## Epidemiology / disease modelling data
### Transmission probabilities from Eyre
A dataset containing data digitised from "The impact of SARS-CoV-2 vaccination on Alpha & Delta variant transmission", by David W Eyre, Donald Taylor, Mark Purver, David Chapman, Tom Fowler, Koen B Pouwels, A Sarah Walker, Tim EA Peto
```{r eyre-transmission-probabilities}
eyre_transmission_probabilities
```
We can visualise the data like so:
```{r eyre-transmission-probabilities-plot}
library(ggplot2)
library(stringr)
library(dplyr)
eyre_transmission_probabilities %>%
group_by(
setting,
case_age_5y,
contact_age_5y
) %>%
summarise(
across(
probability,
mean
),
.groups = "drop"
) %>%
rename(
case_age = case_age_5y,
contact_age = contact_age_5y
) %>%
mutate(
across(
ends_with("age"),
~ factor(.x,
levels = str_sort(
unique(.x),
numeric = TRUE
)
)
)
) %>%
ggplot(
aes(
x = case_age,
y = contact_age,
fill = probability
)
) +
facet_wrap(~setting) +
geom_tile() +
scale_fill_viridis_c() +
coord_fixed() +
theme_minimal() +
theme(
axis.text = element_text(angle = 45, hjust = 1)
)
```