--- title: "Data Sources" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data Sources} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` We provide access to a variety of different data sources in `conmat`. Most of these are centred around Australian data, as the package was initially created for disease modelling work in Australia. The aim of this vignette is to give a quick tour of the data sources available in `conmat`. ```{r setup} library(conmat) ``` ## World data We provide functions to clean up world population data from `socialmixr`. ```{r} world_data <- socialmixr::wpp_age() head(world_data) ``` We can tidy the data up, filtering down to a specified location and year with the `age_population` function: ```{r} nz_2015 <- age_population( data = world_data, location_col = country, location = "New Zealand", age_col = lower.age.limit, year_col = year, year = 2015 ) nz_2015 ``` This returns a `conmat_population` object, which is a data frame that knows which columns represent `age` and `population` information. This is useful for other modelling parts of the `conmat` package. ## Australian Bureau of Statistics (ABS) data ### Accessing Functions We provide two functions to access LGA (Local Government Area), and state level population age data, which are provided in 5 year age bins from 0, 5, up to 85+. These data are `conmat_population` tibbles, which means that they know which columns represent the `age` and `population` information. This means that functions inside of `conmat` can work a bit smoother as we refer to these columns frequently. #### `abs_age_lga()` ```{r abs-age-lga} fairfield <- abs_age_lga(lga_name = "Fairfield (C)") fairfield ``` Note that this is a `conmat_population` object, which prints in red at the top of the data frame. This provides the information on the `age` and `population` columns, stating: `age: lower.age.limit`, and `population: population`, indicating which columns refer to the appropriate variables. Also note that `abs_age_lga` requires you to know the exact name of the LGA, you can see them in the dataset, `abs_lga_lookup` ```{r abs-lga-lookup} abs_lga_lookup ``` And if you're not sure about a particular name of a place, you can use `agrep` and `filter`, to match on similar-ish characters, like so: ```{r abs-lga-lookup-filter} library(dplyr) abs_lga_lookup %>% filter(agrepl("Sydney", lga)) ``` #### `abs_age_state()` This takes in the abbreviated state names, and is also a `conmat_population` object. ```{r abs-age-state} abs_age_state(state_name = "NSW") ``` You can see these state names with: ```{r} unique(abs_lga_lookup$state) ``` Note that "OT" stands for "other territories" ### ABS data We provide other ABS data, listed now. You can read the full details of the data at their respective helpfiles, by writing, for example, `?abs_education_state`. #### Education by state data for 2006 -2020 ```{r abs-education-state} abs_education_state ``` #### Education by state data for 2020 ```{r abs-education-state-2020} abs_education_state_2020 ``` #### Employment by LGA for 2016 ```{r abs-employ-age-lga} abs_employ_age_lga ``` #### Number of people in each household by LGA for 2016 ```{r abs-household-lga} abs_household_lga ``` #### LGA age population for 2016 for all states and LGAs ```{r abs-pop-age-lga-2016} abs_pop_age_lga_2016 ``` #### LGA age population for 2020 for all states and LGAs ```{r abs-pop-age-lga-2020} abs_pop_age_lga_2020 ``` #### State age population for 2020 ```{r abs-state-age} abs_state_age ``` ## Epidemiology / disease modelling data ### Transmission probabilities from Eyre A dataset containing data digitised from "The impact of SARS-CoV-2 vaccination on Alpha & Delta variant transmission", by David W Eyre, Donald Taylor, Mark Purver, David Chapman, Tom Fowler, Koen B Pouwels, A Sarah Walker, Tim EA Peto ```{r eyre-transmission-probabilities} eyre_transmission_probabilities ``` We can visualise the data like so: ```{r eyre-transmission-probabilities-plot} library(ggplot2) library(stringr) library(dplyr) eyre_transmission_probabilities %>% group_by( setting, case_age_5y, contact_age_5y ) %>% summarise( across( probability, mean ), .groups = "drop" ) %>% rename( case_age = case_age_5y, contact_age = contact_age_5y ) %>% mutate( across( ends_with("age"), ~ factor(.x, levels = str_sort( unique(.x), numeric = TRUE ) ) ) ) %>% ggplot( aes( x = case_age, y = contact_age, fill = probability ) ) + facet_wrap(~setting) + geom_tile() + scale_fill_viridis_c() + coord_fixed() + theme_minimal() + theme( axis.text = element_text(angle = 45, hjust = 1) ) ```