--- title: "Generating a matched cohort" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Generate a matched age and sex cohort} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction CohortConstructor packages includes a function to obtain an age and sex matched cohort, the `generateMatchedCohortSet()` function. In this vignette, we will explore the usage of this function. ## Create mock data We will first use `mockDrugUtilisation()` function from DrugUtilisation package to create mock data. ```{r setup, message = FALSE, warning = FALSE, eval = FALSE} library(CohortConstructor) library(dplyr) cdm <- mockCohortConstructor(nPerson = 1000) ``` As we will use `cohort1` to explore `generateMatchedCohortSet()`, let us first use `cohort_attrition()` from CDMConnector package to explore this cohort: ```{r, eval = FALSE} CDMConnector::cohort_set(cdm$cohort1) ``` # Use generateMatchedCohortSet() to create an age-sex matched cohort Let us first see an example of how this function works. For its usage, we need to provide a `cdm` object, the `targetCohortName`, which is the name of the table containing the cohort of interest, and the `name` of the new generated tibble containing the cohort and the matched cohort. We will also use the argument `targetCohortId` to specify that we only want a matched cohort for `cohort_definition_id = 1`. ```{r, eval = FALSE} cdm$matched_cohort1 <- matchCohorts( cohort = cdm$cohort1, cohortId = 1, name = "matched_cohort1") CDMConnector::cohort_set(cdm$matched_cohort1) ``` Notice that in the generated tibble, there are two cohorts: `cohort_definition_id = 1` (original cohort), and `cohort_definition_id = 4` (matched cohort). *target_cohort_name* column indicates which is the original cohort. *match_sex* and *match_year_of_birth* adopt boolean values (`TRUE`/`FALSE`) indicating if we have matched for sex and age, or not. *match_status* indicate if it is the original cohort (`target`) or if it is the matched cohort (`matched`). *target_cohort_id* indicates which is the cohort_id of the original cohort. Check the exclusion criteria applied to generate the new cohorts by using `cohort_attrition()` from CDMConnector package: ```{r, eval = FALSE} # Original cohort CDMConnector::cohort_attrition(cdm$matched_cohort1) |> filter(cohort_definition_id == 1) # Matched cohort CDMConnector::cohort_attrition(cdm$matched_cohort1) |> filter(cohort_definition_id == 4) ``` Briefly, from the original cohort, we exclude first those individuals that do not have a match, and then individuals that their matching pair is not in observation during the assigned *cohort_start_date*. From the matched cohort, we start from the whole database and we first exclude individuals that are in the original cohort. Afterwards, we exclude individuals that do not have a match, then individuals that are not in observation during the assigned *cohort_start_date*, and finally we remove as many individuals as required to fulfill the ratio. Notice that matching pairs are randomly assigned, so it is probable that every time you execute this function, the generated cohorts change. Use `set.seed()` to avoid this. ## matchSex parameter `matchSex` is a boolean parameter (`TRUE`/`FALSE`) indicating if we want to match by sex (`TRUE`) or we do not want to (`FALSE`). ## matchYear parameter `matchYear` is another boolean parameter (`TRUE`/`FALSE`) indicating if we want to match by age (`TRUE`) or we do not want (`FALSE`). Notice that if `matchSex = FALSE` and `matchYear = FALSE`, we will obtain an unmatched comparator cohort. ## ratio parameter The default matching ratio is 1:1 (`ratio = 1`). Use `cohort_counts()` from CDMConnector to check if the matching has been done as desired. ```{r, eval = FALSE} CDMConnector::cohort_count(cdm$matched_cohort1) ``` You can modify the `ratio` parameter to tailor your matched cohort. `ratio` can adopt values from 1 to Inf. ```{r, eval = FALSE} cdm$matched_cohort2 <- matchCohorts( cohort = cdm$cohort1, cohortId = 1, name = "matched_cohort2", ratio = Inf) CDMConnector::cohort_count(cdm$matched_cohort2) ``` ## Generate matched cohorts simultaneously across multiple cohorts All these functionalities can be implemented across multiple cohorts simultaneously. Specify in `targetCohortId` parameter which are the cohorts of interest. If set to NULL, all the cohorts present in `targetCohortName` will be matched. ```{r, eval = FALSE} cdm$matched_cohort3 <- matchCohorts( cohort = cdm$cohort1, cohortId = c(1,3), name = "matched_cohort3", ratio = 2) CDMConnector::cohort_set(cdm$matched_cohort3) |> arrange(cohort_definition_id) CDMConnector::cohort_count(cdm$matched_cohort3) |> arrange(cohort_definition_id) ``` Notice that each cohort has their own (and independent of other cohorts) matched cohort.