---
title: "Path to Automation"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Path to Automation}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{css, echo = FALSE}
#pkgdown-sidebar{
display:none;
}
.wide .table{
font-size: 8px;
overflow: visible
}
```
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
```
The initial release of {sdtm.oak} provides a framework for modular programming of SDTM in R and sets the stage for potential automation of SDTM creation following the standardized SDTM specification. In the future, the automation workflow could involve preparing specifications and then making automated function calls to generate SDTM domains.
The future workflow for automation could look like:
1. Prepare SDTM specification: Users can define the raw data source, target SDTM domain, target SDTM variables, and algorithms used for automation. A template is still under development; details are also provided in this article.
2. Prepare SDTM-controlled Terminology: Users can define the SDTM-controlled terms applicable to the study. A template is still under development.
3. An automated process to read the specification and make {sdtm.oak} function calls can create the code required to generate SDTM datasets or the datasets themselves.
This article provides an overview of metadata and a draft version of the standard SDTM specification. We plan to demonstrate the creation of standard SDTM specs from the CDISC library in collaboration with CDISC COSA. Sponsors may need to establish the necessary tools to generate this SDTM specification from their MDR to utilize the automation features of {sdtm.oak}. It's worth mentioning that this concept draws inspiration from Roche's existing implementation of the SDTM automation process using OAK. I would like to inform you that further development is required for this concept.
Throughout this article, the term "metadata"
is used several times. In this context, "metadata" refers to the specific
metadata used by {sdtm.oak}. This article aims to provide users with a more
detailed understanding of the {sdtm.oak} metadata.
In general, metadata can be defined as "data about data." It does not include
any patient-level data. Instead, the metadata provides a blueprint of the data
that needs to be collected during a study.
## Standards Metadata
The standards metadata used in {sdtm.oak} is sourced from the CDISC Library or
sponsor MDR or any other form of documentation where standards are maintained.
This metadata provides information on the following:
- The relationship between Data Collection Standards (eCRF & eDT), SDTM mapping,
and Controlled Terminology
- Machine-readable standard SDTM mappings
- Algorithms and associated metadata required for the SDTM automation of
standards in the study.
In the upcoming releases of {sdtm.oak}, we will effectively utilize the
standards metadata and customize it to meet the study requirements.
## Study Definition Metadata
Study Definition Metadata is also referred to as Study Metadata. Study Definition
Metadata provides information about the eCRF and eDT data collected in the study.
**eCRF Metadata**
The eCRF Design Metadata is fetched from the EDC system. This Metadata includes
- Forms Metadata: Identifier, eCRF label, Repeating format and other properties of the eCRF.
- Fields Metadata: Identifier, question label, datatype, and other
properties of data collection fields in the study.
- Data Dictionaries: Identifier and the controlled terms collected at the
source.
- Visits: Name of the visits as defined in the EDC.
**eDT Metadata**
eDT Metadata is the blueprint metadata that describes the data collected as part of that external data transfer (from clinical sites to the sponsor). This includes
- Dataset name, label, repeating properties, etc.
- Variable name, datatype, label and associated codelist, etc.
## Study SDTM Mappings Metadata (specifications)
Study SDTM mappings metadata is the study SDTM specification. To develop the SDTM domains, {sdtm.oak} requires the user to prepare the Study SDTM mappings metadata. Unlike the conventional SDTM specification, which includes one tab per domain defining the target (SDTM domain, Variables) to source (raw dataset, raw variables) and SDTM mappings, the SDTM spec for {sdtm.oak} defines the source-to-target relationship. For each source, the SDTM mapping, algorithms, and associated metadata are defined. The table below presents the columns in the SDTM mapping specification and its explanation.
```{r echo = FALSE, results = "asis"}
library(knitr)
definition <- data.frame(
Variable_Name = c(
"study_number",
"raw_source_model",
"raw_dataset",
"raw_dataset_ordinal",
"raw_dataset_label",
"raw_variable",
"raw_variable_label",
"raw_variable_ordinal",
"raw_variable_type",
"raw_data_format",
"study_specific",
"annotation_ordinal",
"mapping_is_dataset",
"annotation_text",
"target_domain",
"target_sdtm_variable",
"target_sdtm_variable_role",
"target_sdtm_variable_codelist_code",
paste(
"target_sdtm_variable_",
"controlled_terms_or_format"
),
"target_sdtm_variable_ordinal",
"origin",
"mapping_algorithm",
"sub_algorithm",
"target_hardcoded_value",
"target_term_value",
"condition_add_raw_dat",
"condition_add_tgt_dat",
"merge_type",
"merge_left",
"merge_right",
"merge_condition",
"unduplicate_keys",
"groupby_keys"
),
`Description_of_the_variable` = c(
"Study Number",
"Data Collection model",
"Name of the raw or source dataset",
"Ordinal of the raw dataset as defined in EDC or eDT specification",
"Label of the raw or source dataset",
"Name of the raw variable",
"Label of the raw variable",
paste(
"Ordinal of the variable as defined in the eCRF or",
"eDT specification"
),
"Type of the Raw Variable",
"Data format of the raw variable",
paste(
"`TRUE` indicates that the source is study specific. ",
"`FALSE` indicates that the raw variable is part of data standards"
),
"Ordinal of the SDTM mappings for the particular raw source",
paste(
"Indicates if the SDTM mapping is at the dataset level. ",
"`TRUE` indicates that it is dataset level mapping."
),
"SDTM mapping text or annotation text",
"Name of the target domain.",
"Name of the target SDTM variable",
"CDISC Role for the SDTM target variable defined in the annotation.",
paste(
"NCI or sponsor code of the codelist assigned to the ",
"SDTM target variable defined in the annotation."
),
paste(
"Controlled terms or format for the target variable ",
"defined in the annotation (as defined per CDISC).",
"`target_sdtm_variable_controlled_terms_or_format` is required ",
"for SDTM Define.xml"
),
"Ordinal of the target SDTM variable",
"Origin of metadata source, values are subject to controlled terminology",
"Mapping Algorithm",
"The sub-algorithm (scenario) of the source-to-target mapping",
"Text (Hardcoded value) that applies to the target.",
paste(
"CDISC Submission value or sponsor value which represents a",
"hardcoded text"
),
paste(
"Condition that has to be applied at a raw dataset ",
" before applying a mapping. Can be a valid R filter statement."
),
paste(
"Condition that has to be applied at a target dataset ",
" before applying a mapping. Can be a valid R filter statement."
),
"Specifies the type of join",
"Specifies the left component of the merge",
"Specifies the right component of the merge",
paste(
"Specify the condition of the join (e.g. a specific ",
"variable that should match in the components of the merge)"
),
paste(
"Raw variables that should be used to determine whether ",
"an observation in the source data is a duplicate record and ",
"subject to being removed"
),
paste(
"Raw Variables or aggregation functions (i.e. earliest, ",
"latest) to group source data records before mapping to SDTM"
)
),
Example_Values = c(
"test_study",
"e-CRF or eDT",
"VTLS1, DEM",
"1, 2, 3, etc",
"Vital Signs,
Demographics",
"SEX_001,
BRTHDD",
"Systolic Blood Pressure,
Birth Day",
"1, 2, 3, etc",
"Text Box,
Date control",
"$200,
dd MON YYYY",
"TRUE, FALSE",
"1, 2, 3, etc",
"TRUE, FALSE",
"VS.VSORRES when VSTESTCD = 'SYSBP'",
"VS, MH",
"VSORRES, MHSTDTC",
"Topic Variable,
Grouping Qualifier,
Identifier Variable",
"C66742
C66790",
"(AGEU)
ISO 8601
(SEX)",
"1, 2, 3",
"Derived,
Assigned,
Collected,
Predecessor",
"condition_add
assign_ct
ae_aerel
hardcode_ct",
"assign_no_ct
hardcode_ct",
"ALZHEIMER'S DISEASE HISTORY",
"Y,
beats/min,
INFORMED CONSENT OBTAINED",
paste(
"Map qualifier CMSTRTPT Annotation text is If MDPRIOR == 1 ",
"then CM.CMSTRTPT = 'BEFORE'",
"raw_dat parameter as condition_add(cm_raw, MDPRIOR == 1)"
),
paste(
"Map qualifier CMDOSFRQ Annotation text is If CMTRT is not null",
" then map the collected value in raw dataset cm_raw and",
"raw variable MDFRQ to CMDOSFRQ",
"tgt_dat parameter as condition_add(., !is.na(CMTRT))"
),
"left_join
right_join
full_join
visit_join
subject_join",
"VTLS1",
"VACREC",
"VTLS1.SUBJECT = VACREC.SUBJECT,
MD1.MDNUM = VACREC.MDNUM",
"VTLS1.SUBJECT,
VTLS1.DATAPAGEID",
"TXINF1.DATAPGID,
Earliest"
),
Association_with_mapping_Algorithms = c(
"Generic Use",
"Generic Use",
"Required for all mapping algorithms",
"Generic Use",
"Generic Use",
"Generic Use",
"Generic Use",
"Generic Use",
"Required for all mapping algorithms",
"Required for all mapping algorithms",
"Generic Use",
"Required for all mapping algorithms",
"Required for all mapping algorithms",
"Generic Use",
"Required for all mapping algorithms",
"Required for all mapping algorithms",
"Required for all mapping algorithms",
"Required for all mapping algorithms",
"Generic Use",
"Required for all mapping algorithms",
"Used for define.xml",
"Required for all mapping algorithms",
"Only when Mapping Algorithm is
condition_add
dataset_level",
"assign_no_ct
hardcode_no_ct",
"harcode_ct",
"condition_add",
"condition_add",
"MERGE",
"MERGE",
"MERGE",
"MERGE",
"REMOVE_DUP",
"GROUP_BY"
),
stringsAsFactors = TRUE
)
knitr::kable(definition)
```