Type: | Package |
Title: | SDTM Data Transformation Engine |
Version: | 0.2.0 |
Maintainer: | Rammprasad Ganapathy <ganapathy.rammprasad@gene.com> |
Description: | An Electronic Data Capture system (EDC) and Data Standard agnostic solution that enables the pharmaceutical programming community to develop Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) datasets in R. The reusable algorithms concept in 'sdtm.oak' provides a framework for modular programming and can potentially automate the conversion of raw clinical data to SDTM through standardized SDTM specifications. SDTM is one of the required standards for data submission to the Food and Drug Administration (FDA) in the United States and Pharmaceuticals and Medical Devices Agency (PMDA) in Japan. SDTM standards are implemented following the SDTM Implementation Guide as defined by CDISC https://www.cdisc.org/standards/foundational/sdtmig. |
Language: | en-US |
License: | Apache License (≥ 2) |
Copyright: | F. Hoffmann-La Roche AG, Pattern Institute, Atorus Research LLC and Transition Technologies Science sp. z o.o. |
BugReports: | https://github.com/pharmaverse/sdtm.oak/issues |
URL: | https://pharmaverse.github.io/sdtm.oak/, https://github.com/pharmaverse/sdtm.oak |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 4.2) |
Imports: | admiraldev (≥ 1.1.0), dplyr (≥ 1.0.0), purrr (≥ 1.0.1), tidyr (≥ 1.2.0), rlang (≥ 1.0.2), tibble (≥ 3.2.0), vctrs (≥ 0.5.0), stringr (≥ 1.4.0), assertthat, pillar, cli |
Suggests: | knitr, htmltools, lifecycle, magrittr, rmarkdown, spelling, testthat (≥ 3.1.7), DT, readr |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | true |
NeedsCompilation: | no |
Packaged: | 2025-05-22 16:27:38 UTC; ganapar1 |
Author: | Rammprasad Ganapathy [aut, cre],
Adam Forys [aut],
Edgar Manukyan [aut],
Rosemary Li [aut],
Preetesh Parikh [aut],
Lisa Houterloot [aut],
Yogesh Gupta [aut],
Omar Garcia [aut],
Ramiro Magno |
Repository: | CRAN |
Date/Publication: | 2025-05-22 16:50:07 UTC |
sdtm.oak: SDTM Data Transformation Engine
Description
An Electronic Data Capture system (EDC) and Data Standard agnostic solution that enables the pharmaceutical programming community to develop Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) datasets in R. The reusable algorithms concept in 'sdtm.oak' provides a framework for modular programming and can potentially automate the conversion of raw clinical data to SDTM through standardized SDTM specifications. SDTM is one of the required standards for data submission to the Food and Drug Administration (FDA) in the United States and Pharmaceuticals and Medical Devices Agency (PMDA) in Japan. SDTM standards are implemented following the SDTM Implementation Guide as defined by CDISC https://www.cdisc.org/standards/foundational/sdtmig.
Author(s)
Maintainer: Rammprasad Ganapathy ganapathy.rammprasad@gene.com
Authors:
Adam Forys
Edgar Manukyan
Rosemary Li
Preetesh Parikh
Lisa Houterloot
Yogesh Gupta
Omar Garcia ogcalderon@cdisc.org
Ramiro Magno rmagno@pattern.institute (ORCID)
Kamil Sijko kamil.sijko@ttsi.com.pl (ORCID)
Shiyu Chen Shiyu.Chen@atorusresearch.com
Mohsin Uzzama mohsin.uzzama2@gmail.com
Other contributors:
Pattern Institute [copyright holder, funder]
F. Hoffmann-La Roche AG [copyright holder, funder]
Pfizer Inc [copyright holder, funder]
Transition Technologies Science [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/pharmaverse/sdtm.oak/issues
Explicit Dot Pipe
Description
This operator pipes an object forward into a function or call expression
using an explicit placement of the dot (.
) placeholder. Unlike magrittr's
%>% operator, %.>%
does not automatically place the
left-hand side (lhs
) as the first argument in the right-hand side (rhs
)
call. This operator provides a simpler alternative to the use of braces with
magrittr, while achieving similar behavior.
Usage
lhs %.>% rhs
Arguments
lhs |
A value to be piped forward. |
rhs |
A function call that utilizes the dot ( |
Details
The %.>%
operator is used to pipe the lhs
value into the rhs
function
call. Within the rhs
expression, the placeholder .
represents the
position where lhs
will be inserted. This provides more control over where
the lhs
value appears in the rhs
function call, compared to the magrittr
pipe operator which always places lhs
as the first argument of rhs
.
Unlike magrittr's pipe, which may require the use of braces to fully control
the placement of lhs
in nested function calls, %.>%
simplifies this by
directly allowing multiple usages of the dot placeholder without requiring
braces. For example, the following expression using magrittr's pipe and
braces:
library(magrittr) 1:10 %>% { c(min(.), max(.)) }
can be written as:
1:10 %.>% c(min(.), max(.))
without needing additional braces.
Downside
The disadvantage of %.>%
is that you always need to use
the dot placeholder, even when piping to the first argument of the
right-hand side (rhs
).
Value
No Return Value.
Examples
# Equivalent to `subset(head(iris), 1:nrow(head(iris)) %% 2 == 0)`
head(iris) %.>% subset(., 1:nrow(.) %% 2 == 0)
# Equivalent to `c(min(1:10), max(1:10))`
1:10 %.>% c(min(.), max(.))
Add ISO 8601 parsing problems
Description
add_problems()
annotates the returned value of create_iso8601()
with
possible parsing problems. This annotation consists of a
tibble of problems, one row for each parsing
failure (see Details section).
Usage
add_problems(x, is_problem, dtc)
Arguments
x |
A character vector of date-times in ISO 8601 format; typically, the
output of |
is_problem |
A |
dtc |
A list of |
Details
This function annotates its input x
, a vector date-times in ISO 8601
format, by creating an attribute named problems
. This attribute's value
is a tibble of parsing problems. The problematic
date/times are indicated by the logical
vector passed as argument to
is_problem
.
The attribute problems
in the returned value will contain a first column
named ..i
that indicates the date/time index of the problematic date/time
in x
, and as many extra columns as there were inputs (passed in dtc
). If
dtc
is named, then those names are used to name the extra columns,
otherwise they get named sequentially like so ..var1
, ..var2
, etc..
Value
Either x
without any modification, if no parsing problems exist,
or an annotated x
, meaning having a problems
attribute that holds
parsing issues (see the Details section).
Detect problems with the parsing of date/times
Description
any_problems()
takes a list of capture matrices (see parse_dttm()
) and
reports on parsing problems by means of predicate values. A FALSE
value
indicates that the parsing was successful and a TRUE
value a parsing
failure in at least one of the inputs to create_iso8601()
. Note that this
is an internal function to be used in the context of create_iso8601()
source code and hence each capture matrix corresponds to one input to
create_iso8601()
.
Usage
any_problems(cap_matrices, dtc, .cutoff_2000 = 68L)
Arguments
cap_matrices |
A list of capture matrices in the sense of the returned
value by |
dtc |
A list of |
.cutoff_2000 |
An integer value. Two-digit years smaller or equal to
|
Value
A logical
whose length matches the number of underlying date/times
passed as inputs to create_iso8601()
, i.e. whose length matches the
number of rows of the capture matrices in cap_matrices
.
Assert capture matrix
Description
assert_capture_matrix()
is an internal helper function aiding with the
checking of an internal R object that contains the parsing results as
returned by parse_dttm()
: capture matrix.
This function checks that the capture matrix is a matrix and that it contains
six columns: year
, mon
, mday
, hour
, min
and sec
.
Usage
assert_capture_matrix(m)
Arguments
m |
A character matrix. |
Value
This function throws an error if m
is not either:
A character matrix;
A matrix whose columns are (at least):
year
,mon
,mday
,hour
,min
andsec
.
Otherwise, it returns m
invisibly.
Assert a codelist code
Description
assert_ct_clst()
asserts the validity of a codelist code in the context of
a controlled terminology specification.
Usage
assert_ct_clst(ct_spec, ct_clst, optional = FALSE)
Arguments
ct_spec |
Either a data frame encoding a controlled terminology data set, or
|
ct_clst |
A string with a to-be asserted codelist code, or |
optional |
A scalar logical, indicating whether |
Value
The function throws an error if ct_clst
is not a valid codelist code
given the controlled terminology data set; otherwise, ct_clst
is returned
invisibly.
Assert a controlled terminology specification
Description
assert_ct_spec()
will check whether ct_spec
is a data frame and if it contains the
variables: codelist_code, collected_value, term_synonyms, and term_value.
In addition, it will also check if the data frame is not empty (no rows), and
whether the columns codelist_code
and term_value
do
not contain any NA
values.
Usage
assert_ct_spec(ct_spec, optional = FALSE)
Arguments
ct_spec |
A data frame to be asserted as a valid controlled terminology data set. |
Value
The function throws an error if ct_spec
is not a valid controlled
terminology data set; otherwise, ct_spec
is returned invisibly.
Assert date time character formats
Description
assert_dtc_fmt()
takes a character vector of date/time formats and checks if
the formats are supported, meaning it checks if they are one of the formats
listed in column fmt
of dtc_formats, failing with an error otherwise.
Usage
assert_dtc_fmt(fmt)
Arguments
fmt |
A character vector. |
Assert dtc format
Description
assert_dtc_format()
is an internal helper function aiding with the checking
of the .format
parameter of create_iso8601()
.
Usage
assert_dtc_format(.format)
Arguments
.format |
The argument of |
Value
This function throws an error if .format
is not either:
A character vector of formats permitted by
assert_dtc_fmt()
;A list of character vectors of formats permitted by
assert_dtc_fmt()
.
Otherwise, it returns .format
invisibly.
Derive an ISO8601 date-time variable
Description
assign_datetime()
maps one or more variables with date/time components in a
raw dataset to a target SDTM variable following the ISO8601 format.
Usage
assign_datetime(
tgt_dat = NULL,
tgt_var,
raw_dat,
raw_var,
raw_fmt,
raw_unk = c("UN", "UNK"),
id_vars = oak_id_vars(),
.warn = TRUE
)
Arguments
tgt_dat |
Target dataset: a data frame to be merged against |
tgt_var |
The target SDTM variable: a single string indicating the name of variable to be derived. |
raw_dat |
The raw dataset (dataframe); must include the
variables passed in |
raw_var |
The raw variable(s): a character vector indicating the name(s)
of the raw variable(s) in |
raw_fmt |
A date/time parsing format. Either a character vector or a
list of character vectors. If a character vector is passed then each
element is taken as parsing format for each variable indicated in
|
raw_unk |
A character vector of string literals to be regarded as missing values during parsing. |
id_vars |
Key variables to be used in the join between the raw dataset
( |
.warn |
Whether to warn about parsing failures. |
Value
The returned data set depends on the value of tgt_dat
:
If no target dataset is supplied, meaning that
tgt_dat
defaults toNULL
, then the returned data set israw_dat
, selected for the variables indicated inid_vars
, and a new extra column: the derived variable, as indicated intgt_var
.If the target dataset is provided, then it is merged with the raw data set
raw_dat
by the variables indicated inid_vars
, with a new column: the derived variable, as indicated intgt_var
.
Examples
# `md1`: an example raw data set.
md1 <-
tibble::tribble(
~oak_id, ~raw_source, ~patient_number, ~MDBDR, ~MDEDR, ~MDETM,
1L, "MD1", 375, NA, NA, NA,
2L, "MD1", 375, "15-Sep-20", NA, NA,
3L, "MD1", 376, "17-Feb-21", "17-Feb-21", NA,
4L, "MD1", 377, "4-Oct-20", NA, NA,
5L, "MD1", 377, "20-Jan-20", "20-Jan-20", "10:00:00",
6L, "MD1", 377, "UN-UNK-2019", "UN-UNK-2019", NA,
7L, "MD1", 377, "20-UNK-2019", "20-UNK-2019", NA,
8L, "MD1", 378, "UN-UNK-2020", "UN-UNK-2020", NA,
9L, "MD1", 378, "26-Jan-20", "26-Jan-20", "07:00:00",
10L, "MD1", 378, "28-Jan-20", "1-Feb-20", NA,
11L, "MD1", 378, "12-Feb-20", "18-Feb-20", NA,
12L, "MD1", 379, "10-UNK-2020", "20-UNK-2020", NA,
13L, "MD1", 379, NA, NA, NA,
14L, "MD1", 379, NA, "17-Feb-20", NA
)
# Using the raw data set `md1`, derive the variable CMSTDTC from MDBDR using
# the parsing format (`raw_fmt`) `"d-m-y"` (day-month-year), while allowing
# for the presence of special date component values (e.g. `"UN"` or `"UNK"`),
# indicating that these values are missing/unknown (unk).
cm1 <-
assign_datetime(
tgt_var = "CMSTDTC",
raw_dat = md1,
raw_var = "MDBDR",
raw_fmt = "d-m-y",
raw_unk = c("UN", "UNK")
)
cm1
# Inspect parsing failures associated with derivation of CMSTDTC.
problems(cm1$CMSTDTC)
# `cm_inter`: an example target data set.
cm_inter <-
tibble::tibble(
oak_id = 1L:14L,
raw_source = "MD1",
patient_number = c(
375, 375, 376, 377, 377, 377, 377, 378,
378, 378, 378, 379, 379, 379
),
CMTRT = c(
"BABY ASPIRIN",
"CORTISPORIN",
"ASPIRIN",
"DIPHENHYDRAMINE HCL",
"PARCETEMOL",
"VOMIKIND",
"ZENFLOX OZ",
"AMITRYPTYLINE",
"BENADRYL",
"DIPHENHYDRAMINE HYDROCHLORIDE",
"TETRACYCLINE",
"BENADRYL",
"SOMINEX",
"ZQUILL"
),
CMINDC = c(
"NA",
"NAUSEA",
"ANEMIA",
"NAUSEA",
"PYREXIA",
"VOMITINGS",
"DIARHHEA",
"COLD",
"FEVER",
"LEG PAIN",
"FEVER",
"COLD",
"COLD",
"PAIN"
)
)
# Same derivation as above but now involving the merging with the target
# data set `cm_inter`.
cm2 <-
assign_datetime(
tgt_dat = cm_inter,
tgt_var = "CMSTDTC",
raw_dat = md1,
raw_var = "MDBDR",
raw_fmt = "d-m-y"
)
cm2
# Inspect parsing failures associated with derivation of CMSTDTC.
problems(cm2$CMSTDTC)
# Derive CMSTDTC using both MDEDR and MDETM variables.
# Note that the format `"d-m-y"` is used for parsing MDEDR and `"H:M:S"` for
# MDETM (correspondence is by positional matching).
cm3 <-
assign_datetime(
tgt_var = "CMSTDTC",
raw_dat = md1,
raw_var = c("MDEDR", "MDETM"),
raw_fmt = c("d-m-y", "H:M:S"),
raw_unk = c("UN", "UNK")
)
cm3
# Inspect parsing failures associated with derivation of CMSTDTC.
problems(cm3$CMSTDTC)
Derive an SDTM variable
Description
-
assign_no_ct()
maps a variable in a raw dataset to a target SDTM variable that has no terminology restrictions. -
assign_ct()
maps a variable in a raw dataset to a target SDTM variable following controlled terminology recoding.
Usage
assign_no_ct(
tgt_dat = NULL,
tgt_var,
raw_dat,
raw_var,
id_vars = oak_id_vars()
)
assign_ct(
tgt_dat = NULL,
tgt_var,
raw_dat,
raw_var,
ct_spec,
ct_clst,
id_vars = oak_id_vars()
)
Arguments
tgt_dat |
Target dataset: a data frame to be merged against |
tgt_var |
The target SDTM variable: a single string indicating the name of variable to be derived. |
raw_dat |
The raw dataset (dataframe); must include the
variables passed in |
raw_var |
The raw variable: a single string indicating the name of the
raw variable in |
id_vars |
Key variables to be used in the join between the raw dataset
( |
ct_spec |
Study controlled terminology specification: a dataframe with a
minimal set of columns, see |
ct_clst |
A codelist code indicating which subset of the controlled terminology to apply in the derivation. |
Value
The returned data set depends on the value of tgt_dat
:
If no target dataset is supplied, meaning that
tgt_dat
defaults toNULL
, then the returned data set israw_dat
, selected for the variables indicated inid_vars
, and a new extra column: the derived variable, as indicated intgt_var
.If the target dataset is provided, then it is merged with the raw data set
raw_dat
by the variables indicated inid_vars
, with a new column: the derived variable, as indicated intgt_var
.
Examples
md1 <-
tibble::tibble(
oak_id = 1:14,
raw_source = "MD1",
patient_number = 101:114,
MDIND = c(
"NAUSEA", "NAUSEA", "ANEMIA", "NAUSEA", "PYREXIA",
"VOMITINGS", "DIARHHEA", "COLD",
"FEVER", "LEG PAIN", "FEVER", "COLD", "COLD", "PAIN"
)
)
assign_no_ct(
tgt_var = "CMINDC",
raw_dat = md1,
raw_var = "MDIND"
)
cm_inter <-
tibble::tibble(
oak_id = 1:14,
raw_source = "MD1",
patient_number = 101:114,
CMTRT = c(
"BABY ASPIRIN",
"CORTISPORIN",
"ASPIRIN",
"DIPHENHYDRAMINE HCL",
"PARCETEMOL",
"VOMIKIND",
"ZENFLOX OZ",
"AMITRYPTYLINE",
"BENADRYL",
"DIPHENHYDRAMINE HYDROCHLORIDE",
"TETRACYCLINE",
"BENADRYL",
"SOMINEX",
"ZQUILL"
),
CMROUTE = c(
"ORAL",
"ORAL",
NA,
"ORAL",
"ORAL",
"ORAL",
"INTRAMUSCULAR",
"INTRA-ARTERIAL",
NA,
"NON-STANDARD",
"RANDOM_VALUE",
"INTRA-ARTICULAR",
"TRANSDERMAL",
"OPHTHALMIC"
)
)
# Controlled terminology specification
(ct_spec <- read_ct_spec_example("ct-01-cm"))
assign_ct(
tgt_dat = cm_inter,
tgt_var = "CMINDC",
raw_dat = md1,
raw_var = "MDIND",
ct_spec = ct_spec,
ct_clst = "C66729"
)
# Variables are derived in sequence from multiple input sources.
# For each target variable, only missing (`NA`) values are filled
# during each step—previously assigned (non-missing) values are retained.
cm_raw <-
tibble::tibble(
oak_id = 1:4,
raw_source = "cm_raw",
patient_number = 370L + oak_id,
PATNUM = patient_number,
IT.CMTRT = c("BABY ASPIRIN", "CORTISPORIN", NA, NA),
IT.CMTRTOTH = c("Other Treatment - ", NA, "Other Treatment - Baby Aspirin", NA)
)
cm_raw
# Derivation of `CMTRT` first from `IT.CMTRT` and then from `IT.CMTRTOTH`.
assign_no_ct(
raw_dat = cm_raw,
raw_var = "IT.CMTRT",
tgt_var = "CMTRT"
) |>
assign_no_ct(
raw_dat = cm_raw,
raw_var = "IT.CMTRTOTH",
tgt_var = "CMTRT"
)
# Derivation of `CMTRT` first from `IT.CMTRTOTH` and then from `IT.CMTRT`.
assign_no_ct(
raw_dat = cm_raw,
raw_var = "IT.CMTRTOTH",
tgt_var = "CMTRT"
) |>
assign_no_ct(
raw_dat = cm_raw,
raw_var = "IT.CMTRT",
tgt_var = "CMTRT"
)
# Another example of variables derived in sequence from multiple input
# sources but now with controlled terminology remapping, in this case,
# CDISC Dose Unit (C71620) recoding.
cm_raw2 <- tibble::tibble(
oak_id = c(1:3, 6, 8:10, 12:14),
raw_source = "cm_raw",
patient_number = c(rep(375L, 2), 376:377, rep(378L, 3), rep(379L, 3)),
PATNUM = patient_number,
`IT.DOSUO` = c(NA, NA, NA, NA, NA, "Other Dose Unit", "cap", NA, NA, NA),
`IT.CMDOSU` = c("mg", "Gram", NA, "Tablet", "g", "mg", NA, "IU", "mL", "%")
)
assign_ct(
raw_dat = cm_raw2,
raw_var = "IT.DOSUO",
tgt_var = "CMDOSU",
ct_spec = ct_spec,
ct_clst = "C71620",
# Dose Unit
id_vars = oak_id_vars()
) |>
assign_ct(
raw_dat = cm_raw2,
raw_var = "IT.CMDOSU",
tgt_var = "CMDOSU",
ct_spec = ct_spec,
ct_clst = "C71620",
id_vars = oak_id_vars()
)
Calculate minimum and maximum date and time in the data frame
Description
This function derives the earliest/latest date as ISO8601 datetime
Usage
cal_min_max_date(
raw_dataset,
date_variable,
time_variable,
val_type = "min",
date_format,
time_format
)
Arguments
raw_dataset |
Raw source data frame |
date_variable |
Single character string. Name of the date variable |
time_variable |
Single character string. Name of the time variable |
val_type |
Single character string determining whether to look for the earliest or the latest datetime combination. Permitted values: "min", "max". Default to "min". |
date_format |
Format of source date variable |
time_format |
Format of source time variable |
Value
Data frame with 2 columns: unique patient_number and datetime variable column storing the earliest/latest datetime.
Examples
ex_raw <- tibble::tribble(
~patient_number, ~EX_ST_DT, ~EX_ST_TM,
"001", "25-04-2022", "10:20",
"001", "25-04-2022", "10:15",
"001", "25-04-2022", "10:19",
"002", "26-05-2022", "UNK:UNK",
"002", "26-05-2022", "05:59"
)
min <- cal_min_max_date(ex_raw,
date_variable = "EX_ST_DT",
time_variable = "EX_ST_TM",
val_type = "min",
date_format = "dd-mmm-yyyy",
time_format = "H:M"
)
max <- cal_min_max_date(ex_raw,
date_variable = "EX_ST_DT",
time_variable = "EX_ST_TM",
val_type = "max",
date_format = "dd-mmm-yyyy",
time_format = "H:M"
)
Coalesce capture matrices
Description
coalesce_capture_matrices()
combines several capture matrices into one.
Each argument of ...
should be a capture matrix in the sense of the output
by complete_capture_matrix()
, meaning a character matrix of six columns
whose names are: year
, mon
, mday
, hour
, min
or sec
.
Usage
coalesce_capture_matrices(...)
Arguments
... |
A sequence of capture matrices. |
Value
A single capture matrix whose values have been coalesced in the sense of coalesce().
Complete a capture matrix
Description
complete_capture_matrix()
completes the missing, if any, columns of the
capture matrix.
Usage
complete_capture_matrix(m)
Arguments
m |
A character matrix that might be missing one or more of the
following columns: |
Value
A character matrix that contains the columns year
, mon
, mday
,
hour
, min
and sec
. Any other existing columns are dropped.
Add filtering tags to a data set
Description
condition_add()
tags records in a data set, indicating which rows match the
specified conditions, resulting in a conditioned data frame. Learn how to
integrate conditioned data frames in your SDTM domain derivation in
vignette("cnd_df")
.
Usage
condition_add(dat, ..., .na = NA, .dat2 = rlang::env())
Arguments
dat |
A data frame. |
... |
Conditions to filter the data frame. |
.na |
Return value to be used when the conditions evaluate to |
.dat2 |
An optional environment to look for variables involved in
logical expression passed in |
Value
A conditioned data frame, meaning a tibble with an additional class
cnd_df
and a logical vector attribute indicating matching rows.
Examples
(df <- tibble::tibble(x = 1L:3L, y = letters[x]))
# Mark rows for which `x` greater than `1`
(cnd_df <- condition_add(dat = df, x > 1L))
Does a vector contain the raw dataset key variables?
Description
contains_oak_id_vars()
evaluates whether a character vector x
contains
the raw dataset key variable names, i.e. the so called Oak identifier
variables — these are defined by the return value of oak_id_vars()
.
Usage
contains_oak_id_vars(x)
Arguments
x |
A character vector. |
Value
A logical scalar value.
Convert date or time collected values to ISO 8601
Description
create_iso8601()
converts vectors of dates, times or date-times to ISO 8601 format. Learn more in
vignette("iso_8601")
.
Usage
create_iso8601(
...,
.format,
.fmt_c = fmt_cmp(),
.na = NULL,
.cutoff_2000 = 68L,
.check_format = FALSE,
.warn = TRUE
)
Arguments
... |
Character vectors of dates, times or date-times' components. |
.format |
Parsing format(s). Either a character vector or a list of
character vectors. If a character vector is passed then each element is
taken as parsing format for each vector passed in |
.fmt_c |
A list of regexps to use when parsing |
.na |
A character vector of string literals to be regarded as missing values during parsing. |
.cutoff_2000 |
An integer value. Two-digit years smaller or equal to
|
.check_format |
Whether to check the formats passed in |
.warn |
Whether to warn about parsing failures. |
Value
A vector of dates, times or date-times in ISO 8601 format
Examples
# Converting dates
create_iso8601(c("2020-01-01", "20200102"), .format = "y-m-d")
create_iso8601(c("2020-01-01", "20200102"), .format = "ymd")
create_iso8601(c("2020-01-01", "20200102"), .format = list(c("y-m-d", "ymd")))
# Two-digit years are supported
create_iso8601(c("20-01-01", "200101"), .format = list(c("y-m-d", "ymd")))
# `.cutoff_2000` sets the cutoff for two-digit to four-digit year conversion
# Default is at 68.
create_iso8601(c("67-01-01", "68-01-01", "69-01-01"), .format = "y-m-d")
# Change it to 80.
create_iso8601(c("79-01-01", "80-01-01", "81-01-01"), .format = "y-m-d", .cutoff_2000 = 80)
# Converting times
create_iso8601("15:10", .format = "HH:MM")
create_iso8601("2:10", .format = "HH:MM")
create_iso8601("2:1", .format = "HH:MM")
create_iso8601("02:01:56", .format = "HH:MM:SS")
create_iso8601("020156.5", .format = "HHMMSS")
# Converting date-times
create_iso8601("12 NOV 202015:15", .format = "dd mmm yyyyHH:MM")
# Indicate allowed missing values to make the parsing pass
create_iso8601("U DEC 201914:00", .format = "dd mmm yyyyHH:MM")
create_iso8601("U DEC 201914:00", .format = "dd mmm yyyyHH:MM", .na = "U")
create_iso8601("NOV 2020", .format = "m y")
create_iso8601(c("MAR 2019", "MaR 2020", "mar 2021"), .format = "m y")
create_iso8601("2019-04-041045-", .format = "yyyy-mm-ddHHMM-")
create_iso8601("20200507null", .format = "ymd(HH:MM:SS)")
create_iso8601("20200507null", .format = "ymd((HH:MM:SS)|null)")
# Fractional seconds
create_iso8601("2019-120602:20:13.1230001", .format = "y-mdH:M:S")
# Use different reserved characters in the format specification
# Here we change "H" to "x" and "M" to "w", for hour and minute, respectively.
create_iso8601("14H00M", .format = "HHMM")
create_iso8601("14H00M", .format = "xHwM", .fmt_c = fmt_cmp(hour = "x", min = "w"))
# Alternative formats with unknown values
datetimes <- c("UN UNK 201914:00", "UN JAN 2021")
format <- list(c("dd mmm yyyy", "dd mmm yyyyHH:MM"))
create_iso8601(datetimes, .format = format, .na = c("UN", "UNK"))
# Dates and times may come in many format variations
fmt <- "dd MMM yyyy HH nn ss"
fmt_cmp <- fmt_cmp(mon = "MMM", min = "nn", sec = "ss")
create_iso8601("05 feb 1985 12 55 02", .format = fmt, .fmt_c = fmt_cmp)
Recode according to controlled terminology
Description
ct_map()
recodes a vector following a controlled terminology.
Usage
ct_map(
x,
ct_spec = NULL,
ct_clst = NULL,
from = ct_spec_vars("from"),
to = ct_spec_vars("to")
)
Arguments
x |
A character vector of terms to be recoded following a controlled terminology. |
ct_spec |
A tibble providing a controlled terminology specification. |
ct_clst |
A character vector indicating a set of possible controlled
terminology codelists codes to be used for recoding. By default ( |
from |
A character vector of column names indicating the variables containing values to be matched against for terminology recoding. |
to |
A single string indicating the column whose values are to be recoded into. |
Value
A character vector of terminology recoded values from x
. If no
match is found in the controlled terminology spec provided in ct_spec
, then
x
values are returned in uppercase. If ct_spec
is not provided x
is
returned unchanged.
Examples
# A few example terms.
terms <-
c(
"/day",
"Yes",
"Unknown",
"Prior",
"Every 2 hours",
"Percentage",
"International Unit"
)
# Load a controlled terminology example
(ct_spec <- read_ct_spec_example("ct-01-cm"))
# Use all possible matching terms in the controlled terminology.
ct_map(x = terms, ct_spec = ct_spec)
# Note that if the controlled terminology mapping is restricted to a codelist
# code, e.g. C71113, then only `"/day"` and `"Every 2 hours"` get mapped to
# `"QD"` and `"Q2H"`, respectively; remaining terms won't match given the
# codelist code restriction, and will be mapped to an uppercase version of
# the original terms.
ct_map(x = terms, ct_spec = ct_spec, ct_clst = "C71113")
Controlled terminology mappings
Description
ct_mappings()
takes a controlled terminology specification and returns the
mappings in the form of a tibble in long format,
i.e. the recoding of values in the from
column to the to
column values,
one mapping per row.
The resulting mappings are unique, i.e. if from
values are duplicated in
two from
columns, the first column indicated in from
takes precedence,
and only that mapping is retained in the controlled terminology map.
Usage
ct_mappings(ct_spec, from = ct_spec_vars("from"), to = ct_spec_vars("to"))
Arguments
ct_spec |
Controlled terminology specification as a
tibble. Each row is for a mapped controlled term.
Controlled terms are expected in the column indicated by |
from |
A character vector of column names indicating the variables containing values to be recoded. |
to |
A single string indicating the column whose values are to be recoded into. |
Value
A tibble with two columns, from
and
to
, indicating the mapping of values, one per row.
Find the path to an example controlled terminology file
Description
ct_spec_example()
resolves the local path to an example controlled
terminology file.
Usage
ct_spec_example(example)
Arguments
example |
A string with either the basename, file name, or relative path
to a controlled terminology file bundled with |
Value
The local path to an example file if example
is supplied, or a
character vector of example file names.
Examples
# Get the local path to controlled terminology example file 01
# Using the basename only:
ct_spec_example("ct-01-cm")
# Using the file name:
ct_spec_example("ct-01-cm.csv")
# Using the relative path:
ct_spec_example("ct/ct-01-cm.csv")
# If no example is provided it returns a vector of possible choices.
ct_spec_example()
Controlled terminology variables
Description
ct_spec_vars()
returns the mandatory variables to be present in a data set
representing a controlled terminology. By default, it returns all required
variables.
If only the subset of variables used for matching terms are needed, then
request this subset of variables by passing the argument value "from"
. If
only the mapping-to variable is to be requested, then simply pass "to"
. If
only the codelist code variable name is needed then pass "ct_clst"
.
Usage
ct_spec_vars(set = c("all", "ct_clst", "from", "to"))
Arguments
set |
A scalar character (string), one of: |
Conditioned tibble pillar print method
Description
Conditioned tibble pillar print method
Usage
## S3 method for class 'cnd_df'
ctl_new_rowid_pillar(controller, x, width, ...)
Arguments
controller |
The object of class |
x |
A simple (one-dimensional) vector. |
width |
The available width, can be a vector for multiple tiers. |
... |
These dots are for future extensions and must be empty. |
Value
A character vector to print the tibble which is a conditioned dataframe.
See Also
Output a Dataset in a Vignette in the sdtm.oak Format
Description
Output a dataset in a vignette with the pre-specified sdtm.oak format.
Usage
dataset_oak_vignette(dataset, display_vars = NULL, filter = NULL)
Arguments
dataset |
Dataset to output in the vignette |
display_vars |
Variables selected to demonstrate the outcome of the mapping Permitted Values: list of variables Default is NULL If |
filter |
Filter condition The specified condition is applied to the dataset before it is displayed. Permitted Values: a condition |
Value
A HTML table
Derive Baseline Flag or Last Observation Before Exposure Flag
Description
Derive the baseline flag variable (--BLFL
) or the last observation before
exposure flag (--LOBXFL
), from the observation date/time (--DTC
), and a
DM domain reference date/time.
Usage
derive_blfl(
sdtm_in,
dm_domain,
tgt_var,
ref_var,
baseline_visits = character(),
baseline_timepoints = character()
)
Arguments
sdtm_in |
Input SDTM domain. |
dm_domain |
DM domain with the reference variable |
tgt_var |
Name of variable to be derived ( |
ref_var |
vector of a date/time from the Demographics (DM) dataset, which serves as a point of comparison for other observations in the study. Common choices for this reference variable include "RFSTDTC" (the date/time of the first study treatment) or "RFXSTDTC" (the date/time of the first exposure to the study drug). |
baseline_visits |
A character vector specifying the baseline visits within the study. These visits are identified as critical points for data collection at the start of the study, before any intervention is applied. This allows the function to assign the baseline flag if the –DTC matches to the reference date. |
baseline_timepoints |
A character vector of timepoints values in –TPT that specifies the specific timepoints during the baseline visits when key assessments or measurements were taken. This allows the function to assign the baseline flag if the –DTC matches to the reference date. |
Details
The derivation is as follows:
Remove records where the result (
--ORRES
) is missing. Also, exclude records with results labeled as "ND" (No Data) or "NOT DONE" in the--ORRES
column, which indicate that the measurement or observation was not completed.Remove records where the status (
--STAT
) indicates the observation or test was not performed, marked as "NOT DONE".Divide the date and time column (
--DTC
) and the reference date/time variable (ref_var
) into separate date and time components. Ignore any seconds recorded in the time component, focusing only on hours and minutes for further calculations.Set partial or missing dates to
NA
.Set partial or missing times to
NA
.Filter on rows that have domain and reference dates not equal to
NA
. (Ref to as X)Filter X on rows with domain date (–DTC) prior to (less than) reference date. (Ref to as A)
Filter X on rows with domain date (–DTC) equal to reference date but domain and reference times not equal to
NA
and domain time prior to (less than) reference time. (Ref to as B)Filter X on rows with domain date (–DTC) equal to reference date but domain and/or reference time equal to NA and:
VISIT is in baseline visits list (if it exists) and
xxTPT is in baseline timepoints list (if it exists). (Ref to as C)
Combine the rows from A, B, and C to get a data frame of pre-reference date observations. Sort the rows by
USUBJID
,--STAT
, and--ORRES
.Group by
USUBJID
and--TESTCD
and filter on the rows that have maximum value from--DTC
. Keep only the oak id variables and--TESTCD
(because these are the unique values). Remove any duplicate rows. Assign the baseline flag variable,--BLFL
, the last observation before exposure flag (--LOBXFL
) variable to these rows.Join the baseline flag onto the input dataset based on oak id vars
Value
Modified input data frame with baseline flag variable --BLFL
or
last observation before exposure flag --LOBXFL
added.
Examples
dm <- tibble::tribble(
~USUBJID, ~RFSTDTC, ~RFXSTDTC,
"test_study-375", "2020-09-28T10:10", "2020-09-28T10:10",
"test_study-376", "2020-09-21T11:00", "2020-09-21T11:00",
"test_study-377", NA, NA,
"test_study-378", "2020-01-20T10:00", "2020-01-20T10:00",
"test_study-379", NA, NA,
)
dm
sdtm_in <-
tibble::tribble(
~DOMAIN,
~oak_id,
~raw_source,
~patient_number,
~USUBJID,
~VSDTC,
~VSTESTCD,
~VSORRES,
~VSSTAT,
~VISIT,
"VS",
1L,
"VTLS1",
375L,
"test_study-375",
"2020-09-01T13:31",
"DIABP",
"90",
NA,
"SCREENING",
"VS",
2L,
"VTLS1",
375L,
"test_study-375",
"2020-10-01T11:20",
"DIABP",
"90",
NA,
"SCREENING",
"VS",
1L,
"VTLS1",
375L,
"test_study-375",
"2020-09-28T10:10",
"PULSE",
"ND",
NA,
"SCREENING",
"VS",
2L,
"VTLS1",
375L,
"test_study-375",
"2020-10-01T13:31",
"PULSE",
"85",
NA,
"SCREENING",
"VS",
1L,
"VTLS2",
375L,
"test_study-375",
"2020-09-28T10:10",
"SYSBP",
"120",
NA,
"SCREENING",
"VS",
2L,
"VTLS2",
375L,
"test_study-375",
"2020-09-28T10:05",
"SYSBP",
"120",
NA,
"SCREENING",
"VS",
1L,
"VTLS1",
376L,
"test_study-376",
"2020-09-20",
"DIABP",
"75",
NA,
"SCREENING",
"VS",
1L,
"VTLS1",
376L,
"test_study-376",
"2020-09-20",
"PULSE",
NA,
"NOT DONE",
"SCREENING",
"VS",
2L,
"VTLS1",
376L,
"test_study-376",
"2020-09-20",
"PULSE",
"110",
NA,
"SCREENING",
"VS",
2L,
"VTLS1",
378L,
"test_study-378",
"2020-01-20T10:00",
"PULSE",
"110",
NA,
"SCREENING",
"VS",
3L,
"VTLS1",
378L,
"test_study-378",
"2020-01-21T11:00",
"PULSE",
"105",
NA,
"SCREENING"
)
sdtm_in
# Example 1:
observed_output <- derive_blfl(
sdtm_in = sdtm_in,
dm_domain = dm,
tgt_var = "VSLOBXFL",
ref_var = "RFXSTDTC",
baseline_visits = c("SCREENING")
)
observed_output
# Example 2:
observed_output2 <- derive_blfl(
sdtm_in = sdtm_in,
dm_domain = dm,
tgt_var = "VSLOBXFL",
ref_var = "RFXSTDTC",
baseline_timepoints = c("PRE-DOSE")
)
observed_output2
# Example 3: Output is the same as Example 2
observed_output3 <- derive_blfl(
sdtm_in = sdtm_in,
dm_domain = dm,
tgt_var = "VSLOBXFL",
ref_var = "RFXSTDTC",
baseline_visits = c("SCREENING"),
baseline_timepoints = c("PRE-DOSE")
)
observed_output3
Derive the sequence number (--SEQ
) variable
Description
derive_seq()
creates a new identifier variable: the sequence number
(--SEQ
).
This function adds a newly derived variable to tgt_dat
, namely the sequence
number (--SEQ
) whose name is the one provided in tgt_var
. An integer
sequence is generated that uniquely identifies each record within the domain.
Prior to the derivation of tgt_var
, the data frame tgt_dat
is sorted
according to grouping variables indicated in rec_vars
.
Usage
derive_seq(
tgt_dat,
tgt_var,
rec_vars,
sbj_vars = sdtm.oak::sbj_vars(),
start_at = 1L
)
Arguments
tgt_dat |
The target dataset, a data frame. |
tgt_var |
The target SDTM variable: a single string indicating the name
of the sequence number ( |
rec_vars |
A character vector of record-level identifier variables. |
sbj_vars |
A character vector of subject-level identifier variables. |
start_at |
The sequence numbering starts at this value (default is |
Value
Returns the data frame supplied in tgt_dat
with the newly derived
variable, i.e. the sequence number (--SEQ
), whose name is that passed in
tgt_var
. This variable is of type integer.
Examples
# A VS raw data set example
(vs <- read_domain_example("vs"))
# Derivation of VSSEQ
rec_vars <- c("STUDYID", "USUBJID", "VSTESTCD", "VSDTC", "VSTPTNUM")
derive_seq(tgt_dat = vs, tgt_var = "VSSEQ", rec_vars = rec_vars)
# An APSC raw data set example
(apsc <- read_domain_example("apsc"))
# Derivation of APSEQ
derive_seq(
tgt_dat = apsc,
tgt_var = "APSEQ",
rec_vars = c("STUDYID", "RSUBJID", "SCTESTCD"),
sbj_vars = c("STUDYID", "RSUBJID")
)
derive_study_day
performs study day calculation
Description
This function takes the an input data frame and a reference data frame (which is DM domain in most cases), and calculate the study day from reference date and target date. In case of unexpected conditions like reference date is not unique for each patient, or reference and input dates are not actual dates, NA will be returned for those records.
Usage
derive_study_day(
sdtm_in,
dm_domain,
tgdt,
refdt,
study_day_var,
merge_key = "USUBJID"
)
Arguments
sdtm_in |
Input data frame that contains the target date. |
dm_domain |
Reference date frame that contains the reference date. |
tgdt |
Target date from |
refdt |
Reference date from |
study_day_var |
New study day variable name in the output. For example, AESTDY for AE domain and CMSTDY for CM domain. |
merge_key |
Character to represent the merging key between |
Value
Data frame that takes all columns from sdtm_in
and a new variable
to represent the calculated study day.
Examples
ae <- data.frame(
USUBJID = c("study123-123", "study123-124", "study123-125"),
AESTDTC = c("2012-01-01", "2012-04-14", "2012-04-14")
)
dm <- data.frame(
USUBJID = c("study123-123", "study123-124", "study123-125"),
RFSTDTC = c("2012-02-01", "2012-04-14", NA)
)
ae$AESTDTC <- as.Date(ae$AESTDTC)
dm$RFSTDTC <- as.Date(dm$RFSTDTC)
derive_study_day(ae, dm, "AESTDTC", "RFSTDTC", "AESTDY")
Find the path to an example SDTM domain file
Description
domain_example()
resolves the local path to a SDTM domain example file. The
domain examples files were imported from
pharmaversesdtm. See
Details section for available datasets.
Usage
domain_example(example)
Arguments
example |
A string with either the basename, file name, or relative path
to a SDTM domain example file bundled with |
Details
Datasets were obtained from
pharmaversesdtm but are
originally sourced from the CDISC pilot project or have been
constructed ad-hoc by the
admiral team. These datasets
are bundled with {sdtm.oak}
, thus obviating a dependence on
{pharmaversesdtm}
.
Example SDTM domains
-
"ae"
: Adverse Events (AE) data set. -
"apsc"
: Associated Persons Subject Characteristics (APSC) data set. -
"cm"
: Concomitant Medications (CM) data set. -
"vs"
: Vital Signs (VS) data set.
Value
The local path to an example file if example
is supplied, or a
character vector of example file names.
Source
See https://cran.r-project.org/package=pharmaversesdtm.
See Also
Examples
# If no example is provided it returns a vector of possible choices.
domain_example()
# Get the local path to the Concomitant Medication dataset file.
domain_example("cm")
# Local path to the Adverse Events dataset file.
domain_example("ae")
Extract date part from ISO8601 date/time variable
Description
The date part is extracted from an ISO8601 date/time variable. By default, partial or missing dates are set to NA.
Usage
dtc_datepart(dtc, partial_as_na = TRUE)
Arguments
dtc |
Character vector containing ISO8601 date/times. |
partial_as_na |
Logical |
Value
Character vector containing ISO8601 dates.
Date/time collection formats
Description
Date/time collection formats
Usage
dtc_formats
Format
A tibble of 20 formats with three variables:
fmt
Format string.
type
Whether a date, time or date-time.
description
Description of which date-time components are parsed.
Examples
dtc_formats
Extract time part from ISO 8601 date/time variable
Description
The time part is extracted from an ISO 8601 date/time variable. By default, partial or missing times are set to NA, and seconds are ignored and not extracted.
Usage
dtc_timepart(dtc, partial_as_na = TRUE, ignore_seconds = TRUE)
Arguments
dtc |
Character vector containing ISO 8601 date/times. |
partial_as_na |
Logical |
ignore_seconds |
Logical |
Value
Character vector containing ISO 8601 times.
Convert a parsed date/time format to regex
Description
dttm_fmt_to_regex()
takes a tibble of parsed
date/time format components (as returned by parse_dttm_fmt()
), and a
mapping of date/time component formats to regexps and generates a single
regular expression with groups for matching each of the date/time components.
Usage
dttm_fmt_to_regex(
fmt,
fmt_regex = fmt_rg(),
fmt_c = fmt_cmp(),
anchored = TRUE
)
Arguments
fmt |
A format string (scalar) to be parsed by |
fmt_regex |
A named character vector of regexps, one for each date/time component. |
anchored |
Whether the final regex should be anchored, i.e. bounded by
|
Value
A string containing a regular expression for matching date/time components according to a format.
Evaluate conditions
Description
eval_conditions()
evaluates a set of conditions in the context of a
data frame and an optional environment.
The utility of this function is to provide an easy way to generate a logical
vector of matching records from a set of logical conditions involving
variables in a data frame (dat
) and optionally in a supplementary
environment (.env
). The set of logical conditions are provided as
expressions to be evaluated in the context of dat
and .env
.
Variables are looked up in dat
, then in .env
, then in the calling
function's environment, followed by its parent environments.
Usage
eval_conditions(dat, ..., .na = NA, .env = rlang::caller_env())
Arguments
dat |
A data frame |
... |
A set of logical conditions, e.g. |
.na |
Return value to be used when the conditions evaluate to |
.env |
An optional environment to look for variables involved in logical
expression passed in |
Value
A logical vector reflecting matching rows in dat
.
Find gap intervals in integer sequences
Description
find_int_gap()
determines the start
and end
positions for gap intervals
in a sequence of integers. By default, the interval range to look for gaps is
defined by the minimum and maximum values of x
; specify xmin
and xmax
to change the range explicitly.
Usage
find_int_gap(x, xmin = min(x), xmax = max(x))
Arguments
x |
An integer vector. |
xmin |
Left endpoint integer value. |
xmax |
Right endpoint integer value. |
Value
A tibble of gap intervals of two columns:
-
start
: left endpoint -
end
: right endpoint If no gap intervals are found then an empty tibble is returned.
Regexps for date/time format components
Description
fmt_cmp()
creates a character vector of patterns to match individual
format date/time components.
Usage
fmt_cmp(
sec = "S+",
min = "M+",
hour = "H+",
mday = "d+",
mon = "m+",
year = "y+"
)
Arguments
sec |
A string pattern for matching the second format component. |
min |
A string pattern for matching the minute format component. |
hour |
A string pattern for matching the hour format component. |
mday |
A string pattern for matching the month day format component. |
mon |
A string pattern for matching the month format component. |
year |
A string pattern for matching the year format component. |
Value
A named character vector of date/time format patterns. This a vector of six elements, one for each date/time component.
Examples
# Regexps to parse format components
fmt_cmp()
fmt_cmp(year = "yyyy")
Regexps for date/time components
Description
fmt_rg()
creates a character vector of named patterns to match individual
date/time components.
Usage
fmt_rg(
sec = "(\\b\\d|\\d{2})(\\.\\d*)?",
min = "(\\b\\d|\\d{2})",
hour = "\\d?\\d",
mday = "\\b\\d|\\d{2}",
mon = stringr::str_glue("\\d\\d|{months_abb_regex()}"),
year = "(\\d{2})?\\d{2}",
na = NULL,
sec_na = na,
min_na = na,
hour_na = na,
mday_na = na,
mon_na = na,
year_na = na
)
Arguments
sec |
Regexp for the second component. |
min |
Regexp for the minute component. |
hour |
Regexp for the hour component. |
mday |
Regexp for the month day component. |
mon |
Regexp for the month component. |
year |
Regexp for the year component. |
na |
Regexp of alternatives, useful to match special values coding for missingness. |
sec_na |
Same as |
min_na |
Same as |
hour_na |
Same as |
mday_na |
Same as |
mon_na |
Same as |
year_na |
Same as |
Value
A named character vector of named patterns (regexps) for matching each date/time component.
Convert date/time components into ISO8601 format
Description
format_iso8601()
takes a character matrix of date/time components and
converts each component to ISO8601 format. In practice this entails
converting years to a four digit number, and month, day, hours, minutes and
seconds to two-digit numbers. Not available (NA
) components are converted
to "-"
.
Usage
format_iso8601(m, .cutoff_2000 = 68L)
Arguments
m |
A character matrix of date/time components. It must have six
named columns: |
.cutoff_2000 |
An integer value. Two-digit years smaller or equal to
|
Value
A character vector with date-times following the ISO8601 format.
A function to generate oak_id_vars
Description
A function to generate oak_id_vars
Usage
generate_oak_id_vars(raw_dat, pat_var, raw_src)
Arguments
raw_dat |
The raw dataset (dataframe) |
pat_var |
Variable that holds the patient number |
raw_src |
Name of the raw source |
Value
dataframe
Examples
raw_dataset <-
tibble::tribble(
~patnum, ~MDRAW,
101L, "BABY ASPIRIN",
102L, "CORTISPORIN",
103L, NA_character_,
104L, "DIPHENHYDRAMINE HCL"
)
# Generate oak_id_vars
generate_oak_id_vars(
raw_dat = raw_dataset,
pat_var = "patnum",
raw_src = "Concomitant Medication"
)
Function to generate final SDTM domain and supplemental domain SUPP–
Description
Function to generate final SDTM domain and supplemental domain SUPP–
Usage
generate_sdtm_supp(
sdtm_dataset,
idvar = NULL,
supp_qual_info,
qnam_var,
label_var,
orig_var
)
Arguments
sdtm_dataset |
SDTM output used to split supplemental domains. |
idvar |
Variable name for IDVAR variable. |
supp_qual_info |
User-defined data frame of specifications for suppquals
which contains |
qnam_var |
Variable name in user-defined |
label_var |
Variable name in user-defined |
orig_var |
Variable name in user-defined |
Value
List of SDTM domain with suppquals dropped and corresponding supplemental domain.
Examples
dm <- read_domain_example("dm")
supp_qual_info <- read.csv(system.file("spec/suppqual_spec.csv", package = "sdtm.oak"))
dm_suppdm <-
generate_sdtm_supp(
dm,
idvar = NULL,
supp_qual_info = supp_qual_info,
qnam_var = "Variable",
label_var = "Label",
orig_var = "Origin"
)
Get the conditioning vector from a conditioned data frame
Description
get_cnd_df_cnd()
extracts the conditioning vector from a conditioned data
frame, i.e. from an object of class cnd_df
.
Usage
get_cnd_df_cnd(dat)
Arguments
dat |
A conditioned data frame ( |
Value
The conditioning vector (cnd
) if dat
is a conditioned data frame
(cnd_df
), otherwise NULL
, or NULL
if dat
is not a conditioned data
frame (cnd_df
).
See Also
new_cnd_df()
, is_cnd_df()
, get_cnd_df_cnd_sum()
,
rm_cnd_df()
.
Get the summary of the conditioning vector from a conditioned data frame
Description
get_cnd_df_cnd_sum()
extracts the tally of the conditioning vector from a
conditioned data frame.
Usage
get_cnd_df_cnd_sum(dat)
Arguments
dat |
A conditioned data frame ( |
Value
A vector of three elements providing the sum of TRUE
, FALSE
, and
NA
values in the conditioning vector (cnd
), or NULL
if dat
is not
a conditioned data frame (cnd_df
).
See Also
new_cnd_df()
, is_cnd_df()
, get_cnd_df_cnd()
, rm_cnd_df()
.
Derive an SDTM variable with a hardcoded value
Description
-
hardcode_no_ct()
maps a hardcoded value to a target SDTM variable that has no terminology restrictions. -
hardcode_ct()
maps a hardcoded value to a target SDTM variable with controlled terminology recoding.
Usage
hardcode_no_ct(
tgt_dat = NULL,
tgt_val,
raw_dat,
raw_var,
tgt_var,
id_vars = oak_id_vars()
)
hardcode_ct(
tgt_dat = NULL,
tgt_val,
raw_dat,
raw_var,
tgt_var,
ct_spec,
ct_clst,
id_vars = oak_id_vars()
)
Arguments
tgt_dat |
Target dataset: a data frame to be merged against |
tgt_val |
The target SDTM value to be hardcoded into the variable
indicated in |
raw_dat |
The raw dataset (dataframe); must include the
variables passed in |
raw_var |
The raw variable: a single string indicating the name of the
raw variable in |
tgt_var |
The target SDTM variable: a single string indicating the name of variable to be derived. |
id_vars |
Key variables to be used in the join between the raw dataset
( |
ct_spec |
Study controlled terminology specification: a dataframe with a
minimal set of columns, see |
ct_clst |
A codelist code indicating which subset of the controlled
terminology to apply in the derivation. This parameter is optional, if left
as |
Value
The returned data set depends on the value of tgt_dat
:
If no target dataset is supplied, meaning that
tgt_dat
defaults toNULL
, then the returned data set israw_dat
, selected for the variables indicated inid_vars
, and a new extra column: the derived variable, as indicated intgt_var
.If the target dataset is provided, then it is merged with the raw data set
raw_dat
by the variables indicated inid_vars
, with a new column: the derived variable, as indicated intgt_var
.
Examples
md1 <-
tibble::tribble(
~oak_id, ~raw_source, ~patient_number, ~MDRAW,
1L, "MD1", 101L, "BABY ASPIRIN",
2L, "MD1", 102L, "CORTISPORIN",
3L, "MD1", 103L, NA_character_,
4L, "MD1", 104L, "DIPHENHYDRAMINE HCL"
)
# Derive a new variable `CMCAT` by overwriting `MDRAW` with the
# hardcoded value "GENERAL CONCOMITANT MEDICATIONS".
hardcode_no_ct(
tgt_val = "GENERAL CONCOMITANT MEDICATIONS",
raw_dat = md1,
raw_var = "MDRAW",
tgt_var = "CMCAT"
)
cm_inter <-
tibble::tribble(
~oak_id, ~raw_source, ~patient_number, ~CMTRT, ~CMINDC,
1L, "MD1", 101L, "BABY ASPIRIN", NA,
2L, "MD1", 102L, "CORTISPORIN", "NAUSEA",
3L, "MD1", 103L, "ASPIRIN", "ANEMIA",
4L, "MD1", 104L, "DIPHENHYDRAMINE HCL", "NAUSEA",
5L, "MD1", 105L, "PARACETAMOL", "PYREXIA"
)
# Derive a new variable `CMCAT` by overwriting `MDRAW` with the
# hardcoded value "GENERAL CONCOMITANT MEDICATIONS" with a prior join to
# `target_dataset`.
hardcode_no_ct(
tgt_dat = cm_inter,
tgt_val = "GENERAL CONCOMITANT MEDICATIONS",
raw_dat = md1,
raw_var = "MDRAW",
tgt_var = "CMCAT"
)
# Controlled terminology specification
(ct_spec <- read_ct_spec_example("ct-01-cm"))
# Hardcoding of `CMCAT` with the value `"GENERAL CONCOMITANT MEDICATIONS"`
# involving terminology recoding. `NA` values in `MDRAW` are preserved in
# `CMCAT`.
hardcode_ct(
tgt_dat = cm_inter,
tgt_var = "CMCAT",
raw_dat = md1,
raw_var = "MDRAW",
tgt_val = "GENERAL CONCOMITANT MEDICATIONS",
ct_spec = ct_spec,
ct_clst = "C66729"
)
# Variables are derived in sequence from multiple input sources.
# For each target variable, only missing (`NA`) values are filled
# during each step—previously assigned (non-missing) values are retained.
cm_raw <-
tibble::tibble(
oak_id = 1:4,
raw_source = "cm_raw",
patient_number = 370 + oak_id,
PATNUM = patient_number,
IT.CMTRT = c("BABY ASPIRIN", "CORTISPORIN", NA, NA),
IT.CMTRTOTH = c("Other Treatment - ", NA, "Other Treatment - Baby Aspirin", NA)
)
cm_raw
# Hardcoding of values of `CMCAT` is based firstly on the presence of missing
# values (`NA`) in `IT.CMTRT` and only secondly on `IT.CMTRTOTH`.
hardcode_no_ct(
tgt_val = "General Concomitant Medications",
raw_dat = cm_raw,
raw_var = "IT.CMTRT",
tgt_var = "CMCAT"
) |>
hardcode_no_ct(
tgt_val = "Other General Concomitant Medications",
raw_dat = cm_raw,
raw_var = "IT.CMTRTOTH",
tgt_var = "CMCAT"
)
# Note that hardcoding application is reversed in this example, this impacts
# the result.
hardcode_no_ct(
tgt_val = "Other General Concomitant Medications",
raw_dat = cm_raw,
raw_var = "IT.CMTRTOTH",
tgt_var = "CMCAT"
) |>
hardcode_no_ct(
tgt_val = "General Concomitant Medications",
raw_dat = cm_raw,
raw_var = "IT.CMTRT",
tgt_var = "CMCAT"
)
Determine Indices for Recoding
Description
index_for_recode()
identifies the positions of elements in x
that match
any of the values specified in the from
vector. This function is primarily
used to facilitate the recoding of values by pinpointing which elements in
x
correspond to the from
values and thus need to be replaced or updated.
Usage
index_for_recode(x, from)
Arguments
x |
A vector of values in which to search for matches. |
from |
A vector of values to match against the elements in |
Value
An integer vector of the same length as x
, containing the indices
of the matched values from the from
vector. If an element in x
does not
match any value in from
, the corresponding position in the output will be
NA
. This index information is critical for subsequent recoding operations.
Inform on the mappability of terms to controlled terminology
Description
inform_on_ct_mappability()
checks whether all values in x
can be mapped
using the controlled terminology terms in from
. It raises an informative
message if any values in x
are not mappable.
Usage
inform_on_ct_mappability(x, from)
Arguments
x |
A character vector of terms to be checked. |
from |
A character vector of valid controlled terminology terms. |
Value
Invisibly returns TRUE
if all terms are mappable; otherwise,
prints an informative message and returns FALSE
invisibly.
Check if a data frame is a conditioned data frame
Description
is_cnd_df()
checks whether a data frame is a conditioned data frame, i.e.
of class cnd_df
.
Usage
is_cnd_df(dat)
Arguments
dat |
A data frame. |
Value
TRUE
if dat
is a conditioned data frame (class cnd_df
),
otherwise FALSE
.
See Also
new_cnd_df()
, get_cnd_df_cnd()
, get_cnd_df_cnd_sum()
,
rm_cnd_df()
.
Identify CT mappable terms
Description
is_ct_mappable()
returns a logical vector indicating whether each element
of x
is found in the from
values used for controlled terminology recoding.
Empty strings (blanks) and NA
values are treated specially and are
considered mappable terms, even though they might not be.
This function is useful for checking in advance which terms in a vector can be recoded given a specified controlled terminology mapping.
Usage
is_ct_mappable(x, from)
Arguments
x |
A character vector of terms to be evaluated for recoding. |
from |
A character vector of controlled terminology terms that |
Value
A logical vector of the same length as x
, where TRUE
indicates the
corresponding term in x
is found in from
, and FALSE
otherwise.
This function is used to check if a –DTC variable is in ISO8601 format
Description
This function is used to check if a –DTC variable is in ISO8601 format
Usage
is_iso8601(dtc_var)
Arguments
dtc_var |
A vector of the date and time values |
Value
A logical value indicating if input is in ISO8601 format
Is it a –SEQ variable name
Description
is_seq_name()
returns which variable names end in "SEQ"
.
Usage
is_seq_name(x)
Arguments
x |
A character vector. |
Value
A logical vector.
Format as a ISO8601 month
Description
iso8601_mon()
converts a character vector whose values represent numeric
or abbreviated month names to zero-padded numeric months.
Usage
iso8601_mon(x)
Arguments
x |
A character vector. |
Value
A character vector.
Convert NA to "-"
Description
iso8601_na()
takes a character vector and converts NA
values to "-"
.
Usage
iso8601_na(x)
Arguments
x |
A character vector. |
Value
A character vector.
Format as ISO8601 seconds
Description
iso8601_sec()
converts a character vector whose values represent seconds.
Usage
iso8601_sec(x)
Arguments
x |
A character vector. |
Value
A character vector.
Truncate a partial ISO8601 date-time
Description
iso8601_truncate()
converts a character vector of ISO8601 dates, times or
date-times that might be partial and truncates the format by removing those
missing components.
Usage
iso8601_truncate(x, empty_as_na = TRUE)
Arguments
x |
A character vector. |
Value
A character vector.
Format as a ISO8601 two-digit number
Description
iso8601_two_digits()
converts a single digit or two digit number into a
two digit, 0-padded, number. Failing to parse the input as a two digit number
results in NA
.
Usage
iso8601_two_digits(x)
Arguments
x |
A character vector. |
Value
A character vector of the same size as x
.
Format as a ISO8601 four-digit year
Description
iso8601_year()
converts a character vector whose values represent years to
four-digit years.
Usage
iso8601_year(x, cutoff_2000 = 68L)
Arguments
x |
A character vector. |
cutoff_2000 |
A non-negative integer value. Two-digit years smaller or
equal to |
Value
A character vector.
Regex for months' abbreviations
Description
months_abb_regex()
generates a regex that matches month abbreviations. For
finer control, the case can be specified with parameter case
.
Usage
months_abb_regex(x = month.abb, case = c("any", "upper", "lower", "title"))
Arguments
x |
A character vector of three-letter month abbreviations. Default is
|
case |
A string scalar: |
Value
A regex as a string.
Mutate method for conditioned data frames
Description
mutate.cnd_df()
is an S3 method to be dispatched by mutate
generic on conditioned data frames. This function implements a conditional
mutate by only changing rows for which the condition stored in the
conditioned data frame is TRUE
.
Usage
## S3 method for class 'cnd_df'
mutate(
.data,
...,
.by = NULL,
.keep = c("all", "used", "unused", "none"),
.before = NULL,
.after = NULL
)
Arguments
.data |
A conditioned data frame. |
... |
< The value can be:
|
.by |
Not used when |
.keep |
Control which columns from
|
.before |
Not used, use |
.after |
Control where new columns should appear, i.e. after which columns. |
Value
A conditioned data frame, meaning a tibble with mutated values.
Create a data frame with filtering tags
Description
new_cnd_df()
creates a conditioned data frame, classed cnd_df
, meaning
that this function extends the data frame passed as argument by storing a
logical vector cnd
(as attribute) that marks rows for posterior conditional
transformation by methods that support conditioned data frames.
Usage
new_cnd_df(dat, cnd, .warn = TRUE)
Arguments
dat |
A data frame. |
cnd |
A logical vector. Length must match the number of rows in |
.warn |
Whether to warn about creating a new conditioned data frame
in case that |
Value
A data frame dat
with the additional class "cnd_df"
and the
following attributes:
-
cnd
: The logical vector passed as argumentcnd
:TRUE
values mark rows indat
to be used for transformations; rows marked withFALSE
are not transformed; andNA
mark rows whose transformations are to be applied resulting inNA
. -
cnd_sum
: An integer vector of three elements providing the sum ofTRUE
,FALSE
andNA
values incnd
, respectively.
See Also
is_cnd_df()
, get_cnd_df_cnd()
, get_cnd_df_cnd_sum()
,
rm_cnd_df()
.
Calculate Reference dates in ISO8601 character format.
Description
Derive RFSTDTC, RFENDTC, RFXENDTC, RFXSTDTC, etc. based on the input dates and time.
Usage
oak_cal_ref_dates(
ds_in = dm,
der_var,
min_max = "min",
ref_date_config_df,
raw_source
)
Arguments
ds_in |
Data frame. DM domain. |
der_var |
Character string. The reference date to be derived. |
min_max |
Minimum or Maximum date to be calculated based on the input. Default set to Minimum. Values should be min or max. |
ref_date_config_df |
Data frame which has the details of the variables to be used for the calculation of reference dates. Should have columns listed below: raw_dataset_name : Name of the raw dataset. date_var : Date variable name from the raw dataset. time_var : Time variable name from the raw dataset. dformat : Format of the date collected in raw data. tformat: Format of the time collected in raw data. sdtm_var_name : Reference variable name. |
raw_source |
List contains all the raw datasets. |
Details
Populate Reference date variables in demographic domain in ISO8601 character format.
Value
DM data frame with the reference dates populated.
Examples
dm <- tibble::tribble(
~patient_number, ~USUBJID, ~SUBJID, ~SEX,
"001", "XXXX-001", "001", "F",
"002", "XXXX-002", "002", "M",
"003", "XXXX-003", "003", "M"
)
ref_date_config_df <- tibble::tribble(
~raw_dataset_name, ~date_var, ~time_var, ~dformat, ~tformat, ~sdtm_var_name,
"ex1_raw", "EX_ST_DT1", "EX_ST_TM1", "dd-mm-yyyy", "H:M", "RFSTDTC",
"ex2_raw", "EX_ST_DT2", NA, "dd-mmm-yyyy", NA, "RFSTDTC",
"ex1_raw", "EX_EN_DT1", "EX_EN_TM1", "dd-mm-yyyy", "H:M", "RFENDTC",
"ex2_raw", "EX_ST_DT2", NA, "dd-mmm-yyyy", NA, "RFENDTC"
)
ex1_raw <- tibble::tribble(
~patient_number, ~EX_ST_DT1, ~EX_EN_DT1, ~EX_ST_TM1, ~EX_EN_TM1,
"001", "15-05-2023", "15-05-2023", "10:20", "11:00",
"001", "15-05-2023", "15-05-2023", "9:15", "10:00",
"001", "15-05-2023", "15-05-2023", "8:19", "09:00",
"002", "02-10-2023", "02-10-2023", "UNK:UNK", NA,
"002", "03-11-2023", "03-11-2023", "11:19", NA
)
ex2_raw <- tibble::tribble(
~patient_number, ~EX_ST_DT2,
"001", "11-JUN-2023",
"002", "24-OCT-2023",
"002", "25-JUL-2023",
"002", "30-OCT-2023",
"002", "UNK-OCT-2023"
)
raw_source <- list(ex1_raw = ex1_raw, ex2_raw = ex2_raw)
dm_df <- oak_cal_ref_dates(dm,
der_var = "RFSTDTC",
min_max = "min",
ref_date_config_df = ref_date_config_df,
raw_source
)
Raw dataset keys
Description
oak_id_vars()
is a helper function providing the variable (column) names to
be regarded as keys in tibbles representing raw
datasets. By default, the set of names is
oak_id, raw_source, and patient_number. Extra variable names may be
indicated and passed in extra_vars
which are appended to the default names.
Usage
oak_id_vars(extra_vars = NULL)
Arguments
extra_vars |
A character vector of extra column names to be appended to the default names: oak_id, raw_source, and patient_number. |
Value
A character vector of column names to be regarded as keys in raw datasets.
Parse a date, time, or date-time
Description
parse_dttm()
extracts date and time components. parse_dttm()
wraps around
parse_dttm_()
, which is not vectorized over fmt
.
Usage
parse_dttm_(
dttm,
fmt,
fmt_c = fmt_cmp(),
na = NULL,
sec_na = na,
min_na = na,
hour_na = na,
mday_na = na,
mon_na = na,
year_na = na
)
parse_dttm(
dttm,
fmt,
fmt_c = fmt_cmp(),
na = NULL,
sec_na = na,
min_na = na,
hour_na = na,
mday_na = na,
mon_na = na,
year_na = na
)
Arguments
dttm |
A character vector of dates, times or date-times. |
fmt |
In the case of
|
na , sec_na , min_na , hour_na , mday_na , mon_na , year_na |
A character vector of alternative values to allow during matching. This can be used to indicate different forms of missing values to be found during the parsing date-time strings. |
Value
A character matrix of six columns: "year"
, "mon"
, "mday"
,
"hour"
, "min"
and "sec"
. Each row corresponds to an element in
dttm
. Each element of the matrix is the parsed date/time component.
Parse a date/time format
Description
parse_dttm_fmt()
parses a date/time formats, meaning it will try to parse
the components of the format fmt
that refer to date/time components.
parse_dttm_fmt_()
is similar to parse_dttm_fmt()
but is not vectorized
over fmt
.
Usage
parse_dttm_fmt_(fmt, pattern)
parse_dttm_fmt(fmt, patterns = fmt_cmp())
Arguments
fmt |
A format string (scalar) to be parsed by |
pattern , patterns |
A string (in the case of |
Value
A tibble of seven columns:
-
fmt_c
: date/time format component. Values are either"year"
,"mon"
,"mday"
,"hour"
,"min"
,"sec"
, orNA
. -
pat
: Regexp used to parse the date/time component. -
cap
: The captured substring from the format. -
start
: Start position in the format string for this capture. -
end
: End position in the format string for this capture. -
len
: Length of the capture (number of chars). -
ord
: Ordinal of this date/time component in the format string.
Each row is for either a date/time format component or a "delimiter" string or pattern in-between format components.
Retrieve date/time parsing problems
Description
problems()
is a companion helper function to create_iso8601()
. It
retrieves ISO 8601 parsing problems from an object of class iso8601, which is
create_iso8601()
's return value and that might contain a problems
attribute in case of parsing failures. problems()
is a helper function that
provides easy access to these parsing problems.
Usage
problems(x = .Last.value)
Arguments
x |
An object of class iso8601, as typically obtained from a call to
|
Value
If there are no parsing problems in x
, then the returned value is
NULL
; otherwise, a tibble of parsing failures
is returned. Each row corresponds to a parsing problem. There will be a
first column named ..i
indicating the position(s) in the inputs to the
create_iso8601()
call that resulted in failures; remaining columns
correspond to the original input values passed on to create_iso8601()
,
with columns being automatically named ..var1
, ..var2
, and so on, if
the inputs to create_iso8601()
were unnamed, otherwise, the original
variable names are used instead.
Examples
dates <-
c(
"2020-01-01",
"2020-02-11",
"2020-01-06",
"2020-0921",
"2020/10/30",
"2020-12-05",
"20231225"
)
# By inspecting the problematic dates it can be understood that
# the `.format` parameter needs to updated to include other variations.
iso8601_dttm <- create_iso8601(dates, .format = "y-m-d")
problems(iso8601_dttm)
# Including more parsing formats addresses the previous problems
formats <- c("y-m-d", "y-md", "y/m/d", "ymd")
iso8601_dttm2 <- create_iso8601(dates, .format = list(formats))
# So now `problems()` returns `NULL` because there are no more parsing issues.
problems(iso8601_dttm2)
# If you pass named arguments when calling `create_iso8601()` then they will
# be used to create the problems object.
iso8601_dttm3 <- create_iso8601(date = dates, .format = "y-m-d")
problems(iso8601_dttm3)
Parallel sequence generation
Description
pseq()
is similar to seq()
but conveniently accepts integer vectors as
inputs to from
and to
, allowing for parallel generation of sequences.
The result is the union of the generated sequences.
Usage
pseq(from, to)
Arguments
from |
An integer vector. The starting value(s) of the sequence(s). |
to |
An integer vector. The ending value(s) of the sequence(s). |
Value
An integer vector.
Read in a controlled terminology
Description
read_ct_spec()
imports a controlled terminology specification data set as a
tibble.
Usage
read_ct_spec(file = cli::cli_abort("`file` must be specified"))
Arguments
file |
A path to a file containing a controlled terminology specification data set. The following are expected of this file:
|
Value
A tibble with a controlled terminology specification.
Examples
# Get the local path to one of the controlled terminology example files.
path <- ct_spec_example("ct-01-cm")
# Import it to R.
read_ct_spec(file = path)
Read an example controlled terminology specification
Description
read_ct_spec_example()
imports one of the bundled controlled terminology
specification data sets as a tibble into R.
Usage
read_ct_spec_example(example)
Arguments
example |
The file name of a controlled terminology data set bundled
with |
Value
A tibble with a controlled terminology specification data set, or a character vector of example file names.
Examples
# Leave the `example` parameter as missing for available example files.
read_ct_spec_example()
# Read an example controlled terminology spec file.
read_ct_spec_example("ct-01-cm.csv")
# You may omit the file extension.
read_ct_spec_example("ct-01-cm")
Read an example SDTM domain
Description
read_domain_example()
imports one of the bundled SDTM domain examples
as a tibble into R. See domain_example()
for
possible choices.
Usage
read_domain_example(example)
Arguments
example |
The name of SDTM domain example, e.g. |
Value
A tibble with an SDTM domain dataset, or a character vector of example file names.
See Also
Examples
# Leave the `example` parameter as missing for available example files.
read_domain_example()
# Read the example Concomitant Medication domain.
read_domain_example("cm")
# Read the example Adverse Events domain.
read_domain_example("ae")
Recode values
Description
recode()
recodes values in x
by matching elements in from
onto values
in to
.
Usage
recode(x, from = unique(na.omit(x)), to = from, .no_match = x, .na = NA)
Arguments
x |
An atomic vector of values are to be recoded. |
from |
A vector of values to be matched in |
to |
A vector of values to be used as replacement for values in |
.no_match |
Value to be used as replacement when cases in |
.na |
Value to be used to recode missing values. |
Value
A vector of recoded values.
regmatches()
with NA
Description
reg_matches()
is a thin wrapper around regmatches()
that returns
NA
instead of character(0)
when matching fails.
Usage
reg_matches(x, m, invert = FALSE)
Arguments
x |
A character vector. |
m |
An object with match data. |
invert |
A logical scalar. If |
Value
A list of character vectors with the matched substrings, or NA
if
matching failed.
Utility function to assemble a regex of alternative patterns
Description
regex_or()
takes a set of patterns and binds them with the Or ("|"
)
pattern for an easy regex of alternative patterns.
Usage
regex_or(x, .open = FALSE, .close = FALSE)
Arguments
x |
A character vector of alternative patterns. |
.open |
Whether the resulting regex should start with |
.close |
Whether the resulting regex should end with |
Value
A character scalar of the resulting regex.
Remove the cnd_df
class from a data frame
Description
This function removes the cnd_df
class, along with its attributes, if
applicable.
Usage
rm_cnd_df(dat)
Arguments
dat |
A data frame. |
Value
The input dat
without the cnd_df
class and associated attributes.
See Also
new_cnd_df()
, is_cnd_df()
, get_cnd_df_cnd()
,
get_cnd_df_cnd_sum()
.
Subject-level key variables
Description
sbj_vars()
returns the set of variable names that uniquely define
a subject.
Usage
sbj_vars()
Value
A character vector of variable names.
Examples
sbj_vars()
Derive an SDTM variable
Description
sdtm_assign()
is an internal function packing the same functionality as
assign_no_ct()
and assign_ct()
together but aimed at developers only.
As a user please use either assign_no_ct()
or assign_ct()
.
Usage
sdtm_assign(
tgt_dat = NULL,
tgt_var,
raw_dat,
raw_var,
ct_spec = NULL,
ct_clst = NULL,
id_vars = oak_id_vars()
)
Arguments
tgt_dat |
Target dataset: a data frame to be merged against |
tgt_var |
The target SDTM variable: a single string indicating the name of variable to be derived. |
raw_dat |
The raw dataset (dataframe); must include the
variables passed in |
raw_var |
The raw variable: a single string indicating the name of the
raw variable in |
ct_spec |
Study controlled terminology specification: a dataframe with a
minimal set of columns, see |
ct_clst |
A codelist code indicating which subset of the controlled
terminology to apply in the derivation. This parameter is optional, if left
as |
id_vars |
Key variables to be used in the join between the raw dataset
( |
Value
The returned data set depends on the value of tgt_dat
:
If no target dataset is supplied, meaning that
tgt_dat
defaults toNULL
, then the returned data set israw_dat
, selected for the variables indicated inid_vars
, and a new extra column: the derived variable, as indicated intgt_var
.If the target dataset is provided, then it is merged with the raw data set
raw_dat
by the variables indicated inid_vars
, with a new column: the derived variable, as indicated intgt_var
.
Derive an SDTM variable with a hardcoded value
Description
sdtm_hardcode()
is an internal function packing the same functionality as
hardcode_no_ct()
and hardcode_ct()
together but aimed at developers only.
As a user please use either hardcode_no_ct()
or hardcode_ct()
.
Usage
sdtm_hardcode(
tgt_dat = NULL,
tgt_var,
raw_dat,
raw_var,
tgt_val,
ct_spec = NULL,
ct_clst = NULL,
id_vars = oak_id_vars()
)
Arguments
tgt_dat |
Target dataset: a data frame to be merged against |
tgt_var |
The target SDTM variable: a single string indicating the name of variable to be derived. |
raw_dat |
The raw dataset (dataframe); must include the
variables passed in |
raw_var |
The raw variable: a single string indicating the name of the
raw variable in |
tgt_val |
The target SDTM value to be hardcoded into the variable
indicated in |
ct_spec |
Study controlled terminology specification: a dataframe with a
minimal set of columns, see |
ct_clst |
A codelist code indicating which subset of the controlled
terminology to apply in the derivation. This parameter is optional, if left
as |
id_vars |
Key variables to be used in the join between the raw dataset
( |
Value
The returned data set depends on the value of tgt_dat
:
If no target dataset is supplied, meaning that
tgt_dat
defaults toNULL
, then the returned data set israw_dat
, selected for the variables indicated inid_vars
, and a new extra column: the derived variable, as indicated intgt_var
.If the target dataset is provided, then it is merged with the raw data set
raw_dat
by the variables indicated inid_vars
, with a new column: the derived variable, as indicated intgt_var
.
SDTM join
Description
sdtm_join()
is a special join between a raw data set and a target data
set. This function supports conditioned data frames.
Usage
sdtm_join(raw_dat, tgt_dat = NULL, id_vars = oak_id_vars())
Arguments
raw_dat |
The raw dataset: a dataframe or a conditioned data frame. Must
include the variables passed in |
tgt_dat |
Target dataset: a data frame or a conditioned data frame to be
merged against |
id_vars |
Key variables to be used in the join between the raw dataset
( |
Value
A data frame, or a conditioned data frame if, at least, one of the input data sets is a conditioned data frame.
Generate case insensitive regexps
Description
str_to_anycase()
takes a character vector of word strings as input, and
generates regular expressions that express that match in any case.
Usage
str_to_anycase(x)
Arguments
x |
A character vector of strings consisting of word characters. |
Value
A character vector.
Conditioned tibble header print method
Description
Conditioned tibble header print method. This S3 method adds an extra line
in the header of a tibble that indicates the tibble is a conditioned tibble
(# Cond. tbl:
) followed by the tally of the conditioning vector: number
of TRUE, FALSE and NA values: e.g., 1/1/1
.
Usage
## S3 method for class 'cnd_df'
tbl_sum(x, ...)
Arguments
x |
A conditioned tibble of class |
... |
Additional arguments passed to the default print method. |
Value
A character vector with header values of the conditioned data frame.
See Also
ctl_new_rowid_pillar.cnd_df()
.
Examples
df <- data.frame(x = c(1L, NA_integer_, 3L))
(cnd_df <- condition_add(dat = df, x >= 2L))
pillar::tbl_sum(cnd_df)
Convert two-digit to four-digit years
Description
yy_to_yyyy()
converts two-digit years to four-digit years.
Usage
yy_to_yyyy(x, cutoff_2000 = 68L)
Arguments
x |
An integer vector of years. |
cutoff_2000 |
An integer value. Two-digit years smaller or equal to
|
Value
An integer vector.
Convert an integer to a zero-padded character vector
Description
zero_pad_whole_number()
takes non-negative integer values and converts
them to character with zero padding. Negative numbers and numbers greater
than the width specified by the number of digits n
are converted to NA
.
Usage
zero_pad_whole_number(x, n = 2L)
Arguments
x |
An integer vector. |
n |
Number of digits in the output, including zero padding. |
Value
A character vector.