Title: | Helper Files for the CTN-0094 Relational Database |
Version: | 1.0.4 |
Date: | 2023-11-21 |
Description: | Engineered features and "helper" functions ancillary to the 'public.ctn0094data' package, extending this package for ease of use (see https://CRAN.R-project.org/package=public.ctn0094data). This 'public.ctn0094data' package contains harmonized datasets from some of the National Institute of Drug Abuse's Clinical Trials Network (NIDA's CTN) projects. Specifically, the CTN-0094 project is to harmonize and de-identify clinical trials data from the CTN-0027, CTN-0030, and CTN-51 studies for opioid use disorder. This current version is built from 'public.ctn0094data' v. 1.0.6. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Language: | en-US |
LazyData: | true |
Depends: | R (≥ 2.10), public.ctn0094data |
Imports: | dplyr, magrittr, purrr, tibble, tidyr, utils |
Suggests: | knitr, kableExtra, rmarkdown, scales, stringr |
RoxygenNote: | 7.2.3 |
VignetteBuilder: | knitr |
URL: | https://ctn-0094.github.io/public.ctn0094extra/ |
BugReports: | https://github.com/CTN-0094/public.ctn0094extra/issues |
NeedsCompilation: | no |
Packaged: | 2023-11-21 21:21:53 UTC; gabrielodom |
Author: | Gabriel Odom |
Maintainer: | Gabriel Odom <gabriel.j.odom@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-11-22 10:00:02 UTC |
Create a Subject History Table by Protocol
Description
Create a Subject History Table by Protocol
Usage
CreateCTN30ProtocolHistory(
randCTN30_df,
start_int = -30,
phase1Len_int = 98,
phase2Len_int = 168
)
Arguments
randCTN30_df |
A data frame of subject randomization days for CTN-0030.
This data will have columns for |
start_int |
When should the protocol timeline start? Defaults to 30 days before consent (allowing for self-reported drug use via Timeline Follow- Back data). |
phase1Len_int |
When should Phase I end per protocol? Defaults to 98 days (14 weeks). |
phase2Len_int |
When should Phase II end per protocol? Defaults to 168 days (24 weeks). |
Details
We may want to perform SQL-like operations on a set of tables. This
data table will form the "backbone" for future join operations. It creates
one record per person in a study for each day in that study, and it is
designed to work with subject-specific two-arm protocols (where Phase I
and Phase II treatments could start on different days for each subject).
For subjects who consented to treatment but were never randomized, we use
an "intent to treat" philosophy and assign them an empty protocol timeline
starting on the day of consent (day 0) until phase1Len_int
.
NOTE: We expect this function to be used specifically to create a
potential visit backbone for CTN-0030. For studies with fixed protocol
lengths, please use CreateSubjectHistory()
instead.
Value
A tibble with columns who
and when
. Each subject will
have one row for each day in the study range.
Examples
# Subject A started Phase I on day 2 and was switched from treatment
# Phase I to Phase II on day 45. Subject B started Phase I on day 6 and
# was never switched to Phase II (either because they dropped out of the
# study or because the treatment given to them in Phase I worked).
rand_df <- data.frame(
who = c("A", "A", "B", "B"),
which = c( 1, 2, 1, 2),
when = c( 2, 45, 6, NA)
)
# Based on this example data above, we expect to see potential contact
# days per the protocol for subject A to range from -30 to 45 + 168.
# For subject B, we expect this range to be from -30 to 6 + 98
CreateCTN30ProtocolHistory(rand_df)
Create a Subject History Table
Description
Create a Subject History Table
Usage
CreateHistory(
rawData_ls,
personsTable = "everybody",
firstDay_int = NULL,
lastDay_int = NULL
)
Arguments
rawData_ls |
a list of tibbles returned by |
personsTable |
What is the name of the data table that contains all the
subject IDs? Defaults to |
firstDay_int |
(OPTIONAL) What should be the first "day" counter for all subjects? This may be beneficial to set if you wish to have the History ignore all days before a certain point. |
lastDay_int |
(OPTIONAL) What should be the last "day" counter for all subjects? |
Details
We may want to perform SQL-like operations on a set of tables. This data table will form the "backbone" for future join operations. It creates one record per person for each day in the study. The default behavior is to set the date range to the smallest observed value for "when" across all data tables to the largest observed value for "when" across all tables, inclusive. Necessarily, this is very sensitive to outliers or coding errors (for instance, if one subject has a "first day" 200 days before Study Day 0, then this History table will include days -200 to -1 for ALL subjects, regardless of any recorded contact in this period).
If this behavior is not desirable, you must specify a study range. For
example, you may have a short recruitment period followed by a 6-month
clinical trial. In this instance, you may want to ignore any data more
than 30 days prior to Study Day 0 for each subject or more than a few
weeks after the end of the trial. Thus, you would set
firstDay_int = -30L
and lastDay_int = 6 * 30 + 14
.
Value
A tibble with columns who
and when
. Each subject will
have one row for each day in the study range.
Examples
data_ls <- loadRawData(c("tlfb", "all_drugs", "everybody"))
CreateHistory(rawData_ls = data_ls, firstDay_int = -30)
Create a Subject History Table by Protocol
Description
Create a Subject History Table by Protocol
Usage
CreateProtocolHistory(start_vec, end_vec, persons_df = "everybody")
Arguments
start_vec |
a named integer vector with the number of days before subject consent when the subject history should start, per protocol |
end_vec |
a named integer vector with the length of the study phase of interest, per protocol |
persons_df |
Either the name of the data frame that contains all the
subject IDs and their clinical trials (which defaults to
|
Details
We may want to perform SQL-like operations on a set of tables. This
data table will form the "backbone" for future join operations. It creates
one record per person in each study for each day in those studies (when
persons_df
is set to "everybody"
), or it creates one record
per person contained in the table persons_df
for each day in those
studies. The default behavior is to use the supplied "everybody"
table for all consenting subjects in the CTN-0027, CTN-0030, and CTN-0051
clinical trials. However, users may only care about a smaller subset of
these patients, so a subset of the "everybody"
data frame can be
supplied to the persons_df
argument if desired.
NOTE: this function is only appropriate for trial with fixed start and
end days (such as CTN-0027 or CTN-0051). For studies with variable-length
(i.e., subject-specific) protocol lengths, please use
CreateSubjectProtocolHistory()
instead.
Value
A tibble with columns who
, project
, and when
.
Each subject will have one row for each day in the study range.
Examples
start_int <- c(`27` = -30L, `51` = -30L)
end_int <- c(`27` = 168L, `51` = 168L)
CreateProtocolHistory(
start_vec = start_int, end_vec = end_int
)
Code Empty Visit Values as "Missing" as Appropriate
Description
Given a complete timeline of potential subject visits per study protocol, mark certain visits as "Missing"
Usage
MarkMissing(timeline_df, windowWidth = 7, daysGrace = 0)
Arguments
timeline_df |
A data frame with columns |
windowWidth |
How many days are expected between clinic visits? Defaults to 7, representing weekly clinic visits. |
daysGrace |
How many days late are subjects allowed to be for their weekly visit. Defaults to 0. Under this default behavior with weekly visits, a subject who visits the clinic on days 8 and 14 instead of days 7 and 14 will have a missing visit imputed for day 7. |
Details
Most definitions of opioid use disorder treatment success or failure partially depend on a tally of the number of missed clinic visits. For example, a definition of early treatment failure could be "3 or more UDS positive for non-study opioids or missing visits within the first 28 days of randomization". Given a table of subject visits by day over the entire protocol timeline, this function will estimate when each subject missed a clinic visit (unfortunately, missed visits can often be improperly recorded in the patient logs; if such information is complete, using this function is unnecessary).
This estimation is conducted as follows: (1) first, for each subject, a
regular grid of days is spread from the randomization day to the end of
treatment by windowWidth
; (2) next, we iterate over each day in
this regular grid, and at each step we check the next windowWidth
plus daysGrace
days for a visit in that range, and we mark the day
at the end of the window as "missing" if there are no visits in that
range; (3) and finally, we combine these subject-specific data tables.
Value
A copy of timeline_df
with the column visitYM
added.
This column is a copy of the visit
column with additional cells
marking if a subject should have attended the clinic but did not.
Examples
# TO DO
Mark Use Day by Subject
Description
Mark Use Day by Subject
Usage
MarkUse(
targetDrugs_char,
drugs_df = NULL,
reportSource = c("TFB", "UDSAB", "UDS"),
retainEmptyRows = FALSE
)
Arguments
targetDrugs_char |
A character vector including which drugs should be counted against the subject |
drugs_df |
A data frame with columns |
reportSource |
A character vector matching the source of the reported
drug use. The options must be from Timeline Followback ( |
retainEmptyRows |
A logical flag to force rows for participants who did
not have UDS positive for the substances listed in |
Details
This function is basically just a fancy wrapper around some dplyr code. We just don't want the user to have to 1) know dplyr, or 2) write the code themselves.
Value
A modification of the drugs_df
data set: the columns are
"who"
, "when"
, and "source"
; each row corresponds
to one use day per subject per use source (if, for instance, there is drug
use for a particular day recorded in both TFB and UDS, then that day will
have two rows in the resulting data set).
Examples
MarkUse(c("Crack", "Pcp", "Opioid"))
Derived Induction Delay Data
Description
This data set measures the number of days from a participant's randomization until they received their first dose of study drug.
Usage
data(derived_inductDelay)
Format
A tibble with 2,492 rows and 3 variables:
- who
Patient ID
- treatment
What treatment is prescribed: "Inpatient BUP", "Inpatient NR-NTX", "Methadone", "Outpatient BUP", "Outpatient BUP + EMM", "Outpatient BUP + SMM"
- inductDelay
How many days after being assigned to a treatment arm did the participant receive their first dose of study drug? Missing values indicate that the subject never received their first dose.
Details
This data set is a derived data set. The inputs are the treatment
and randomization data sets. The code to calculate the induction delay is
given in scripts/create_inductDelay_20210729.R
. The treatment arm is
also included in this data set because the induction delay depends on the
type of treatment. For example, inpatient treatment arms may have
different protocols than outpatient treatment arms for determining how
many days the subject must wait after randomization before treatment.
Derived Patient Race and Ethnicity Data
Description
Summarize the patients' self-reported race and ethnicity into four groups: "Non-Hispanic White", "Non-Hispanic Black", "Hispanic", and "Other".
Usage
data(derived_raceEthnicity)
Format
A tibble with 3,560 rows and columns:
- who
Patient ID
- race
Self-reported race. Options are "White", "Black", "Other", and "Refused/missing".
- is_hispanic
Self-reported Hispanic/Latino ethnicity.
- race_ethnicity
Derived composite marker of race and ethnicity.
Details
This data set contains a summary of self-reported race and ethnicity
in four levels. Of note, the "Other" category includes 2 participants who
marked "no" to the question of Hispanic/Latino ethnicity but refused to
answer their race. Also, the "Other" category includes 33 participants for
whom all information about race and ethnicity is missing. This data set is
a derived data set; the script used to create it is
"scripts/create_raceEthnicity_20220816.R"
.
Imputed Patient Visit Data
Description
Given a series of weekly clinic visits described per protocol, this data marks subjects as present or missing.
Usage
data(derived_visitImputed)
Format
A tibble with 87,891 rows and columns:
- who
Patient ID
- when
Study day
- visitImputed
-
Marked as
"Present"
if the subject visited the clinic on that day, or"Missing"
if the subject did not visit the clinic on a day we would have expected them to (based on regular weekly visits).
Details
This contains planned visits. Not all appointments were kept. We
indicate if an appointment was kept on a certain day by marking the
subject as "Present"
on that day. If the subject goes more than 7 days
without a clinic visit, we mark the subject as "Missing"
on days that
are multiples of 7 from the randomization day. For subjects without a
randomization day, weekly visits after day of consent are marked as
"Missing"
instead. This data set is a derived data set; the script used
to create it is "scripts/create_visitImputed_20210909.R"
.
NOTE: because our window is a strict weekly window, a subject who shows up for their clinic visit one or more days late will still be marked as missing on the day they were supposed to appear. This means that some subjects will be marked as having missed their weekly clinic visit on one day, but be present in the clinic the next.
Patient UDS Opioid Weekly Pattern Data
Description
Show the pattern of positive, negative, and missing urine drug screen (UDS) results for opioids by patient over the study protocol. Study "Week 1" starts the day after randomization (for patients who were randomized) or the day after signed consent (for patients who were not randomized).
Usage
data(derived_weeklyOpioidPattern)
Format
A tibble with 3,560 rows and columns:
- who
Patient ID
- startWeek
The start of the "word" is how many weeks before randomization? This should be -4 for most people, but can be as high as -8. Note that week 0 is included, so a value of -4 represents data in the 5th week before randomization; that is, 29-35 days prior to randomization. Most subjects have timeline follow-back data 30 days before consent, and delays between consent and randomization are common.
- randWeek1
Week of first randomization (1, if randomized; NA if not)
- randWeek2
Week of second randomization (only for CTN-0030)
- endWeek
The end of the "word" is how many weeks after randomization? This depends on the study protocol, but should be close to 16 or 24 weeks for most subjects.
- Baseline
A character string of symbols from
startWeek
to the last week beforerandWeek1
. Symbols are as defined inPhase_1
.- Phase_1
A character string of symbols from
randWeek1
toendWeek
(for subjects from CTN-0027 and CTN-0051) or the last week beforerandWeek2
(for CTN-0030). These symbols are:"+"
= the subject's UDS showed presence of an opioid in that week;"-"
= the subject's UDS showed absence of an opioid in that week;"*"
= two or more UDS in the same week, where >= 1 UDS was positive and >= 1 UDS was negative;"o"
if the subject was supposed to visit a clinic per the study protocol but did not visit or did not complete a UDS screen; and"_"
to represent weeks wherein the subject was not scheduled to provide UDS.- Phase_2
A character string of symbols from
randWeek2
toendWeek
for subjects from CTN-0030 only. Symbols are as defined inPhase_1
.
Details
This data set contains a "word" describing weekly non-study opioid
use patterns as measured by UDS. Based on the substances screened in this
data set, our list of substances classified as an opioid is: Oxymorphone,
Opium, Fentanyl, Hydromorphone, Codeine, Suboxone, Tramadol, Morphine,
Buprenorphine, Hydrocodone, Opioid, Methadone, Oxycodone, and Heroin. UDS
results indicating the presence of one or more of these substances will be
marked with "+"
for that week. UDS results negative for these
substances will be marked with "-"
. If, by study protocol, the
subject was supposed to visit a clinic to complete a UDS but they did not,
then the visit for that week will be marked with "o"
. If, by study
protocol, the subject was NOT supposed to visit a clinic to complete a UDS
for a given week, then visit for that week will be marked with "_"
.
This data set is a derived data set; the script used to create it is
"scripts/create_weeklyOpioidPattern_20211123.R"
.
NOTE: because our window is a strict weekly window, a subject could have
both positive and negative UDS within the same 7-day period. In this case,
the week is marked as "*"
. Depending on the definition of treatment
failure or treatment success desired, these dual-status indicators can be
re-coded to "+"
or "-"
as appropriate. Also, some studies
include a baseline UDS in the week of consent (before the subject was
randomized to a treatment arm). Some subjects were randomized in the same
week as the week wherein consent was signed, while other subjects were
randomized weeks later. We represent the weeks before consent and the
variable number of weeks between consent and randomization with "_"
if there were no baseline UDS visits in those weeks (this represents the
pre-study period). For subjects who were never randomized, the weeks
before consent are also marked as pre-study ("_"
), and all
subsequent protocol weeks are marked as missing.
Patient TLFB Opioid Weekly Pattern Data
Description
Show the pattern of positive and negative patient self-report (timeline follow-back, TLFB) results for opioids by patient over the study protocol. Study "Week 1" starts the day after randomization (for patients who were randomized) or the day after signed consent (for patients who were not randomized).
Usage
data(derived_weeklyTLFBPattern)
Format
A tibble with 3,560 rows and columns:
- who
Patient ID
- startWeek
The start of the "word" is how many weeks before randomization? This should be -4 for most people, but can be as high as -8. Note that week 0 is included, so a value of -4 represents data in the 5th week before randomization; that is, 29-35 days prior to randomization. Most subjects have timeline follow-back data 30 days before consent, and delays between consent and randomization are common.
- randWeek1
Week of first randomization (1, if randomized; NA if not)
- randWeek2
Week of second randomization (only for CTN-0030)
- endWeek
The end of the "word" is how many weeks after randomization? This depends on the study protocol, but should be close to 16 or 24 weeks for most subjects.
- Baseline
A character string of symbols from
startWeek
to the last week beforerandWeek1
. Symbols are as defined inPhase_1
.- Phase_1
A character string of symbols from
randWeek1
toendWeek
(for subjects from CTN-0027 and CTN-0051) or the last week beforerandWeek2
(for CTN-0030). These symbols are:"+"
= the subject reported the use of an opioid for two or more days in that week;"-"
= the subject did not report use of an opioid in that week;"*"
= the subject reported one day of opioid use in that week;"o"
if the subject was supposed to report TLFB but did not; and"_"
to represent weeks wherein the subject was not scheduled to provide TLFB (for example, more than 30 days before randomization).- Phase_2
A character string of symbols from
randWeek2
toendWeek
for subjects from CTN-0030 only. Symbols are as defined inPhase_1
.
Details
This data set contains a "word" describing weekly non-study opioid
use patterns as reported by the subject in TLFB. Based on the substances
reported by the subjects, our list of substances classified as an opioid
is: non-study Buprenorphine, non-study Methadone, heroin, and "opioids"
(which includes Oxymorphone, Opium, Fentanyl, Hydromorphone, Codeine,
Suboxone, Tramadol, Morphine, Hydrocodone, and Oxycodone). TLFB reporting
indicating the presence of two or more use days of these substances will
be marked with "+"
for that week. TLFB results positive for these
substances for 0 days will be marked with "-"
. TLFB results
positive for these substances for 1 day in the week will be marked with
"*"
for that week. This data set is a derived data set; the script
used to create it is
"scripts/create_weeklyTLFBPattern_20220511.R"
.
NOTE: some studies collected more TLFB data than others. Also, all times
are marked starting with the week of randomization. We represent the weeks
before randomization with "_"
if no TLFB data was collected. For
subjects who were never randomized, all subsequent protocol weeks are
marked as missing ("o"
).
Load Data Sets into a List
Description
Load Data Sets into a List
Usage
loadRawData(dataNames_char)
Arguments
dataNames_char |
Names of data sets to load |
Details
We may want to perform SQL-like operations on a set of tables without loading each table into R's Global Environment separately. This function loads these data sets into a self-destructing environment and then returns a named list of these data sets.
Value
Loads data sets specified into the current function environment for further evaluation (unused) and then returns these data sets as a named list
Examples
loadRawData(c("tlfb", "all_drugs"))