Title: | Rolling Entry Matching |
Version: | 2.0.4 |
Date: | 2025-04-15 |
Description: | Functions to perform propensity score matching on rolling entry interventions for which a suitable "entry" date is not observed for nonparticipants. For more details, please reference Witman et al. (2018) <doi:10.1111/1475-6773.13086>. |
License: | MIT + file LICENSE |
URL: | https://github.com/RTIInternational/rollmatch |
LazyData: | true |
Depends: | R (≥ 3.0.2) |
Imports: | dplyr (≥ 0.5.0), magrittr (≥ 1.5.0), stats |
Suggests: | testthat (≥ 1.0.2) |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-04-15 10:22:17 UTC; rchew |
Author: | Rob Chew [aut, cre], Kasey Jones [aut], Mahin Manley [aut], Allison Witman [res], Chris Beadles [res], Yiyan Liu [res], Ann Larson [res] |
Maintainer: | Rob Chew <rchew@rti.org> |
Repository: | CRAN |
Date/Publication: | 2025-04-15 18:30:02 UTC |
Add the balancing table to the final output
Description
Add the balancing table to the final output
Usage
add_balance_table(scored_data, vars, tm, id, combined_output, treat, matches)
Arguments
scored_data |
The dataframe from score_data() |
vars |
See rollmatch() |
tm |
See rollmatch() |
id |
See rollmatch() |
combined_output |
A list of output for the rollmatch package. See make_output |
treat |
See rollmatch() |
matches |
Dataframe containing the matches from comparison_pool |
Value
output
returns a list with the additional output:
balance |
The balancing table. |
Examples
## Not run:
data(package="rollmatch", "rem_synthdata_small")
reduced_data <- reduce_data(data = rem_synthdata_small, treat = "treat",
tm = "quarter", entry = "entry_q",
id = "indiv_id", lookback = 1)
fm <- as.formula(treat ~ qtr_pmt + yr_pmt + age)
vars <- all.vars(fm)
scored_data <- score_data(model_type = "logistic", match_on = "logit", fm = fm,
reduced_data = reduced_data, treat = "treat",
tm = "quarter", entry = "entry_q", id = "indiv_id")
comparison_pool <- compare_pool(scored_data, treat = "treat",
tm = "quarter", entry = "entry_q",
id = "indiv_id")
trimmed_pool <- trim_pool(alpha = .2, comparison_pool = comparison_pool,
scored_data = scored_data, treat = "treat",
tm = "quarter", standard_deviation = 'average')
matches <- create_matches(trimmed_pool = trimmed_pool, tm = "quarter",
num_matches = 3, replacement = TRUE)
matches <- add_matches_columns(matches)
combined_output <- make_output(scored_data = scored_data,
data = rem_synthdata_small,
matches = matches,
treat = "treat", tm = "quarter",
entry = "entry_q", id = "indiv_id", lookback = 1)
# Add balance table to the output
output <- add_balance_table(scored_data = scored_data, vars = vars,
tm = "quarter", id = "indiv_id",
combined_output = combined_output,
treat = "treat", matches = matches)
## End(Not run)
Create Additional Columns for the Matches Dataset
Description
This function takes a dataframe containing match information and adds additional columns to indicate the match rank, total matches for a given treatment ID, treatment weight, control matches, and control weight.
Usage
add_matches_columns(matches)
Arguments
matches |
Dataframe containing the matches from comparison_pool. Each row represents a match, and there should be columns for 'treat_id' and possibly 'control_id' if control matches are to be calculated. |
Value
A dataframe containing the original match information along with additional columns: 'match_rank', 'total_matches', 'treatment_weight', 'control_matches', and 'control_weight'.
Examples
print('See add_balance_table for full example')
Run checks on variable lookback
Description
Run checks on variable lookback
Usage
check_lookback(data, lookback, entry)
Arguments
lookback |
See rollmatch() |
Create a dataframe of comparisons between all treatment and control data.
Description
Create a dataframe of comparisons between all treatment and control data.
Usage
compare_pool(scored_data, treat, tm, entry, id)
Arguments
scored_data |
The dataframe from score_data() |
tm |
See rollmatch() |
entry |
See rollmatch() |
id |
See rollmatch() |
Value
Dataframe comparing all treatment and control data
Examples
print('See add_balance_table for full example')
Algorithm to find best matches from the comparison pool
Description
Algorithm to find best matches from the comparison pool
Usage
create_matches(trimmed_pool, tm, num_matches = 3, replacement = TRUE)
Arguments
trimmed_pool |
Dataframe containing the pool from which matches should be found |
tm |
See rollmatch() |
num_matches |
See rollmatch() |
replacement |
See rollmatch() |
Value
Dataframe containing top matches
Examples
print('See add_balance_table for full example')
Combine the results of rollmatch into a tidy list for output
Description
Combine the results of rollmatch into a tidy list for output
Usage
make_output(scored_data, data, matches, treat, tm, entry, id, lookback)
Arguments
scored_data |
The dataframe from score_data() |
data |
See rollmatch() |
matches |
Dataframe containing the matches from comparison_pool |
tm |
See rollmatch() |
entry |
See rollmatch() |
id |
See rollmatch() |
lookback |
See rollmatch() |
Value
output
returns a list. See rollmatch()
Examples
print('See add_balance_table for full example')
Preprocessing Step to Rolling Entry Matching
Description
Preprocessing Step to Rolling Entry Matching
Usage
reduce_data(data, treat, tm, entry, id, lookback = 1)
Arguments
data |
Original dataset before reduce_data() was ran. |
treat |
String for name of treatment variable in data. |
tm |
String for time period indicator variable name in data. |
entry |
String for name of time period in which the participant enrolled in the intervention (in the same units as the tm variable). |
id |
String for individual id variable name in data. |
lookback |
The number of time periods to look back before the time period of enrollment (1-...). |
Value
reduced_data
returns a dataset of reduced data ready
for propensity scoring and to use in the function score_data()
Examples
data(package="rollmatch", "rem_synthdata_small")
reduced_data <- reduce_data(data = rem_synthdata_small, treat = "treat",
tm = "quarter", entry = "entry_q",
id = "indiv_id", lookback = 1)
reduced_data
Synthetic dataset to illustrate rolling entry
Description
This dataset represents a synthetic population of individuals who resemble Medicare fee-for-service patients in terms of age, race, spending, inpatient visits, ED visits, chronic conditions, and dual eligibility. The quasi-panel dataset contains multiple observations of non-participants (one for each entry period). Participants enter the data once in the baseline period immediately preceding their unique entry into the intervention. Time-varying covariates (e.g., health conditions, spending, utilization) are dynamic for each entry period's non-participant observations.
Usage
rem_synthdata
Format
A data frame with 254,400 observations and 20 variables:
- indiv_id
The unique identifier for each individual.
- entry_q
The period in which the individual enrolled in treatment / entered the intervention.
- lq
Last baseline quarter before entry into the intervention.
- quarter
Time variable, indicating the quarter that the variables are measured.
- treat
Treatment indicator variable (=1 if in treatment group and =0 if in control group).
- age
The patient's age.
- is_black
Race indicator variable (=1 if identified as Black, =0 if not).
- is_disabled
Physical disability indicator variable (=1 if identified as disabled, =0 if not).
- is_esrd
Disease indicator variable (=1 if identified as having End Stage Renal Disease (ESRD), =0 if not).
- is_hispanic
Ethnicity indicator variable (=1 if identified as Hispanic, =0 if not).
- is_male
Gender indicator variable (=1 if identified as Male, =0 if not).
- is_white
Race indicator variable (=1 if identified as White, =0 if not).
- lq_ed
Indicates the person had an ED visit during LQ.
- lq_ip
Indicates the person had an inpatient stay during LQ.
- yr_ed2
Count of ED visits during quarters LQ-5 to LQ-1.
- yr_ip2
Count of inpatient stays during quarters LQ-4 to LQ-1.
- months_dual
Number of months of dual Medicare-Medicaid eligibility in the previous year.
- chron_num
Number of chronic conditions.
- qtr_pmt
Payments during the quarter.
- yr_pmt
Payments during the previous 4 quarters.
Synthetic dataset to illustrate rolling entry (small)
Description
This dataset represents a synthetic population of individuals who resemble
Medicare fee-for-service patients in terms of age, race, spending,
inpatient visits, ED visits, chronic conditions, and dual eligibility.
The quasi-panel dataset contains multiple observations of non-participants
(one for each entry period). Participants enter the data once in the baseline
period immediately preceding their unique entry into the intervention.
Time-varying covariates (e.g., health conditions, spending, utilization) are
dynamic for each entry period's non-participant observations.
This is a smaller version of rem_synthadata
.
Usage
rem_synthdata_small
Format
A data frame with 12,720 observations and 20 variables:
- indiv_id
The unique identifier for each individual.
- entry_q
The period in which the individual enrolled in treatment / entered the intervention.
- lq
Last baseline quarter before entry into the intervention.
- quarter
Time variable, indicating the quarter that the variables are measured.
- treat
Treatment indicator variable (=1 if in treatment group and =0 if in control group).
- age
The patient's age.
- is_black
Race indicator variable (=1 if identified as Black, =0 if not).
- is_disabled
Physical disability indicator variable (=1 if identified as disabled, =0 if not).
- is_esrd
Disease indicator variable (=1 if identified as having End Stage Renal Disease (ESRD), =0 if not).
- is_hispanic
Ethnicity indicator variable (=1 if identified as Hispanic, =0 if not).
- is_male
Gender indicator variable (=1 if identified as Male, =0 if not).
- is_white
Race indicator variable (=1 if identified as White, =0 if not).
- lq_ed
Indicates the person had an ED visit during LQ.
- lq_ip
Indicates the person had an inpatient stay during LQ.
- yr_ed2
Count of ED visits during quarters LQ-5 to LQ-1.
- yr_ip2
Count of inpatient stays during quarters LQ-4 to LQ-1.
- months_dual
Number of months of dual Medicare-Medicaid eligibility in the previous year.
- chron_num
Number of chronic conditions.
- qtr_pmt
Payments during the quarter.
- yr_pmt
Payments during the previous 4 quarters.
Rolling entry matching
Description
rollmatch
is the last of 3 main functions in the rollmatch package
<rollmatch> implements a comparison group selection
methodology for interventions with rolling participant entry over time.
A difficulty in evaluating rolling entry interventions is that a suitable
"entry" date is not observed for non-participants. This method, called
rolling entry matching, assigns potential comparison non-participants
multiple counterfactual entry periods which allows for matching of
participant and non-participants based on data immediately preceding each
participant's specific entry period, rather than using data from a fixed
pre-intervention period.
Usage
rollmatch(
scored_data,
data,
treat,
tm,
entry,
id,
vars,
lookback,
alpha = 0,
standard_deviation = "average",
num_matches = 3,
replacement = TRUE
)
Arguments
scored_data |
Output from scored_data() or the output from reduce_data() with propensity scores labeled "score". |
data |
Original dataset before reduce_data() was ran. |
treat |
String for name of treatment variable in data. |
tm |
String for time period indicator variable name in data. |
entry |
String for name of time period in which the participant enrolled in the intervention (in the same units as the tm variable). |
id |
String for individual id variable name in data. |
vars |
Vector of column names used in the propensity score algorithm. This is used when creating the balance table. |
lookback |
The number of time periods to look back before the time period of enrollment (1-...). |
alpha |
Part of the pre-specified distance within which to allow
matching. The caliper width is calculated as the |
standard_deviation |
String. 'average' for average pooled standard deviation, 'weighted' for weighted pooled standard deviation, and 'None' to not use a standard deviation multiplication. Default is "average". |
num_matches |
Number of comparison beneficiary matches to attempt to assign to each treatment beneficiary. Default is 3. |
replacement |
Assign comparison beneficiaries with replacement (TRUE)
or without replacement (FALSE). If |
Details
Rolling entry matching requires preliminary steps. This package will assist the user in steps 2 and 3. First, a quasi-panel dataset is constructed containing multiple observations of non-participants (one for each entry period). Participants enter the data once in the baseline period immediately preceding their unique entry into the intervention. Time-varying covariates (e.g., health conditions, spending, utilization) are dynamic for each entry period's non-participant observations. The user of rollmatch is expected to have already created this quasi-panel dataset. Second, the pool of potential comparisons for each participant is restricted to those that have the same "entry period" into the intervention (see function "reduce_data"). Finally, a predicted probability of treatment is obtained for participants and non-participants (e.g. through propensity score matching). The user can use function "score_data" to complete this step, or create use their own propensity score calculation.
The final step consists of the matching algorithm. The algorithm selects the best matched comparison(s) for each participant from the pool of non-participants with the same entry period. This is completed via the function "rollmatch".
Value
rollmatch
returns an object of class "rollmatch".
An object of class "rollmatch" is a list containing the following components:
model |
The output of the model used to estimate the distance measure. |
scores |
The propensity scores used in the matching algorithm. |
data |
The original dataset with all matches added. |
summary |
A basic summary table with counts of matched and unmatched data. |
ids_not_matched |
A vector of the treatment IDs that were not matched. |
total_not_matched |
The number of treatment IDs not matched. |
matched_data |
R data.frame of matches with scores, matching information, and the weights of the individuals |
balance |
table showing the full treatment, full control, matched treatment, and matched comparison group means and standard deviations for the variables used in the model. |
Examples
data(package="rollmatch", "rem_synthdata_small")
reduced_data <- reduce_data(data = rem_synthdata_small, treat = "treat",
tm = "quarter", entry = "entry_q",
id = "indiv_id", lookback = 1)
fm <- as.formula(treat ~ qtr_pmt + yr_pmt + age)
vars <- all.vars(fm)
scored_data <- score_data(reduced_data = reduced_data,
model_type = "logistic", match_on = "logit",
fm = fm, treat = "treat",
tm = "quarter", entry = "entry_q", id = "indiv_id")
output <- rollmatch(scored_data, data=rem_synthdata_small, treat = "treat",
tm = "quarter", entry = "entry_q", id = "indiv_id",
vars = vars, lookback = 1, alpha = .2,
standard_deviation = "average", num_matches = 3,
replacement = TRUE)
output
Run checks on variable inputs
Description
Run checks on variable inputs
Usage
run_checks_one(data, treat, tm, entry, id)
Arguments
data |
See rollmatch() |
treat |
See rollmatch() |
tm |
See rollmatch() |
entry |
See rollmatch() |
id |
See rollmatch() |
Run checks on variable inputs
Description
Run checks on variable inputs
Usage
run_checks_two(data, alpha, standard_deviation, num_matches, replacement)
Arguments
data |
See rollmatch() |
alpha |
See rollmatch() |
standard_deviation |
See rollmatch() |
num_matches |
See rollmatch() |
replacement |
See rollmatch() |
Create propensity scores using a logistic or probit regression model
Description
Create propensity scores using a logistic or probit regression model
Usage
score_data(reduced_data, model_type, match_on, fm, treat, tm, entry, id)
Arguments
reduced_data |
Dataframe of reduced treatment and comparison data. See output of reduce_data(). |
model_type |
Use logistic regression ("logistic") or "probit" regression ("probit") to estimate the predicted probability of participating |
match_on |
Match on estimated propensity score ("pscore") or logit of estimated propensity score ("logit"). |
fm |
A |
treat |
String for name of treatment variable in data. |
tm |
String for time period indicator variable name in data. |
entry |
String for name of time period in which the participant enrolled in the intervention (in the same units as the tm variable). |
id |
String for individual id variable name in data. |
Value
A copy of reduced_data input with added propensity scores
Examples
## Not run:
data(package="rollmatch", "rem_synthdata_small")
fm <- as.formula(treat ~ qtr_pmt + age + is_male + is_white)
reduced_data <- reduce_data(data = rem_synthdata_small, treat = "treat",
tm = "quarter", entry = "entry_q",
id = "indiv_id", lookback = 1)
scored_data <- score_data(reduced_data = reduced_data,
model_type = "logistic", match_on = "logit",
fm = fm, treat = "treat", tm = "quarter",
entry = "entry_q", id = "indiv_id")
head(scored_data)
## End(Not run)
Use a caliper to trim the data to only observations within threshold
Description
Use a caliper to trim the data to only observations within threshold
Usage
trim_pool(
alpha,
comparison_pool,
scored_data,
treat,
tm,
standard_deviation = "average"
)
Arguments
alpha |
See rollmatch() |
comparison_pool |
Dataframe of comparison data to be trimmed from compare_pool() |
scored_data |
Dataframe of results from score_data() |
treat |
See rollmatch() |
tm |
See rollmatch() |
standard_deviation |
See rollmatch() |
Value
Dataframe of the trimmed comparisons based on the alpha value
Examples
print('See add_balance_table for full example')