Title: | Regression Discontinuity Design Application |
Version: | 1.3.3 |
Author: | Ze Jin [aut], Wang Liao [aut], Irena Papst [aut], Wenyu Zhang [aut], Kimberly Hochstedler [aut], Felix Thoemmes [aut, cre] |
Maintainer: | Felix Thoemmes <fjt36@cornell.edu> |
Description: | Estimation of both single- and multiple-assignment Regression Discontinuity Designs (RDDs). Provides both parametric (global) and non-parametric (local) estimation choices for both sharp and fuzzy designs, along with power analysis and assumption checks. Introductions to the underlying logic and analysis of RDDs are in Thistlethwaite, D. L., Campbell, D. T. (1960) <doi:10.1037/h0044319> and Lee, D. S., Lemieux, T. (2010) <doi:10.1257/jel.48.2.281>. |
Depends: | R (≥ 3.2.3) |
Imports: | AER (≥ 1.2-5), sandwich (≥ 2.3-4), lmtest (≥ 0.9-35), Formula (≥ 1.2-1), shiny (≥ 0.14), R.utils (≥ 2.6.0), plot3D (≥ 1.1.1), sp (≥ 1.3.1), DT (≥ 0.2) |
Suggests: | foreign (≥ 0.8-67), devtools (≥ 1.12.0), testthat (≥ 1.0.2), roxygen2 (≥ 5.0.1), knitr (≥ 1.14), rmarkdown (≥ 1.1.9012) |
VignetteBuilder: | knitr |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Collate: | 'attr_check.R' 'bw_ik09.R' 'bw_ik12.R' 'data.R' 'wt_kern.R' 'dc_test.R' 'treat_assign.R' 'wt_kern_bivariate.R' 'mfrd_est.R' 'var_center.R' 'rd_est.R' 'mrd_est.R' 'mrd_impute.R' 'mrd_power.R' 'mrd_sens_bw.R' 'mrd_sens_cutoff.R' 'plot.mfrd.R' 'predict.rd.R' 'plot.rd.R' 'print.mfrd.R' 'print.rd.R' 'rd_impute.R' 'rd_power.R' 'rd_sens_bw.R' 'rd_sens_cutoff.R' 'rd_type.R' 'rddapp-package.R' 'sens_plot.R' 'shiny_run.R' 'summary.mfrd.R' 'summary.mrd.R' 'summary.mrdi.R' 'summary.mrdp.R' 'summary.rd.R' 'summary.rdp.R' |
NeedsCompilation: | no |
Packaged: | 2025-07-23 18:18:22 UTC; fjt36 |
Repository: | CRAN |
Date/Publication: | 2025-07-23 18:30:02 UTC |
Regression Discontinuity Design Application
Description
rddapp: A package for regression discontinuity designs (RDDs).
Details
The rddapp package provides a set of functions for the analysis of the regression-discontinuity design (RDD). The three main parts are: estimation of effects of interest, power analysis, and assumption checks.
Estimation
A variety of designs can be estimated in various ways. The single-assignment RDD (both sharp and fuzzy) can be analyzed using both a parametric (global) or non-parametric (local) approach. The multiple-assignment RDD (both sharp and fuzzy) can be analyzed using both parametric and non-parametric estimation. The analysis choices are further to use estimate effects based on univariate scaling, the centering approach, or the frontier approach. The frontier approach can currently only be estimated using parametric regression with bootstrapped standard errors.
Power analysis
Statistical power can be be estimated for both the single- and multiple-assignment RDD, (both sharp and fuzzy), including all parametric and non-parametric estimators mentioned in the estimation section. All power analyses are based on a simulation approach, which means that the user has to provide all necessary parameters for a data-generating model.
Assumption checks
An important part of any RDD are checks of underlying assumptions. The package provides users with the option to estimate McCrary's sorting test (to identify violations of assignment rules), checks of discontinuities of other baseline covariates, along with sensitivity checks of the chosen bandwidth parameter for non-parametric models, and so-called placebo tests, that examine the treatment effect at other cut-points along the assignment variable.
Author(s)
Ze Jin zj58@cornell.edu, Wang Liao wl483@cornell.edu, Irena Papst ip98@cornell.edu, Wenyu Zhang wz258@cornell.edu, Kimberly Hochstedler kah343@cornell.edu, Felix Thoemmes, fjt36@cornell.edu
Carolina Abecedarian Project and the Carolina Approach to Responsive Education (CARE), 1972-1992
Description
A dataset containing a subset of children from the CARE trial on early childhood intervention. The randomized controlled trial was subsetted to mimic a regression-discontinuity design in which treatment was assigned only to mothers whose IQ was smaller than 85.
Usage
CARE
Format
A data frame with 81 rows and 5 variables:
- SUBJECT
Unique ID variable
- DC_TRT
Day Care (Preschool) Treatment Group, 1 = Treatment, 0 = Control
- APGAR5
APGAR ("Appearance, Pulse, Grimace, Activity, and Respiration") score at 5 minutes after birth
- MOMWAIS0
Biological mother's WAIS (Wechsler Adult Intelligence Scale) full-scale score at subject's birth
- SBIQ48
Subject's Stanford Binet IQ score at 48 months
Source
http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/4091
Examples
data("CARE")
head(CARE)
Attrition Checks
Description
attr_check
reports missing data on treatment variable, assignment variable, and outcome.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::attr_check().
Usage
attr_check(x1, y, t, x2 = NULL)
Arguments
x1 |
A numeric object containing the assignment variable. |
y |
A numeric object containing the outcome variable, with the same dimensionality
as |
t |
A numeric object containing the treatment variable (coded as 0 for untreated and 1 for treated), with the same dimensionality
as |
x2 |
A numeric object containing the secondary assignment variable. |
Value
attr_check
returns a list containing the amount and percentage of missing data for all variables and subgroups, by treatment.
Imbens-Kalyanaraman 2009 Optimal Bandwidth Calculation
Description
bw_ik09
calculates the Imbens-Kalyanaraman (2009) optimal bandwidth
for local linear regression in regression discontinuity designs.
It is based on the IKbandwidth
function in the "rdd" package.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::bw_ik09().
Usage
bw_ik09(X, Y, cutpoint = NULL, verbose = FALSE, kernel = "triangular")
Arguments
X |
A numeric vector containing the running variable. |
Y |
A numeric vector containing the outcome variable. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
verbose |
A logical value indicating whether to print more information to the terminal.
The default is |
kernel |
A string indicating which kernel to use. Options are |
Value
ik_bw09
returns a numeric value specifying the optimal bandwidth.
References
Imbens, G., Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator (Working Paper No. 14726). National Bureau of Economic Research. https://www.nber.org/papers/w14726.
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Imbens-Kalyanaraman 2012 Optimal Bandwidth Calculation
Description
bw_ik12
calculates the Imbens-Kalyanaraman (2012) optimal bandwidth
for local linear regression in regression discontinuity designs.
It is based on a function in the "rddtools" package.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::bw_ik12().
Usage
bw_ik12(X, Y, cutpoint = NULL, verbose = FALSE, kernel = "triangular")
Arguments
X |
A numeric vector containing the running variable. |
Y |
A numeric vector containing the outcome variable. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
verbose |
A logical value indicating whether to print more information to the terminal.
The default is |
kernel |
A string indicating which kernel to use. Options are |
Value
ik_bw12
returns a numeric value specifying the optimal bandwidth.
References
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Stigler, M. and B. Quast, B (2016). rddtools: A toolbox for regression discontinuity in R.
McCrary Sorting Test
Description
dc_test
implements the McCrary (2008) sorting test to identify violations of assignment rules.
It is based on the DCdensity
function in the "rdd" package.
Usage
dc_test(
runvar,
cutpoint,
bin = NULL,
bw = NULL,
verbose = TRUE,
plot = TRUE,
ext.out = FALSE,
htest = FALSE,
level = 0.95,
digits = max(3, getOption("digits") - 3),
timeout = 30
)
Arguments
runvar |
A numeric vector containing the running variable. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
bin |
A numeric value containing the binwidth. The default is |
bw |
A numeric value containing bandwidth to use. If no bandwidth is supplied, the default uses bandwidth selection calculation from McCrary (2008). |
verbose |
A logical value indicating whether to print diagnostic information to
the terminal. The default is |
plot |
A logical value indicating whether to plot the histogram and density estimations
The default is |
ext.out |
A logical value indicating whether to return extended output.
The default is |
htest |
A logical value indicating whether to return an |
level |
A numerical value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display in all output.
The default is |
timeout |
A non-negative numerical value specifying the maximum number of seconds that
expressions in the function are allowed to run. The default is 30. Specify |
Value
If ext.out
is FALSE
, dc_test
returns a numeric value specifying the p-value of the McCrary (2008) sorting test.
Additional output is enabled when ext.out
is TRUE
.
In this case, dc_test
returns a list with the following elements:
theta |
The estimated log difference in heights of the density curve at the cutpoint. |
se |
The standard error of |
z |
The z statistic of the test. |
p |
The p-value of the test. A p-value below the significance threshold indicates that the user can reject the null hypothesis of no sorting. |
binsize |
The calculated size of bins for the test. |
bw |
The calculated bandwidth for the test. |
cutpoint |
The cutpoint used. |
data |
A dataframe for the binning of the histogram. Columns are |
References
McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2), 698-714. doi:10.1016/j.jeconom.2007.05.005.
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Examples
set.seed(12345)
# No discontinuity
x <- runif(1000, -1, 1)
dc_test(x, 0)
# Discontinuity
x <- runif(1000, -1, 1)
x <- x + 2 * (runif(1000, -1, 1) > 0 & x < 0)
dc_test(x, 0)
Multivariate Frontier Regression Discontinuity Estimation
Description
mfrd_est
implements the frontier approach for multivariate regression discontinuity estimation in Wong, Steiner and Cook (2013).
It is based on the MFRDD code in Stata from Wong, Steiner, and Cook (2013).
Usage
mfrd_est(
y,
x1,
x2,
c1,
c2,
t.design = NULL,
local = 0.15,
front.bw = NA,
m = 10,
k = 5,
kernel = "triangular",
ngrid = 250,
margin = 0.03,
boot = NULL,
cluster = NULL,
stop.on.error = TRUE
)
Arguments
y |
A numeric object containing outcome variable. |
x1 |
A numeric object containing the first assignment variable. |
x2 |
A numeric object containing the second assignment variable. |
c1 |
A numeric value containing the cutpoint at which assignment to the treatment is determined for |
c2 |
A numeric value containing the cutpoint at which assignment to the treatment is determined for |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
local |
A non-negative numeric value specifying the range of neighboring points around the cutoff on the standardized scale, for each assignment variable. The default is 0.15. |
front.bw |
A non-negative numeric vector of length 3 specifying the bandwidths at which to estimate the RD for each
of three effects models (complete model, heterogeneous treatment model, and treatment only model)
detailed in Wong, Steiner, and Cook (2013).
If |
m |
A non-negative integer specifying the number of uniformly-at-random samples to draw as search candidates for |
k |
A non-negative integer specifying the number of folds for cross-validation to determine |
kernel |
A string indicating which kernel to use. Options are |
ngrid |
A non-negative integer specifying the number of non-zero grid points on each assignment variable, which is also the number of zero grid points on each assignment variable. The default is 250. The value used in Wong, Steiner and Cook (2013) is 2500, which may cause long computational time. |
margin |
A non-negative numeric value specifying the range of grid points beyond the minimum and maximum of sample points on each assignment variable. This grid is used to impute potential outcomes along the frontier, as in Wong, Steiner, and Cook (2013). The default is 0.03. |
boot |
An optional non-negative integer specifying the number of bootstrap samples to obtain standard error of estimates. |
cluster |
An optional vector of length n specifying clusters within which the errors are assumed to be correlated. This will result in reporting cluster robust SEs. It is suggested that data with a discrete running variable be clustered by each unique value of the running variable (Lee and Card, 2008). |
stop.on.error |
A logical value indicating whether to remove bootstraps which cause error in the |
Value
mfrd_est
returns an object of class "mfrd
".
The functions summary
and plot
are used to obtain and print a summary and
plot of the estimated regression discontinuity. The object of class mfrd
is a list
containing the following components:
w |
Numeric vector specifying the weight of frontier 1 and frontier 2, respectively. |
est |
Numeric matrix of the estimate of the discontinuity in the outcome under a complete model (no prefix), heterogeneous treatment (ht) effects model, and treatment (t) only model, for the parametric case and for each corresponding bandwidth. Estimates with suffix "ev1" and "ev2" correspond to expected values for each frontier, under a given model. Estimates with suffix "ate" correspond to average treatment effects across both frontiers, under a given model. |
d |
Numeric matrix of the effect size (Cohen's d) for estimate. |
se |
Numeric matrix of the standard error for each corresponding bandwidth, if applicable. |
m_s |
A list containing estimates for the complete model, under parametric
and non-parametric (optimal, half, and double bandwidth) cases. A list of
coefficient estimates, residuals, effects, weights (in the non-parametric case),
|
m_h |
A list containing estimates for the heterogeneous treatments model, under parametric
and non-parametric (optimal, half, and double bandwidth) cases. A list of
coefficient estimates, residuals, effects, weights (in the non-parametric case),
|
m_t |
A list containing estimates for the treatment only model, under parametric
and non-parametric (optimal, half, and double bandwidth) cases. A list of
coefficient estimates, residuals, effects, weights (in the non-parametric case),
|
dat_h |
A list containing four data frames, one for each case: parametric or non-parametric (optimal, half, and double bandwidth). Each data frame contains functions and densities for each frontier and treatment model. |
dat |
A data frame containing the outcome ( |
obs |
List of the number of observations used in each model. |
impute |
A logical value indicating whether multiple imputation is used or not. |
call |
The matched call. |
front.bw |
Numeric vector of each bandwidth used to estimate the density at the frontier for the three effects models (complete model, heterogeneous treatment model, and treatment only model) detailed in Wong, Steiner, and Cook (2013). |
References
Wong, V., Steiner, P, and Cook, T. (2013). Analyzing regression discontinuity designs with multiple assignment variables: A comparative study of four estimation methods. Journal of Educational and Behavioral Statistics, 38(2), 107-141. doi:10.3102/1076998611432172.
Lee, D. and Card, D. (2008). A Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Examples
set.seed(12345)
x1 <- runif(1000, -1, 1)
x2 <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000)
mfrd_est(y = y, x1 = x1, x2 = x2, c1 = 0, c2 = 0, t.design = c("geq", "geq"))
Multivariate Regression Discontinuity Estimation
Description
mrd_est
estimates treatment effects in a multivariate regression discontinuity design (MRDD) with two assignment variables,
including the frontier average treatment effect (tau_MRD
)
and frontier-specific effects (tau_R
and tau_M
) simultaneously.
Usage
mrd_est(
formula,
data,
subset = NULL,
cutpoint = NULL,
bw = NULL,
front.bw = NA,
m = 10,
k = 5,
kernel = "triangular",
se.type = "HC1",
cluster = NULL,
verbose = FALSE,
less = FALSE,
est.cov = FALSE,
est.itt = FALSE,
local = 0.15,
ngrid = 250,
margin = 0.03,
boot = NULL,
method = c("center", "univ", "front"),
t.design = NULL,
stop.on.error = TRUE
)
Arguments
formula |
The formula of the MRDD; a symbolic description of the model to
be fitted. This is supplied in the
format of |
data |
An optional data frame containing the variables in the model. If not found in |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
cutpoint |
A numeric vector of length 2 containing the cutpoints at which assignment to the treatment is determined. The default is c(0, 0). |
bw |
A vector specifying the bandwidths at which to estimate the RD for non-parametric models.
Possible values are |
front.bw |
A non-negative numeric vector of length 3 specifying the bandwidths at which to estimate the RD for each
of three effects models (complete model, heterogeneous treatment model, and treatment only model)
detailed in Wong, Steiner, and Cook (2013). If |
m |
A non-negative integer specifying the number of uniformly-at-random samples to draw as search candidates for |
k |
A non-negative integer specifying the number of folds for cross-validation to determine |
kernel |
A string indicating which kernel to use. Options are |
se.type |
This specifies the robust standard error calculation method to use,
from the "sandwich" package. Options are,
as in |
cluster |
An optional vector of length n specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
verbose |
A logical value indicating whether to print additional information to
the terminal, including results of instrumental variable regression,
and outputs from background regression models. The default is |
less |
Logical. If |
est.cov |
Logical. If |
est.itt |
Logical. If |
local |
A non-negative numeric value specifying the range of neighboring points around the cutoff on the standardized scale, for each assignment variable. The default is 0.15. |
ngrid |
A non-negative integer specifying the number of non-zero grid points on each assignment variable, which is also the number of zero grid points on each assignment variable. The default is 250. The value used in Wong, Steiner and Cook (2013) is 2500, which may cause long computational time. |
margin |
A non-negative numeric value specifying the range of grid points beyond the minimum and maximum of sample points on each assignment variable. The default is 0.03. |
boot |
An optional non-negative integer specifying the number of bootstrap samples to obtain standard error of estimates.
This argument is not optional if method is |
method |
A string specifying the method to estimate the RD effect. Options are |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
stop.on.error |
A logical value indicating whether to remove bootstraps which cause error in the |
Value
mrd_est
returns an object of class "mrd
".
The function summary
is used to obtain and print a summary of the
estimated regression discontinuity. The object of class mrd
is a list
containing the following components for each estimated treatment effect,
tau_MRD
or tau_R
and tau_M
:
type |
A string denoting either |
call |
The matched call. |
est |
Numeric vector of the estimate of the discontinuity in the outcome under a sharp MRDD or the Wald estimator in the fuzzy MRDD, for each corresponding bandwidth, if applicable. |
se |
Numeric vector of the standard error for each corresponding bandwidth, if applicable. |
ci |
The matrix of the 95 for each corresponding bandwidth, if applicable. |
bw |
Numeric vector of each bandwidth used in estimation. |
z |
Numeric vector of the z statistic for each corresponding bandwidth, if applicable. |
p |
Numeric vector of the p-value for each corresponding bandwidth, if applicable. |
obs |
Vector of the number of observations within the corresponding bandwidth, if applicable. |
cov |
The names of covariates. |
model |
For a sharp design, a list of the |
frame |
Returns the model frame used in fitting. |
na.action |
The observations removed from fitting due to missingness. |
impute |
A logical value indicating whether multiple imputation is used or not. |
d |
Numeric vector of the effect size (Cohen's d) for each estimate. |
References
Wong, V. C., Steiner, P. M., Cook, T. D. (2013). Analyzing regression-discontinuity designs with multiple assignment variables: A comparative study of four estimation methods. Journal of Educational and Behavioral Statistics, 38(2), 107-141. https://journals.sagepub.com/doi/10.3102/1076998611432172.
Imbens, G., Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator (Working Paper No. 14726). National Bureau of Economic Research. https://www.nber.org/papers/w14726.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Lee, D. S., Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355. doi:10.1257/jel.48.2.281.
Zeileis, A. (2006). Object-oriented computation of sandwich estimators. Journal of Statistical Software, 16(9), 1-16. doi:10.18637/jss.v016.i09
Examples
set.seed(12345)
x1 <- runif(1000, -1, 1)
x2 <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000)
# centering
mrd_est(y ~ x1 + x2 | cov, method = "center", t.design = c("geq", "geq"))
# univariate
mrd_est(y ~ x1 + x2 | cov, method = "univ", t.design = c("geq", "geq"))
# frontier
mrd_est(y ~ x1 + x2 | cov, method = "front", t.design = c("geq", "geq"))
Multiple Imputation of Multivariate Regression Discontinuity Estimation
Description
mrd_impute
estimates treatment effects in a multivariate regression discontinuity design (MRDD) with imputed missing values.
Usage
mrd_impute(
formula,
data,
subset = NULL,
cutpoint = NULL,
bw = NULL,
front.bw = NA,
m = 10,
k = 5,
kernel = "triangular",
se.type = "HC1",
cluster = NULL,
impute = NULL,
verbose = FALSE,
less = FALSE,
est.cov = FALSE,
est.itt = FALSE,
local = 0.15,
ngrid = 250,
margin = 0.03,
boot = NULL,
method = c("center", "univ", "front"),
t.design = NULL,
stop.on.error = TRUE
)
Arguments
formula |
The formula of the MRDD; a symbolic description of the model to be fitted. This is supplied in the
format of |
data |
An optional data frame containing the variables in the model. If not found in |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
cutpoint |
A numeric vector of length 2 containing the cutpoints at which assignment to the treatment is determined.
The default is |
bw |
A vector specifying the bandwidths at which to estimate the RD.
Possible values are |
front.bw |
A non-negative numeric vector of length 3 specifying the bandwidths at which to estimate the RD for each
of three effects models (complete model, heterogeneous treatment model, and treatment only model)
detailed in Wong, Steiner, and Cook (2013). If |
m |
A non-negative integer specifying the number of uniformly-at-random samples to draw as search candidates for |
k |
A non-negative integer specifying the number of folds for cross-validation to determine |
kernel |
A string indicating which kernel to use. Options are |
se.type |
This specifies the robust standard error calculation method to use,
from the "sandwich" package. Options are,
as in |
cluster |
An optional vector of length n specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
impute |
An optional vector of length n containing a grouping variable that specifies the imputed variables with missing values. |
verbose |
A logical value indicating whether to print additional information to
the terminal. The default is |
less |
Logical. If |
est.cov |
Logical. If |
est.itt |
Logical. If |
local |
A non-negative numeric value specifying the range of neighboring points around the cutoff on the standardized scale, for each assignment variable. The default is 0.15. |
ngrid |
A non-negative integer specifying the number of non-zero grid points on each assignment variable, which is also the number of zero grid points on each assignment variable. The default is 250. The value used in Wong, Steiner and Cook (2013) is 2500, which may cause long computational time. |
margin |
A non-negative numeric value specifying the range of grid points beyond the minimum and maximum of sample points on each assignment variable. The default is 0.03. |
boot |
An optional non-negative integer specifying the number of bootstrap samples to obtain standard error of estimates.
This argument is not optional if method is |
method |
A string specifying the method to estimate the RD effect. Options are |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
stop.on.error |
A logical value indicating whether to remove bootstraps which cause error in the |
Value
mrd_impute
returns an object of class "mrd
" or "mrdi"
for "front"
method.
The function summary
is used to obtain and print a summary of the
estimated regression discontinuity. The object of class mrd
is a list
containing the following components for each estimated treatment effect,
tau_MRD
or tau_R
and tau_M
:
call |
The matched call. |
type |
A string denoting either |
cov |
The names of covariates. |
bw |
Numeric vector of each bandwidth used in estimation. |
obs |
Vector of the number of observations within the corresponding bandwidth. |
model |
For a sharp design, a list of the |
frame |
Returns the model frame used in fitting. |
na.action |
The observations removed from fitting due to missingness. |
est |
Numeric vector of the estimate of the discontinuity in the outcome under a sharp MRDD or the Wald estimator in the fuzzy MRDD, for each corresponding bandwidth. |
d |
Numeric vector of the effect size (Cohen's d) for each estimate. |
se |
Numeric vector of the standard error for each corresponding bandwidth. |
z |
Numeric vector of the z statistic for each corresponding bandwidth. |
df |
Numeric vector of the degrees of freedom computed using Barnard and Rubin (1999) adjustment for imputation. |
p |
Numeric vector of the p-value for each corresponding bandwidth. |
ci |
The matrix of the 95 for each corresponding bandwidth. |
impute |
A logical value indicating whether multiple imputation is used or not. |
References
Wong, V. C., Steiner, P. M., Cook, T. D. (2013). Analyzing regression-discontinuity designs with multiple assignment variables: A comparative study of four estimation methods. Journal of Educational and Behavioral Statistics, 38(2), 107-141. https://journals.sagepub.com/doi/10.3102/1076998611432172.
Lee, D. S., Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355. doi:10.1257/jel.48.2.281.
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Barnard, J., Rubin, D. (1999). Small-Sample Degrees of Freedom with Multiple Imputation. Biometrika, 86(4), 948-55.
Examples
set.seed(12345)
x1 <- runif(300, -1, 1)
x2 <- runif(300, -1, 1)
cov <- rnorm(300)
y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(300)
imp <- rep(1:3, each = 100)
# all examples below have smaller numbers of m to keep run-time low
# centering
mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "center", t.design = c("geq", "geq"), m = 3)
# univariate
mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "univ", t.design = c("geq", "geq"), m = 3)
# frontier - don't run due to computation time
## Not run: mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "front",
boot = 1000, t.design = c("geq", "geq"), m = 3)
## End(Not run)
Power Analysis of Multivariate Regression Discontinuity
Description
mrd_power
computes the empirical probability that a resulting parameter
estimate of the MRD is significant,
i.e. the empirical power (1 - beta).
Usage
mrd_power(
num.rep = 100,
sample.size = 100,
x1.dist = "normal",
x1.para = c(0, 1),
x2.dist = "normal",
x2.para = c(0, 1),
x1.cut = 0,
x2.cut = 0,
x1.fuzzy = c(0, 0),
x2.fuzzy = c(0, 0),
x1.design = NULL,
x2.design = NULL,
coeff = c(0.1, 0.5, 0.5, 1, rep(0.1, 9)),
eta.sq = 0.5,
alpha.list = c(0.001, 0.01, 0.05)
)
Arguments
num.rep |
A non-negative integer specifying the number of repetitions used to calculate the empirical power. The default is 100. |
sample.size |
A non-negative integer specifying the number of observations in each sample. The default is 100. |
x1.dist |
A string specifying the distribution of the first assignment variable, |
x1.para |
A numeric vector of length 2 specifying parameters of the distribution of the first assignment variable, |
x2.dist |
A string specifying the distribution of the second assignment variable, |
x2.para |
A numeric vector of length 2 specifying parameters of the distribution of the second assignment variable, |
x1.cut |
A numeric value containing the cutpoint at which assignment to the treatment is determined for the first assignment variable, |
x2.cut |
A numeric value containing the cutpoint at which assignment to the treatment is determined for the second assignment variable, |
x1.fuzzy |
A numeric vector of length 2 specifying the probabilities to be
assigned to the control condition, in terms of the first
assignment variable, |
x2.fuzzy |
A numeric vector of length 2 specifying the probabilities to be assigned to the control, in terms of the second
assignment variable, |
x1.design |
A string specifying the treatment option according to design for |
x2.design |
A string specifying the treatment option according to design for |
coeff |
A numeric vector specifying coefficients of variables in the linear model to generate data. Coefficients are in the following order:
The default is |
eta.sq |
A numeric value specifying the expected partial eta-squared of the linear model with respect to the treatment itself. It is used to control the variance of noise in the linear model. The default is 0.50. |
alpha.list |
A numeric vector containing significance levels (between 0 and 1) used to calculate the empirical alpha.
The default is |
Value
mrd_power
returns an object of class
"mrdp
" containing the number of successful iterations,
mean, variance, and power (with alpha
of 0.001, 0.01, and 0.05)
for six estimators. The function summary
is used to obtain and print a summary of the power analysis.
The six estimators are as follows:
The 1st estimator,
Linear
, provides results of the linear regression estimator of combined RD using the centering approach.The 2nd estimator,
Opt
, provides results of the local linear regression estimator of combined RD using the centering approach, with the optimal bandwidth in the Imbens and Kalyanaraman (2012) paper.The 3rd estimator,
Linear
, provides results of the linear regression estimator of separate RD in terms ofx1
using the univariate approach.The 4th estimator,
Opt
, provides results of the local linear regression estimator of separate RD in terms ofx1
using the univariate approach, with the optimal bandwidth in the Imbens and Kalyanaraman (2012) paper.The 5th estimator,
Linear
, provides results of the linear regression estimator of separate RD in terms ofx2
using the univariate approach.The 6th estimator,
Opt
, provides results of the local linear regression estimator of separate RD in terms ofx2
using the univariate approach, with the optimal bandwidth in the Imbens and Kalyanaraman (2012) paper.
References
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Examples
## Not run:
summary(mrd_power(x1.design = "l", x2.design = "l"))
summary(mrd_power(x1.dist = "uniform", x1.cut = 0.5,
x1.design = "l", x2.design = "l"))
summary(mrd_power(x1.fuzzy = c(0.1, 0.1), x1.design = "l", x2.design = "l"))
## End(Not run)
Bandwidth Sensitivity Simulation for Multivariate Regression Discontinuity
Description
mrd_sens_bw
refits the supplied model with varying bandwidths.
All other aspects of the model are held constant.
Usage
mrd_sens_bw(object, approach = c("center", "univ1", "univ2"), bws)
Arguments
object |
An object returned by |
approach |
A string of the approaches to be refitted,
choosing from |
bws |
A positive numeric vector of the bandwidths for refitting an |
Value
mrd_sens_bw
returns a dataframe containing the estimate est
and standard error se
for each supplied bandwidth and for the Imbens-Kalyanaraman (2012) optimal bandwidth, bw
,
and for each supplied approach, model
. Approaches are either user
specified ("usr"
) or based on the optimal bandwidth ("origin"
).
References
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Examples
set.seed(12345)
x1 <- runif(10000, -1, 1)
x2 <- rnorm(10000, 10, 2)
cov <- rnorm(10000)
y <- 3 + 2 * x1 + 1 * x2 + 3 * cov + 10 * (x1 >= 0) + 5 * (x2 >= 10) + rnorm(10000)
# front.bw arugment was supplied to speed up the example
# users should choose appropriate values for front.bw
mrd <- mrd_est(y ~ x1 + x2 | cov,
cutpoint = c(0, 10), t.design = c("geq", "geq"), front.bw=c(1,1,1))
mrd_sens_bw(mrd, approach = "univ1", bws = seq(0.1, 1, length.out = 3))
Cutoff Sensitivity Simulation for Multivariate Regression Discontinuity
Description
mrd_sens_cutoff
refits the supplied model with varying cutoff(s).
All other aspects of the model, such as the automatically calculated bandwidth, are held constant.
Usage
mrd_sens_cutoff(object, cutoffs)
Arguments
object |
An object returned by |
cutoffs |
A two-column numeric matrix of paired cutoff values
to be used for refitting an |
Value
mrd_sens_cutoff
returns a dataframe containing the estimate est
and standard error se
for each pair of cutoffs (A1
and A2
) and for each model
. A1
contains varying cutoffs
for assignment 1 and A2
contains varying cutoffs for assignment 2.
The model
column contains the approach (either centering, univariate 1, or univariate 2)
for determining the cutoff and the parametric model (linear, quadratic, or cubic) or
non-parametric bandwidth setting (Imbens-Kalyanaraman 2012 optimal, half, or double) used for estimation.
References
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Examples
set.seed(12345)
x1 <- runif(5000, -1, 1)
x2 <- rnorm(5000, 10, 2)
cov <- rnorm(5000)
y <- 3 + 2 * x1 + 1 * x2 + 3 * cov + 10 * (x1 >= 0) + 5 * (x2 >= 10) + rnorm(5000)
# front.bw arugment was supplied to speed up the example
# users should choose appropriate values for front.bw
mrd <- mrd_est(y ~ x1 + x2 | cov,
cutpoint = c(0, 10), t.design = c("geq", "geq"), front.bw = c(1,1,1))
mrd_sens_cutoff(mrd, expand.grid(A1 = seq(-.5, .5, length.out = 3), A2 = 10))
Plot the Multivariate Frontier Regression Discontinuity
Description
plot.mfrd
plots a 3D illustration of the bivariate frontier regression discontinuity design (RDD).
Usage
## S3 method for class 'mfrd'
plot(
x,
model = c("m_s", "m_h", "m_t"),
methodname = c("Param", "bw", "Half-bw", "Double-bw"),
gran = 10,
raw_data = TRUE,
color_surface = FALSE,
...
)
Arguments
x |
An |
model |
A string containing the model specification. Options include one of |
methodname |
A string containing the method specification.
Options include one of |
gran |
A non-negative integer specifying the granularity of the surface grid (i.e. the desired number of predicted points before and after the cutoff, along each assignment variable). The default is 10. |
raw_data |
A logical value indicating whether the raw data points are plotted. The default is |
color_surface |
A logical value indicating whether the treated surface is colored. The default is |
... |
Additional graphic arguments passed to |
Examples
set.seed(12345)
x1 <- runif(1000, -1, 1)
x2 <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(1000)
model <- mrd_est(y ~ x1 + x2, cutpoint = c(0, 0), t.design = c("geq", "geq"))
plot(model$front$tau_MRD, "m_s", "Param")
Plot the Regression Discontinuity
Description
plot.rd
plots the relationship between the running variable and the outcome.
It is based on the plot.RD
function in the "rdd" package.
Usage
## S3 method for class 'rd'
plot(
x,
preds = NULL,
fit_line = c("linear", "quadratic", "cubic", "optimal", "half", "double"),
fit_ci = c("area", "dot", "hide"),
fit_ci_level = 0.95,
bin_n = 20,
bin_level = 0.95,
bin_size = c("shade", "size"),
quant_bin = TRUE,
xlim = NULL,
ylim = NULL,
include_rugs = FALSE,
...
)
Arguments
x |
An |
preds |
An optional vector of predictions generated by |
fit_line |
A character vector specifying models to be shown as fitted lines.
Options are |
fit_ci |
A string specifying whether and how to plot prediction confidence intervals
around the fitted lines. Options are |
fit_ci_level |
A numeric value between 0 and 1 specifying the confidence level of prediction CIs. The default is 0.95. |
bin_n |
An integer specifying the number of bins for binned data points. If |
bin_level |
A numeric value between 0 and 1 specifying the confidence level for CIs around binned data points. The default is 0.95. |
bin_size |
A string specifying how to plot the number of observations in each bin, by |
quant_bin |
A logical value indicating whether the data are binned by quantiles. The default is |
xlim |
An optional numeric vector containing the x-axis limits. |
ylim |
An optional numeric vector containing the y-axis limits. |
include_rugs |
A logical value indicating whether to include the 1d plot for both axes. The default is |
... |
Additional graphic arguments passed to |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Examples
set.seed(12345)
dat <- data.frame(x = runif(1000, -1, 1), cov = rnorm(1000))
dat$tr <- as.integer(dat$x >= 0)
dat$y <- 3 + 2 * dat$x + 3 * dat$cov + 10 * (dat$x >= 0) + rnorm(1000)
rd <- rd_est(y ~ x + tr | cov, data = dat, cutpoint = 0, t.design = "geq")
plot(rd)
Predict the Regression Discontinuity
Description
predict.rd
makes predictions of means and standard deviations of RDs at different cutoffs.
Usage
## S3 method for class 'rd'
predict(object, gran = 50, ...)
Arguments
object |
An |
gran |
A non-negative integer specifying the granularity of the data points (i.e. the desired number of predicted points). The default is 50. |
... |
Additional arguments passed to |
Examples
set.seed(12345)
x <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000)
tr <- as.integer(x >= 0)
rd <- rd_est(y ~ x + tr | cov, cutpoint = 0, t.design = "geq")
predict(rd)
Print the Multivariate Frontier Regression Discontinuity
Description
print.mfrd
prints a very basic summary of the multivariate frontier regression
discontinuity. It is based on the print.RD
function in the "rdd" package.
Usage
## S3 method for class 'mfrd'
print(x, digits = max(3, getOption("digits") - 3), ...)
Arguments
x |
An |
digits |
A non-negative integer specifying the number of digits to print.
The default is |
... |
Additional arguments passed to |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Print the Regression Discontinuity
Description
print.rd
prints a basic summary of the regression discontinuity.
print.rd
is based on the print.RD
function in the "rdd" package.
Usage
## S3 method for class 'rd'
print(x, digits = max(3, getOption("digits") - 3), ...)
Arguments
x |
An |
digits |
A non-negative integer specifying the number of digits to print.
The default is |
... |
Additional arguments passed to |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Regression Discontinuity Estimation
Description
rd_est
estimates both sharp and fuzzy RDDs using parametric and non-parametric
(local linear) models.
It is based on the RDestimate
function in the "rdd" package.
Sharp RDDs (both parametric and non-parametric) are estimated using lm
in the
stats package.
Fuzzy RDDs (both parametric and non-parametric) are estimated using two-stage least-squares
ivreg
in the AER package.
For non-parametric models, Imbens-Kalyanaraman optimal bandwidths can be used,
Usage
rd_est(
formula,
data,
subset = NULL,
cutpoint = NULL,
bw = NULL,
kernel = "triangular",
se.type = "HC1",
cluster = NULL,
verbose = FALSE,
less = FALSE,
est.cov = FALSE,
est.itt = FALSE,
t.design = NULL
)
Arguments
formula |
The formula of the RDD; a symbolic description of the model to be fitted. This is supplied in the
format of |
data |
An optional data frame containing the variables in the model. If not found in |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
bw |
A vector specifying the bandwidths at which to estimate the RD.
Possible values are |
kernel |
A string indicating which kernel to use. Options are |
se.type |
This specifies the robust standard error calculation method to use,
from the "sandwich" package. Options are,
as in |
cluster |
An optional vector of length n specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
verbose |
A logical value indicating whether to print additional information to
the terminal. The default is |
less |
Logical. If |
est.cov |
Logical. If |
est.itt |
Logical. If |
t.design |
A string specifying the treatment option according to design.
Options are |
Value
rd_est
returns an object of class "rd
".
The functions summary
and plot
are used to obtain and print a summary and
plot of the estimated regression discontinuity. The object of class rd
is a list
containing the following components:
type |
A string denoting either |
est |
Numeric vector of the estimate of the discontinuity in the outcome under a sharp RDD or the Wald estimator in the fuzzy RDD, for each corresponding bandwidth. |
se |
Numeric vector of the standard error for each corresponding bandwidth. |
z |
Numeric vector of the z statistic for each corresponding bandwidth. |
p |
Numeric vector of the p-value for each corresponding bandwidth. |
ci |
The matrix of the 95 for each corresponding bandwidth. |
d |
Numeric vector of the effect size (Cohen's d) for each estimate. |
cov |
The names of covariates. |
bw |
Numeric vector of each bandwidth used in estimation. |
obs |
Vector of the number of observations within the corresponding bandwidth. |
call |
The matched call. |
na.action |
The number of observations removed from fitting due to missingness. |
impute |
A logical value indicating whether multiple imputation is used or not. |
model |
For a sharp design, a list of the |
frame |
Returns the dataframe used in fitting the model. |
References
Lee, D. S., Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355. doi:10.1257/jel.48.2.281.
Imbens, G., Lemieux, T. (2008). Regression discontinuity designs: A guide to practice. Journal of Econometrics, 142(2), 615-635. doi:10.1016/j.jeconom.2007.05.001.
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Angrist, J. D., Pischke, J.-S. (2009). Mostly harmless econometrics: An empiricist's companion. Princeton, NJ: Princeton University Press.
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Imbens, G., Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator (Working Paper No. 14726). National Bureau of Economic Research. https://www.nber.org/papers/w14726.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Examples
set.seed(12345)
x <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000)
rd_est(y ~ x, t.design = "geq")
# Efficiency gains can be made by including covariates (review SEs in "summary" output).
rd_est(y ~ x | cov, t.design = "geq")
Multiple Imputation of Regression Discontinuity Estimation
Description
rd_impute
estimates treatment effects in an RDD with imputed missing values.
Usage
rd_impute(
formula,
data,
subset = NULL,
cutpoint = NULL,
bw = NULL,
kernel = "triangular",
se.type = "HC1",
cluster = NULL,
impute = NULL,
verbose = FALSE,
less = FALSE,
est.cov = FALSE,
est.itt = FALSE,
t.design = NULL
)
Arguments
formula |
The formula of the RDD; a symbolic description of the model to be fitted. This is supplied in the
format of |
data |
An optional data frame containing the variables in the model. If not found in |
subset |
An optional vector specifying a subset of observations to be used in the fitting process. |
cutpoint |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
bw |
A vector specifying the bandwidths at which to estimate the RD.
Possible values are |
kernel |
A string indicating which kernel to use. Options are |
se.type |
This specifies the robust standard error calculation method to use,
from the "sandwich" package. Options are,
as in |
cluster |
An optional vector specifying clusters within which the errors are assumed
to be correlated. This will result in reporting cluster robust SEs. This option overrides
anything specified in |
impute |
An optional vector of length n, indexing whole imputations. |
verbose |
A logical value indicating whether to print additional information to
the terminal. The default is |
less |
Logical. If |
est.cov |
Logical. If |
est.itt |
Logical. If |
t.design |
A string specifying the treatment option according to design.
Options are |
Value
rd_impute
returns an object of class "rd
".
The functions summary
and plot
are used to obtain and print a summary and
plot of the estimated regression discontinuity. The object of class rd
is a list
containing the following components:
call |
The matched call. |
impute |
A logical value indicating whether multiple imputation is used or not. |
type |
A string denoting either |
cov |
The names of covariates. |
bw |
Numeric vector of each bandwidth used in estimation. |
obs |
Vector of the number of observations within the corresponding bandwidth. |
model |
For a sharp design, a list of the |
frame |
Returns the model frame used in fitting. |
na.action |
The observations removed from fitting due to missingness. |
est |
Numeric vector of the estimate of the discontinuity in the outcome under a sharp RDD or the Wald estimator in the fuzzy RDD, for each corresponding bandwidth. |
d |
Numeric vector of the effect size (Cohen's d) for each estimate. |
se |
Numeric vector of the standard error for each corresponding bandwidth. |
z |
Numeric vector of the z statistic for each corresponding bandwidth. |
df |
Numeric vector of the degrees of freedom computed using Barnard and Rubin (1999) adjustment for imputation. |
p |
Numeric vector of the p-value for each corresponding bandwidth. |
ci |
The matrix of the 95 for each corresponding bandwidth. |
References
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003.
Imbens, G., Kalyanaraman, K. (2009). Optimal bandwidth choice for the regression discontinuity estimator (Working Paper No. 14726). National Bureau of Economic Research. https://www.nber.org/papers/w14726.
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Barnard, J., Rubin, D. (1999). Small-Sample Degrees of Freedom with Multiple Imputation. Biometrika, 86(4), 948-55.
Examples
set.seed(12345)
x <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * x + 3 * cov + 10 * (x < 0) + rnorm(1000)
group <- rep(1:10, each = 100)
rd_impute(y ~ x, impute = group, t.design = "l")
# Efficiency gains can be made by including covariates (review SEs in "summary" output).
rd_impute(y ~ x | cov, impute = group, t.design = "l")
Power Analysis of Regression Discontinuity
Description
rd_power
computes the empirical probability that a resulting parameter
estimate of the MRD is significant,
i.e. the empirical power (1 - beta).
Usage
rd_power(
num.rep = 100,
sample.size = 100,
x.dist = "normal",
x.para = c(0, 1),
x.cut = 0,
x.fuzzy = c(0, 0),
x.design = NULL,
coeff = c(0.3, 1, 0.2, 0.3),
eta.sq = 0.5,
alpha.list = c(0.001, 0.01, 0.05)
)
Arguments
num.rep |
A non-negative integer specifying the number of repetitions used to calculate the empirical power. The default is 100. |
sample.size |
A non-negative integer specifying the number of observations in each sample. The default is 100. |
x.dist |
A string specifying the distribution of the assignment variable, |
x.para |
A numeric vector of length 2 specifying parameters of the distribution of the first assignment variable, |
x.cut |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
x.fuzzy |
A numeric vector of length 2 specifying the probabilities to be assigned to the control, in terms of the
assignment variable, |
x.design |
A string specifying the treatment option according to design.
Options are |
coeff |
A numeric vector specifying coefficients of variables in the linear model to generate data. Coefficients are in the following order:
The default is |
eta.sq |
A numeric value specifying the expected partial eta-squared of the linear model with respect to the treatment itself. It is used to control the variance of noise in the linear model. The default is 0.50. |
alpha.list |
A numeric vector containing significance levels (between 0 and 1) used to calculate the empirical alpha.
The default is |
Value
rd_power
returns an object of class
"rdp
", including containing the mean, variance, and power (with alpha
of 0.001, 0.01, and 0.05)
for two estimators. The function summary
is used to obtain and print a summary of the power analysis. The two estimators are:
The 1st estimator,
Linear
, provides results of the linear regression estimator.The 2nd estimator,
Opt
, provides results of the local linear regression estimator of RD, with the optimal bandwidth in the Imbens and Kalyanaraman (2012) paper.
References
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Examples
## Not run:
summary(rd_power(x.design = "l"))
summary(rd_power(x.dist = "uniform", x.cut = 0.5, x.design = "l"))
summary(rd_power(x.fuzzy = c(0.1, 0.1), x.design = "l"))
## End(Not run)
Bandwidth Sensitivity Simulation for Regression Discontinuity
Description
rd_sens_bw
refits the supplied model with varying bandwidths.
All other aspects of the model are held constant.
Usage
rd_sens_bw(object, bws)
Arguments
object |
An object returned by |
bws |
A positive numeric vector of the bandwidths for refitting an |
Value
rd_sens_bw
returns a dataframe containing the estimate est
and standard error se
for each supplied bandwidth and for the Imbens-Kalyanaraman (2012) optimal bandwidth, bw
,
and for each supplied approach, model
. Approaches are either user
specified ("usr"
) or based on the optimal bandwidth ("origin"
).
References
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Examples
set.seed(12345)
x <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000)
rd <- rd_est(y ~ x | cov, t.design = "geq")
rd_sens_bw(rd, bws = seq(.1, 1, length.out = 5))
Cutoff Sensitivity Simulation for Regression Discontinuity
Description
rd_sens_cutoff
refits the supplied model with varying cutoff(s).
All other aspects of the model, such as the automatically calculated bandwidth, are held constant.
Usage
rd_sens_cutoff(object, cutoffs)
Arguments
object |
An object returned by |
cutoffs |
A numeric vector of cutoff values to be used for refitting
an |
Value
rd_sens_cutoff
returns a dataframe containing the estimate est
and standard error se
for each cutoff value (A1
). Column A1
contains varying cutoffs
on the assignment variable. The model
column contains the parametric model (linear, quadratic, or cubic) or
non-parametric bandwidth setting (Imbens-Kalyanaraman 2012 optimal, half, or double) used for estimation.
References
Imbens, G., Kalyanaraman, K. (2012). Optimal bandwidth choice for the regression discontinuity estimator. The Review of Economic Studies, 79(3), 933-959. https://academic.oup.com/restud/article/79/3/933/1533189.
Examples
set.seed(12345)
x <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000)
rd <- rd_est(y ~ x | cov, t.design = "geq")
rd_sens_cutoff(rd, seq(-.5, .5, length.out = 10))
Determine Type of Regression Discontinuity Design
Description
rd_type
cross-tabulates observations based on (1) a binary treatment and
(2) one or two assignments and their cutoff values.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::rd_type().
Usage
rd_type(
data,
treat,
assign_1,
cutoff_1,
operator_1 = NULL,
assign_2 = NULL,
cutoff_2 = NULL,
operator_2 = NULL
)
Arguments
data |
A |
treat |
A string specifying the name of the numeric treatment variable (treated = positive values). |
assign_1 |
A string specifying the variable name of the primary assignment. |
cutoff_1 |
A numeric value containing the cutpoint at which assignment to the treatment is determined, for the primary assignment. |
operator_1 |
The operator specifying the treatment option according to design for the primary assignment.
Options are
|
assign_2 |
An optional string specifying the variable name of the secondary assignment. |
cutoff_2 |
An optional numeric value containing the cutpoint at which assignment to the treatment is determined, for the secondary assignment. |
operator_2 |
The operator specifying the treatment option according to design for the secondary assignment.
Options are
|
Value
rd_type
returns a list of two elements:
crosstab |
The cross-table as a data.frame. Columns in the dataframe include treatment rules, number of observations in the control condition, number of observations in the treatment condition, and the probability of an observation being in treatment or control. |
type |
A string specifying the type of design used, either |
Examples
set.seed(12345)
x <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000)
df <- data.frame(cbind(y, x, t = x>=0))
rddapp:::rd_type(df, 't', 'x', 0, 'geq')
Plot the Simulated Estimates for Sensitivity Analyses
Description
sens_plot
plots the sensitivity analysis for cutpoints or bandwidths.
Usage
sens_plot(
sim_results,
level = 0.95,
x = c("A1", "A2", "bw"),
plot_models = unique(sim_results$model),
yrange = NULL
)
Arguments
sim_results |
A |
level |
A numeric value between 0 and 1 specifying the confidence level for CIs (assuming a normal sampling distribution). The default is 0.95. |
x |
A string of the column name of the varying parameter in |
plot_models |
A character vector specifying the models to be plotted (i.e. models estimated with
different approaches). Possible values are |
yrange |
An optional numeric vector specifying the range of the y-axis. |
Examples
set.seed(12345)
x <- runif(1000, -1, 1)
cov <- rnorm(1000)
y <- 3 + 2 * x + 3 * cov + 10 * (x >= 0) + rnorm(1000)
m <- rd_est(y ~ x | cov, t.design = "geq")
sim_cutoff <- rd_sens_cutoff(m, seq(-.5, .5, length.out = 10))
sens_plot(sim_cutoff, x = "A1", plot_models = c("linear", "optimal"))
sim_bw <- rd_sens_bw(m, seq(.1, 1, length.out = 10))
sens_plot(sim_bw, x = "bw")
Launch the R Shiny App for "rddapp"
Description
shiny_run
launches the R Shiny application for "rddapp".
Usage
shiny_run(app_name = "shinyrdd")
Arguments
app_name |
A string specifying the name of the R Shiny app. The default is |
Examples
## Not run:
shiny_run()
shiny_run("shinyrdd")
## End(Not run)
Summarize the Multivariate Frontier Regression Discontinuity
Description
summary.mfrd
is a summary
method for class "mfrd"
.
It is based on the summary.RD
function in the "rdd" package.
Usage
## S3 method for class 'mfrd'
summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
Arguments
object |
An object of class |
level |
A numeric value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
Value
summary.mfrd
returns a list containing the following components:
coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the complete model. |
ht_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the heterogeneous treatment model. |
t_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the treatment only model. |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Summarize the Multivariate Regression Discontinuity
Description
summary.mrd
is a summary
method for class "mrd"
.
It is based on summary.RD
function in the "rdd" package.
Usage
## S3 method for class 'mrd'
summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
Arguments
object |
An object of class |
level |
A numeric value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
Value
summary.mrd
returns a list which has the following components depending on methods
implemented in the "mrd"
object:
center_coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, confidence intervals, z-values and p-values for each estimated bandwidth and/or parametric model. |
univR_coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, confidence intervals, z-values and p-values for each estimated bandwidth and/or parametric model. |
univM_coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, confidence intervals, z-values and p-values for each estimated bandwidth and/or parametric model. |
front_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the complete model. |
front_ht_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the heterogeneous treatment model. |
front_t_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the treatment only model. |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Summarize the Multiple Imputation of Multivariate Regression Discontinuity
Description
summary.mrdi
is a summary
method for class "mrdi"
.
It is based on summary.RD
function in the "rdd" package.
Usage
## S3 method for class 'mrdi'
summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
Arguments
object |
An object of class |
level |
A numeric value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
Value
summary.mrdi
returns a list which has the following components:
coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the complete model. |
ht_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the heterogeneous treatment model. |
t_coefficients |
A matrix containing estimates and confidence intervals (if applicable) for the treatment only model. |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Summarize the Power Analysis of Regression Discontinuity
Description
summary.mrdp
is a summary
method for class "mrdp"
.
It is based on summary.RD
function in the "rdd" package.
Usage
## S3 method for class 'mrdp'
summary(object, digits = max(3, getOption("digits") - 3), ...)
Arguments
object |
An object of class |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
Value
summary.mrdp
returns a list which has the following components:
coefficients |
A matrix containing the mean, variance, and empirical alpha of each estimator. |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Summarize the Regression Discontinuity
Description
summary.rd
is a summary
method for class "rd"
It is based on summary.RD
function in the "rdd" package.
Usage
## S3 method for class 'rd'
summary(object, level = 0.95, digits = max(3, getOption("digits") - 3), ...)
Arguments
object |
An object of class |
level |
A numeric value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95. |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
Value
summary.rd
returns a list which has the following components:
coefficients |
A matrix containing bandwidths, number of observations, estimates, SEs, confidence intervals, z-values and p-values for each estimated bandwidth. |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Summarize the Power Analysis of Regression Discontinuity
Description
summary.rdp
is a summary
method for class "rdp"
.
It is based on summary.RD
function in the "rdd" package.
Usage
## S3 method for class 'rdp'
summary(object, digits = max(3, getOption("digits") - 3), ...)
Arguments
object |
An object of class |
digits |
A non-negative integer specifying the number of digits to display.
The default is |
... |
Additional arguments passed to |
Value
summary.rdp
returns a list which has the following components:
coefficients |
A matrix containing the mean, variance, and empirical alpha of each estimator. |
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Treatment Assignment for Regression Discontinuity
Description
treat_assign
computes the treatment variable, t
, based on the cutoff of
assignment variable, x
.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::treat_assign().
Usage
treat_assign(x, cut = 0, t.design = "l")
Arguments
x |
A numeric vector containing the assignment variable, |
cut |
A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0. |
t.design |
A string specifying the treatment option according to design.
Options are |
Value
treat_assign
returns the treatment variable as a vector according to the design,
where 1 means the treated group and 0 means the control group.
Assignment Centering for Multivariate Frontier Regression Discontinuity
Description
var_center
computes the univariate assignment variable, x
based on the cutoffs of
two assignment variables: x1
and x2
.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::var_center().
Usage
var_center(x, cut = c(0, 0), t.design = NULL, t.plot = FALSE)
Arguments
x |
Data frame or matrix of two assignment variables,
where the first column is |
cut |
A numeric vector of length 2 containing the cutpoints at which assignment to the treatment is determined.
The default is |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
t.plot |
A logical value indicating whether to calculate the univariate treatment variable, |
Value
var_center
returns the univariate assignment variable as a vector
according to the design.
Kernel Weight Calculation
Description
wt_kern
calculates the appropriate kernel weights for a vector.
This is useful when, for instance, one wishes to perform local regression.
It is based on the kernelwts
function in the "rdd" package.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::wt_kern().
Usage
wt_kern(X, center, bw, kernel = "triangular")
Arguments
X |
A numeric vector containing the the input |
center |
A numeric value specifying the point from which distances should be calculated. |
bw |
A numeric value specifying the bandwidth. |
kernel |
A string indicating which kernel to use. Options are |
Value
wt_kern
returns a vector of weights with length equal to that of the X
input
(one weight per element of X
).
References
Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd
Bivariate Kernel Weight Calculation
Description
wt_kern_bivariate
calculates the appropriate weights for two variables for
Multivariate Frontier Regression Discontinuity Estimation with nonparametric implementation.
Kernel weights are calculated based on the L1 distance of the two variables from the frontiers.
This is an internal function and is typically not directly invoked by the user.
It can be accessed using the triple colon, as in rddapp:::wt_kern_bivariate().
Usage
wt_kern_bivariate(
X1,
X2,
center1,
center2,
bw,
kernel = "triangular",
t.design = NULL
)
Arguments
X1 |
The input x1 values for the first vector. This variable represents the axis along which kernel weighting should be performed; the first assignment variable in an MRDD. |
X2 |
The input x2 values for the second vector. |
center1 |
A numeric value specifying the point from which distances should be calculated for the first vector, |
center2 |
A numeric value specifying the point from which distances should be calculated for the second vector, |
bw |
A numeric vector specifying the bandwidths for each of three effects models (complete model, heterogeneous treatment model, and treatment only model) detailed in Wong, Steiner, and Cook (2013). |
kernel |
A string indicating which kernel to use. Options are |
t.design |
A character vector of length 2 specifying the treatment option according to design.
The first entry is for |
Value
wt_bivariate_kern
returns a matrix of weights and distances with length equal to that of the X1
and X2
input.
The first and second weights and distances are calculated with respect to all frontiers of different treatments.
The third weight and distance are calculated with respect to the overall frontier of treatment versus
non-treatment.