Title: | Covariate Balance Tables and Plots |
Version: | 4.6.0 |
Description: | Generate balance tables and plots for covariates of groups preprocessed through matching, weighting or subclassification, for example, using propensity scores. Includes integration with 'MatchIt', 'WeightIt', 'MatchThem', 'twang', 'Matching', 'optmatch', 'CBPS', 'ebal', 'cem', 'sbw', and 'designmatch' for assessing balance on the output of their preprocessing functions. Users can also specify data for balance assessment not generated through the above packages. Also included are methods for assessing balance in clustered or multiply imputed data sets or data sets with multi-category, continuous, or longitudinal treatments. |
Depends: | R (≥ 4.1.0) |
Imports: | stats, utils, grid, grDevices, ggplot2 (≥ 3.5.0), gtable (≥ 0.3.6), gridExtra (≥ 2.3), chk (≥ 0.10.0), rlang (≥ 1.1.5), crayon (≥ 1.5.3) |
Suggests: | MatchIt (≥ 4.0.0), WeightIt (≥ 1.0.0), twang (≥ 1.6), twangContinuous, Matching, optmatch, ebal, CBPS (≥ 0.17), designmatch, optweight, mice (≥ 3.8.0), MatchThem (≥ 0.9.3), cem (≥ 1.1.30), sbw (≥ 1.1.5), gbm (≥ 2.1.7), brglm2 (≥ 0.9), caret, knitr, rmarkdown, testthat (≥ 3.0.0) |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
LazyData: | true |
VignetteBuilder: | knitr |
URL: | https://ngreifer.github.io/cobalt/, https://github.com/ngreifer/cobalt |
BugReports: | https://github.com/ngreifer/cobalt/issues |
Config/testthat/edition: | 3 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-04-15 19:55:37 UTC; NoahGreifer |
Author: | Noah Greifer |
Maintainer: | Noah Greifer <noah.greifer@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-15 20:40:02 UTC |
cobalt: Covariate Balance Tables and Plots
Description
A set of tools for assessing covariate balance in observational studies numerically and graphically. The functions provide integration with the major R packages used for balancing covariates, including MatchIt, WeightIt, twang, CBPS, and many others, and support objects not made using these packages. They support binary, multi-category and continuous treatments, point and longitudinal treatments, and clustered and multiply imputed data.
The main functions of cobalt are the following:
-
bal.tab()
- generate tables of balance statistics before and after matching, weighting, or subclassification -
bal.plot()
- generate plots to assess balance visually on one covariate at a time -
love.plot()
- generate plots to summarize and report balance statistics
Other functions include get.w()
for extracting weights from objects produced by other packages, col_w_smd()
(and friends documented on the same page) for computing (weighted) balance statistics outside of bal.tab()
, bal.compute()
for computing scalar balance statistics efficiently, and splitfactor()
for splitting factor variables in a dataset into dummy variables.
cobalt has several vignettes, which can be accessed using vignette(package = "cobalt")
or visiting the website at https://ngreifer.github.io/cobalt/.
Author(s)
Maintainer: Noah Greifer noah.greifer@gmail.com (ORCID)
See Also
Useful links:
Report bugs at https://github.com/ngreifer/cobalt/issues
Efficiently compute scalar balance statistics
Description
These are functions primarily designed for programmers who want to be able to quickly compute one of several scalar (single number) sample balance statistics, e.g., for use in selecting a tuning parameter when estimating balancing weights. bal.compute()
computes a scalar balance statistics from the supplied inputs. bal.init()
initializes the input so that when bal.compute()
is used on the output along with a set of weights, the computation of the balance statistic is fast. vignette("optimizing-balance")
provides an overview and more examples of how to use these functions. available.stats()
returns the balance statistics available for the given treatment type.
Usage
bal.compute(x, ...)
## S3 method for class 'bal.init'
bal.compute(x, weights = NULL, ...)
## Default S3 method:
bal.compute(x, treat = NULL, stat, s.weights = NULL, weights = NULL, ...)
bal.init(x, treat = NULL, stat, s.weights = NULL, ...)
available.stats(treat.type = "binary")
Arguments
x |
for |
... |
other arguments used to specify options for the balance statistic. See Details for which arguments are allowed with each balance statistic. Ignored for the |
weights |
a vector of balancing weights to compute the weighted statistics. |
treat |
a vector containing the treatment variable. Can be |
stat |
string; the name of the statistic to compute. See Details. |
s.weights |
optional; a vector of sampling weights. |
treat.type |
string; the treatment type, either |
Details
The following list contains the allowable balance statistics that can be supplied to bal.init()
or the default method of bal.compute()
, the additional arguments that can be used with each one, and the treatment types allowed with each one. For all balance statistics, lower values indicate better balance. Target balance refers to the similarity between a sample and the same sample but weighted.
smd.mean
,smd.max
,smd.rms
-
The mean, maximum, or root-mean-squared absolute standardized mean difference, computed using
col_w_smd()
. The other allowable arguments includeestimand
("ATE"
,"ATT"
, or"ATC"
) to select the estimand (default is"ATE"
),focal
to identify the focal treatment group when the ATT is the estimand and the treatment has more than two categories, andpairwise
to select whether mean differences should be computed between each pair of treatment groups or between each treatment group and the target group identified byestimand
(defaultTRUE
). Can be used with binary and multi-category treatments and for target balance. ks.mean
,ks.max
,ks.rms
-
The mean, maximum, or root-mean-squared Kolmogorov-Smirnov statistic, computed using
col_w_ks()
. The other allowable arguments includeestimand
("ATE"
,"ATT"
, or"ATC"
) to select the estimand (default is"ATE"
),focal
to identify the focal treatment group when the ATT is the estimand and the treatment has more than two categories, andpairwise
to select whether statistics should be computed between each pair of treatment groups or between each treatment group and the target group identified byestimand
(defaultTRUE
). Can be used with binary and multi-category treatments and for target balance. ovl.mean
,ovl.max
,ovl.rms
-
The mean, maximum, or root-mean-squared overlapping coefficient complement, computed using
col_w_ovl()
. The other allowable arguments includeestimand
("ATE"
,"ATT"
, or"ATC"
) to select the estimand (default is"ATE"
),integrate
to select whether integration is done using usingintegrate()
(TRUE
) or a Riemann sum (FALSE
, the default),focal
to identify the focal treatment group when the ATT is the estimand and the treatment has more than two categories,pairwise
to select whether statistics should be computed between each pair of treatment groups or between each treatment group and the target group identified byestimand
(defaultTRUE
). Can be used with binary and multi-category treatments and for target balance. mahalanobis
-
The Mahalanobis distance between the treatment group means. This is similar to
smd.rms
but the covariates are standardized to remove correlations between them and de-emphasize redundant covariates. The other allowable arguments includeestimand
("ATE"
,"ATT"
, or"ATC"
) to select the estimand (default is"ATE"
) andfocal
to identify the focal treatment group when the ATT is the estimand. Can only be used with binary treatments and for target balance. energy.dist
-
The total energy distance between each treatment group and the target sample, which is a scalar measure of the similarity between two multivariate distributions. The other allowable arguments include
estimand
("ATE"
,"ATT"
,"ATC"
, orNULL
) to select the estimand (default isNULL
),focal
to identify the focal treatment group when the ATT is the estimand and the treatment has more than two categories, andimproved
to select whether the "improved" energy distance should be used whenestimand = "ATE"
, which emphasizes difference between treatment groups in addition to difference between each treatment group and the target sample (defaultTRUE
). Whenestimand = NULL
, only the energy distance between the treatment groups will be computed (i.e., as opposed to the energy distance between each treatment groups and the target sample). Can be used with binary and multi-category treatments and for target balance. kernel.dist
-
The kernel distance between the treatment groups, which is a scalar measure of the similarity between two multivariate distributions. Can only be used with binary treatments.
l1.med
-
The median L1 statistic computed across a random selection of possible coarsening of the data. The other allowable arguments include
estimand
("ATE"
,"ATT"
, or"ATC"
) to select the estimand (default is"ATE"
),focal
to identify the focal treatment group when the ATT is the estimand and the treatment has more than two categories,l1.min.bin
(default 2) andl1.max.bin
default (12) to select the minimum and maximum number of bins with which to bin continuous variables andl1.n
(default 101) to select the number of binnings used to select the binning at the median.covs
should be supplied without splitting factors into dummies to ensure the binning works correctly; for simplicity, the.covs
argument can be supplied, which will overridecovs
but isn't used by other statistics. Can be used with binary and multi-category treatments. r2
,r2.2
,r2.3
-
The post-weighting
R^2
of a model for the treatment. The other allowable arguments includepoly
to add polynomial terms of the supplied order to the model andint
(defaultFALSE
) to add two-way interaction between covariates into the model. Usingr2.2
is a shortcut to requesting squares, and usingr2.3
is a shortcut to requesting cubes. Can be used with binary and continuous treatments. For binary treatments, the McKelvey and ZavoinaR^2
from a logistic regression is used; for continuous treatments, theR^2
from a linear regression is used. p.mean
,p.max
,p.rms
-
The mean, maximum, or root-mean-squared absolute Pearson correlation between the treatment and covariates, computed using
col_w_corr()
. Can only be used with continuous treatments. s.mean
,s.max
,s.rms
-
The mean, maximum, or root-mean-squared absolute Spearman correlation between the treatment and covariates, computed using
col_w_corr()
. Can only be used with continuous treatments. distance.cov
,distance.cor
-
The distance covariance or distance correlation, respectively, between the scaled covariates and treatment, which is a scalar measure of the independence of two possibly multivariate distributions. The distance correlation is scale-free and ranges from 0 (completely independent) to 1 (perfectly associated). Can only be used with continuous treatments.
Although statistics can be computed directly using bal.compute()
alone, the intended workflow is to use bal.init()
to initialize a bal.init
object, which can then be passed to bal.compute()
many times with different sets of weights, thereby minimizing the processing that bal.init()
does because it is only done once. In contrast, using bal.compute()
on covariates directly (i.e., using the default method) calls bal.init()
internally each time, which can slow down evaluation. When speed isn't of interest or to calculate a balance statistic outside the context of balance optimization, the default method of bal.compute()
can be a quick shortcut to avoid having to create a bal.init
object first.
Value
For bal.compute()
, a single numeric value. For bal.init()
, a bal.init
object containing the components created in the initialization and the function used to compute the balance statistic. For available.stats()
, a character vector of available statistics.
See Also
See vignette("optimizing-balance")
for references and definitions of some of the above quantities.
Examples
# Select the optimal number of subclasses for
# subclassification:
data("lalonde")
covs <- c("age", "educ", "race", "married",
"nodegree", "re74", "re75")
# Estimate propensity score
p <- glm(reformulate(covs, "treat"),
data = lalonde,
family = "binomial")$fitted.values
# Function to compute subclassification weights
subclass_ATE <- function(treat, p, nsub) {
m <- MatchIt::matchit(treat ~ 1,
data = lalonde,
distance = p,
method = "subclass",
estimand = "ATE",
subclass = nsub)
m$weights
}
# Initialize balance statistic; largest KS statistic
init <- bal.init(lalonde[covs],
treat = lalonde$treat,
stat = "ks.max",
estimand = "ATE")
# Statistic prior to subclassification:
bal.compute(init)
# Testing 4 to 50 subclasses
nsubs <- 4:50
stats <- vapply(nsubs, function(n) {
w <- subclass_ATE(lalonde$treat, p, n)
bal.compute(init, w)
}, numeric(1L))
plot(stats ~ nsubs)
# 6 subclass gives lowest ks.max value (.238)
nsubs[which.min(stats)]
stats[which.min(stats)]
# See which statistics are available
available.stats("binary")
available.stats("multinomial")
available.stats("continuous")
available.stats("target")
Visualize Distributional Balance
Description
Generates density plots, bar graphs, or scatterplots displaying distributional balance between treatment and covariates using ggplot2.
Usage
bal.plot(
x,
var.name,
...,
which,
which.sub = NULL,
cluster = NULL,
which.cluster = NULL,
imp = NULL,
which.imp = NULL,
which.treat = NULL,
which.time = NULL,
mirror = FALSE,
type = "density",
colors = NULL,
grid = FALSE,
sample.names,
position = "right",
facet.formula = NULL,
disp.means = getOption("cobalt_disp.means", FALSE),
alpha.weight = TRUE
)
Arguments
x |
the object for which balance is to be assessed; can be any object for which there is support in |
var.name |
|
... |
other arguments to define the variable, treatment, and weights. Some inputs are required depending on the method. See Additional Arguments. Can also be used to supply the |
which |
whether to display distributional balance for the adjusted ( |
which.sub |
|
cluster |
optional; a vector of cluster membership, or the name of a variable in an available data set passed to |
which.cluster |
if clusters are used, which cluster(s) to display. Can be cluster names or numerical indices for which to display balance. Indices correspond to the alphabetical order of cluster names. If |
imp |
optional; a vector of imputation indices, or the name of a variable in an available data set passed to |
which.imp |
if imputations are used, which imputations(s) to display. Must be numerical indices for which to display balance. If |
which.treat |
which treatment groups to display. If |
which.time |
for longitudinal treatments, which time points to display. Can be treatment names or time period indices. If |
mirror |
|
type |
|
colors |
a vector of colors for the plotted densities/histograms. See 'Color Specification' at |
grid |
|
sample.names |
|
position |
the position of the legend. This can be any value that would be appropriate as an argument to |
facet.formula |
a |
disp.means |
|
alpha.weight |
|
Details
bal.plot()
uses ggplot2::ggplot()
from the ggplot2 package, and (invisibly) returns a "ggplot"
object. For categorical treatments with continuous covariates or continuous treatments with categorical covariates, density plots are created using ggplot2::geom_density()
, histograms are created using ggplot2::geom_histogram()
, and empirical CDF plots are created using ggplot2::geom_step()
; for categorical treatments with categorical covariates, bar graphs are created using ggplot2::geom_bar()
; for continuous treatments with continuous covariates, scatterplots are created using ggplot2::geom_point()
.
For continuous treatments with continuous covariates, four additional lines are presented for aid in balance assessment. The red line is the linear fit line. The blue line is a smoothing curve generated with ggplot2's ggplot2::geom_smooth()
with method = "auto"
. The horizontal black line is a horizontal reference line intercepting the (unweighted) treatment mean. The vertical black line is a reference line intercepting the (unweighted) treatment mean. Balance is indicated by the flatness of both fit lines and whether they pass through the intersection of the two black reference lines.
When multiple plots are to be displayed (i.e., when requesting subclass balance, cluster balance, or imputation balance, or when multiple sets of weights are provided or which = "both"
, or when treatment is longitudinal), the plots will be displayed in a grid using ggplot2's ggplot2::facet_grid()
. Subclassification cannot be used with clusters or multiply imputed data.
To change the plot and axis titles, use ggplot2::labs()
. Because the output is a ggplot
object, other elements can be changed using ggplot2 functions; see here for an example.
Value
A "ggplot"
object, returned invisibly.
Additional Arguments
bal.plot()
works like bal.tab()
in that it can take a variety of types of inputs and yield the same output for each. Depending on what kind of input is given, different additional parameters are required in ...
. For details on what is required and allowed for each additional input and their defaults, see the help file for the bal.tab()
method associated with the input. The following are the required additional arguments based on each input type:
For
matchit
objects: NoneFor
weightit
objects: NoneFor
ps
,ps.cont
,mnps
, andiptw
objects: (stop.method
; see defaults).For
Match
objects:formula
anddata
orcovs
andtreat
.For
optmatch
objects:formula
anddata
orcovs
(treat
is not required).For
CBPS
objects: NoneFor
ebalance
objects:formula
anddata
orcovs
andtreat
.For
formula
s:data
For
data.frame
s:treat
For
designmatch
objects:formula
anddata
orcovs
andtreat
.For
sbw
objects: NoneFor
mimids
andwimids
objects: None, but an argument towhich.imp
should be specified.For other objects processed through
bal.tab()
's default method, whichever arguments are required to identify treatment, variables, and a conditioning method (if any).
See Also
Examples
data("lalonde", package = "cobalt")
#Nearest Neighbor Matching
m.out <- MatchIt::matchit(treat ~ age + educ + race + married +
nodegree + re74 + re75,
data = lalonde)
bal.plot(m.out, "age", which = "both")
bal.plot(m.out, "re74", which = "both", type = "ecdf")
bal.plot(m.out, "race", which = "both")
bal.plot(m.out, "distance", which = "both", mirror = TRUE,
type = "histogram", colors = c("white", "black"))
#Entropy balancing with a continuous treatment
w.out <- WeightIt::weightit(re75 ~ age + I(age^2) + educ +
race + married + nodegree,
data = lalonde, method = "ebal")
bal.plot(w.out, "age", which = "both")
bal.plot(w.out, "married", which = "both")
Display Balance Statistics in a Table
Description
Generates balance statistics on covariates in relation to an observed treatment variable. It is a generic function that dispatches to the method corresponding to the class of the first argument.
Usage
bal.tab(x, ...)
## # Arguments common across all input types:
## bal.tab(x,
## stats,
## int = FALSE,
## poly = 1,
## distance = NULL,
## addl = NULL,
## data = NULL,
## continuous,
## binary,
## s.d.denom,
## thresholds = NULL,
## weights = NULL,
## cluster = NULL,
## imp = NULL,
## pairwise = TRUE,
## s.weights = NULL,
## abs = FALSE,
## subset = NULL,
## quick = TRUE,
## ...)
Arguments
x |
an input object on which to assess balance. Can be the output of a call to a balancing function in another package or a formula or data frame. Input to this argument will determine which |
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
Details
bal.tab()
performs various calculations on the the data objects given. This page details the arguments and calculations that are used across bal.tab()
methods.
With Binary Point Treatments
Balance statistics can be requested with the stats
argument. The default balance statistic for mean differences for continuous variables is the standardized mean difference, which is the difference in the means divided by a measure of spread (i.e., a d-type effect size measure). This is the default because it puts the mean differences on the same scale for comparison with each other and with a given threshold. For binary variables, the default balance statistic is the raw difference in proportion. Although standardized differences in proportion can be computed, raw differences in proportion for binary variables are already on the same scale, and computing the standardized difference in proportion can obscure the true difference in proportion by dividing the difference in proportion by a number that is itself a function of the observed proportions.
Standardized mean differences are calculated using col_w_smd()
as follows: the numerator is the mean of the treated group minus the mean of the control group, and the denominator is a measure of spread calculated in accordance with the argument to s.d.denom
or the default of the specific method used. Common approaches in the literature include using the standard deviation of the treated group or using the "pooled" standard deviation (i.e., the square root of the mean of the group variances) in calculating standardized mean differences. The computed spread bal.tab()
uses is always that of the full, unadjusted sample (i.e., before matching, weighting, or subclassification), as recommended by Stuart (2010).
Prior to computation, all variables are checked for variable type, which allows users to differentiate balance statistic calculations based on type using the arguments to continuous
and binary
. First, if a given covariate is numeric and has only 2 levels, it is converted into a binary (0,1) variable. If 0 is a value in the original variable, it retains its value and the other value is converted to 1; otherwise, the lower value is converted to 0 and the other to 1. Next, if the covariate is not numeric or logical (i.e., is a character or factor variable), it will be split into new binary variables, named with the original variable and the value, separated by an underscore. Otherwise, the covariate will be used as is and treated as a continuous variable.
When weighting or matching are used, an "effective sample size" is calculated for each group using the following formula: (\sum w)^2 / \sum w^2
. The effective sample size is "approximately the number of observations from a simple random sample that yields an estimate with sampling variation equal to the sampling variation obtained with the weighted comparison observations" (Ridgeway et al., 2016). The calculated number tends to underestimate the true effective sample size of the weighted samples. The number depends on the variability of the weights, so sometimes trimming units with large weights can actually increase the effective sample size, even though units are being down-weighted. When matching is used, an additional "unweighted" sample size will be displayed indicating the total number of units contributing to the weighted sample.
When subclassification is used, the balance tables for each subclass stored in $Subclass.Balance
use values calculated as described above. For the aggregate balance table stored in $Balance.Across.Subclass
, the values of each statistic are computed as a weighted average of the statistic across subclasses, weighted by the proportion of units in each subclass. See class-bal.tab.subclass
for more details.
With Continuous Point Treatments
When continuous treatment variables are considered, the balance statistic calculated is the Pearson correlation between the covariate and treatment. The correlation after adjustment is computed using col_w_cov()
as the weighted covariance between the covariate and treatment divided by the product of the standard deviations of the unweighted covariate and treatment, in an analogous way to how how the weighted standardized mean difference uses an unweighted measure of spread in its denominator, with the purpose of avoiding the analogous paradox (i.e., where the covariance decreases but is accompanied by a change in the standard deviations, thereby distorting the actual resulting balance computed using the weighted standard deviations). This can sometimes yield correlations greater than 1 in absolute value; these usually indicate degenerate cases anyway.
With Multi-Category Point Treatments
For information on using bal.tab()
with multi-category treatments, see class-bal.tab.multi
. Essentially, bal.tab()
compares pairs of treatment groups in a standard way.
With Longitudinal Treatments
For information on using bal.tab()
with longitudinal treatments, see class-bal.tab.msm
and vignette("longitudinal-treat")
. Essentially, bal.tab()
summarizes balance at each time point and summarizes across time points.
With Clustered or Multiply Imputed Data
For information on using bal.tab()
with clustered data, see class-bal.tab.cluster
. For information on using bal.tab()
with multiply imputed data, see class-bal.tab.imp
.
quick
Calculations can take some time, especially when there are many variables, interactions, or clusters. When certain values are not printed, by default they are not computed. In particular, summary tables are not computed when their display has not been requested. This can speed up the overall production of the output when these values are not to be used later. However, when they are to be used later, such as when output is to be further examined with print()
or is to be used in some other way after the original call to bal.tab()
, it may be useful to compute them even if they are not to be printed initially. To do so, users can set quick = FALSE
, which will cause bal.tab()
to calculate all values and components it can. Note that love.plot()
is fully functional even when quick = TRUE
and values are requested that are otherwise not computed in bal.tab()
with quick = TRUE
.
Missing Data
If there is missing data in the covariates (i.e., NA
s in the covariates provided to bal.tab()
), a few additional things happen. A warning will appear mentioning that missing values were present in the data set. The computed balance summaries will be for the variables ignoring the missing values. New variables will be created representing missingness indicators for each variable, named var: <NA>
(with var
replaced by the actual name of the variable). If int = TRUE
, balance for the pairwise interactions between the missingness indicators will also be computed. These variables are treated like regular variables once created.
Value
An object of class "bal.tab"
. The use of continuous treatments, subclasses, clusters, and/or imputations will also cause the object to inherit other classes. The class "bal.tab"
has its own print()
method (print.bal.tab()
), which formats the output nicely and in accordance with print-related options given in the call to bal.tab()
, and which can be called with its own options.
For scenarios with binary point treatments and no subclasses, imputations, or clusters, the following are the elements of the bal.tab
object:
Balance |
A data frame containing balance information for each covariate. Balance contains the following columns, with additional columns present when other balance statistics are requested, and some columns omitted when not requested:
|
Balanced.Means |
If a threshold on mean differences is specified, a table tallying the number of variables that exceed or are within the threshold. |
Max.Imbalance.Means |
If a threshold on mean differences is specified, a table displaying the variable with the greatest absolute mean difference. |
Observations |
A table displaying the sample sizes before and after adjusting. Often the effective sample size (ESS) will be displayed. See Details. |
call |
The original function call, if adjustment was performed by a function in another package. |
If the treatment is continuous, instead of producing mean differences, bal.tab()
will produce correlations between the covariates and the treatment. The default corresponding entries in the output will be "Corr.Un"
, "Corr.Adj"
, and "R.Threshold"
(and accordingly for the balance tally and maximum imbalance tables).
If multiple weights are supplied, "Adj"
in Balance
will be replaced by the provided names of the sets of weights, and extra columns will be added for each set of weights. Additional columns and rows for other items in the output will be created as well.
For bal.tab
output with subclassification, see class-bal.tab.subclass
.
References
Ridgeway, G., McCaffrey, D., Morral, A., Burgette, L., & Griffin, B. A. (2016). Toolkit for Weighting and Analysis of Nonequivalent Groups: A tutorial for the twang package. R vignette. RAND.
Stuart, E. A. (2010). Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science, 25(1), 1-21. doi:10.1214/09-STS313
See Also
For information on the use of bal.tab()
with specific types of objects, use the following links:
-
bal.tab.matchit()
for the method for objects returned by MatchIt. -
bal.tab.weightit()
for the method forweightit
andweightitMSM
objects returned by WeightIt. -
bal.tab.ps()
for the method forps
,mnps
, andiptw
objects returned by twang and forps.cont
objects returned by twangContinuous. -
bal.tab.Match()
for the method for objects returned by Matching. -
bal.tab.optmatch()
for the method for objects returned by optmatch. -
bal.tab.cem.match()
for the method for objects returned by cem. -
bal.tab.CBPS()
for the method for objects returned by CBPS. -
bal.tab.ebalance()
for the method for objects returned by ebal. -
bal.tab.designmatch()
for the method for objects returned by designmatch. -
bal.tab.mimids()
for the method for objects returned by MatchThem. -
bal.tab.sbwcau()
for the method for objects returned by sbw. -
bal.tab.formula()
andbal.tab.data.frame()
for the methods forformula
and data frame interfaces when the user has covariate values and weights (including matching weights) or subclasses or wants to evaluate balance on an unconditioned data set. For data that corresponds to a longitudinal treatment (i.e., to be analyzed with a marginal structural model), seebal.tab.time.list()
.
See vignette("faq")
for answers to frequently asked questions about bal.tab()
.
Examples
## See individual pages above for examples with
## different inputs, or see `vignette("cobalt")`
Balance Statistics for CBPS
Objects
Description
Generates balance statistics for CBPS
and CBMSM
objects from the CBPS package.
Usage
## S3 method for class 'CBPS'
bal.tab(
x,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
a |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab.CBPS()
and bal.tab.CBMSM()
generate a list of balance summaries for the CBPS
or CBMSM
object given and functions similarly to CBPS::balance()
.
Value
For point treatments, if clusters are not specified, an object of class "bal.tab"
containing balance summaries for the CBPS
object. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
If CBPS()
is used with multi-category treatments, an object of class "bal.tab.multi"
containing balance summaries for each pairwise treatment comparison and a summary of balance across pairwise comparisons. See bal.tab.multi()
for details.
If CBMSM()
is used for longitudinal treatments, an object of class "bal.tab.msm"
containing balance summaries for each time period and a summary of balance across time periods. See class-bal.tab.msm
for details.
See Also
-
bal.tab()
for details of calculations. -
class-bal.tab.cluster
for more information on clustered data. -
bal.tab.multi()
for more information on multi-category treatments. -
class-bal.tab.msm
for more information on longitudinal treatments.
Examples
data("lalonde", package = "cobalt")
## Using CBPS() for generating covariate balancing
## propensity score weights
cbps.out <- CBPS::CBPS(treat ~ age + educ + married + race +
nodegree + re74 + re75,
data = lalonde)
bal.tab(cbps.out)
Balance Statistics for Matching
Objects
Description
Generates balance statistics for output objects from Matching.
Usage
## S3 method for class 'Match'
bal.tab(
x,
formula = NULL,
data = NULL,
treat = NULL,
covs = NULL,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
a |
formula |
a |
data |
a data frame containing variables named in |
treat |
a vector of treatment statuses. See Details. |
covs |
a data frame of covariate values for which to check balance. See Details. |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab()
generates a list of balance summaries for the object given, and functions similarly to Matching::MatchBalance()
. The input to bal.tab.Match()
must include either both formula
and data
or both covs
and treat
. Using the formula
+ data
inputs mirrors how Matching::MatchBalance()
is used.
cobalt
functions do not support Match
object with sampling weights, i.e., with an argument passed to the weights
argument of Matching::Match()
.
Value
If clusters and imputations are not specified, an object of class "bal.tab"
containing balance summaries for the given object. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
bal.tab()
for details of calculations.
Examples
library(Matching); data("lalonde", package = "cobalt")
p.score <- glm(treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = lalonde, family = "binomial")$fitted.values
Match.out <- Match(Tr = lalonde$treat, X = p.score)
## Using formula and data
bal.tab(Match.out, formula = treat ~ age + educ + race +
married + nodegree + re74 + re75, data = lalonde)
Balance Statistics for cem
Objects
Description
Generates balance statistics for cem.match
objects from cem.
Usage
## S3 method for class 'cem.match'
bal.tab(
x,
data,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
a |
data |
a data frame containing variables named in other arguments. An argument to |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab.cem.match()
generates a list of balance summaries for the cem.match
object given, and functions similarly to cem::imbalance()
.
Value
If clusters and imputations are not specified, an object of class "bal.tab"
containing balance summaries for the cem.match
object. See bal.tab()
for details.
If imputations are specified, an object of class "bal.tab.imp"
containing balance summaries for each imputation and a summary of balance across imputations. See class-bal.tab.imp
for details.
If cem()
is used with multi-category treatments, an object of class "bal.tab.multi"
containing balance summaries for each pairwise treatment comparison. See bal.tab.multi()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
bal.tab()
for details of calculations.
Examples
data("lalonde", package = "cobalt")
## Coarsened exact matching
cem.out <- cem::cem("treat", data = lalonde, drop = "re78")
bal.tab(cem.out, data = lalonde, un = TRUE,
stats = c("m", "k"))
Balance Statistics for Other Objects
Description
Generates balance statistics using an object for which there is not a defined method.
Usage
## Default S3 method:
bal.tab(
x,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
An object containing information about conditioning. See Details. |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
other arguments that would be passed to |
Details
bal.tab.default()
processes its input and attempt to extract enough information from it to display covariate balance for x
. The purpose of this method is to allow users who have created their own objects containing conditioning information (i.e., weights, subclasses, treatments, covariates, etc.) to access the capabilities of bal.tab()
without having a special method written for them. By including the correct items in x
, bal.tab.default()
can present balance tables as if the input was the output of one of the specifically supported packages (e.g., MatchIt, twang, etc.).
The function will search x
for the following named items and attempt to process them:
treat
A vector (
numeric
,character
,factor
) containing the values of the treatment for each unit or the name of the column indata
containing them. Essentially the same input totreat
inbal.tab.data.frame()
.treat.list
A list of vectors (
numeric
,character
,factor
) containing, for each time point, the values of the treatment for each unit or the name of the column indata
containing them. Essentially the same input totreat.list
inbal.tab.time.list()
.covs
A
data.frame
containing the values of the covariates for each unit. Essentially the same input tocovs
inbal.tab.data.frame()
.covs.list
A list of
data.frame
s containing, for each time point, the values of the covariates for each unit. Essentially the same input tocovs.list
inbal.tab.time.list()
.formula
A
formula
with the treatment variable as the response and the covariates for which balance is to be assessed as the terms. Essentially the same input toformula
inbal.tab.formula()
.formula.list
A list of
formula
s with, for each time point, the treatment variable as the response and the covariates for which balance is to be assessed as the terms. Essentially the same input toformula.list
inbal.tab.time.list()
.data
A
data.frame
containing variables with the names used in other arguments and components (e.g.,formula
,weights
, etc.). Essentially the same input todata
inbal.tab.formula()
,bal.tab.data.frame()
, orbal.tab.time.list()
.weights
A vector, list, or
data.frame
containing weights for each unit or a string containing the names of the weights variables indata
. Essentially the same input toweights
inbal.tab.data.frame()
orbal.tab.time.list()
.distance
-
A vector, formula, or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified,
bal.tab()
will look in the argument todata
, if specified. Essentially the same input todistance
inbal.tab.data.frame()
. formula.list
A list of vectors or
data.frame
s containing, for each time point, distance values (e.g., propensity scores) for each unit or a string containing the name of the distance variable indata
. Essentially the same input todistance.list
inbal.tab.time.list()
.subclass
A vector containing subclass membership for each unit or a string containing the name of the subclass variable in
data
. Essentially the same input tosubclass
inbal.tab.data.frame()
.match.strata
A vector containing matching stratum membership for each unit or a string containing the name of the matching stratum variable in
data
. Essentially the same input tomatch.strata
inbal.tab.data.frame()
.estimand
A
character
vector; whether the desired estimand is the "ATT", "ATC", or "ATE" for each set of weights. Essentially the same input toestimand
inbal.tab.data.frame()
.s.weights
A vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in
data
. Essentially the same input tos.weights
inbal.tab.data.frame()
orbal.tab.time.list()
.focal
The name of the focal treatment when multi-category treatments are used. Essentially the same input to
focal
inbal.tab.data.frame()
.call
A
call
object containing the function call, usually generated by usingmatch.call()
inside the function that createdx
.
Any of these items can also be supplied directly to bal.tab.default
, e.g., bal.tab.default(x, formula = treat ~ x1 + x2)
. If supplied, it will override the object with the same role in x
. In addition, any arguments to bal.tab.formula()
, bal.tab.data.frame()
, and bal.tab.time.list()
are allowed and perform the same function.
At least some inputs containing information to create the treatment and covariates are required (e.g., formula
and data
or covs
and treat
). All other arguments are optional and have the same defaults as those in bal.tab.data.frame()
or bal.tab.time.list()
. If treat.list
, covs.list
, or formula.list
are supplied in x
or as an argument to bal.tab.default()
, the function will proceed considering a longitudinal treatment. Otherwise, it will proceed considering a point treatment.
bal.tab.default()
, like other bal.tab()
methods, is just a shortcut to supply arguments to bal.tab.data.frame()
or bal.tab.time.list()
. Therefore, any matters regarding argument priority or function are described in the documentation for these methods.
Value
For point treatments, if clusters and imputations are not specified, an object of class "bal.tab"
containing balance summaries for the specified treatment and covariates. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
If imputations are specified, an object of class "bal.tab.imp"
containing balance summaries for each imputation and a summary of balance across imputations, just as with clusters. See class-bal.tab.imp
for details.
If multi-category treatments are used, an object of class "bal.tab.multi"
containing balance summaries for each pairwise treatment comparison and a summary of balance across pairwise comparisons. See bal.tab.multi()
for details.
If longitudinal treatments are used, an object of class "bal.tab.msm"
containing balance summaries at each time point. Each balance summary is its own bal.tab
object. See class-bal.tab.msm
for more details.
See Also
-
bal.tab.formula()
andbal.tab.time.list()
for additional arguments to be supplied. -
bal.tab()
for output and details of calculations. -
class-bal.tab.cluster
for more information on clustered data. -
class-bal.tab.imp
for more information on multiply imputed data. -
bal.tab.multi()
for more information on multi-category treatments.
Examples
data("lalonde", package = "cobalt")
covs <- subset(lalonde, select = -c(treat, re78))
##Writing a function the produces output for direct
##use in bal.tab.default
ate.weights <- function(treat, covs) {
data <- data.frame(treat, covs)
formula <- formula(data)
ps <- glm(formula, data = data,
family = "binomial")$fitted.values
weights <- treat/ps + (1-treat)/(1-ps)
call <- match.call()
out <- list(treat = treat,
covs = covs,
distance = ps,
weights = weights,
estimand = "ATE",
call = call)
return(out)
}
out <- ate.weights(lalonde$treat, covs)
bal.tab(out, un = TRUE)
Balance Statistics for designmatch
Objects
Description
Generates balance statistics for output objects from designmatch.
Usage
## S3 method for class 'designmatch'
bal.tab(
x,
formula = NULL,
data = NULL,
treat = NULL,
covs = NULL,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
the output of a call to |
formula |
a |
data |
a data frame containing variables named in |
treat |
a vector of treatment statuses. See Details. |
covs |
a data frame of covariate values for which to check balance. See Details. |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab()
generates a list of balance summaries for the object given, and functions similarly to designmatch::meantab()
. Note that output objects from designmatch do not have their own class; bal.tab()
first checks whether the object meets the criteria to be treated as a designmatch
object before dispatching the correct method. Renaming or removing items from the output object can create unintended consequences.
The input to bal.tab.designmatch()
must include either both formula
and data
or both covs
and treat
. Using the covs
+ treat
input mirrors how designmatch::meantab()
is used (note that to see identical results to meantab()
, s.d.denom
must be set to "pooled"
).
Value
If clusters and imputations are not specified, an object of class "bal.tab"
containing balance summaries for the given object. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
bal.tab()
for details of calculations.
Examples
data("lalonde", package = "cobalt")
library(designmatch)
covariates <- as.matrix(lalonde[c("age", "educ", "re74", "re75")])
treat <- lalonde$treat
dmout <- bmatch(treat,
total_groups = sum(treat == 1),
mom = list(covs = covariates,
tols = absstddif(covariates,
treat, .05))
)
## Using treat and covs
bal.tab(dmout, treat = treat, covs = covariates)
Balance Statistics for ebalance
Objects
Description
Generates balance statistics for output objects from ebal.
Usage
## S3 method for class 'ebalance'
bal.tab(
x,
formula = NULL,
data = NULL,
treat = NULL,
covs = NULL,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
an |
formula |
a |
data |
a data frame containing variables named in |
treat |
a vector of treatment statuses. See Details. |
covs |
a data frame of covariate values for which to check balance. See Details. |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab()
generates a list of balance summaries for the object given. The input to bal.tab.ebalance()
must include either both formula
and data
or both covs
and treat
.
Value
If clusters and imputations are not specified, an object of class "bal.tab"
containing balance summaries for the given object. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
bal.tab()
for details of calculations.
Examples
data("lalonde", package = "cobalt")
covs <- subset(lalonde, select = -c(re78, treat))
covs0 <- splitfactor(covs)
e.out <- ebal::ebalance(lalonde$treat, covs0)
## Using formula and data
bal.tab(e.out, formula = treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = lalonde)
## Using treat and covs
bal.tab(e.out, treat = lalonde$treat, covs = covs)
Balance Statistics for Data Sets
Description
Generates balance statistics for unadjusted, matched, weighted, or stratified data using either a data.frame
or formula interface.
Usage
## S3 method for class 'formula'
bal.tab(
x,
data = NULL,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
subclass = NULL,
match.strata = NULL,
method,
estimand = NULL,
focal = NULL,
...
)
## S3 method for class 'data.frame'
bal.tab(
x,
treat,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
subclass = NULL,
match.strata = NULL,
method,
estimand = NULL,
focal = NULL,
...
)
## S3 method for class 'matrix'
bal.tab(
x,
treat,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
subclass = NULL,
match.strata = NULL,
method,
estimand = NULL,
focal = NULL,
...
)
Arguments
x |
either a |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
subclass |
optional; either a vector containing subclass membership for each unit or a string containing the name of the subclass variable in |
match.strata |
optional; either a vector containing matching stratum membership for each unit or a string containing the name of the matching stratum variable in |
method |
|
estimand |
|
focal |
the name of the focal treatment when multi-category treatments are used. See |
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
treat |
either a vector containing treatment status values for each unit or a string containing the name of the treatment variable in |
Details
bal.tab.data.frame()
generates a list of balance summaries for the covariates and treatment status values given. bal.tab.formula()
does the same but uses a formula interface instead. When the formula interface is used, the formula and data are reshaped into a treatment vector and data.frame
of covariates and then simply passed through the data.frame
method.
If weights
, subclass
and match.strata
are all NULL
, balance information will be presented only for the unadjusted sample.
The argument to match.strata
corresponds to a factor vector containing the name or index of each pair/stratum for units conditioned through matching, for example, using the optmatch package. If more than one of weights
, subclass
, or match.strata
are specified, bal.tab()
will attempt to figure out which one to apply. Currently only one of these can be applied ta a time. bal.tab()
behaves differently depending on whether subclasses are used in conditioning or not. If they are used, bal.tab()
creates balance statistics for each subclass and for the sample in aggregate. See class-bal.tab.subclass
for more information.
Multiple sets of weights can be supplied simultaneously by entering a data.frame
or a character vector containing the names of weight variables found in data
or a list of weights vectors or names. The arguments to method
, s.d.denom
, and estimand
, if any, must be either the same length as the number of sets of weights or of length one, where the sole entry is applied to all sets. When standardized differences are computed for the unadjusted group, they are done using the first entry to s.d.denom
or estimand
. When only one set of weights is supplied, the output for the adjusted group will simply be called "Adj"
, but otherwise will be named after each corresponding set of weights. Specifying multiple sets of weights will also add components to other outputs of bal.tab()
.
Value
For point treatments, if clusters and imputations are not specified, an object of class "bal.tab"
containing balance summaries for the specified treatment and covariates. See bal.tab()
for details.
If imputations are specified, an object of class "bal.tab.imp"
containing balance summaries for each imputation and a summary of balance across imputations. See class-bal.tab.imp
for details.
If multi-category treatments are used, an object of class "bal.tab.multi"
containing balance summaries for each pairwise treatment comparison. See bal.tab.multi()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
-
bal.tab()
for details of calculations. -
class-bal.tab.cluster
for more information on clustered data. -
class-bal.tab.imp
for more information on multiply imputed data. -
bal.tab.multi()
for more information on multi-category treatments.
Examples
data("lalonde", package = "cobalt")
lalonde$p.score <- glm(treat ~ age + educ + race, data = lalonde,
family = "binomial")$fitted.values
covariates <- subset(lalonde, select = c(age, educ, race))
## Propensity score weighting using IPTW
lalonde$iptw.weights <- ifelse(lalonde$treat==1,
1/lalonde$p.score,
1/(1-lalonde$p.score))
# data frame interface:
bal.tab(covariates, treat = "treat", data = lalonde,
weights = "iptw.weights", s.d.denom = "pooled")
# Formula interface:
bal.tab(treat ~ age + educ + race, data = lalonde,
weights = "iptw.weights", s.d.denom = "pooled")
## Propensity score subclassification
lalonde$subclass <- findInterval(lalonde$p.score,
quantile(lalonde$p.score,
(0:6)/6), all.inside = TRUE)
# data frame interface:
bal.tab(covariates, treat = "treat", data = lalonde,
subclass = "subclass", disp.subclass = TRUE,
s.d.denom = "pooled")
# Formula interface:
bal.tab(treat ~ age + educ + race, data = lalonde,
subclass = "subclass", disp.subclass = TRUE,
s.d.denom = "pooled")
Balance Statistics for MatchIt
Objects
Description
Generates balance statistics for matchit
objects from MatchIt.
Usage
## S3 method for class 'matchit'
bal.tab(
x,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
method,
...
)
Arguments
x |
a |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
method |
a character vector containing the method of adjustment. Ignored unless subclassification was used in the original call to |
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab.matchit()
generates a list of balance summaries for the matchit
object given, and functions similarly to MatchIt::summary.matchit()
. bal.tab()
behaves differently depending on whether subclasses are used in conditioning or not. If they are used, bal.tab()
creates balance statistics for each subclass and for the sample in aggregate; see class-bal.tab.subclass
for more information.
Value
If subclassification is used and method
is set to "subclassification"
, an object of class "bal.tab.subclass"
containing balance summaries within and across subclasses. See class-bal.tab.subclass
for details.
If matching is used and clusters are not specified, an object of class "bal.tab"
containing balance summaries for the matchit
object. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
-
bal.tab()
for details of calculations.
Examples
library(MatchIt)
data("lalonde", package = "cobalt")
## Nearest Neighbor matching
m.out1 <- matchit(treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = lalonde, method = "nearest")
bal.tab(m.out1, un = TRUE,
thresholds = c(m = .1, v = 2))
## Subclassification
m.out2 <- matchit(treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = lalonde, method = "subclass")
bal.tab(m.out2, disp.subclass = TRUE)
Balance Statistics for MatchThem
Objects
Description
Generates balance statistics for mimids
and wimids
objects from MatchThem.
Usage
## S3 method for class 'mimids'
bal.tab(
x,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
a |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab.mimids()
and bal.tab.wimids()
generate a list of balance summaries for the mimids
or wimids
object given.
Value
If clusters are not specified, an object of class "bal.tab.imp"
containing balance summaries for each imputation and a summary of balance across imputations. See class-bal.tab.imp
for details.
If clusters are specified, an object of class "bal.tab.imp.cluster"
containing summaries between and across all clusters and imputations.
See Also
-
bal.tab()
for details of calculations
Examples
library(MatchThem)
data("lalonde_mis", package = "cobalt")
#Imputing the missing data
imp <- mice::mice(lalonde_mis, m = 5,
print = FALSE)
#Matching using within-imputation propensity scores
mt.out1 <- matchthem(treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = imp, approach = "within")
bal.tab(mt.out1)
#Matching using across-imputation average propensity scores
mt.out2 <- matchthem(treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = imp, approach = "across")
bal.tab(mt.out2)
#Weighting using within-imputation propensity scores
wt.out <- weightthem(treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = imp, approach = "within",
estimand = "ATT")
bal.tab(wt.out)
Balance Statistics for optmatch
Objects
Description
Generates balance statistics for output objects from optmatch.
Usage
## S3 method for class 'optmatch'
bal.tab(
x,
formula = NULL,
data = NULL,
treat = NULL,
covs = NULL,
estimand = NULL,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
an |
formula |
a |
data |
a data frame containing variables named in |
treat |
a vector of treatment statuses. See Details. |
covs |
a data frame of covariate values for which to check balance. See Details. |
estimand |
|
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab()
generates a list of balance summaries for the object given. The input to bal.tab.optmatch()
must include either both formula
and data
or just covs
(treat
is not necessary).
Value
If clusters and imputations are not specified, an object of class "bal.tab"
containing balance summaries for the given object. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
bal.tab()
for details of calculations.
Examples
data("lalonde", package = "cobalt")
lalonde$prop.score <- glm(treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = lalonde, family = binomial)$fitted.values
pm <- optmatch::pairmatch(treat ~ prop.score, data = lalonde)
## Using formula and data; LHS of formula not required
bal.tab(pm, formula = ~ age + educ + race +
married + nodegree + re74 + re75,
data = lalonde)
## Using covs
covs <- subset(lalonde, select = -c(re78, treat))
bal.tab(pm, covs = covs)
Balance Statistics for twang
Objects
Description
Generates balance statistics for ps
, mnps
, and iptw
objects from twang and for ps.cont
objects from twangContinuous.
Usage
## S3 method for class 'ps'
bal.tab(
x,
stop.method,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
a |
stop.method |
a string containing the names of the stopping methods used in the original call to |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab.ps()
generates a list of balance summaries for the input object given, and functions similarly to twang::bal.table()
. The variances used in the denominator of the standardized mean differences computed in twang::bal.table()
are weighted and computed using survey::svyvar()
and are unweighted here (except when s.weights
are specified, in which case col_w_sd()
is used). twang also uses "all" as the default s.d.denom
when the estimand is the ATE; the default here is "pooled". For these reasons, results may differ slightly between the two packages.
Value
For binary or continuous point treatments, if clusters are not specified, an object of class "bal.tab"
containing balance summaries for the ps
object. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
If mnps()
is used with multi-category treatments, an object of class "bal.tab.multi"
containing balance summaries for each pairwise treatment comparison and a summary of balance across pairwise comparisons. See bal.tab.multi()
for details.
See Also
-
bal.tab()
for details of calculations. -
class-bal.tab.cluster
for more information on clustered data. -
bal.tab.multi()
for more information on multi-category treatments. -
class-bal.tab.msm
for more information on longitudinal treatments.
Examples
library(twang)
data("lalonde", package = "cobalt")
## Using ps() for generalized boosted modeling
ps.out <- ps(treat ~ age + educ + married + race +
nodegree + re74 + re75, data = lalonde,
stop.method = c("ks.mean", "es.mean"),
estimand = "ATT", verbose = FALSE)
bal.tab(ps.out, stop.method = "ks.mean", un = TRUE,
stats = c("m", "ks"),
thresholds = c(m = .1))
Balance Statistics for sbw
Objects
Description
Generates balance statistics for sbwcau
objects from sbw.
Usage
## S3 method for class 'sbwcau'
bal.tab(
x,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
an |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab.sbwcau()
generates a list of balance summaries for the sbwcau
object given, and functions similarly to sbw::summarize()
.
Value
If clusters are not specified, an object of class "bal.tab"
containing balance summaries for the sbwcau
object. See bal.tab()
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
-
bal.tab()
for details of calculations.
Examples
library(sbw)
data("lalonde", package = "cobalt")
## Stable balancing weights for the ATT
sbw.out <- sbw(splitfactor(lalonde, drop.first = "if2"),
ind = "treat",
bal = list(bal_cov = c("age", "educ", "race_black",
"race_hispan", "race_white",
"married", "nodegree",
"re74", "re75"),
bal_alg = FALSE,
bal_tol = .001),
par = list(par_est = "att"))
bal.tab(sbw.out, un = TRUE, poly = 2)
Balance Statistics for Longitudinal Datasets
Description
Generates balance statistics for data coming from a longitudinal treatment scenario. The primary input is in the form of a list of formulas or data.frame
s contain the covariates at each time point. bal.tab()
automatically classifies this list as either a data.frame.list
or formula.list
, respectively.
Usage
## S3 method for class 'formula.list'
bal.tab(
x,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
## S3 method for class 'data.frame.list'
bal.tab(
x,
treat.list,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
either a list of data frames containing all the covariates to be assessed at each time point or a list of formulas with the treatment for each time period on the left and the covariates for which balance is to be displayed on the right. Covariates to be assessed at multiple points must be included in the entries for each time point. Data must be in the "wide" format, with one row per unit. If a formula list is supplied, an argument to |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
treat.list |
treatment status for each unit at each time point. This can be specified as a list or data frame of vectors, each of which contains the treatment status of each individual at each time point, or a list or vector of the names of variables in |
Details
bal.tab.formula.list()
and bal.tab.data.frame.list()
generate a list of balance summaries for each time point based on the treatments and covariates provided. All data must be in the "wide" format, with exactly one row per unit and columns representing variables at different time points. See the WeightIt::weightitMSM()
documentation for an example of how to transform long data into wide data using reshape()
.
Multiple sets of weights can be supplied simultaneously by including entering a data frame or a character vector containing the names of weight variables found in data
or a list thereof. When only one set of weights is supplied, the output for the adjusted group will simply be called "Adj"
, but otherwise will be named after each corresponding set of weights. Specifying multiple sets of weights will also add components to other outputs of bal.tab()
.
Value
An object of class bal.tab.msm
containing balance summaries at each time point. Each balance summary is its own bal.tab
object. See class-bal.tab.msm
for more details.
See bal.tab() base methods()
for more detailed information on the value of the bal.tab
objects produced for each time point.
See Also
-
bal.tab()
for details of calculations. -
class-bal.tab.msm
for output and related options. -
class-bal.tab.cluster
for more information on clustered data. -
class-bal.tab.imp
for more information on multiply imputed data. -
bal.tab.multi()
for more information on multi-category treatments.
Examples
data("msmdata", package = "WeightIt")
## Estimating longitudinal propensity scores and weights
ps1 <- glm(A_1 ~ X1_0 + X2_0,
data = msmdata,
family = "binomial")$fitted.values
w1 <- ifelse(msmdata$A_1 == 1, 1 / ps1, 1 / (1 - ps1))
ps2 <- glm(A_2 ~ X1_1 + X2_1 +
A_1 + X1_0 + X2_0,
data = msmdata,
family = "binomial")$fitted.values
w2 <- ifelse(msmdata$A_2 == 1, 1 / ps2, 1 / (1 - ps2))
ps3 <- glm(A_3 ~ X1_2 + X2_2 +
A_2 + X1_1 + X2_1 +
A_1 + X1_0 + X2_0,
data = msmdata,
family = "binomial")$fitted.values
w3 <- ifelse(msmdata$A_3 == 1, 1 / ps3, 1 / (1 - ps3))
w <- w1 * w2 * w3
# Formula interface plus addl:
bal.tab(list(A_1 ~ X1_0 + X2_0,
A_2 ~ X1_1 + X2_1 +
A_1 + X1_0 + X2_0,
A_3 ~ X1_2 + X2_2 +
A_2 + X1_1 + X2_1 +
A_1 + X1_0 + X2_0),
data = msmdata,
weights = w,
distance = list(~ps1, ~ps2, ~ps3),
addl = ~X1_0 * X2_0,
un = TRUE)
# data frame interface:
bal.tab(list(msmdata[c("X1_0", "X2_0")],
msmdata[c("X1_1", "X2_1", "A_1", "X1_0", "X2_0")],
msmdata[c("X1_2", "X2_2", "A_2", "X1_1", "X2_1",
"A_1", "X1_0", "X2_0")]),
treat.list = msmdata[c("A_1", "A_2", "A_3")],
weights = w,
distance = list(~ps1, ~ps2, ~ps3),
un = TRUE)
Balance Statistics for WeightIt
Objects
Description
Generates balance statistics for weightit
and weightitMSM
objects from WeightIt.
Usage
## S3 method for class 'weightit'
bal.tab(
x,
stats,
int = FALSE,
poly = 1,
distance = NULL,
addl = NULL,
data = NULL,
continuous,
binary,
s.d.denom,
thresholds = NULL,
weights = NULL,
cluster = NULL,
imp = NULL,
pairwise = TRUE,
s.weights = NULL,
abs = FALSE,
subset = NULL,
quick = TRUE,
...
)
Arguments
x |
a |
stats |
|
int |
|
poly |
|
distance |
an optional formula or data frame containing distance values (e.g., propensity scores) or a character vector containing their names. If a formula or variable names are specified, |
addl |
an optional formula or data frame containing additional covariates for which to present balance or a character vector containing their names. If a formula or variable names are specified, |
data |
an optional data frame containing variables named in other arguments. For some input object types, this is required. |
continuous |
whether mean differences for continuous variables should be standardized ( |
binary |
whether mean differences for binary variables (i.e., difference in proportion) should be standardized ( |
s.d.denom |
|
thresholds |
a named vector of balance thresholds, where the name corresponds to the statistic (i.e., in |
weights |
a vector, list, or |
cluster |
either a vector containing cluster membership for each unit or a string containing the name of the cluster membership variable in |
imp |
either a vector containing imputation indices for each unit or a string containing the name of the imputation index variable in |
pairwise |
whether balance should be computed for pairs of treatments or for each treatment against all groups combined. See |
s.weights |
Optional; either a vector containing sampling weights for each unit or a string containing the name of the sampling weight variable in |
abs |
|
subset |
a |
quick |
|
... |
for some input types, other arguments that are required or allowed. Otherwise, further arguments to control display of output. See display options for details. |
Details
bal.tab.weightit()
generates a list of balance summaries for the weightit
object given.
Value
For point treatments, if clusters and imputations are not specified, an object of class "bal.tab"
containing balance summaries for the weightit
object. See bal.tab()
for details.
If imputations are specified, an object of class "bal.tab.imp"
containing balance summaries for each imputation and a summary of balance across imputations. See class-bal.tab.imp
for details.
If weightit()
is used with multi-category treatments, an object of class "bal.tab.multi"
containing balance summaries for each pairwise treatment comparison. See bal.tab.multi()
for details.
If weightitMSM()
is used for longitudinal treatments, an object of class "bal.tab.msm"
containing balance summaries for each time period. See class-bal.tab.msm
for details.
If clusters are specified, an object of class "bal.tab.cluster"
containing balance summaries within each cluster and a summary of balance across clusters. See class-bal.tab.cluster
for details.
See Also
-
bal.tab()
for details of calculations.
Examples
library(WeightIt)
data("lalonde", package = "cobalt")
## Basic propensity score weighting
w.out1 <- weightit(treat ~ age + educ + race +
married + nodegree + re74 + re75,
data = lalonde, method = "glm")
bal.tab(w.out1, un = TRUE,
thresholds = c(m = .1, v = 2))
## Weighting with a multi-category treatment
w.out2 <- weightit(race ~ age + educ + married +
nodegree + re74 + re75,
data = lalonde, method = "glm",
estimand = "ATE")
bal.tab(w.out2, un = TRUE)
bal.tab(w.out2, un = TRUE, pairwise = FALSE)
## IPW for longitudinal treatments
data("msmdata", package = "WeightIt")
wmsm.out <- weightitMSM(list(A_1 ~ X1_0 + X2_0,
A_2 ~ X1_1 + X2_1 +
A_1 + X1_0 + X2_0,
A_3 ~ X1_2 + X2_2 +
A_2 + X1_1 + X2_1 +
A_1 + X1_0 + X2_0),
data = msmdata,
method = "glm")
bal.tab(wmsm.out)
Balance Statistics in bal.tab
and love.plot
Description
bal.tab()
and love.plot()
display balance statistics for the included covariates. The stats
argument in each of these functions controls which balance statistics are to be displayed. The argument to stats
should be a character vector with the names of the desired balance statistics.
This page describes all of the available balance statistics and how to request them. Abbreviations are allowed, so you can use the first few letters of each balance statistics to request it instead of typing out its whole name. That convention is used throughout the documentation. For example, to request mean differences and variance ratios in bal.tab()
or love.plot()
, you could include stats = c("m", "v")
. In addition, the thresholds
argument uses the same naming conventions and can be used to request balance thresholds on each statistic. For example, to request a balance threshold of .1 for mean differences, you could include thresholds = c(m = .1)
.
Below, each allowable entry to stats
and thresholds
are described, along with other details or option that accompany them.
Binary/Multi-Category Treatments
"mean.diffs"
Mean differences as computed by
col_w_smd()
. Can be abbreviated as"m"
. Setting the argumentscontinuous
andbinary
to either"std"
or"raw"
will determine whether standardized mean differences or raw mean differences are calculated for continuous and categorical variables, respectively. When standardized mean differences are requested, thes.d.denom
argument controls how the standardization occurs. Whenabs = TRUE
, negative values become positive. Mean differences are requested by default when no entry tostats
is provided."variance.ratios"
Variance ratios as computed by
col_w_vr()
. Can be abbreviated as"v"
. Will not be computed for binary variables. Whenabs = TRUE
, values less than 1 will have their inverse taken. When used withlove.plot
, the x-axis scaled will be logged so that, e.g., .5 is as far away from 1 as 2 is."ks.statistics"
Kolmogorov-Smirnov (KS) statistics as computed by
col_w_ks()
."ovl.coefficients"
Overlapping (OVL) statistics as computed by
col_w_ovl()
. Can be abbreviated as"ovl"
. Additional arguments passed tocol_w_ovl()
, such asintegrate
orbw
, can be supplied tobal.tab()
orlove.plot()
.
Continuous Treatments
"correlations"
Pearson correlations as computed by
col_w_cov()
. Can be abbreviated as"cor"
. Setting the argumentscontinuous
andbinary
to either"std"
or"raw"
will determine whether correlations or covariances are calculated for continuous and categorical variables, respectively (they are both"std"
by default). When correlations are requested, thes.d.denom
argument controls how the standardization occurs. Whenabs = TRUE
, negative values become positive. Pearson correlations are requested by default when no entry tostats
is provided."spearman.correlations"
Spearman correlations as computed by
col_w_cov()
. Can be abbreviated as"sp"
. All arguments are the same as those for"correlations"
. Whenabs = TRUE
, negative values become positive."distance.correlations"
Distance correlations as computed by
col_w_dcov()
. Can be abbreviated as"dis"
. Setting the argumentscontinuous
andbinary
to either"std"
or"raw"
will determine whether distance correlations or distance covariances are calculated for continuous and categorical variables, respectively (they are both"std"
by default). When distance correlations are requested, thes.d.denom
argument controls how the standardization occurs."mean.diffs.target"
Mean differences computed between the weighted and unweighted sample to ensure the weighted sample is representative of the original population. Can be abbreviated as
"m"
. Setting the argumentscontinuous
andbinary
to either"std"
or"raw"
will determine whether standardized mean differences or raw mean differences are calculated for continuous and categorical variables, respectively. The standardization factor will be computed in the unweighted sample. Whenabs = TRUE
, negative values become positive. This statistic is only computed for the adjusted samples."ks.statistics.target"
KS statistics computed between the weighted and unweighted sample to ensure the weighted sample is representative of the original population. Can be abbreviated as
"ks"
. This statistic is only computed for the adjusted samples."ovl.coefficients.target"
Overlapping coefficients computed between the weighted and unweighted sample to ensure the weighted sample is representative of the original population. Can be abbreviated as
"ovl"
. This statistic is only computed for the adjusted samples.
If a statistic is requested in thresholds
, it will automatically be placed in stats
. For example, bal.tab(..., stats = "m", thresholds = c(v = 2))
will display both mean differences and variance ratios, and the variance ratios will have a balance threshold set to 2.
Examples
data(lalonde)
#Binary treatments
bal.tab(treat ~ age + educ + married + re74, data = lalonde,
stats = c("m", "v", "ks"))
love.plot(treat ~ age + educ + married + re74, data = lalonde,
stats = c("m", "v", "ks"), binary = "std",
thresholds = c(m = .1, v = 2))
#Continuous treatments
bal.tab(re75 ~ age + educ + married + re74, data = lalonde,
stats = c("cor", "sp"))
love.plot(re75 ~ age + educ + married + re74, data = lalonde,
thresholds = c(cor = .1, sp = .1))
Compute Balance and Summary Statistics for Covariates
Description
These functions quickly compute balance statistics for the given covariates. These functions are used in bal.tab()
, but they are available for use in programming without having to call bal.tab()
to get them.
col_w_mean()
computes the (weighted) means for a set of covariates and weights and is essentially a weighted version ofcolMeans()
.col_w_sd()
computes the (weighted) standard deviations for a set of covariates and weights.col_w_smd()
computes the (weighted) (absolute) (standardized) difference in means for a set of covariates, a binary treatment, and weights.col_w_vr()
computes the (weighted) variance ratio for a set of covariates, a binary treatment, and weights.col_w_ks()
computes the (weighted) Kolmogorov-Smirnov (KS) statistic for a set of covariates, a binary treatment, and weights.col_w_ovl()
computes the complement of the (weighted) overlapping coefficient compliment for a set of covariates, a binary treatment, and weights (based on Franklin et al, 2014).col_w_cov()
andcol_w_corr()
compute the (weighted) (absolute) treatment-covariate covariance or correlation for a set of covariates, a continuous treatment, and weights.col_w_dcov()
andcol_w_dcorr()
compute the (weighted) treatment-covariate distance covariance or distance correlation for a set of covariates, a continuous treatment, and weights.
Usage
col_w_mean(
mat,
weights = NULL,
s.weights = NULL,
subset = NULL,
na.rm = TRUE,
...
)
col_w_sd(
mat,
weights = NULL,
s.weights = NULL,
bin.vars,
subset = NULL,
na.rm = TRUE,
...
)
col_w_smd(
mat,
treat,
weights = NULL,
std = TRUE,
s.d.denom = "pooled",
abs = FALSE,
s.weights = NULL,
bin.vars,
subset = NULL,
weighted.weights = weights,
na.rm = TRUE,
...
)
col_w_vr(
mat,
treat,
weights = NULL,
abs = FALSE,
s.weights = NULL,
bin.vars,
subset = NULL,
na.rm = TRUE,
...
)
col_w_ks(
mat,
treat,
weights = NULL,
s.weights = NULL,
bin.vars,
subset = NULL,
na.rm = TRUE,
...
)
col_w_ovl(
mat,
treat,
weights = NULL,
s.weights = NULL,
bin.vars,
subset = NULL,
na.rm = TRUE,
integrate = TRUE,
steps = 1001L,
...
)
col_w_cov(
mat,
treat,
weights = NULL,
type = "pearson",
std = FALSE,
s.d.denom = "all",
abs = FALSE,
s.weights = NULL,
bin.vars,
subset = NULL,
weighted.weights = weights,
na.rm = TRUE,
...
)
col_w_corr(
mat,
treat,
weights = NULL,
type = "pearson",
s.d.denom = "all",
abs = FALSE,
s.weights = NULL,
bin.vars,
subset = NULL,
weighted.weights = weights,
na.rm = TRUE,
...
)
col_w_dcov(
mat,
treat,
weights = NULL,
std = FALSE,
s.d.denom = "all",
s.weights = NULL,
subset = NULL,
weighted.weights = weights,
na.rm = TRUE,
...
)
col_w_dcorr(
mat,
treat,
weights = NULL,
s.d.denom = "all",
s.weights = NULL,
subset = NULL,
weighted.weights = weights,
na.rm = TRUE,
...
)
Arguments
mat |
a numeric matrix or a data frame containing the covariates for which the statistic is to be computed. If a data frame, |
weights |
|
s.weights |
|
subset |
a |
na.rm |
|
... |
for all functions, additional arguments supplied to |
bin.vars |
a vector used to denote whether each variable is binary or not. Can be a |
treat |
a vector of treatment status for each individual. For |
std |
|
s.d.denom |
for
For |
abs |
|
weighted.weights |
for |
integrate |
|
steps |
for |
type |
for |
Details
col_w_mean()
computes column weighted means for a matrix of variables. It is similar to colMeans()
but (optionally) incorporates weights. weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them. This could be used to compute the weighted means of each covariate in the general population to examine the degree to which a weighting method has left the weighted samples resembling the original population.
col_w_sd()
computes column weighted standard deviations for a matrix of variables. weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them. The variance of binary variables is computed as p(1-p)
, where p
is the (weighted) proportion of 1s, while the variance of continuous variables is computed using the standard formula; the standard deviation is the square root of this variance.
col_w_smd()
computes the mean difference for each covariate between treatment groups defined by treat
. These mean differences can optionally be weighted, standardized, and/or in absolute value. The standardization factor is computed using the unweighted standard deviation or variance when s.weights
are absent, and is computed using the s.weights
-weighted standard deviation or variance when s.weights
are present, except when s.d.denom = "weighted"
, in which case the product of weighted.weights
and s.weights
(if present) are used to weight the standardization factor. The standardization factor is computed using the whole sample even when subset
is used. Note that unlike bal.tab()
, col_w_smd()
requires the user to specify whether each individual variable should be standardized using std
rather than relying on continuous
or binary
. The weighted mean difference is computed using the product of weights
and s.weights
, if specified. The variance of binary variables is computed as p(1-p)
, where p
is the (weighted) proportion of 1s, while the variance of continuous variables is computed using the standard formula.
col_w_vr()
computes the variance ratio for each covariate between treatment groups defined by treat
. When abs = TRUE
, pmax(out, 1/out)
is applied to the output so that the ratio is always greater than or equal to 1. For binary variables, the variance is computed as p(1-p)
, where p
is the (weighted) proportion of 1s, while the variance of continuous variables is computed using the standard formula. Note that in bal.tab()
, variance ratios are not computed for binary variables, while here, they are (but likely should not be interpreted). weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them. Because of how the weighted variance is computed, exactly balanced groups may have variance ratios that differ slightly from 1.
col_w_ks()
computes the KS statistic for each covariate using the method implemented in twang. The KS statistics can optionally be weighted. For binary variables, the KS statistic is just the difference in proportions. weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them.
col_w_ovl()
computes the complement of the overlapping coefficient as described by Franklin et al. (2014). It does so by computing the density of the covariate in the treated and control groups, then finding the area where those density overlap, and subtracting that number from 1, yielding a value between 0 and 1 where 1 indicates complete imbalance, and 0 indicates perfect balance. density()
is used to model the density in each group. The bandwidth of the covariate in the smaller treatment group is used for both groups. The area of overlap can be computed using integrate
, which quickly and accurately computes the integral, or using a midpoint Riemann sum with 1000 partitions, which approximates the area more slowly. A reason to prefer the Riemann sum is that integrate
can fail for unknown reasons, though Riemann sums will fail with some extreme distributions. When either method fails, the resulting value will be NA
. For binary variables, the complement of the overlapping coefficient is just the difference in proportions. weights
and s.weights
are multiplied together prior to being used, and there is no distinction between them. The weights are used to compute the weighted density by supplying them to the weights
argument of density
.
col_w_cov()
computes the covariances between a continuous treatment and the covariates to assess balance for a continuous treatment as recommended in Austin (2019). These covariances can optionally be weighted or in absolute value or can be requested as correlations (i.e., standardized covariances). Each correlations is computed as the covariance between the treatment and covariate divided by a standardization factor, which is equal to the square root of the product of the variance of treatment and the variance of the covariate. The standardization factor is computed using the unweighted variances when s.weights
are absent, and is computed using the sampling weighted variances when s.weights
are present, except when s.d.denom = "weighted"
, in which case the product of weighted.weights
and s.weights
(if present) are used to weight the standardization factor. For this reason, the computed correlation can be greater than 1 or less than -1. The standardization factor is always computed using the whole sample even when subset
is used. The covariance is computed using the product of weights
and s.weights
, if specified. The variance of binary variables is computed as p(1-p)
, where p
is the (weighted) proportion of 1s, while the variance of continuous variables is computed using the standard formula.
col_w_corr()
is a wrapper for col_w_cov
with std
set to TRUE
.
col_w_dcov()
computes the distance covariances between a continuous treatment and the covariates to assess balance for a continuous treatment. A multivariate version is described by Huling et al. (2023) for computing a scalar value that represents the balance for all covariates simultaneously; the statistic computed here is for one covariate at a time. The distance covariances can optionally be weighted or can be requested as distance correlations (i.e., standardized distance covariances). The distance correlations are computed as the distance covariance between the treatment and covariate divided by a standardization factor, which is equal to the square root of the product of the distance variance of treatment and the distance variance of the covariate, where the distance variance is the distance covariance of a variable with itself. The standardization factor is computed using the unweighted distance variances when s.weights
are absent, and is computed using the sampling weighted distance variances when s.weights
are present, except when s.d.denom = "weighted"
, in which case the product of weighted.weights
and s.weights
(if present) are used to weight the standardization factor. For this reason, the computed distance correlation can be greater than 1. The standardization factor is always computed using the whole sample even when subset
is used. The distance covariance is computed using the product of weights
and s.weights
, if specified.
col_w_dcorr()
is a wrapper for col_w_dcov
with std
set to TRUE
.
Value
A vector of balance statistics, one for each variable in mat
. If mat
has column names, the output will be named as well.
References
Austin, P. C. (2019). Assessing covariate balance when using the generalized propensity score with quantitative or continuous exposures. Statistical Methods in Medical Research, 28(5), 1365–1377. doi:10.1177/0962280218756159
Franklin, J. M., Rassen, J. A., Ackermann, D., Bartels, D. B., & Schneeweiss, S. (2014). Metrics for covariate balance in cohort studies of causal effects. Statistics in Medicine, 33(10), 1685–1699. doi:10.1002/sim.6058
Huling, J. D., Greifer, N., & Chen, G. (2023). Independence Weights for Causal Inference with Continuous Treatments. Journal of the American Statistical Association, 0(0), 1–14. doi:10.1080/01621459.2023.2213485
What Works Clearinghouse. (2020). WWC Procedures Handbook (Version 4.1). Retrieved from https://ies.ed.gov/ncee/wwc/Handbooks
See Also
Examples
data("lalonde", package = "cobalt")
treat <- lalonde$treat
covs <- subset(lalonde, select = -c(treat, re78))
covs0 <- splitfactor(covs, drop.first = "if2")
bin.vars <- c(FALSE, FALSE, TRUE, TRUE, TRUE,
TRUE, TRUE, FALSE, FALSE)
W <- WeightIt::weightit(treat ~ covs, method = "glm",
estimand = "ATE")
weights <- W$weights
round(data.frame(
m0 = col_w_mean(covs0, weights = weights, subset = treat == 0),
sd0 = col_w_sd(covs0, weights = weights,
bin.vars = bin.vars, subset = treat == 0),
m1 = col_w_mean(covs0, weights = weights, subset = treat == 1),
sd1 = col_w_sd(covs0, weights = weights,
bin.vars = bin.vars, subset = treat == 1),
smd = col_w_smd(covs0, treat = treat, weights = weights,
std = TRUE, bin.vars = bin.vars),
vr = col_w_vr(covs0, treat = treat, weights = weights,
bin.vars = bin.vars),
ks = col_w_ks(covs0, treat = treat, weights = weights,
bin.vars = bin.vars),
ovl = col_w_ovl(covs0, treat = treat, weights = weights,
bin.vars = bin.vars),
row.names = colnames(covs0)
), 4)
# Compare to bal.tab():
bal.tab(covs, treat = treat, weights = weights,
disp = c("m", "sd"),
stats = c("m", "v", "ks", "ovl"),
estimand = "ATE", method = "weighting",
binary = "std")
Using bal.tab()
with Clustered Data
Description
When using bal.tab()
with clustered data, the output will be different from the case with single-level data, and there are some options that are common across all bal.tab()
methods. This page outlines the outputs and options in this case.
There are two main components of the output of bal.tab()
with clustered data: the within-cluster balance summaries and the across-cluster balance summary. The within-cluster balance summaries display balance for units within each cluster separately.
The across-cluster balance summary pools information across the within-cluster balance summaries to simplify balance assessment. It provides a combination (e.g., mean or maximum) of each balance statistic for each covariate across all clusters. This allows you to see how bad the worst imbalance is and what balance looks like on average. The balance summary will not be computed if longitudinal treatments, multi-category treatments, or multiply imputed data are used.
In order to use the thresholds
argument with bal.tab()
with clustered data and the balance summary across clustered displayed, cluster.fun
must be supplied and set to a single string, which is not the default.
Allowable arguments
There are four arguments for each bal.tab()
method that can handle clustered data: cluster
, which.cluster
, cluster.summary
, and cluster.fun
.
cluster
A vector of cluster membership. This can be factor, character, or numeric vector. This argument is required to let
bal.tab()
know that the data is clustered. If adata
argument is specified, this can also be the name of a variable indata
that contains cluster membership.which.cluster
This is a display option that does not affect computation. If
.all
(the default), all clusters incluster
will be displayed. If.none
, no clusters will be displayed. Otherwise, can be a vector of cluster names or numerical indices for which to display balance. Indices correspond to the alphabetical order of cluster names (or the order of cluster levels if a factor).cluster.summary
This is a display option that does not affect computation. If
TRUE
, the balance summary across clusters will be displayed. The default isTRUE
, and ifwhich.cluster
is.none
, it will automatically be set toTRUE
.cluster.fun
This is a display option that does not affect computation. Can be "min", "mean", or "max" and corresponds to which function is used in the across-cluster summary to combine results across clusters. For example, if
cluster.fun = "mean"
the mean balance statistic across clusters will be displayed. The default whenabs = FALSE
in thebal.tab()
call is to display all three. The default whenabs = TRUE
in thebal.tab()
call is to display just the mean and maximum absolute balance statistic.
Output
The output is a bal.tab.cluster
object, which inherits from bal.tab
. It has the following elements:
-
Cluster.Balance
: For each cluster, a regularbal.tab
object containing a balance table, a sample size summary, and other balance assessment tools, depending on which options are specified. -
Cluster.Summary
: The balance summary across clusters. This will include the combination of each balance statistic for each covariate across all clusters according to the value ofcluster.fun
. -
Observations
: A table of sample sizes or effective sample sizes for each cluster before and after adjustment.
As with other methods, multiple weights can be specified, and values for all weights will appear in all tables.
See Also
-
vignette("segmented-data")
for examples
Using bal.tab()
with Multiply Imputed Data
Description
When using bal.tab()
with multiply imputed data, the output will be different from the case with a single data set. Multiply imputed data can be used with all bal.tab()
methods, and the mimids
and wimids
methods for MatchThem objects automatically incorporate multiply imputed data. This page outlines the outputs and options available with multiply imputed data.
There are two main components of the output of bal.tab()
with multiply imputed data: the within-imputation balance summaries and the across-imputation balance summary. The within-imputation balance summaries display balance for units within each imputed data set separately. In general, this will not be very useful because interest rarely lies in the qualities of any individual imputed data set.
The across-imputation balance summary pools information across the within-imputation balance summaries to simplify balance assessment. It provides the average, smallest, and largest balance statistic for each covariate across all imputations. This allows you to see how bad the worst imbalance is and what balance looks like on average across the imputations. The summary behaves differently depending on whether abs
is specified as TRUE
or FALSE
. When abs = TRUE
, the across-imputation balance summary will display the mean absolute balance statistics and the maximum absolute balance statistics. When abs = FALSE
, the across-imputation balance summary will display the minimum, mean, and maximum of the balance statistic in its original form.
In order to use the thresholds
argument with bal.tab()
with multiply imputed data and the balance summary across imputations displayed, imp.fun
must be supplied and set to a single string, which is not the default. See vignette("segmented-data")
for details.
Allowable arguments
There are four arguments for each bal.tab()
method that can handle multiply imputed data: imp
, which.imp
, imp.summary
, and imp.fun
.
imp
A vector of imputation membership. This can be factor, character, or numeric vector. This argument is required to let
bal.tab()
know that the data is multiply imputed unless MatchThem objects are used. If adata
argument is specified, this can also be the name of a variable indata
that contains imputation membership. If thedata
argument is amids
object, the output of a call tomice()
,imp
does not need to be specified and will automatically be extracted from themids
object.which.imp
This is a display option that does not affect computation. If
.all
, all imputations inimp
will be displayed. If.none
(the default), no imputations will be displayed. Otherwise, can be a vector of imputation indices for which to display balance.imp.summary
This is a display option that does not affect computation. If
TRUE
, the balance summary across imputations will be displayed. The default isTRUE
, and ifwhich.imp
is.none
, it will automatically be set toTRUE
.imp.fun
This is a display option that does not affect computation. Can be "min", "mean", or "max" and corresponds to which function is used in the across-imputation summary to combine results across imputations. For example, if
imp.fun = "mean"
the mean balance statistic across imputations will be displayed. The default whenabs = FALSE
in thebal.tab()
call is to display all three. The default whenabs = TRUE
in thebal.tab()
call is to display just the mean and maximum absolute balance statistic.
Output
The output is a bal.tab.imp
object, which inherits from bal.tab
. It has the following elements:
-
Imputation.Balance
: For each imputation, a regularbal.tab
object containing a balance table, a sample size summary, and other balance assessment tools, depending on which options are specified. -
Balance.Across.Imputations
: The balance summary across imputations. This will include the combination of each balance statistic for each covariate across all imputations according to the value ofimp.fun
. -
Observations
: A table of sample sizes or effective sample sizes averaged across imputations before and after adjustment.
As with other methods, multiple weights can be specified, and values for all weights will appear in all tables.
See Also
-
vignette("segmented-data")
for examples
Using bal.tab()
with Longitudinal Treatments
Description
When using [bal.tab()] with longitudinal treatments, the output will be different from the case with point treatments, and there are some options that are common across all `bal.tab()` methods for dealing with longitudinal data. This page outlines the outputs and options in this case. There are two main components of the output of `bal.tab()` with longitudinal treatments: the time-point-specific balance summary and across-time-points balance summary. The time-point-specific balance summaries are standard point treatment balance summaries at each time point. The across-time-points balance summary is, for each variable, the greatest imbalance across all time-point-specific balance summaries. If the greatest observed imbalance is tolerable, then all other imbalances for that variable will be tolerable too, so focusing on reducing the greatest imbalance is sufficient for reducing imbalance overall. The balance summary will not be computed if multi-category treatments or multiply imputed data are used.
Allowable arguments
There are two additional arguments for each bal.tab()
method that can handle longitudinal treatments: which.time
and msm.summary
.
which.time
This is a display option that does not affect computation. If
.all
(the default), all time points will be displayed. If.none
, no time points will be displayed. Otherwise, can be a vector of treatment names or indices for which to display balance.msm.summary
This is a display option that does not affect computation. If
TRUE
, the balance summary across time points will be displayed. The default isTRUE
, and ifwhich.time
is.none
, it will automatically be set toTRUE
.
Output
The output is a bal.tab.msm
object, which inherits from bal.tab
. It has the following elements:
-
Time.Balance
: For each time point, a regularbal.tab
object containing a balance table, a sample size summary, and other balance assessment tools, depending on which options are specified. -
Balance.Across.Times
: The balance summary across time points. This will include the maximum balance statistic(s) for each covariate across all time points. -
Observations
: A table of sample sizes or effective sample sizes for each time point before and after adjustment.
As with other methods, multiple weights can be specified, and values for all weights will appear in all tables.
Note
The balance tables presented here are not the same as those recommended by Jackson (2016) and computed in his R package, confoundr, as these do not take into account treatment history. The balance statistics presented here should be used with caution and may not reflect balance in an accurate way.
References
Jackson, J. W. (2016). Diagnostics for Confounding of Time-varying and Other Joint Exposures: Epidemiology, 27(6), 859–869. doi:10.1097/EDE.0000000000000547
See Also
-
vignette("longitudinal-treat")
for examples
Using bal.tab()
with Multi-Category Treatments
Description
When using bal.tab()
with multi-category treatments, the output will be different from the case with binary or continuous treatments, and there are some options that are common across all bal.tab()
methods. This page outlines the outputs and options in this case.
There are two main components of the output of bal.tab()
with multi-category treatments: the two-group treatment comparisons and the balance summary. The two-group treatment comparisons are standard binary treatment comparison either for pairs of groups (e.g., for treatments A, B, and C, "A vs. B", "A vs. C", and "B vs. C") or each group against all the groups (i.e., the entire sample).
The balance summary is, for each variable, the greatest imbalance across all two-group comparisons. So, for variable X1, if "A vs. B" had a standardized mean difference of 0.52, "A vs. C" had a standardized mean difference of .17, and "B vs. C" had a standardized mean difference of .35, the balance summary would have 0.52 for the value of the standardized mean difference for X1. The same goes for other variables and other measures of balance. If the greatest observed imbalance is tolerable, then all other imbalances for that variable will be tolerable too, so focusing on reducing the greatest imbalance is sufficient for reducing imbalance overall. (Note that when s.d.denom = "pooled"
, i.e., when the estimand is the ATE, the pooled standard deviation in the denominator will be the average of the standard deviations across all treatment groups, not just those used in the pairwise comparison.) The balance summary will not be computed if multiply imputed data are used.
Allowable arguments
There are four arguments for each bal.tab()
method that can handle multi-category treatments: pairwise
, focal
, which.treat
, and multi.summary
.
pairwise
-
Whether to compute the two-group comparisons pairwise or not. If
TRUE
,bal.tab()
will compute comparisons for each pair of treatments. This can be valuable if treatments are to be compared with one another (which is often the case). IfFALSE
,bal.tab()
will compute balance for each treatment group against the full unadjusted sample whenfocal
isNULL
and for each non-focal group against the focal group otherwise. focal
-
When one group is to be compared to multiple control groups in an ATT analysis, the group considered "treated" is the focal group. By specifying the name or index of the treatment condition considered focal,
bal.tab()
will only compute and display pairwise balance for treatment comparisons that include the focal group whenpairwise = FALSE
. which.treat
-
This is a display option that does not affect computation. When displaying the
bal.tab
output, which treatments should be displayed? If a vector of length 1 is entered, all comparisons involving that treatment group will be displayed. If a vector of length 2 or more is entered, all comparisons involving treatments that both appear in the input will be displayed. For example, inputting"A"
will display "A vs. B" and "A vs. C", while enteringc("A", "B")
will only display "A vs. B"..none
indicates no treatment comparisons will be displayed, and.all
indicates all treatment comparisons will be displayed..none
is the default. multi.summary
-
If
TRUE
, the balance summary across all comparisons will be computed and displayed. This includes one row for each covariate with maximum balance statistic across all pairwise comparisons. Note that, if variance ratios or KS statistics are requested in addition to mean differences, the displayed values may not come from the same pairwise comparisons; that is, the greatest standardized mean difference and the greatest variance ratio may not come from the same comparison. The default isTRUE
, and ifwhich.treat
is.none
, it will automatically be set toTRUE
.
Output
The output is a bal.tab.multi
object, which inherits from bal.tab
. It has the following elements:
-
Pair.Balance
:For each pair of treatment groups, a regularbal.tab
object containing a balance table, a sample size summary, and other balance assessment tools, depending on which options are specified. Ifpairwise
isFALSE
, the comparisons will be between each group and the groups combined (labeled "All") whenfocal
isNULL
and between each non-focal group and the focal group otherwise. -
Balance.Across.Pairs
: The balance summary across two-group comparisons. This will include the greatest (i.e., maximum) absolute balance statistics(s) for each covariate across all comparisons computed. Thresholds can be requested for each balance measure as with binary treatments. -
Observations
: A table of sample sizes or effective sample sizes for each treatment group before and after adjustment.
As with other methods, multiple weights can be specified, and values for all weights will appear in all tables.
Note
In versions 4.3.1 and earlier, setting pairwise = FALSE
would compare each group to the full adjusted sample. Now, each group is compared to the full unadjusted sample (unadjusted except for s.weights
, if supplied).
In versions 4.3.1 and earlier, pairwise
was ignored with non-NULL
focal
and was automatically set to FALSE
. pairwise
can be specified and its default is now TRUE
, so balance between all treatment groups will be computed by default rather than only between each non-group and the focal group. To recover previous functionality, set pairwise = FALSE
with non-NULL
focal
.
See Also
-
vignette("segmented-data")
for examples
Using bal.tab()
with Subclassified Data
Description
When using bal.tab()
with subclassified data, i.e., data split into subclasses where balance may hold, the output will be different from the standard, non-subclassified case, and there is an additional option for controlling display. This page outlines the outputs and options in this case.
There are two main components of the output of bal.tab()
with subclassified data: the balance within subclasses and the balance summary across subclasses. The within-subclass balance displays essentially are standard balance displays for each subclass, except that only "adjusted" values are available, because the subclassification itself is the adjustment.
The balance summary is, for each variable, like a weighted average of the balance statistics across subclasses. This is computed internally by assigning each individual a weight based on their subclass and treatment group membership and then computing weighted balance statistics as usual with these weights. This summary is the same one would get if subclasses were supplied to the match.strata
argument rather than to subclass
. Because the means and mean differences are additive, their computed values will be weighted averages of the subclass-specific values, but for other statistics, the computed values will not be.
Allowable arguments
There are three arguments for bal.tab()
that relate to subclasses: subclass
, which.subclass
, and subclass.summary
.
subclass
For the
data.frame
and formula methods ofbal.tab()
, a vector of subclass membership or the name of the variable indata
containing subclass membership. When using subclassification with a function compatible with cobalt, such asmatchit()
in MatchIt, this argument can be omitted because the subclasses are in the output object.which.subclass
This is a display option that does not affect computation. If
.all
, all subclasses insubclass
will be displayed. If.none
(the default), no subclasses will be displayed. Otherwise, can be a vector of subclass indices for which to display balance.subclass.summary
This is a display option that does not affect computation. If
TRUE
, the balance summary across subclasses will be displayed. The default isTRUE
, and ifwhich.subclass
is.none
, it will automatically be set toTRUE
.
Output
The output is a bal.tab.subclass
object, which inherits from bal.tab
. It has the following elements:
-
Subclass.Balance
: A list of data frames containing balance information for each covariate in each subclass. -
Balance.Across.Subclass
: A data frame containing balance statistics for each covariate aggregated across subclasses and for the original sample (i.e., unadjusted). Seebal.tab()
for details on what this includes. -
Observations
: A table of sample sizes in each subclass and overall.
See Also
Options for Displaying bal.tab()
Output
Description
Several additional arguments can be passed to bal.tab()
that control the display of the output; these arguments are documented here. Not all arguments are applicable to all uses of bal.tab()
; for example, which.subclass
, which controls which subclasses are displayed when subclassification is used, won't do anything when subclassification is not used. Note that when quick = TRUE
is set in the call to bal.tab()
(which is the default), setting any of these arguments to FALSE
can prevent some values from being computed, which can have unintended effects.
Allowed arguments
disp.bal.tab
logical
; whether to display the table of balance statistics. Default isTRUE
, so the balance table is displayed.imbalanced.only
logical
; whether to display only the covariates that failed to meet at least one of balance thresholds. Default isFALSE
, so all covariates are displayed.un
logical
; whether to print statistics for the unadjusted sample as well as for the adjusted sample. Default isFALSE
, so only the statistics for the adjusted sample are displayed.disp
character
; which distribution summary statistic(s) should be reported. Allowable options include"means"
and"sds"
. Multiple options are allowed. Abbreviations allowed.stats
character
; which statistic(s) should be reported. Seestats
to see which options are available. Multiple options are allowed. Abbreviations allowed. For binary and multi-category treatments, the default is"mean.diffs"
(i.e., [standardized] mean differences), and for continuous treatments, the default is"correlations"
(i.e., treatment-covariate Pearson correlations).factor_sep
character
; the string used to separate factor variables from their levels when variable names are printed. Default is"_"
.int_sep
character
; the string used to separate two variables involved in an interaction when variable names are printed. Default is" * "
. Older versions of cobalt used"_"
.disp.call
logical
; whether to display the function call from the original input object, if present. Default isFALSE
, so the function call is not displayed.
When subclassification is used
which.subclass
Which subclasses (if any) should be displayed. If
.all
, all subclasses will be displayed. If.none
(the default), no subclasses will be displayed. Otherwise, can be a vector of subclass indices for which to display balance.subclass.summary
logical
; whether to display the balance summary across subclasses. IfTRUE
, the balance summary across subclasses will be displayed. The default isTRUE
, and ifwhich.subclass
is.none
, it will automatically be set toTRUE
.
When the treatment is multi-category
which.treat
For which treatments or treatment combinations balance tables should be displayed. If a vector of length 1 is entered, all comparisons involving that treatment group will be displayed. If a vector of length 2 or more is entered, all comparisons involving treatments that both appear in the input will be displayed. For example, setting
which.treat = "A"
will display "A vs. B" and "A vs. C", while settingwhich.treat = c("A", "B")
will only display "A vs. B"..none
indicates no treatment comparisons will be displayed, and.all
indicates all treatment comparisons will be displayed. Default is.none
. Seebal.tab.multi()
.multi.summary
logical
; whether to display the balance summary across all treatment pairs. This includes one row for each covariate with maximum balance statistic across all pairwise comparisons. Note that, if variance ratios or KS statistics are requested, the displayed values may not come from the same pairwise comparisons; that is, the greatest standardized mean difference and the greatest variance ratio may not come from the same comparison. Default isTRUE
whenwhich.treat
is.none
andFALSE
otherwise. Seebal.tab.multi()
.
When clusters are present
which.cluster
For which clusters balance tables should be displayed. If
.all
, all clusters incluster
will be displayed. If.none
, no clusters will be displayed. Otherwise, can be a vector of cluster names or numerical indices for which to display balance. Indices correspond to the alphabetical order of cluster names (or the order of cluster levels if a factor). Default is.all
. Seeclass-bal.tab.cluster
.cluster.summary
logical
; whether to display the balance summary across clusters. Default isTRUE
whenwhich.cluster
is.none
andFALSE
otherwise (note the default forwhich.cluster
is.all
). Seeclass-bal.tab.cluster
.cluster.fun
Which function is used in the across-cluster summary to combine results across clusters. Can be "min", "mean", or "max". For example, if
cluster.fun = "mean"
the mean balance statistic across clusters will be displayed. The default whenabs = FALSE
in thebal.tab()
call is to display all three. The default whenabs = TRUE
in thebal.tab()
call is to display just the mean and maximum absolute balance statistic. Seeclass-bal.tab.cluster
.
When multiple imputations are present
which.imp
For which imputations balance tables should be displayed. If
.all
, all imputations inimp
will be displayed. If.none
, no imputations will be displayed. Otherwise, can be a vector of imputation indices for which to display balance. Default is.none
. Seeclass-bal.tab.imp
.imp.summary
logical
; whether to display the balance summary across imputations. Default isTRUE
whenwhich.imp
is.none
andFALSE
otherwise. Seeclass-bal.tab.imp
.imp.fun
Which function is used in the across-imputation summary to combine results across imputations. Can be "min", "mean", or "max". For example, if
imp.fun = "mean"
the mean balance statistic across imputations will be displayed. The default whenabs = FALSE
in thebal.tab()
call is to display all three. The default whenabs = TRUE
in thebal.tab()
call is to display just the mean and maximum absolute balance statistic. Seeclass-bal.tab.imp
.
When the treatment is longitudinal
which.time
For which time points balance tables should be displayed. If
.all
, all time points will be displayed. If.none
, no time points will be displayed. Otherwise, can be a vector of treatment names or indices for which to display balance. Default is.none
. Seeclass-bal.tab.msm
.msm.summary
logical
; whether to display the balance summary across time points. Default isTRUE
whenwhich.time
is.none
andFALSE
otherwise. Seeclass-bal.tab.msm
.
Setting options globally
In addition to being able to be specified as arguments, if you find you frequently set a display option to something other than its default, you can set that as a global option (for the present R session) using set.cobalt.options()
and retrieve it using get.cobalt.options()
. Note that global options cannot be set for which.subclass
, which.cluster
, which.imp
, which.treat
, or which.time
.
Note
When calling bal.tab()
using do.call()
, if you are using .all
or .none
as inputs to arguments, you need to use alist()
rather than list()
to group the arguments. For example, do.call(bal.tab, list(., which.cluster = .none))
will produce an error, but do.call(bal.tab, alist(., which.cluster = .none))
should work correctly.
See Also
Convenient Formula Generation
Description
f.build()
returns a formula
of the form y ~ x1 + x2 + ...
from a data frame input. It can be much quicker to use f.build()
than to hand-write the precise formula, which may contain errors. It can be used in place of a formula in, for example, glm()
, matchit()
, or bal.tab()
. It provides similar functionality to reformulate()
.
Usage
f.build(y = NULL, rhs = NULL)
Arguments
y |
the quoted name of the response (left hand side) variable in the formula. Only one variable is supported. If missing, |
rhs |
a data frame whose variable names will be the terms on the right hand side of the formula, or a character vector whose values will be the terms on the right hand side of the formula. If missing, the argument to |
Value
a formula
object.
See Also
Examples
data(lalonde)
covs <- subset(lalonde, select = -c(treat, re78))
lm(f.build("treat", covs), data = lalonde)
Extract Weights from Preprocessing Objects
Description
Extracts weights from the outputs of preprocessing functions.
Usage
get.w(x, ...)
## S3 method for class 'matchit'
get.w(x, ...)
## S3 method for class 'ps'
get.w(x, stop.method = NULL, estimand, s.weights = FALSE, ...)
## S3 method for class 'mnps'
get.w(x, stop.method = NULL, s.weights = FALSE, ...)
## S3 method for class 'ps.cont'
get.w(x, s.weights = FALSE, ...)
## S3 method for class 'iptw'
get.w(x, stop.method = NULL, s.weights = FALSE, ...)
## S3 method for class 'Match'
get.w(x, ...)
## S3 method for class 'CBPS'
get.w(x, estimand, ...)
## S3 method for class 'CBMSM'
get.w(x, ...)
## S3 method for class 'ebalance'
get.w(x, treat, ...)
## S3 method for class 'optmatch'
get.w(x, estimand, ...)
## S3 method for class 'cem.match'
get.w(x, estimand, ...)
## S3 method for class 'weightit'
get.w(x, s.weights = FALSE, ...)
## S3 method for class 'designmatch'
get.w(x, treat, estimand, ...)
## S3 method for class 'mimids'
get.w(x, ...)
## S3 method for class 'wimids'
get.w(x, ...)
## S3 method for class 'sbwcau'
get.w(x, ...)
Arguments
x |
output from the corresponding preprocessing packages. |
... |
arguments passed to other methods. |
stop.method |
the name of the stop method used in the original call to |
estimand |
if weights are computed using the propensity score (i.e., for the If weights are computed using subclasses/matching strata (i.e., for the |
s.weights |
whether the sampling weights included in the original call to the fitting function should be included in the weights. If |
treat |
a vector of treatment status for each unit. This is required for methods that include |
Details
The output of get.w()
can be used in calls to the formula and data frame methods of bal.tab()
(see example below). In this way, the output of multiple preprocessing packages can be viewed simultaneously and compared. The weights can also be used in weights
statements in regression methods to compute weighted effects.
twang has a function called get.weights()
that performs the same function on ps
objects but offers slightly finer control. Note that the weights generated by get.w()
for ps
objects do not include sampling weights by default.
When sampling weights are used with CBPS()
in CBPS, the returned weights will already have the sampling weights incorporated. To retrieve the balancing weights on their own, divide the returned weights by the original sampling weights. For other packages, the balancing weights are returned separately unless s.weights = TRUE
, which means they must be multiplied by the sampling weights for effect estimation.
When Match()
in Matching is used with CommonSupport = TRUE
, the returned weights will be incorrect. This option is not recommended by the package authors.
Value
A vector or data frame of weights for each unit. These may be matching weights or balancing weights.
Examples
data("lalonde", package = "cobalt")
m.out <- MatchIt::matchit(treat ~ age + educ + race,
data = lalonde,
estimand = "ATT")
w.out <- WeightIt::weightit(treat ~ age + educ + race,
data = lalonde,
estimand = "ATT")
bal.tab(treat ~ age + educ + race, data = lalonde,
weights = data.frame(matched = get.w(m.out),
weighted = get.w(w.out)),
method = c("matching", "weighting"),
estimand = "ATT")
Data from National Supported Work Demonstration and PSID, as analyzed by Dehejia and Wahba (1999).
Description
This is a subsample of the data from the treated group in the National Supported Work Demonstration (NSW) and the comparison sample from the Population Survey of Income Dynamics (PSID). This data was previously analyzed extensively by Lalonde (1986) and Dehejia and Wahba (1999).
Usage
lalonde
lalonde_mis
Format
A data frame with 614 observations (185 treated, 429 control). There are 9 variables measured for each individual.
"treat" is the treatment assignment (1=treated, 0=control).
"age" is age in years.
"educ" is education in number of years of schooling.
"race" is the individual's race/ethnicity, (Black, Hispanic, or White). Note some other versions of this dataset use indicator variables
black
andhispan
instead of a single race variable."married" is an indicator for married (1=married, 0=not married).
"nodegree" is an indicator for whether the individual has a high school degree (1=no degree, 0=degree).
"re74" is income in 1974, in U.S. dollars.
"re75" is income in 1975, in U.S. dollars.
"re78" is income in 1978, in U.S. dollars.
"treat" is the treatment variable, "re78" is the outcome, and the others are pre-treatment covariates.
An object of class data.frame
with 614 rows and 9 columns.
Details
lalonds_mis
is the same dataset with missing values in three of the covariates.
References
Lalonde, R. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review 76: 604-620.
Dehejia, R.H. and Wahba, S. (1999). Causal Effects in Nonexperimental Studies: Re-Evaluating the Evaluation of Training Programs. Journal of the American Statistical Association 94: 1053-1062.
Display Balance Statistics in a Love Plot
Description
Generates a "Love" plot graphically displaying covariate balance before and after adjusting. Options are available for producing publication-ready plots. Detailed examples are available in vignette("love.plot")
.
Usage
love.plot(
x,
stats,
abs,
agg.fun = NULL,
var.order = NULL,
drop.missing = TRUE,
drop.distance = FALSE,
thresholds = NULL,
line = FALSE,
stars = "none",
grid = FALSE,
limits = NULL,
colors = NULL,
shapes = NULL,
alpha = 1,
size = 3,
wrap = 30,
var.names = NULL,
title,
sample.names,
labels = FALSE,
position = "right",
themes = NULL,
...
)
Arguments
x |
the valid input to a call to |
stats |
|
abs |
|
agg.fun |
if balance is to be displayed across clusters or imputations rather than within a single cluster or imputation, which summarizing function ("mean", "max", or "range") of the balance statistics should be used. If "range" is entered, |
var.order |
a |
drop.missing |
|
drop.distance |
|
thresholds |
|
line |
|
stars |
when mean differences are to be displayed, which variable names should have a star (i.e., an asterisk) next to them. Allowable values are "none", "std" (for variables with mean differences that have been standardized), or "raw" (for variables with mean differences that have not been standardized). If "raw", the x-axis title will be "Standardized Mean Differences". Otherwise, it will be "Mean Differences". Ignored when mean difference are not displayed. See Details for an explanation of the purpose of this option. |
grid |
|
limits |
|
colors |
the colors of the points on the plot. See 'Color Specification' at |
shapes |
the shapes of the points on the plot. Must be one or two numbers between 1 and 25 or the name of a valid shape. See the |
alpha |
|
size |
|
wrap |
|
var.names |
an optional object providing alternate names for the variables in the plot, which will otherwise be the variable names as they are stored. This may be useful when variables have ugly names. See Details on how to specify |
title |
|
sample.names |
|
labels |
|
position |
the position of the legend. When |
themes |
an optional list of |
... |
additional arguments passed to
Additionally, any of the |
Details
love.plot
can be used with clusters, imputations, and multi-category and longitudinal treatments in addition to the standard case. Setting the corresponding which.
argument to .none
will aggregate across that dimension. When aggregating, an argument should be specified to agg.fun
referring to whether the mean, minimum ("min"), or maximum ("max") balance statistic or range ("range", the default) of balance statistics for each covariate should be presented in the plot. See vignette("segmented-data")
for examples.
With subclasses, balance will be displayed for the unadjusted sample and the aggregated subclassified sample. If disp.subclass
is TRUE
, each subclass will be displayed additionally as a number on the plot.
Variable order using var.order
The order that the variables are presented in depends on the argument to var.order
. If NULL
, the default, they will be displayed in the same order as in the call to bal.tab()
, which is the order of the underlying data set. If "alphabetical", they will be displayed in alphabetical order. If "unadjusted", they will be ordered by the balance statistic of the unadjusted sample. To order by the values of the adjusted sample, "adjusted" can be supplied if only one set of weights (or subclasses) are specified; otherwise, the name of the set of weights should be specified.
If multiple stats
are requested, the order will be determined by the first entry to stats
(e.g., if both "mean.diffs" and "ks.statistics" are requested, and var.order = "unadjusted"
, the variables will be displayed in order of the unadjusted mean differences for both plots). If multiple plots are produced simultaneously (i.e., for individual clusters or imputations), var.order
can only be NULL
or "alphabetical".
If a love.plot
object is supplied, the plot being drawn will use the variable order in the supplied love.plot
object. This can be useful when making more than one plot and the variable order should be the same across plots.
Variable names using var.names
The default in love.plot()
is to present variables as they are named in the output of the call to bal.tab()
, so it is important to know this output before specifying alternate variable names when using var.names
, as the displayed variable names may differ from those in the original data.
There are several ways to specify alternate names for presentation in the displayed plot using the var.names
argument by specifying a list of old and new variable names, pairing the old name with the new name. You can do this in three ways: 1) use a vector or list of new variable names, with the names
of the values the old variable names; 2) use a data frame with exactly one column containing the new variable names and the row names containing the old variable names; or 3) use a data frame with two columns, the first (or the one named "old") containing the old variable names and the second (or the one named "new") containing the new variable names. If a variable in the output from bal.tab()
is not provided in the list of old variable names, love.plot()
will use the original old variable name.
love.plot()
can replace old variables names with new ones based on exact matching for the name strings or matching using the variable name components. For example, if a factor variable "X"
with levels "a"
, "b"
, and "c"
is displayed with love.plot()
, the variables "X_a"
, "X_b"
, and "X_c"
will be displayed. You can enter replacement names for all three variables individually with var.names
, or you can simply specify a replacement name for "X"
, and "X"
will be replaced by the given name in all instances it appears, including not just factor expansions, but also polynomials and interactions in int = TRUE
in the original bal.tab()
call. In an interaction with another variable, say "Y"
, there are several ways to replace the name of the interaction term "X_a * Y"
. If the entire string ("X_a * Y"
) is included in var.names
, the entire string will be replaced. If "X_a"
is included in var.names
, only it will be replaced (and it will be replaced everywhere else it appears). If "X"
is included in var.names
, only it will be replaced (and it will be replaced everywhere else it appears). See example at var.names()
.
Stars and the x-axis label with mean differences
When mean differences are to be displayed, love.plot()
attempts to figure out the appropriate label for the x-axis. If all mean differences are standardized, the x-axis label will be "Standardized Mean Differences". If all mean differences are raw (i.e., unstandardized), the x-axis label will be "Mean Differences". Otherwise, love.plot()
turns to the stars
argument. If "raw", the x-axis label will be "Standardized Mean Differences" (i.e., because un-starred variables have standardized mean differences displayed). If "std", the x-axis label will be "Mean Differences" (i.e., because un-starred variables have raw mean differences displayed). If "none", the x-axis label will be "Mean Differences" and a warning will be issued recommending the use of stars
.
The default is to display standardized mean differences for continuous variables, raw mean differences for binary variables, and no stars, so this warning will be issued in most default uses of love.plot()
. The purpose of this is to correct behavior of previous versions of cobalt in which the default x-axis label was "Mean Differences", even when standardized mean differences were displayed, yielding a potentially misleading plot. This warning requires the user to think about what values are being displayed. The idea of using stars
is that the user can, in a caption for the plot, explain that variables with an asterisk have standardized (or raw) mean differences display, in contrast to un-starred variables.
Value
When only one type of balance statistic is requested, the returned object is a standard ggplot
object that can be manipulated using ggplot2 syntax. This facilitates changing fonts, background colors, and features of the legend outside of what love.plot()
provides automatically.
When more than one type of balance statistic is requested, the plot is constructed using gridExtra::arrangeGrob()
in gridExtra
, which arranges multiple plots and their shared legend into one plot. Because the output of arrangeGrob
is a gtable
object, its features cannot be manipulated in the standard way. Use the themes
argument to change theme elements of the component plots. The original plots are stored in the "plots"
attribute of the output object.
Note
love.plot
can also be called by using plot()
or autoplot()
on a bal.tab
object. If used in this way, some messages may appear twice. It is recommended that you just use love.plot()
instead.
See Also
bal.tab()
, vignette("love.plot")
Examples
data("lalonde", package = "cobalt")
## Propensity score weighting
w.out1 <- WeightIt::weightit(treat ~ age + educ + race + married +
nodegree + re74 + re75,
data = lalonde)
love.plot(w.out1, thresholds = c(m = .1), var.order = "unadjusted")
## Using alternate variable names
v <- data.frame(old = c("age", "educ", "race_black", "race_hispan",
"race_white", "married", "nodegree", "re74",
"re75", "distance"),
new = c("Age", "Years of Education", "Black",
"Hispanic", "White", "Married", "No Degree",
"Earnings 1974", "Earnings 1975",
"Propensity Score"))
love.plot(w.out1, stats = "m", threshold = .1,
var.order = "unadjusted", var.names = v)
#Using multiple stats
love.plot(w.out1, stats = c("m", "ks"),
thresholds = c(m = .1, ks = .05),
var.order = "unadjusted", var.names = v, stars = "raw",
position = "bottom", wrap = 20)
#Changing visual elements
love.plot(w.out1, thresholds = c(m = .1),
var.order = "unadjusted", var.names = v, abs = TRUE,
shapes = c("triangle filled", "circle"),
colors = c("red", "blue"), line = TRUE,
grid = FALSE, sample.names = c("Original", "Weighted"),
stars = "raw", position = "top")
Print Results of a Call to bal.tab()
Description
Prints bal.tab()
output in a clean way. Provides options for printing.
Usage
## S3 method for class 'bal.tab'
print(
x,
imbalanced.only,
un,
disp.bal.tab,
disp.call,
stats,
disp.thresholds,
disp,
which.subclass,
subclass.summary,
which.imp,
imp.summary,
imp.fun,
which.treat,
multi.summary,
which.time,
msm.summary,
which.cluster,
cluster.summary,
cluster.fun,
digits = max(3L, getOption("digits") - 3),
...
)
Arguments
x |
a |
imbalanced.only |
|
un |
|
disp.bal.tab |
|
disp.call |
|
stats |
|
disp.thresholds |
|
disp |
|
which.subclass |
when used with subclassification, which subclass(es) to display. If |
subclass.summary |
|
which.imp |
when used with multiply imputed data, which imputation(s) to display. If |
imp.summary |
|
imp.fun |
|
which.treat |
when used with multi-category treatments, which treatments to display. See |
multi.summary |
|
which.time |
when used with longitudinal treatments, which time periods to display if longitudinal treatments are used. See |
msm.summary |
|
which.cluster |
when used with clustered data, which cluster(s) to display. If |
cluster.summary |
|
cluster.fun |
|
digits |
the number of digits to display. |
... |
further arguments passed to or from other methods. |
Details
Simply calling bal.tab()
will print its results, but it can be useful to store the results into an object and print them again later, possibly with different print options specified. The print()
function automatically dispatches the correct method for the bal.tab
object given.
Any parameter used in bal.tab()
for calculations, such as int
, addl
, or distance
, cannot be used with print()
; only those parameters listed above, those that solely determine printing options, can be used. To change computation options, a new call to bal.tab()
must be performed.
Prior versions of print()
had separate methods for each bal.tab
class. Now they are dispatched internally.
Note
Unless quick = FALSE
in the original call to bal.tab()
(which is not the default), some values may not be calculated, in which case using print()
will not display these values even when requested. For example, if stats = "m"
and quick = TRUE
in the original call to bal.tab()
(the default for both), setting stats = "ks"
in print()
will not print the KS statistics because they were not calculated.
See Also
display-options
for further information on some of these options.
Examples
data("lalonde", package = "cobalt")
w.out <- WeightIt::weightit(treat ~ age + educ + married +
race + re74 + re75,
data = lalonde)
b <- bal.tab(w.out, stats = c("m", "v", "ks"),
un = TRUE, v.threshold = 2)
print(b, un = FALSE, stats = c("m", "v"),
disp.thresholds = c(v = FALSE))
Set and Get Options in cobalt
Description
Makes it easier to set cobalt options. set.cobalt.options()
is essentially a wrapper for options()
but performs several checks, and get.cobalt.options()
is essentially a wrapper for getOption()
.
Usage
set.cobalt.options(..., default = FALSE)
get.cobalt.options(...)
Arguments
... |
For For |
default |
if |
Details
When an option is set to NULL
, it is set to its default value. The defaults are not displayed but are listed on the help pages where they appear. Most options correspond to display options, which can be accessed here. Some others (e.g., continuous
and binary
) are described on the bal.tab()
help page.
See Also
-
display-options
for some arguments that can be set via options.
Examples
# Set un to be TRUE to always display unadjusted
# balance measures and set binary to "std" to
# produce standardized mean differences for
# binary variables.
set.cobalt.options(un = TRUE, binary = "std")
# Note: the above is equivalent to:
# options(cobalt_un = TRUE, cobalt_binary = "std")
# but performs some additional checks
get.cobalt.options("un", "binary")
# Note: the above is equivalent to:
# getOption("cobalt_un")
# getOption("cobalt_binary")
# Return all cobalt options to their defaults
set.cobalt.options(default = TRUE)
# View all available options
get.cobalt.options()
Split and Unsplit Factors into Dummy Variables
Description
splitfactor()
splits factor variables into dummy (0/1) variables. This can be useful when functions do not process factor variables well or require numeric matrices to operate. unsplitfactor()
combines dummy variables into factor variables, undoing the operation of splitfactor()
.
Usage
splitfactor(
data,
var.name,
drop.level = NULL,
drop.first = TRUE,
drop.singleton = FALSE,
drop.na = TRUE,
sep = "_",
replace = TRUE,
split.with = NULL,
check = TRUE
)
unsplitfactor(
data,
var.name,
dropped.level = NULL,
dropped.na = TRUE,
sep = "_",
replace = TRUE
)
Arguments
data |
A |
var.name |
For |
drop.level |
The name of a level of |
drop.first |
Whether to drop the first dummy created for each factor. If |
drop.singleton |
Whether to drop a factor variable if it only has one level. |
drop.na |
If |
sep |
A character separating the the stem from the value of the variable for each dummy. For example, for |
replace |
Whether to replace the original variable(s) with the new variable(s) ( |
split.with |
A list of vectors or factors with lengths equal to the number of columns of |
check |
Whether to make sure the variables specified in |
dropped.level |
The value of each original factor variable whose dummy was dropped when the variable was split. If left empty and a dummy was dropped, the resulting factor will have the value |
dropped.na |
If |
Details
If there are NA
s in the variable to be split, the new variables created by splitfactor()
will have NA
where the original variable is NA
.
When using unsplitfactor()
on a data.frame
that was generated with splitfactor()
, the arguments dropped.na
, and sep
are unnecessary.
If split.with
is supplied, the elements will be split in the same way data
is. For example, if data
contained a 4-level factor that was to be split, the entries of split.with
at the same index as the factor and would be duplicated so that resulting entries will have the same length as the number of columns of data
after being split. The resulting values are stored in the "split.with"
attribute of the output object. See Examples.
Value
For splitfactor()
, a data.frame
containing the original data set with the newly created dummies. For unsplitfactor()
. a data.frame
containing the original data set with the newly created factor variables.
See Also
Examples
data("lalonde", package = "cobalt")
lalonde.split <- splitfactor(lalonde, "race",
replace = TRUE,
drop.first = TRUE)
# A data set with "race_hispan" and "race_white" instead
# of "race".
lalonde.unsplit <- unsplitfactor(lalonde.split, "race",
replace = TRUE,
dropped.level = "black")
all.equal(lalonde, lalonde.unsplit) #TRUE
# Demonstrating the use of split.with:
to.split <- list(letters[1:ncol(lalonde)],
1:ncol(lalonde))
lalonde.split <- splitfactor(lalonde, split.with = to.split,
drop.first = FALSE)
attr(lalonde.split, "split.with")
Extract Variable Names from bal.tab
Objects
Description
This function extracts variable names from a bal.tab
object for use in specifying alternate variable names in love.plot()
. Optionally, a file can be written for easy editing of names.
Usage
var.names(b, type, file = NULL, minimal = FALSE)
Arguments
b |
a |
type |
the type of output desired. Can either be |
file |
optional; a file name to save the output if |
minimal |
whether the output should contain all variable names (i.e., all rows that appear the output of |
Details
The goal of the function is to make supplying new variable names to the var.names
argument in love.plot()
easier. Rather than manually creating a vector or data.frame
with all the variable names that one desires to change, one can use var.names()
to extract variable names from a bal.tab
object and edit the output. Importantly, the output can be saved to a CSV file, which can be easily edited and read back into R for use in love.plot()
, as demonstrated in the Example.
When minimal = TRUE
, only a minimal set of variables will be output. For example, if the variables analyzed in bal.tab()
are age
, race
, and married
, and int = TRUE
in bal.tab()
, many variables will appear in the output, including expansions of the factor variables, the polynomial terms, and the interactions. Rather than renaming all of these variables individually, one can rename just the three base variables, and all variables that arise from them will be accordingly renamed. Setting minimal = TRUE
requests only these base variables.
Value
If type = "vec"
, a character vector the the variable names both as the names and the entries.
If type = "df"
, a data.frame
with two columns called "old"
and "new"
, each with the variables as the entries.
If file is not NULL
, the output will be returned invisibly.
Note
Not all programs can properly read the Unicode characters for the polynomial terms when requested. These may appear strange in, e.g., Excel, but R will process the characters correctly.
Examples
data(lalonde, package = "cobalt")
b1 <- bal.tab(treat ~ age + race + married, data = lalonde,
int = TRUE)
v1 <- var.names(b1, type = "vec", minimal = TRUE)
v1["age"] <- "Age (Years)"
v1["race"] <- "Race/Eth"
v1["married"] <- "Married"
love.plot(b1, var.names = v1)
## Not run:
b2 <- bal.tab(treat ~ age + race + married + educ + nodegree +
re74 + re75 + I(re74==0) + I(re75==0),
data = lalonde)
var.names(b2, file = "varnames.csv")
##Manually edit the CSV (e.g., in Excel), then save it.
v2 <- read.csv("varnames.csv")
love.plot(b2, var.names = v2)
## End(Not run)