Title: | Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests |
Version: | 0.28.0 |
Description: | Compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and machine learning models in R. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference. Details can be found in Arel-Bundock, Greifer, and Heiss (2024) <doi:10.18637/jss.v111.i09>. |
License: | GPL (≥ 3) |
Copyright: | inst/COPYRIGHTS |
Encoding: | UTF-8 |
URL: | https://marginaleffects.com/ |
BugReports: | https://github.com/vincentarelbundock/marginaleffects/issues |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 3.6.0) |
Imports: | backports, checkmate, data.table (≥ 1.16.4), generics, Formula, insight (≥ 1.3.0), methods, rlang, Rcpp (≥ 1.0.0) |
LinkingTo: | Rcpp, RcppEigen |
Suggests: | AER, Amelia, afex, aod, bench, betareg, BH, bife, biglm, blme, boot, brglm2, brms, brmsmargins, broom, car, carData, causaldata, clarify, cjoint, cobalt, collapse, conflicted, countrycode, covr, crch, DALEXtra, DCchoice, dbarts, distributional, dfidx, dplyr, emmeans, equivalence, estimatr, fixest, flexsurv, fmeffects, fontquiver, future, future.apply, fwb, gam, gamlss, gamlss.dist, geepack, ggdist, ggokabeito, ggplot2, ggrepel, glmmTMB, glmtoolbox, glmx, haven, here, itsadug, ivreg, kableExtra, lme4, lmerTest, logistf, magrittr, MatchIt, MASS, mclogit, MCMCglmm, mhurdle, missRanger, mgcv, mice, miceadds, mlogit, mlr3verse, modelbased, modelsummary, multcomp, mvtnorm, nanoparquet, nlme, nnet, numDeriv, nycflights13, optmatch, ordbetareg, ordinal, parameters, parsnip, partykit, patchwork, pkgdown, phylolm, pbapply, plm, polspline, posterior, pscl, purrr, quantreg, Rchoice, REndo, rcmdcheck, remotes, reticulate, rmarkdown, rms, robust, robustbase, robustlmm, rsample, rstanarm, rstantools, rstpm2, rstudioapi, rsvg, sampleSelection, sandwich, scam, spelling, speedglm, splines, survey, survival, svglite, systemfit, systemfonts, tibble, tictoc, tidymodels, tidyr, tidyverse, tinysnapshot (≥ 0.0.7), tinytable (≥ 0.6.1), tinytest, titanic, truncreg, tsModel, withr, workflows, yaml, xgboost, testthat (≥ 3.0.0), altdoc, knitr, quarto |
Collate: | 'RcppExports.R' 'backtransform.R' 'broom.R' 'by.R' 'comparisons.R' 'complete_levels.R' 'config.R' 'construct_call.R' 'datagrid.R' 'equivalence.R' 'get_ci.R' 'get_coef.R' 'get_contrast_data.R' 'get_contrast_data_character.R' 'get_contrast_data_factor.R' 'get_contrast_data_logical.R' 'get_contrast_data_numeric.R' 'get_contrasts.R' 'get_dataset.R' 'get_degrees_of_freedom.R' 'get_draws.R' 'get_group_names.R' 'get_jacobian.R' 'get_labels.R' 'get_model_matrix.R' 'get_model_matrix_attribute.R' 'get_modeldata.R' 'get_predict.R' 'get_se_delta.R' 'get_vcov.R' 'github_issue.R' 'hush.R' 'hypotheses.R' 'hypotheses_joint.R' 'hypothesis.R' 'hypothesis_formula.R' 'hypothesis_function.R' 'hypothesis_matrix.R' 'hypothesis_string.R' 'imputation.R' 'inferences.R' 'inferences_boot.R' 'inferences_conformal.R' 'inferences_fwb.R' 'inferences_rsample.R' 'inferences_simulation.R' 'matrix_apply.R' 'mean_or_mode.R' 'methods.R' 'set_coef.R' 'methods_AER.R' 'methods_MASS.R' 'methods_MCMCglmm.R' 'methods_Rchoice.R' 'methods_WeightIt.R' 'methods_afex.R' 'methods_aod.R' 'methods_betareg.R' 'methods_bife.R' 'methods_biglm.R' 'methods_nnet.R' 'methods_brglm2.R' 'sanity_model.R' 'methods_brms.R' 'methods_crch.R' 'methods_dataframe.R' 'methods_dbarts.R' 'methods_fixest.R' 'methods_flexsurv.R' 'methods_gamlss.R' 'methods_glmmTMB.R' 'methods_glmtoolbox.R' 'methods_glmx.R' 'methods_lme4.R' 'methods_mclogit.R' 'methods_mgcv.R' 'methods_mhurdle.R' 'methods_mlm.R' 'methods_mlogit.R' 'methods_mlr3.R' 'methods_nlme.R' 'methods_ordinal.R' 'methods_plm.R' 'methods_pscl.R' 'methods_quantreg.R' 'methods_rms.R' 'methods_robustlmm.R' 'methods_rstanarm.R' 'methods_rstpm2.R' 'methods_sampleSelection.R' 'methods_scam.R' 'methods_stats.R' 'methods_survey.R' 'methods_survival.R' 'methods_systemfit.R' 'methods_tidymodels.R' 'methods_tobit1.R' 'modelarchive.R' 'multcomp.R' 'myTryCatch.R' 'package.R' 'plot.R' 'plot_build.R' 'plot_comparisons.R' 'plot_predictions.R' 'plot_slopes.R' 'predictions.R' 'print.R' 'recall.R' 'sanitize_comparison.R' 'sanitize_condition.R' 'sanitize_conf_level.R' 'sanitize_estimator.R' 'sanitize_hypothesis.R' 'sanitize_hypothesis_formula.R' 'sanitize_interaction.R' 'sanitize_newdata.R' 'sanitize_numderiv.R' 'sanitize_reserved.R' 'sanitize_type.R' 'sanitize_variables.R' 'sanitize_vcov.R' 'sanity.R' 'sanity_by.R' 'sanity_dots.R' 'settings.R' 'slopes.R' 'sort.R' 'tinytest.R' 'type_dictionary.R' 'unpack_matrix_cols.R' 'utils.R' 'zzz.R' |
Language: | en-US |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr, quarto |
NeedsCompilation: | yes |
Packaged: | 2025-06-24 23:39:50 UTC; vincent |
Author: | Vincent Arel-Bundock
|
Maintainer: | Vincent Arel-Bundock <vincent.arel-bundock@umontreal.ca> |
Repository: | CRAN |
Date/Publication: | 2025-06-25 11:50:02 UTC |
Comparisons Between Predictions Made With Different Regressor Values
Description
Predict the outcome variable at different regressor values (e.g., college
graduates vs. others), and compare those predictions by computing a difference,
ratio, or some other function. comparisons()
can return many quantities of
interest, such as contrasts, differences, risk ratios, changes in log odds, lift,
slopes, elasticities, etc.
-
comparisons()
: unit-level (conditional) estimates. -
avg_comparisons()
: average (marginal) estimates.
variables
identifies the focal regressors whose "effect" we are interested in. comparison
determines how predictions with different regressor values are compared (difference, ratio, odds, etc.). The newdata
argument and the datagrid()
function control where statistics are evaluated in the predictor space: "at observed values", "at the mean", "at representative values", etc.
See the comparisons chapter on the package website for worked examples and case studies:
Usage
comparisons(
model,
newdata = NULL,
variables = NULL,
comparison = "difference",
type = NULL,
vcov = TRUE,
by = FALSE,
conf_level = 0.95,
transform = NULL,
cross = FALSE,
wts = FALSE,
hypothesis = NULL,
equivalence = NULL,
df = Inf,
eps = NULL,
numderiv = "fdforward",
...
)
avg_comparisons(
model,
newdata = NULL,
variables = NULL,
type = NULL,
vcov = TRUE,
by = TRUE,
conf_level = 0.95,
comparison = "difference",
transform = NULL,
cross = FALSE,
wts = FALSE,
hypothesis = NULL,
equivalence = NULL,
df = Inf,
eps = NULL,
numderiv = "fdforward",
...
)
Arguments
model |
Model object |
newdata |
Grid of predictor values at which we evaluate the comparisons.
|
variables |
Focal variables
|
comparison |
How should pairs of predictions be compared? Difference, ratio, odds ratio, or user-defined functions.
|
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
by |
Aggregate unit-level estimates (aka, marginalize, average over). Valid inputs:
|
conf_level |
numeric value between 0 and 1. Confidence level to use to build a confidence interval. |
transform |
string or function. Transformation applied to unit-level estimates and confidence intervals just before the function returns results. Functions must accept a vector and return a vector of the same length. Support string shortcuts: "exp", "ln" |
cross |
|
wts |
logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in
|
hypothesis |
specify a hypothesis test or custom contrast using a number , formula, string equation, vector, matrix, or function.
|
equivalence |
Numeric vector of length 2: bounds used for the two-one-sided test (TOST) of equivalence, and for the non-inferiority and non-superiority tests. See Details section below. |
df |
Degrees of freedom used to compute p values and confidence intervals.
|
eps |
NULL or numeric value which determines the step size to use when
calculating numerical derivatives: (f(x+eps)-f(x))/eps. When |
numderiv |
string or list of strings indicating the method to use to for the numeric differentiation used in to compute delta method standard errors.
|
... |
Additional arguments are passed to the |
Value
A data.frame
with one row per estimate. This data frame is pretty-printed by default, but users can interact with it as a regular data frame, with functions like nrow()
, head()
, colnames()
, etc. Values can be extracted using standard [,]
or $
operators, and manipulated using external packages like dplyr
or data.table
.
Columns may include:
-
rowid
: row number of thenewdata
data frame -
group
: (optional) value of the grouped outcome (e.g., categorical outcome models) -
term
: the focal variable. -
estimate
: an estimate of the prediction, counterfactual comparison, or slope. -
std.error
: standard errors computed via the delta method. -
p.value
: p value associated to theestimate
column. The null is determined by thehypothesis
argument (0 by default). -
s.value
: Shannon information transforms of p values. See the S values vignette at https://marginaleffects.com the marginaleffects website. -
conf.low
: lower bound of the confidence (or credible) interval defined by theconf_level
argument. -
conf.high
: upper bound of the confidence (or credible) interval defined by theconf_level
argument. -
predicted_lo
: predicted outcome for the "low" value of the focal predictor in a counterfactual comparison. -
predicted_hi
: predicted outcome for the "high" value of the focal predictor in a counterfactual comparison.
See ?print.marginaleffects
for printing options.
Functions
-
avg_comparisons()
: Average comparisons
Standard errors using the delta method
Standard errors for all quantities estimated by marginaleffects
can be obtained via the delta method. This requires differentiating a function with respect to the coefficients in the model using a finite difference approach. In some models, the delta method standard errors can be sensitive to various aspects of the numeric differentiation strategy, including the step size. By default, the step size is set to 1e-8
, or to 1e-4
times the smallest absolute model coefficient, whichever is largest.
marginaleffects
can delegate numeric differentiation to the numDeriv
package, which allows more flexibility. To do this, users can pass arguments to the numDeriv::jacobian
function through a global option. For example:
-
options(marginaleffects_numDeriv = list(method = "simple", method.args = list(eps = 1e-6)))
-
options(marginaleffects_numDeriv = list(method = "Richardson", method.args = list(eps = 1e-5)))
-
options(marginaleffects_numDeriv = NULL)
See the "Uncertainty" chapter on the marginaleffects
website for more details on the computation of standard errors, bootstrapping, and more:
https://marginaleffects.com/chapters/uncertainty.html
Model-Specific Arguments
Some model types allow model-specific arguments to modify the nature of
marginal effects, predictions, marginal means, and contrasts. Please report
other package-specific predict()
arguments on Github so we can add them to
the table below.
https://github.com/vincentarelbundock/marginaleffects/issues
Package | Class | Argument | Documentation |
brms | brmsfit | ndraws | brms::posterior_predict |
re_formula | brms::posterior_predict | ||
lme4 | merMod | re.form | lme4::predict.merMod |
allow.new.levels | lme4::predict.merMod | ||
glmmTMB | glmmTMB | re.form | glmmTMB::predict.glmmTMB |
allow.new.levels | glmmTMB::predict.glmmTMB | ||
zitype | glmmTMB::predict.glmmTMB | ||
mgcv | bam | exclude | mgcv::predict.bam |
gam | exclude | mgcv::predict.gam | |
robustlmm | rlmerMod | re.form | robustlmm::predict.rlmerMod |
allow.new.levels | robustlmm::predict.rlmerMod | ||
MCMCglmm | MCMCglmm | ndraws | |
sampleSelection | selection | part | sampleSelection::predict.selection |
comparison argument functions
The following transformations can be applied by supplying one of the shortcut strings to the
comparison
argument.
hi
is a vector of adjusted predictions for the "high" side of the
contrast. lo
is a vector of adjusted predictions for the "low" side of the
contrast. y
is a vector of adjusted predictions for the original data. x
is the predictor in the original data. eps
is the step size to use to
compute derivatives and elasticities.
Shortcut | Function |
difference | \(hi, lo) hi - lo |
differenceavg | \(hi, lo) mean(hi - lo) |
dydx | \(hi, lo, eps) (hi - lo)/eps |
eyex | \(hi, lo, eps, y, x) (hi - lo)/eps * (x/y) |
eydx | \(hi, lo, eps, y, x) ((hi - lo)/eps)/y |
dyex | \(hi, lo, eps, x) ((hi - lo)/eps) * x |
dydxavg | \(hi, lo, eps) mean((hi - lo)/eps) |
eyexavg | \(hi, lo, eps, y, x) mean((hi - lo)/eps * (x/y)) |
eydxavg | \(hi, lo, eps, y, x) mean(((hi - lo)/eps)/y) |
dyexavg | \(hi, lo, eps, x) mean(((hi - lo)/eps) * x) |
ratio | \(hi, lo) hi/lo |
ratioavg | \(hi, lo) mean(hi)/mean(lo) |
lnratio | \(hi, lo) log(hi/lo) |
lnratioavg | \(hi, lo) log(mean(hi)/mean(lo)) |
lnor | \(hi, lo) log((hi/(1 - hi))/(lo/(1 - lo))) |
lnoravg | \(hi, lo) log((mean(hi)/(1 - mean(hi)))/(mean(lo)/(1 - mean(lo)))) |
lift | \(hi, lo) (hi - lo)/lo |
liftavg | \(hi, lo) (mean(hi - lo))/mean(lo) |
expdydx | \(hi, lo, eps) ((exp(hi) - exp(lo))/exp(eps))/eps |
expdydxavg | \(hi, lo, eps) mean(((exp(hi) - exp(lo))/exp(eps))/eps) |
Bayesian posterior summaries
By default, credible intervals in bayesian models are built as equal-tailed intervals. This can be changed to a highest density interval by setting a global option:
options("marginaleffects_posterior_interval" = "eti")
options("marginaleffects_posterior_interval" = "hdi")
By default, the center of the posterior distribution in bayesian models is identified by the median. Users can use a different summary function by setting a global option:
options("marginaleffects_posterior_center" = "mean")
options("marginaleffects_posterior_center" = "median")
When estimates are averaged using the by
argument, the tidy()
function, or
the summary()
function, the posterior distribution is marginalized twice over.
First, we take the average across units but within each iteration of the
MCMC chain, according to what the user requested in by
argument or
tidy()/summary()
functions. Then, we identify the center of the resulting
posterior using the function supplied to the
"marginaleffects_posterior_center"
option (the median by default).
Equivalence, Inferiority, Superiority
\theta
is an estimate, \sigma_\theta
its estimated standard error, and [a, b]
are the bounds of the interval supplied to the equivalence
argument.
Non-inferiority:
-
H_0
:\theta \leq a
-
H_1
:\theta > a
-
t=(\theta - a)/\sigma_\theta
p: Upper-tail probability
Non-superiority:
-
H_0
:\theta \geq b
-
H_1
:\theta < b
-
t=(\theta - b)/\sigma_\theta
p: Lower-tail probability
Equivalence: Two One-Sided Tests (TOST)
p: Maximum of the non-inferiority and non-superiority p values.
Thanks to Russell V. Lenth for the excellent emmeans
package and documentation which inspired this feature.
Prediction types
The type
argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects
package. Admissible values for type
depend on the model object. When users specify an incorrect value for type
, marginaleffects
will raise an informative error with a list of valid type
values for the specific model object. The first entry in the list in that error message is the default type.
The invlink(link)
is a special type defined by marginaleffects
. It is available for some (but not all) models, and only for the predictions()
function. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type="invlink(link)"
will not always be equivalent to the average of estimates with type="response"
. This type is default when calling predictions()
. It is available—but not default—when calling avg_predictions()
or predictions()
with the by
argument.
Some of the most common type
values are:
response, link, E, Ep, average, class, conditional, count, cum.prob, cumhaz, cumprob, density, detection, disp, ev, expected, expvalue, fitted, hazard, invlink(link), latent, latent_N, linear, linear.predictor, linpred, location, lp, mean, numeric, p, ppd, pr, precision, prediction, prob, probability, probs, quantile, risk, rmst, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob
Order of operations
Behind the scenes, the arguments of marginaleffects
functions are evaluated in this order:
-
newdata
-
variables
-
comparison
andslope
-
by
-
vcov
-
hypothesis
-
transform
Parallel computation
The slopes()
and comparisons()
functions can use parallelism to
speed up computation. Operations are parallelized for the computation of
standard errors, at the model coefficient level. There is always
considerable overhead when using parallel computation, mainly involved
in passing the whole dataset to the different processes. Thus, parallel
computation is most likely to be useful when the model includes many parameters
and the dataset is relatively small.
Warning: In many cases, parallel processing will not be useful at all.
To activate parallel computation, users must load the future.apply
package,
call plan()
function, and set a global option.
options(marginaleffects_parallel = TRUE)
: parallelize delta method computation of standard errors.
options(marginaleffects_parallel_inferences = TRUE)
: parallelize "rsample"
or "fwb"
bootstrap computation in inferences()
.
options(marginaleffects_parallel_packages = TRUE)
: vector of strings with the names of modeling packages used to fit the model, ex: c("survival", "splines")
For example:
library(future.apply) plan("multisession", workers = 4) options(marginaleffects_parallel = FALSE) options(marginaleffects_parallel_inferences = TRUE) options(marginaleffects_parallel_packages = c("survival", "splines")) slopes(model)
To disable parallelism in marginaleffects
altogether, you can set a global option:
options(marginaleffects_parallel = FALSE)
Global options
The behavior of marginaleffects
functions can be modified by setting global options.
Disable some safety checks and warnings:
-
options(marginaleffects_startup_message = FALSE)
Disable the startup message printed on
library(marginaleffects)
.
-
options(marginaleffects_safe = FALSE)
Disable safety checks and warnings.
-
options(marginaleffects_print_omit = c("p.value", "s.value"))
Omit some columns from the printed output.
Enforce lean return objects, sans information about the original model and data, and other ancillary attributes. Note that this will disable some advanced post-processing features and functions like hypotheses.
options(marginaleffects_lean = TRUE)
Other options:
-
marginaleffects_plot_gray
: logical. IfTRUE
, the default color of the plot is gray. Default isFALSE
.
References
Arel-Bundock V, Greifer N, Heiss A (2024). “How to Interpret Statistical Models Using marginaleffects for R and Python.” Journal of Statistical Software, 111(9), 1-32. doi:10.18637/jss.v111.i09 doi:10.18637/jss.v111.i09
Greenland S. 2019. "Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values." The American Statistician. 73(S1): 106–114.
Cole, Stephen R, Jessie K Edwards, and Sander Greenland. 2020. "Surprise!" American Journal of Epidemiology 190 (2): 191–93. doi:10.1093/aje/kwaa136
Examples
library(marginaleffects)
# Linear model
tmp <- mtcars
tmp$am <- as.logical(tmp$am)
mod <- lm(mpg ~ am + factor(cyl), tmp)
avg_comparisons(mod, variables = list(cyl = "reference"))
avg_comparisons(mod, variables = list(cyl = "sequential"))
avg_comparisons(mod, variables = list(cyl = "pairwise"))
# GLM with different scale types
mod <- glm(am ~ factor(gear), data = mtcars)
avg_comparisons(mod, type = "response")
avg_comparisons(mod, type = "link")
# Contrasts at the mean
comparisons(mod, newdata = "mean")
# Contrasts between marginal means
comparisons(mod, newdata = "balanced")
# Contrasts at user-specified values
comparisons(mod, newdata = datagrid(am = 0, gear = tmp$gear))
comparisons(mod, newdata = datagrid(am = unique, gear = max))
m <- lm(mpg ~ hp + drat + factor(cyl) + factor(am), data = mtcars)
comparisons(m, variables = "hp", newdata = datagrid(FUN_factor = unique, FUN_numeric = median))
# Numeric contrasts
mod <- lm(mpg ~ hp, data = mtcars)
avg_comparisons(mod, variables = list(hp = 1))
avg_comparisons(mod, variables = list(hp = 5))
avg_comparisons(mod, variables = list(hp = c(90, 100)))
avg_comparisons(mod, variables = list(hp = "iqr"))
avg_comparisons(mod, variables = list(hp = "sd"))
avg_comparisons(mod, variables = list(hp = "minmax"))
# using a function to specify a custom difference in one regressor
dat <- mtcars
dat$new_hp <- 49 * (dat$hp - min(dat$hp)) / (max(dat$hp) - min(dat$hp)) + 1
modlog <- lm(mpg ~ log(new_hp) + factor(cyl), data = dat)
fdiff <- function(x) data.frame(x, x + 10)
avg_comparisons(modlog, variables = list(new_hp = fdiff))
# Adjusted Risk Ratio
mod <- glm(vs ~ mpg, data = mtcars, family = binomial)
avg_comparisons(mod, comparison = "lnratioavg", transform = exp)
# Adjusted Risk Ratio: Manual specification of the `comparison`
avg_comparisons(
mod,
comparison = function(hi, lo) log(mean(hi) / mean(lo)),
transform = exp)
# cross contrasts
mod <- lm(mpg ~ factor(cyl) * factor(gear) + hp, data = mtcars)
avg_comparisons(mod, variables = c("cyl", "gear"), cross = TRUE)
# variable-specific contrasts
avg_comparisons(mod, variables = list(gear = "sequential", hp = 10))
# hypothesis test: is the `hp` marginal effect at the mean equal to the `drat` marginal effect
mod <- lm(mpg ~ wt + drat, data = mtcars)
comparisons(
mod,
newdata = "mean",
hypothesis = "wt = drat")
# same hypothesis test using row indices
comparisons(
mod,
newdata = "mean",
hypothesis = "b1 - b2 = 0")
# same hypothesis test using numeric vector of weights
comparisons(
mod,
newdata = "mean",
hypothesis = c(1, -1))
# two custom contrasts using a matrix of weights
lc <- matrix(
c(
1, -1,
2, 3),
ncol = 2)
comparisons(
mod,
newdata = "mean",
hypothesis = lc)
# Effect of a 1 group-wise standard deviation change
# First we calculate the SD in each group of `cyl`
# Second, we use that SD as the treatment size in the `variables` argument
library(dplyr)
mod <- lm(mpg ~ hp + factor(cyl), mtcars)
tmp <- mtcars %>%
group_by(cyl) %>%
mutate(hp_sd = sd(hp))
avg_comparisons(mod,
variables = list(hp = function(x) data.frame(x, x + tmp$hp_sd)),
by = "cyl")
# `by` argument
mod <- lm(mpg ~ hp * am * vs, data = mtcars)
comparisons(mod, by = TRUE)
mod <- lm(mpg ~ hp * am * vs, data = mtcars)
avg_comparisons(mod, variables = "hp", by = c("vs", "am"))
library(nnet)
mod <- multinom(factor(gear) ~ mpg + am * vs, data = mtcars, trace = FALSE)
by <- data.frame(
group = c("3", "4", "5"),
by = c("3,4", "3,4", "5"))
comparisons(mod, type = "probs", by = by)
Create a data.frame with all factor or character levels
Description
model.matrix
breaks when newdata
includes a factor
variable, but not all levels are present in the data. This is bad for us
because we often want to get predictions with one (or few) rows, where some
factor levels are inevitably missing.
Usage
complete_levels(x, character_levels = NULL)
Data grids
Description
Generate a data grid of user-specified values for use in the newdata
argument of the predictions()
, comparisons()
, and slopes()
functions. This is useful to define where in the predictor space we want to evaluate the quantities of interest. Ex: the predicted outcome or slope for a 37 year old college graduate.
Usage
datagrid(
...,
model = NULL,
newdata = NULL,
by = NULL,
grid_type = "mean_or_mode",
response = FALSE,
FUN_character = NULL,
FUN_factor = NULL,
FUN_logical = NULL,
FUN_numeric = NULL,
FUN_integer = NULL,
FUN_binary = NULL,
FUN_other = NULL
)
Arguments
... |
named arguments with vectors of values or functions for user-specified variables.
|
model |
Model object |
newdata |
data.frame (one and only one of the |
by |
character vector with grouping variables within which |
grid_type |
character. Determines the functions to apply to each variable. The defaults can be overridden by defining individual variables explicitly in
|
response |
Logical should the response variable be included in the grid, even if it is not specified explicitly. |
FUN_character |
the function to be applied to character variables. |
FUN_factor |
the function to be applied to factor variables. This only applies if the variable in the original data is a factor. For variables converted to factor in a model-fitting formula, for example, |
FUN_logical |
the function to be applied to logical variables. |
FUN_numeric |
the function to be applied to numeric variables. |
FUN_integer |
the function to be applied to integer variables. |
FUN_binary |
the function to be applied to binary variables. |
FUN_other |
the function to be applied to other variable types. |
Details
If datagrid
is used in a predictions()
, comparisons()
, or slopes()
call as the
newdata
argument, the model is automatically inserted in the model
argument of datagrid()
call, and users do not need to specify either the model
or newdata
arguments. The same behavior will occur when the value supplied to newdata=
is a function call which starts with "datagrid". This is intended to allow users to create convenience shortcuts like:
library(marginaleffects) mod <- lm(mpg ~ am + vs + factor(cyl) + hp, mtcars) datagrid_bal <- function(...) datagrid(..., grid_type = "balanced") predictions(model, newdata = datagrid_bal(cyl = 4))
If users supply a model, the data used to fit that model is retrieved using
the insight::get_data
function.
Value
A data.frame
in which each row corresponds to one combination of the named
predictors supplied by the user via the ...
dots. Variables which are not
explicitly defined are held at their mean or mode.
Examples
# The output only has 2 rows, and all the variables except `hp` are at their
# mean or mode.
datagrid(newdata = mtcars, hp = c(100, 110))
# We get the same result by feeding a model instead of a data.frame
mod <- lm(mpg ~ hp, mtcars)
datagrid(model = mod, hp = c(100, 110))
# Use in `marginaleffects` to compute "Typical Marginal Effects". When used
# in `slopes()` or `predictions()` we do not need to specify the
#`model` or `newdata` arguments.
slopes(mod, newdata = datagrid(hp = c(100, 110)))
# datagrid accepts functions
datagrid(hp = range, cyl = unique, newdata = mtcars)
comparisons(mod, newdata = datagrid(hp = fivenum))
# The full dataset is duplicated with each observation given counterfactual
# values of 100 and 110 for the `hp` variable. The original `mtcars` includes
# 32 rows, so the resulting dataset includes 64 rows.
dg <- datagrid(newdata = mtcars, hp = c(100, 110), grid_type = "counterfactual")
nrow(dg)
# We get the same result by feeding a model instead of a data.frame
mod <- lm(mpg ~ hp, mtcars)
dg <- datagrid(model = mod, hp = c(100, 110), grid_type = "counterfactual")
nrow(dg)
tinytest
helper
Description
tinytest
helper
Usage
expect_margins(
results,
margins_object,
se = TRUE,
tolerance = 1e-05,
verbose = FALSE
)
tinytest
helper
Description
tinytest
helper
Usage
expect_predictions(object, se = TRUE, n_row = NULL, n_col = NULL)
tinytest
helper
Description
tinytest
helper
Usage
expect_slopes(object, n_unique = NULL, pct_na = 5, se = TRUE, ...)
Get a named vector of coefficients from a model object
Description
Mostly for internal use, but can be useful because the output is consistent across model classes.
Usage
get_coef(model, ...)
## Default S3 method:
get_coef(model, ...)
## S3 method for class 'polr'
get_coef(model, ...)
## S3 method for class 'multinom_weightit'
get_coef(model, ...)
## S3 method for class 'afex_aov'
get_coef(model, ...)
## S3 method for class 'betareg'
get_coef(model, ...)
## S3 method for class 'multinom'
get_coef(model, ...)
## S3 method for class 'brmultinom'
get_coef(model, ...)
## S3 method for class 'bracl'
get_coef(model, ...)
## S3 method for class 'brmsfit'
get_coef(model, ...)
## S3 method for class 'data.frame'
get_coef(model, ...)
## S3 method for class 'gamlss'
get_coef(model, ...)
## S3 method for class 'glmmTMB'
get_coef(model, ...)
## S3 method for class 'glmgee'
get_coef(model, ...)
## S3 method for class 'merMod'
get_coef(model, ...)
## S3 method for class 'lmerModLmerTest'
get_coef(model, ...)
## S3 method for class 'lmerMod'
get_coef(model, ...)
## S3 method for class 'mblogit'
get_coef(model, ...)
## S3 method for class 'gam'
get_coef(model, ...)
## S3 method for class 'mlm'
get_coef(model, ...)
## S3 method for class 'selection'
get_coef(model, ...)
## S3 method for class 'scam'
get_coef(model, ...)
## S3 method for class 'nls'
get_coef(model, ...)
## S3 method for class 'svyolr'
get_coef(model, ...)
## S3 method for class 'systemfit'
get_coef(model, ...)
## S3 method for class 'workflow'
get_coef(model, ...)
Arguments
model |
Model object |
... |
Additional arguments are passed to the |
Value
A named vector of coefficients. The names must match those of the variance matrix.
Download and Read Datasets from marginaleffects
or Rdatasets
Description
Downloads a dataset from the marginaleffects
or the Rdatasets archives, and return it as a data frame. Opens the documentation as an HTML page. Search available datasets.
https://vincentarelbundock.github.io/Rdatasets/
Usage
get_dataset(dataset = "thornton", package = NULL, docs = FALSE, search = NULL)
Arguments
dataset |
String. Name of the dataset to download.
|
package |
String. Package name that originally published the data. |
docs |
Logical. If TRUE open the documentation using |
search |
Regular expression. Download the dataset index from Rdatasets; search the "Package", "Item", and "Title" columns; and return the matching rows. |
Value
A data frame containing the dataset. library(marginaleffects)
Examples
dat <- get_dataset("Titanic", "Stat2Data")
head(dat)
get_dataset(search = "(?i)titanic")
# View documentation in the browser
get_dataset("Titanic", "Stat2Data", docs = TRUE)
Extract Posterior Draws or Bootstrap Resamples from marginaleffects
Objects
Description
Extract Posterior Draws or Bootstrap Resamples from marginaleffects
Objects
Usage
get_draws(x, shape = "long")
Arguments
x |
An object produced by a |
shape |
string indicating the shape of the output format:
|
Details
If DxP and PxD and the names returned by coef(x)
are unique, marginaleffects
sets parameter names to those names. Otherwise, it sets them to b1
, b2
, etc.
Value
A data.frame with drawid
and draw
columns.
Get levels of the outcome variable in grouped or multivariate models
Description
Get levels of the outcome variable in grouped or multivariate models
Usage
get_group_names(model, ...)
## Default S3 method:
get_group_names(model, ...)
## S3 method for class 'polr'
get_group_names(model, ...)
## S3 method for class 'multinom'
get_group_names(model, ...)
## S3 method for class 'bracl'
get_group_names(model, ...)
## S3 method for class 'brmsfit'
get_group_names(model, ...)
## S3 method for class 'mblogit'
get_group_names(model, type, ...)
## S3 method for class 'mlm'
get_group_names(model, ...)
## S3 method for class 'clm'
get_group_names(model, ...)
## S3 method for class 'hurdle'
get_group_names(model, type = "count", ...)
## S3 method for class 'svyolr'
get_group_names(model, ...)
Arguments
model |
Model object |
... |
Additional arguments are passed to the |
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
Value
A character vector
Get a named model matrix
Description
Get a named model matrix
Usage
get_model_matrix(model, newdata)
## Default S3 method:
get_model_matrix(model, newdata)
Arguments
model |
Model object |
newdata |
Grid of predictor values at which we evaluate the slopes.
|
Get predicted values from a model object (internal function)
Description
Get predicted values from a model object (internal function)
Usage
get_predict(model, newdata, type, ...)
## Default S3 method:
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'polr'
get_predict(model, newdata = insight::get_data(model), type = "probs", ...)
## S3 method for class 'glmmPQL'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'MCMCglmm'
get_predict(model, newdata, type = "response", ndraws = 1000, ...)
## S3 method for class 'afex_aov'
get_predict(model, newdata = NULL, ...)
## S3 method for class 'glimML'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'betareg'
get_predict(model, newdata, type = "response", ...)
## S3 method for class 'bife'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'biglm'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'multinom'
get_predict(model, newdata = insight::get_data(model), type = "probs", ...)
## S3 method for class 'brmultinom'
get_predict(model, newdata = insight::get_data(model), type = "probs", ...)
## S3 method for class 'brmsfit'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'crch'
get_predict(model, newdata = NULL, type = "location", ...)
## S3 method for class 'bart'
get_predict(model, newdata = NULL, ...)
## S3 method for class 'fixest'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'flexsurvreg'
get_predict(model, newdata, type, ...)
## S3 method for class 'gamlss'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'glmmTMB'
get_predict(
model,
newdata = insight::get_data(model),
type = "response",
newparams = NULL,
...
)
## S3 method for class 'glmgee'
get_predict(model, newdata, ...)
## S3 method for class 'merMod'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'lmerModLmerTest'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'lmerMod'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'mblogit'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'mhurdle'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'mlogit'
get_predict(model, newdata, ...)
## S3 method for class 'Learner'
get_predict(model, newdata, type = NULL, ...)
## S3 method for class 'clm'
get_predict(model, newdata = insight::get_data(model), type = "prob", ...)
## S3 method for class 'rq'
get_predict(model, newdata = insight::get_data(model), type = NULL, ...)
## S3 method for class 'rms'
get_predict(model, newdata = insight::get_data(model), type = NULL, ...)
## S3 method for class 'orm'
get_predict(model, newdata = insight::get_data(model), type = NULL, ...)
## S3 method for class 'lrm'
get_predict(model, newdata = insight::get_data(model), type = NULL, ...)
## S3 method for class 'ols'
get_predict(model, newdata = insight::get_data(model), type = NULL, ...)
## S3 method for class 'rlmerMod'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'stanreg'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'stpm2'
get_predict(model, newdata = NULL, ...)
## S3 method for class 'pstpm2'
get_predict(model, newdata = NULL, ...)
## S3 method for class 'gsm'
get_predict(model, newdata = NULL, ...)
## S3 method for class 'aft'
get_predict(model, newdata = NULL, ...)
## S3 method for class 'lm'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'glm'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
## S3 method for class 'svyolr'
get_predict(model, newdata = insight::get_data(model), type = "probs", ...)
## S3 method for class 'svyglm'
get_predict(
model,
newdata = insight::get_data(model),
type = "response",
se.fit = FALSE,
...
)
## S3 method for class 'coxph'
get_predict(model, newdata = insight::get_data(model), type = "lp", ...)
## S3 method for class 'model_fit'
get_predict(model, newdata, type = NULL, ...)
## S3 method for class 'workflow'
get_predict(model, newdata, type = NULL, ...)
## S3 method for class 'tobit1'
get_predict(model, newdata = insight::get_data(model), type = "response", ...)
Arguments
model |
Model object |
newdata |
Grid of predictor values at which we evaluate the slopes.
|
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
... |
Additional arguments are passed to the |
Value
A data.frame of predicted values with a number of rows equal to the
number of rows in newdata
and columns "rowid" and "estimate". A "group"
column is added for multivariate models or models with categorical outcomes.
Take a summary()
style vcov
argument and convert it to
insight::get_varcov()
Description
Take a summary()
style vcov
argument and convert it to
insight::get_varcov()
Usage
get_varcov_args(model, vcov)
Get a named variance-covariance matrix from a model object
Description
Mostly for internal use, but can be useful because the output is consistent across model classes.
Usage
get_vcov(model, ...)
## Default S3 method:
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'MCMCglmm'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'afex_aov'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'glimML'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'biglm'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'brmsfit'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'bart'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'gamlss'
get_vcov(model, ...)
## S3 method for class 'glmmTMB'
get_vcov(model, ...)
## S3 method for class 'mhurdle'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'Learner'
get_vcov(model, ...)
## S3 method for class 'orm'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'stpm2'
get_vcov(model, ...)
## S3 method for class 'pstpm2'
get_vcov(model, ...)
## S3 method for class 'gsm'
get_vcov(model, ...)
## S3 method for class 'aft'
get_vcov(model, ...)
## S3 method for class 'scam'
get_vcov(model, vcov = NULL, ...)
## S3 method for class 'systemfit'
get_vcov(model, ...)
## S3 method for class 'model_fit'
get_vcov(model, type = NULL, ...)
## S3 method for class 'workflow'
get_vcov(model, type = NULL, ...)
Arguments
model |
Model object |
... |
Additional arguments are passed to the |
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
Value
A named square matrix of variance and covariances. The names must match the coefficient names.
(Non-)Linear Tests for Null Hypotheses, Joint Hypotheses, Equivalence, Non Superiority, and Non Inferiority
Description
Uncertainty estimates are calculated as first-order approximate standard errors for linear or non-linear functions of a vector of random variables with known or estimated covariance matrix. In that sense, hypotheses
emulates the behavior of the excellent and well-established car::deltaMethod and car::linearHypothesis functions, but it supports more models; requires fewer dependencies; expands the range of tests to equivalence and superiority/inferiority; and offers convenience features like robust standard errors.
To learn more, read the hypothesis tests vignette, visit the package website, or scroll down this page for a full list of vignettes:
Warning #1: Tests are conducted directly on the scale defined by the type
argument. For some models, it can make sense to conduct hypothesis or equivalence tests on the "link"
scale instead of the "response"
scale which is often the default.
Warning #2: For hypothesis tests on objects produced by the marginaleffects
package, it is safer to use the hypothesis
argument of the original function. Using hypotheses()
may not work in certain environments, in lists, or when working programmatically with *apply style functions.
Warning #3: The tests assume that the hypothesis
expression is (approximately) normally distributed, which for non-linear functions of the parameters may not be realistic. More reliable confidence intervals can be obtained using the inferences()
function with method = "boot"
.
Usage
hypotheses(
model,
hypothesis = NULL,
vcov = NULL,
conf_level = NULL,
df = NULL,
equivalence = NULL,
joint = FALSE,
joint_test = "f",
multcomp = FALSE,
numderiv = "fdforward",
...
)
Arguments
model |
Model object or object generated by the |
hypothesis |
specify a hypothesis test or custom contrast using a number , formula, string equation, vector, matrix, or function.
|
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
conf_level |
NULL or numeric value between 0 and 1. Confidence level to use to build a confidence interval. When |
df |
Degrees of freedom used to compute p values and confidence intervals.
|
equivalence |
Numeric vector of length 2: bounds used for the two-one-sided test (TOST) of equivalence, and for the non-inferiority and non-superiority tests. See Details section below. |
joint |
Joint test of statistical significance. The null hypothesis value can be set using the
|
joint_test |
A character string specifying the type of test, either "f" or "chisq". The null hypothesis is set by the |
multcomp |
Logical or string. If |
numderiv |
string or list of strings indicating the method to use to for the numeric differentiation used in to compute delta method standard errors.
|
... |
Additional arguments are passed to the |
Joint hypothesis tests
The test statistic for the joint Wald test is calculated as (R * theta_hat - r)' * inv(R * V_hat * R') * (R * theta_hat - r) / Q, where theta_hat is the vector of estimated parameters, V_hat is the estimated covariance matrix, R is a Q x P matrix for testing Q hypotheses on P parameters, r is a Q x 1 vector for the null hypothesis, and Q is the number of rows in R. If the test is a Chi-squared test, the test statistic is not normalized.
The p-value is then calculated based on either the F-distribution (for F-test) or the Chi-squared distribution (for Chi-squared test). For the F-test, the degrees of freedom are Q and (n - P), where n is the sample size and P is the number of parameters. For the Chi-squared test, the degrees of freedom are Q.
Equivalence, Inferiority, Superiority
\theta
is an estimate, \sigma_\theta
its estimated standard error, and [a, b]
are the bounds of the interval supplied to the equivalence
argument.
Non-inferiority:
-
H_0
:\theta \leq a
-
H_1
:\theta > a
-
t=(\theta - a)/\sigma_\theta
p: Upper-tail probability
Non-superiority:
-
H_0
:\theta \geq b
-
H_1
:\theta < b
-
t=(\theta - b)/\sigma_\theta
p: Lower-tail probability
Equivalence: Two One-Sided Tests (TOST)
p: Maximum of the non-inferiority and non-superiority p values.
Thanks to Russell V. Lenth for the excellent emmeans
package and documentation which inspired this feature.
Examples
library(marginaleffects)
mod <- lm(mpg ~ hp + wt + factor(cyl), data = mtcars)
hypotheses(mod)
# Test of equality between coefficients
hypotheses(mod, hypothesis = "hp = wt")
# Non-linear function
hypotheses(mod, hypothesis = "exp(hp + wt) = 0.1")
# Robust standard errors
hypotheses(mod, hypothesis = "hp = wt", vcov = "HC3")
# b1, b2, ... shortcuts can be used to identify the position of the
# parameters of interest in the output of
hypotheses(mod, hypothesis = "b2 = b3")
# wildcard
hypotheses(mod, hypothesis = "b* / b2 = 1")
# term names with special characters have to be enclosed in backticks
hypotheses(mod, hypothesis = "`factor(cyl)6` = `factor(cyl)8`")
mod2 <- lm(mpg ~ hp * drat, data = mtcars)
hypotheses(mod2, hypothesis = "`hp:drat` = drat")
# predictions(), comparisons(), and slopes()
mod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)
cmp <- comparisons(mod, newdata = "mean")
hypotheses(cmp, hypothesis = "b1 = b2")
mfx <- slopes(mod, newdata = "mean")
hypotheses(cmp, hypothesis = "b2 = 0.2")
pre <- predictions(mod, newdata = datagrid(hp = 110, mpg = c(30, 35)))
hypotheses(pre, hypothesis = "b1 = b2")
# The `hypothesis` argument can be used to compute standard errors for fitted values
mod <- glm(am ~ hp + mpg, data = mtcars, family = binomial)
f <- function(x) predict(x, type = "link", newdata = mtcars)
p <- hypotheses(mod, hypothesis = f)
head(p)
f <- function(x) predict(x, type = "response", newdata = mtcars)
p <- hypotheses(mod, hypothesis = f)
head(p)
# Complex aggregation
# Step 1: Collapse predicted probabilities by outcome level, for each individual
# Step 2: Take the mean of the collapsed probabilities by group and `cyl`
library(dplyr)
library(MASS)
library(dplyr)
library(magrittr)
dat <- transform(mtcars, gear = factor(gear))
mod <- polr(gear ~ factor(cyl) + hp, dat)
aggregation_fun <- function(x) {
predictions(x, vcov = FALSE) %>%
mutate(group = ifelse(group %in% c("3", "4"), "3 & 4", "5")) %>%
summarize(estimate = sum(estimate), .by = c("rowid", "cyl", "group")) %>%
summarize(estimate = mean(estimate), .by = c("cyl", "group")) %>%
rename(term = cyl)
}
hypotheses(mod, hypothesis = aggregation_fun)
# Equivalence, non-inferiority, and non-superiority tests
mod <- lm(mpg ~ hp + factor(gear), data = mtcars)
p <- predictions(mod, newdata = "median")
hypotheses(p, equivalence = c(17, 18))
mfx <- avg_slopes(mod, variables = "hp")
hypotheses(mfx, equivalence = c(-.1, .1))
cmp <- avg_comparisons(mod, variables = "gear", hypothesis = ~pairwise)
hypotheses(cmp, equivalence = c(0, 10))
# joint hypotheses: character vector
model <- lm(mpg ~ as.factor(cyl) * hp, data = mtcars)
hypotheses(model, joint = c("as.factor(cyl)6:hp", "as.factor(cyl)8:hp"))
# joint hypotheses: regular expression
hypotheses(model, joint = "cyl")
# joint hypotheses: integer indices
hypotheses(model, joint = 2:3)
# joint hypotheses: different null hypotheses
hypotheses(model, joint = 2:3, hypothesis = 1)
hypotheses(model, joint = 2:3, hypothesis = 1:2)
# joint hypotheses: marginaleffects object
cmp <- avg_comparisons(model)
hypotheses(cmp, joint = "cyl")
# Multiple comparison adjustment
# p values and family-wise confidence intervals
cmp <- avg_comparisons(model)
hypotheses(cmp, multcomp = "hochberg")
(EXPERIMENTAL) Bootstrap, Conformal, and Simulation-Based Inference
Description
Warning: This function is experimental. It may be renamed, the user interface may change, or the functionality may migrate to arguments in other marginaleffects
functions.
Apply this function to a marginaleffects
object to change the inferential method used to compute uncertainty estimates.
Usage
inferences(
x,
method,
R = 1000,
conf_type = "perc",
conformal_test = NULL,
conformal_calibration = NULL,
conformal_score = "residual_abs",
estimator = NULL,
...
)
Arguments
x |
Object produced by one of the core |
method |
String
|
R |
Number of resamples, simulations, or cross-validation folds. |
conf_type |
String: type of bootstrap interval to construct.
|
conformal_test |
Data frame of test data for conformal prediction. |
conformal_calibration |
Data frame of calibration data for split conformal prediction ( |
conformal_score |
String. Warning: The
|
estimator |
Function that accepts a data frame, fits a model, applies a marginaleffects function, and returns the object. Only supported with method="rsample" or method="boot". When method="rsample", the output must include a "term" column. This is not always the case for predictions(), in which case users may have to create the column manually. |
... |
|
Details
When method="simulation"
, we conduct simulation-based inference following the method discussed in Krinsky & Robb (1986):
Draw
R
sets of simulated coefficients from a multivariate normal distribution with mean equal to the original model's estimated coefficients and variance equal to the model's variance-covariance matrix (classical, "HC3", or other).Use the
R
sets of coefficients to computeR
sets of estimands: predictions, comparisons, slopes, or hypotheses.Take quantiles of the resulting distribution of estimands to obtain a confidence interval and the standard deviation of simulated estimates to estimate the standard error.
When method="fwb"
, drawn weights are supplied to the model fitting function's weights
argument; if the model doesn't accept non-integer weights, this method should not be used. If weights were included in the original model fit, they are extracted by weights()
and multiplied by the drawn weights. These weights are supplied to the wts
argument of the estimation function (e.g., comparisons()
).
Value
A marginaleffects
object with simulation or bootstrap resamples and objects attached.
References
Krinsky, I., and A. L. Robb. 1986. "On Approximating the Statistical Properties of Elasticities." Review of Economics and Statistics 68 (4): 715–9.
King, Gary, Michael Tomz, and Jason Wittenberg. "Making the most of statistical analyses: Improving interpretation and presentation." American journal of political science (2000): 347-361
Dowd, Bryan E., William H. Greene, and Edward C. Norton. "Computation of standard errors." Health services research 49.2 (2014): 731-750.
Angelopoulos, Anastasios N., and Stephen Bates. 2022. "A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification." arXiv. https://doi.org/10.48550/arXiv.2107.07511.
Barber, Rina Foygel, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. 2020. "Predictive Inference with the Jackknife+." arXiv. http://arxiv.org/abs/1905.02928.
Examples
## Not run:
library(marginaleffects)
library(magrittr)
set.seed(1024)
mod <- lm(Sepal.Length ~ Sepal.Width * Species, data = iris)
# bootstrap
avg_predictions(mod, by = "Species") %>%
inferences(method = "boot")
avg_predictions(mod, by = "Species") %>%
inferences(method = "rsample")
# Fractional (bayesian) bootstrap
avg_slopes(mod, by = "Species") %>%
inferences(method = "fwb") %>%
get_draws("rvar") %>%
data.frame()
# Simulation-based inference
slopes(mod) %>%
inferences(method = "simulation") %>%
head()
# Two-step estimation procedure: Propensity score + G-Computation
lalonde <- get_dataset("lalonde")
estimator <- function(data) {
# Step 1: Estimate propensity scores
fit1 <- glm(treat ~ age + educ + race, family = binomial, data = data)
ps <- predict(fit1, type = "response")
# Step 2: Fit weighted outcome model
m <- lm(re78 ~ treat * (re75 + age + educ + race),
data = data, weight = ps
)
# Step 3: Compute average treatment effect by G-computation
avg_comparisons(m, variables = "treat", wts = ps, vcov = FALSE)
}
inferences(lalonde, method = "rsample", estimator = estimator)
## End(Not run)
Print a marginaleffects object in knitr
Description
Print a marginaleffects object in knitr
Usage
knit_print.marginaleffects(x, ...)
Value
A string with class knit_asis
to be printed in Rmarkdown or Quarto documents.
Plot Conditional or Marginal Comparisons
Description
Plot comparisons on the y-axis against values of one or more predictors (x-axis, colors/shapes, and facets).
The by
argument is used to plot marginal comparisons, that is, comparisons made on the original data, but averaged by subgroups. This is analogous to using the by
argument in the comparisons()
function.
The condition
argument is used to plot conditional comparisons, that is, comparisons made on a user-specified grid. This is analogous to using the newdata
argument and datagrid()
function in a comparisons()
call. All variables whose values are not specified explicitly are treated as usual by datagrid()
, that is, they are held at their mean or mode (or rounded mean for integers). This includes grouping variables in mixed-effects models, so analysts who fit such models may want to specify the groups of interest using the condition
argument, or supply model-specific arguments to compute population-level estimates. See details below.
See the "Plots" vignette and website for tutorials and information on how to customize plots:
https://marginaleffects.com/bonus/plot.html
https://marginaleffects.com
Usage
plot_comparisons(
model,
variables = NULL,
condition = NULL,
by = NULL,
newdata = NULL,
type = NULL,
vcov = NULL,
conf_level = 0.95,
wts = FALSE,
comparison = "difference",
transform = NULL,
rug = FALSE,
gray = getOption("marginaleffects_plot_gray", default = FALSE),
draw = TRUE,
...
)
Arguments
model |
Model object |
variables |
Name of the variable whose contrast we want to plot on the y-axis. |
condition |
Conditional slopes
|
by |
Aggregate unit-level estimates (aka, marginalize, average over). Valid inputs:
|
newdata |
When |
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
conf_level |
numeric value between 0 and 1. Confidence level to use to build a confidence interval. |
wts |
logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in
|
comparison |
How should pairs of predictions be compared? Difference, ratio, odds ratio, or user-defined functions.
|
transform |
string or function. Transformation applied to unit-level estimates and confidence intervals just before the function returns results. Functions must accept a vector and return a vector of the same length. Support string shortcuts: "exp", "ln" |
rug |
TRUE displays tick marks on the axes to mark the distribution of raw data. |
gray |
FALSE grayscale or color plot |
draw |
|
... |
Additional arguments are passed to the |
Value
A ggplot2
object
Model-Specific Arguments
Some model types allow model-specific arguments to modify the nature of
marginal effects, predictions, marginal means, and contrasts. Please report
other package-specific predict()
arguments on Github so we can add them to
the table below.
https://github.com/vincentarelbundock/marginaleffects/issues
Package | Class | Argument | Documentation |
brms | brmsfit | ndraws | brms::posterior_predict |
re_formula | brms::posterior_predict | ||
lme4 | merMod | re.form | lme4::predict.merMod |
allow.new.levels | lme4::predict.merMod | ||
glmmTMB | glmmTMB | re.form | glmmTMB::predict.glmmTMB |
allow.new.levels | glmmTMB::predict.glmmTMB | ||
zitype | glmmTMB::predict.glmmTMB | ||
mgcv | bam | exclude | mgcv::predict.bam |
gam | exclude | mgcv::predict.gam | |
robustlmm | rlmerMod | re.form | robustlmm::predict.rlmerMod |
allow.new.levels | robustlmm::predict.rlmerMod | ||
MCMCglmm | MCMCglmm | ndraws | |
sampleSelection | selection | part | sampleSelection::predict.selection |
Examples
mod <- lm(mpg ~ hp * drat * factor(am), data = mtcars)
plot_comparisons(mod, variables = "hp", condition = "drat")
plot_comparisons(mod, variables = "hp", condition = c("drat", "am"))
plot_comparisons(mod, variables = "hp", condition = list("am", "drat" = 3:5))
plot_comparisons(mod, variables = "am", condition = list("hp", "drat" = range))
plot_comparisons(mod, variables = "am", condition = list("hp", "drat" = "threenum"))
# marginal comparisons
plot_comparisons(mod, variables = "hp", by = "am")
# marginal comparisons on a counterfactual grid
plot_comparisons(mod,
variables = "hp",
by = "am",
newdata = datagrid(am = 0:1, grid_type = "counterfactual")
)
Plot Conditional or Marginal Predictions
Description
Plot predictions on the y-axis against values of one or more predictors (x-axis, colors/shapes, and facets).
The by
argument is used to plot marginal predictions, that is, predictions made on the original data, but averaged by subgroups. This is analogous to using the by
argument in the predictions()
function.
The condition
argument is used to plot conditional predictions, that is, predictions made on a user-specified grid. This is analogous to using the newdata
argument and datagrid()
function in a predictions()
call. All variables whose values are not specified explicitly are treated as usual by datagrid()
, that is, they are held at their mean or mode (or rounded mean for integers). This includes grouping variables in mixed-effects models, so analysts who fit such models may want to specify the groups of interest using the condition
argument, or supply model-specific arguments to compute population-level estimates. See details below.
See the "Plots" vignette and website for tutorials and information on how to customize plots:
https://marginaleffects.com/bonus/plot.html
https://marginaleffects.com
Usage
plot_predictions(
model,
condition = NULL,
by = NULL,
newdata = NULL,
type = NULL,
vcov = NULL,
conf_level = 0.95,
wts = FALSE,
transform = NULL,
points = 0,
rug = FALSE,
gray = getOption("marginaleffects_plot_gray", default = FALSE),
draw = TRUE,
...
)
Arguments
model |
Model object |
condition |
Conditional predictions.
|
by |
Marginal predictions
|
newdata |
When |
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
conf_level |
numeric value between 0 and 1. Confidence level to use to build a confidence interval. |
wts |
logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in
|
transform |
A function applied to unit-level adjusted predictions and confidence intervals just before the function returns results. For bayesian models, this function is applied to individual draws from the posterior distribution, before computing summaries. |
points |
Number between 0 and 1 which controls the transparency of raw data points. 0 (default) does not display any points. Warning: The points displayed are raw data, so the resulting plot is not a "partial residual plot." |
rug |
TRUE displays tick marks on the axes to mark the distribution of raw data. |
gray |
FALSE grayscale or color plot |
draw |
|
... |
Additional arguments are passed to the |
Value
A ggplot2
object or data frame (if draw=FALSE
)
Model-Specific Arguments
Some model types allow model-specific arguments to modify the nature of
marginal effects, predictions, marginal means, and contrasts. Please report
other package-specific predict()
arguments on Github so we can add them to
the table below.
https://github.com/vincentarelbundock/marginaleffects/issues
Package | Class | Argument | Documentation |
brms | brmsfit | ndraws | brms::posterior_predict |
re_formula | brms::posterior_predict | ||
lme4 | merMod | re.form | lme4::predict.merMod |
allow.new.levels | lme4::predict.merMod | ||
glmmTMB | glmmTMB | re.form | glmmTMB::predict.glmmTMB |
allow.new.levels | glmmTMB::predict.glmmTMB | ||
zitype | glmmTMB::predict.glmmTMB | ||
mgcv | bam | exclude | mgcv::predict.bam |
gam | exclude | mgcv::predict.gam | |
robustlmm | rlmerMod | re.form | robustlmm::predict.rlmerMod |
allow.new.levels | robustlmm::predict.rlmerMod | ||
MCMCglmm | MCMCglmm | ndraws | |
sampleSelection | selection | part | sampleSelection::predict.selection |
Prediction types
The type
argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects
package. Admissible values for type
depend on the model object. When users specify an incorrect value for type
, marginaleffects
will raise an informative error with a list of valid type
values for the specific model object. The first entry in the list in that error message is the default type.
The invlink(link)
is a special type defined by marginaleffects
. It is available for some (but not all) models, and only for the predictions()
function. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type="invlink(link)"
will not always be equivalent to the average of estimates with type="response"
. This type is default when calling predictions()
. It is available—but not default—when calling avg_predictions()
or predictions()
with the by
argument.
Some of the most common type
values are:
response, link, E, Ep, average, class, conditional, count, cum.prob, cumhaz, cumprob, density, detection, disp, ev, expected, expvalue, fitted, hazard, invlink(link), latent, latent_N, linear, linear.predictor, linpred, location, lp, mean, numeric, p, ppd, pr, precision, prediction, prob, probability, probs, quantile, risk, rmst, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob
Examples
mod <- lm(mpg ~ hp + wt, data = mtcars)
plot_predictions(mod, condition = "wt")
mod <- lm(mpg ~ hp * wt * am, data = mtcars)
plot_predictions(mod, condition = c("hp", "wt"))
plot_predictions(mod, condition = list("hp", wt = "threenum"))
plot_predictions(mod, condition = list("hp", wt = range))
# marginal predictions
mod <- lm(mpg ~ hp * am, data = mtcars)
plot_predictions(mod, by = "am")
# marginal predictions on a counterfactual grid
plot_predictions(mod,
by = "am",
newdata = datagrid(am = 0:1, grid_type = "counterfactual")
)
Plot Conditional or Marginal Slopes
Description
Plot slopes on the y-axis against values of one or more predictors (x-axis, colors/shapes, and facets).
The by
argument is used to plot marginal slopes, that is, slopes made on the original data, but averaged by subgroups. This is analogous to using the by
argument in the slopes()
function.
The condition
argument is used to plot conditional slopes, that is, slopes computed on a user-specified grid. This is analogous to using the newdata
argument and datagrid()
function in a slopes()
call. All variables whose values are not specified explicitly are treated as usual by datagrid()
, that is, they are held at their mean or mode (or rounded mean for integers). This includes grouping variables in mixed-effects models, so analysts who fit such models may want to specify the groups of interest using the condition
argument, or supply model-specific arguments to compute population-level estimates. See details below.
See the "Plots" vignette and website for tutorials and information on how to customize plots:
https://marginaleffects.com/bonus/plot.html
https://marginaleffects.com
Usage
plot_slopes(
model,
variables = NULL,
condition = NULL,
by = NULL,
newdata = NULL,
type = NULL,
vcov = NULL,
conf_level = 0.95,
wts = FALSE,
slope = "dydx",
rug = FALSE,
gray = getOption("marginaleffects_plot_gray", default = FALSE),
draw = TRUE,
...
)
Arguments
model |
Model object |
variables |
Name of the variable whose marginal effect (slope) we want to plot on the y-axis. |
condition |
Conditional slopes
|
by |
Aggregate unit-level estimates (aka, marginalize, average over). Valid inputs:
|
newdata |
When |
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
conf_level |
numeric value between 0 and 1. Confidence level to use to build a confidence interval. |
wts |
logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in
|
slope |
string indicates the type of slope or (semi-)elasticity to compute:
|
rug |
TRUE displays tick marks on the axes to mark the distribution of raw data. |
gray |
FALSE grayscale or color plot |
draw |
|
... |
Additional arguments are passed to the |
Value
A ggplot2
object
Model-Specific Arguments
Some model types allow model-specific arguments to modify the nature of
marginal effects, predictions, marginal means, and contrasts. Please report
other package-specific predict()
arguments on Github so we can add them to
the table below.
https://github.com/vincentarelbundock/marginaleffects/issues
Package | Class | Argument | Documentation |
brms | brmsfit | ndraws | brms::posterior_predict |
re_formula | brms::posterior_predict | ||
lme4 | merMod | re.form | lme4::predict.merMod |
allow.new.levels | lme4::predict.merMod | ||
glmmTMB | glmmTMB | re.form | glmmTMB::predict.glmmTMB |
allow.new.levels | glmmTMB::predict.glmmTMB | ||
zitype | glmmTMB::predict.glmmTMB | ||
mgcv | bam | exclude | mgcv::predict.bam |
gam | exclude | mgcv::predict.gam | |
robustlmm | rlmerMod | re.form | robustlmm::predict.rlmerMod |
allow.new.levels | robustlmm::predict.rlmerMod | ||
MCMCglmm | MCMCglmm | ndraws | |
sampleSelection | selection | part | sampleSelection::predict.selection |
Examples
library(marginaleffects)
mod <- lm(mpg ~ hp * drat * factor(am), data = mtcars)
plot_slopes(mod, variables = "hp", condition = "drat")
plot_slopes(mod, variables = "hp", condition = c("drat", "am"))
plot_slopes(mod, variables = "hp", condition = list("am", "drat" = 3:5))
plot_slopes(mod, variables = "am", condition = list("hp", "drat" = range))
plot_slopes(mod, variables = "am", condition = list("hp", "drat" = "threenum"))
# marginal slopes
plot_slopes(mod, variables = "hp", by = "am")
# marginal slopes on a counterfactual grid
plot_slopes(mod,
variables = "hp",
by = "am",
newdata = datagrid(am = 0:1, grid_type = "counterfactual")
)
alias to get_draws()
keep forever for backward compatibility with JSS
Description
alias to get_draws()
keep forever for backward compatibility with JSS
Usage
posterior_draws(x, shape = "long")
Predictions
Description
Outcome predicted by a fitted model on a specified scale for a given combination of values of the predictor variables, such as their observed values, their means, or factor levels (a.k.a. "reference grid").
-
predictions()
: unit-level (conditional) estimates. -
avg_predictions()
: average (marginal) estimates.
The newdata
argument and the datagrid()
function can be used to control where statistics are evaluated in the predictor space: "at observed values", "at the mean", "at representative values", etc.
See the predictions vignette and package website for worked examples and case studies:
Usage
predictions(
model,
newdata = NULL,
variables = NULL,
vcov = TRUE,
conf_level = 0.95,
type = NULL,
by = FALSE,
byfun = NULL,
wts = FALSE,
transform = NULL,
hypothesis = NULL,
equivalence = NULL,
df = Inf,
numderiv = "fdforward",
...
)
avg_predictions(
model,
newdata = NULL,
variables = NULL,
vcov = TRUE,
conf_level = 0.95,
type = NULL,
by = TRUE,
byfun = NULL,
wts = FALSE,
transform = NULL,
hypothesis = NULL,
equivalence = NULL,
df = Inf,
numderiv = "fdforward",
...
)
Arguments
model |
Model object |
newdata |
Grid of predictor values at which we evaluate predictions.
|
variables |
Counterfactual variables.
|
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
conf_level |
numeric value between 0 and 1. Confidence level to use to build a confidence interval. |
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
by |
Aggregate unit-level estimates (aka, marginalize, average over). Valid inputs:
|
byfun |
A function such as |
wts |
logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in
|
transform |
A function applied to unit-level adjusted predictions and confidence intervals just before the function returns results. For bayesian models, this function is applied to individual draws from the posterior distribution, before computing summaries. |
hypothesis |
specify a hypothesis test or custom contrast using a number , formula, string equation, vector, matrix, or function.
|
equivalence |
Numeric vector of length 2: bounds used for the two-one-sided test (TOST) of equivalence, and for the non-inferiority and non-superiority tests. See Details section below. |
df |
Degrees of freedom used to compute p values and confidence intervals.
|
numderiv |
string or list of strings indicating the method to use to for the numeric differentiation used in to compute delta method standard errors.
|
... |
Additional arguments are passed to the |
Value
A data.frame
with one row per estimate. This data frame is pretty-printed by default, but users can interact with it as a regular data frame, with functions like nrow()
, head()
, colnames()
, etc. Values can be extracted using standard [,]
or $
operators, and manipulated using external packages like dplyr
or data.table
.
Columns may include:
-
rowid
: row number of thenewdata
data frame -
group
: (optional) value of the grouped outcome (e.g., categorical outcome models) -
term
: the focal variable. -
estimate
: an estimate of the prediction, counterfactual comparison, or slope. -
std.error
: standard errors computed via the delta method. -
p.value
: p value associated to theestimate
column. The null is determined by thehypothesis
argument (0 by default). -
s.value
: Shannon information transforms of p values. See the S values vignette at https://marginaleffects.com the marginaleffects website. -
conf.low
: lower bound of the confidence (or credible) interval defined by theconf_level
argument. -
conf.high
: upper bound of the confidence (or credible) interval defined by theconf_level
argument. -
predicted_lo
: predicted outcome for the "low" value of the focal predictor in a counterfactual comparison. -
predicted_hi
: predicted outcome for the "high" value of the focal predictor in a counterfactual comparison.
See ?print.marginaleffects
for printing options.
Functions
-
avg_predictions()
: Average predictions
Standard errors using the delta method
Standard errors for all quantities estimated by marginaleffects
can be obtained via the delta method. This requires differentiating a function with respect to the coefficients in the model using a finite difference approach. In some models, the delta method standard errors can be sensitive to various aspects of the numeric differentiation strategy, including the step size. By default, the step size is set to 1e-8
, or to 1e-4
times the smallest absolute model coefficient, whichever is largest.
marginaleffects
can delegate numeric differentiation to the numDeriv
package, which allows more flexibility. To do this, users can pass arguments to the numDeriv::jacobian
function through a global option. For example:
-
options(marginaleffects_numDeriv = list(method = "simple", method.args = list(eps = 1e-6)))
-
options(marginaleffects_numDeriv = list(method = "Richardson", method.args = list(eps = 1e-5)))
-
options(marginaleffects_numDeriv = NULL)
See the "Uncertainty" chapter on the marginaleffects
website for more details on the computation of standard errors, bootstrapping, and more:
https://marginaleffects.com/chapters/uncertainty.html
Model-Specific Arguments
Some model types allow model-specific arguments to modify the nature of
marginal effects, predictions, marginal means, and contrasts. Please report
other package-specific predict()
arguments on Github so we can add them to
the table below.
https://github.com/vincentarelbundock/marginaleffects/issues
Package | Class | Argument | Documentation |
brms | brmsfit | ndraws | brms::posterior_predict |
re_formula | brms::posterior_predict | ||
lme4 | merMod | re.form | lme4::predict.merMod |
allow.new.levels | lme4::predict.merMod | ||
glmmTMB | glmmTMB | re.form | glmmTMB::predict.glmmTMB |
allow.new.levels | glmmTMB::predict.glmmTMB | ||
zitype | glmmTMB::predict.glmmTMB | ||
mgcv | bam | exclude | mgcv::predict.bam |
gam | exclude | mgcv::predict.gam | |
robustlmm | rlmerMod | re.form | robustlmm::predict.rlmerMod |
allow.new.levels | robustlmm::predict.rlmerMod | ||
MCMCglmm | MCMCglmm | ndraws | |
sampleSelection | selection | part | sampleSelection::predict.selection |
Bayesian posterior summaries
By default, credible intervals in bayesian models are built as equal-tailed intervals. This can be changed to a highest density interval by setting a global option:
options("marginaleffects_posterior_interval" = "eti")
options("marginaleffects_posterior_interval" = "hdi")
By default, the center of the posterior distribution in bayesian models is identified by the median. Users can use a different summary function by setting a global option:
options("marginaleffects_posterior_center" = "mean")
options("marginaleffects_posterior_center" = "median")
When estimates are averaged using the by
argument, the tidy()
function, or
the summary()
function, the posterior distribution is marginalized twice over.
First, we take the average across units but within each iteration of the
MCMC chain, according to what the user requested in by
argument or
tidy()/summary()
functions. Then, we identify the center of the resulting
posterior using the function supplied to the
"marginaleffects_posterior_center"
option (the median by default).
Equivalence, Inferiority, Superiority
\theta
is an estimate, \sigma_\theta
its estimated standard error, and [a, b]
are the bounds of the interval supplied to the equivalence
argument.
Non-inferiority:
-
H_0
:\theta \leq a
-
H_1
:\theta > a
-
t=(\theta - a)/\sigma_\theta
p: Upper-tail probability
Non-superiority:
-
H_0
:\theta \geq b
-
H_1
:\theta < b
-
t=(\theta - b)/\sigma_\theta
p: Lower-tail probability
Equivalence: Two One-Sided Tests (TOST)
p: Maximum of the non-inferiority and non-superiority p values.
Thanks to Russell V. Lenth for the excellent emmeans
package and documentation which inspired this feature.
Prediction types
The type
argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects
package. Admissible values for type
depend on the model object. When users specify an incorrect value for type
, marginaleffects
will raise an informative error with a list of valid type
values for the specific model object. The first entry in the list in that error message is the default type.
The invlink(link)
is a special type defined by marginaleffects
. It is available for some (but not all) models, and only for the predictions()
function. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type="invlink(link)"
will not always be equivalent to the average of estimates with type="response"
. This type is default when calling predictions()
. It is available—but not default—when calling avg_predictions()
or predictions()
with the by
argument.
Some of the most common type
values are:
response, link, E, Ep, average, class, conditional, count, cum.prob, cumhaz, cumprob, density, detection, disp, ev, expected, expvalue, fitted, hazard, invlink(link), latent, latent_N, linear, linear.predictor, linpred, location, lp, mean, numeric, p, ppd, pr, precision, prediction, prob, probability, probs, quantile, risk, rmst, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob
Order of operations
Behind the scenes, the arguments of marginaleffects
functions are evaluated in this order:
-
newdata
-
variables
-
comparison
andslope
-
by
-
vcov
-
hypothesis
-
transform
Parallel computation
The slopes()
and comparisons()
functions can use parallelism to
speed up computation. Operations are parallelized for the computation of
standard errors, at the model coefficient level. There is always
considerable overhead when using parallel computation, mainly involved
in passing the whole dataset to the different processes. Thus, parallel
computation is most likely to be useful when the model includes many parameters
and the dataset is relatively small.
Warning: In many cases, parallel processing will not be useful at all.
To activate parallel computation, users must load the future.apply
package,
call plan()
function, and set a global option.
options(marginaleffects_parallel = TRUE)
: parallelize delta method computation of standard errors.
options(marginaleffects_parallel_inferences = TRUE)
: parallelize "rsample"
or "fwb"
bootstrap computation in inferences()
.
options(marginaleffects_parallel_packages = TRUE)
: vector of strings with the names of modeling packages used to fit the model, ex: c("survival", "splines")
For example:
library(future.apply) plan("multisession", workers = 4) options(marginaleffects_parallel = FALSE) options(marginaleffects_parallel_inferences = TRUE) options(marginaleffects_parallel_packages = c("survival", "splines")) slopes(model)
To disable parallelism in marginaleffects
altogether, you can set a global option:
options(marginaleffects_parallel = FALSE)
Global options
The behavior of marginaleffects
functions can be modified by setting global options.
Disable some safety checks and warnings:
-
options(marginaleffects_startup_message = FALSE)
Disable the startup message printed on
library(marginaleffects)
.
-
options(marginaleffects_safe = FALSE)
Disable safety checks and warnings.
-
options(marginaleffects_print_omit = c("p.value", "s.value"))
Omit some columns from the printed output.
Enforce lean return objects, sans information about the original model and data, and other ancillary attributes. Note that this will disable some advanced post-processing features and functions like hypotheses.
options(marginaleffects_lean = TRUE)
Other options:
-
marginaleffects_plot_gray
: logical. IfTRUE
, the default color of the plot is gray. Default isFALSE
.
References
Arel-Bundock V, Greifer N, Heiss A (2024). “How to Interpret Statistical Models Using marginaleffects for R and Python.” Journal of Statistical Software, 111(9), 1-32. doi:10.18637/jss.v111.i09 doi:10.18637/jss.v111.i09
Greenland S. 2019. "Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values." The American Statistician. 73(S1): 106–114.
Cole, Stephen R, Jessie K Edwards, and Sander Greenland. 2020. "Surprise!" American Journal of Epidemiology 190 (2): 191–93. doi:10.1093/aje/kwaa136
Examples
# Adjusted Prediction for every row of the original dataset
mod <- lm(mpg ~ hp + factor(cyl), data = mtcars)
pred <- predictions(mod)
head(pred)
# Adjusted Predictions at User-Specified Values of the Regressors
predictions(mod, newdata = datagrid(hp = c(100, 120), cyl = 4))
m <- lm(mpg ~ hp + drat + factor(cyl) + factor(am), data = mtcars)
predictions(m, newdata = datagrid(FUN_factor = unique, FUN_numeric = median))
# Average Adjusted Predictions (AAP)
library(dplyr)
mod <- lm(mpg ~ hp * am * vs, mtcars)
avg_predictions(mod)
predictions(mod, by = "am")
# Conditional Adjusted Predictions
plot_predictions(mod, condition = "hp")
# Counterfactual predictions with the `variables` argument
# the `mtcars` dataset has 32 rows
mod <- lm(mpg ~ hp + am, data = mtcars)
p <- predictions(mod)
head(p)
nrow(p)
# average counterfactual predictions
avg_predictions(mod, variables = "am")
# counterfactual predictions obtained by replicating the entire for different
# values of the predictors
p <- predictions(mod, variables = list(hp = c(90, 110)))
nrow(p)
# hypothesis test: is the prediction in the 1st row equal to the prediction in the 2nd row
mod <- lm(mpg ~ wt + drat, data = mtcars)
predictions(
mod,
newdata = datagrid(wt = 2:3),
hypothesis = "b1 = b2")
# same hypothesis test using row indices
predictions(
mod,
newdata = datagrid(wt = 2:3),
hypothesis = "b1 - b2 = 0")
# same hypothesis test using numeric vector of weights
predictions(
mod,
newdata = datagrid(wt = 2:3),
hypothesis = c(1, -1))
# two custom contrasts using a matrix of weights
lc <- matrix(
c(
1, -1,
2, 3),
ncol = 2)
predictions(
mod,
newdata = datagrid(wt = 2:3),
hypothesis = lc)
# `by` argument
mod <- lm(mpg ~ hp * am * vs, data = mtcars)
predictions(mod, by = c("am", "vs"))
library(nnet)
nom <- multinom(factor(gear) ~ mpg + am * vs, data = mtcars, trace = FALSE)
# first 5 raw predictions
p <- predictions(nom, type = "probs")
head(p)
# average predictions
avg_predictions(nom, type = "probs", by = "group")
by <- data.frame(
group = c("3", "4", "5"),
by = c("3,4", "3,4", "5"))
predictions(nom, type = "probs", by = by)
# sum of predicted probabilities for combined response levels
mod <- multinom(factor(cyl) ~ mpg + am, data = mtcars, trace = FALSE)
by <- data.frame(
by = c("4,6", "4,6", "8"),
group = as.character(c(4, 6, 8)))
predictions(mod, newdata = "mean", byfun = sum, by = by)
Print marginaleffects
objects
Description
This function controls the text which is printed to the console when one of the core marginalefffects
functions is called and the object is returned: predictions()
, comparisons()
, slopes()
, hypotheses()
, avg_predictions()
, avg_comparisons()
, avg_slopes()
.
All of those functions return standard data frames. Columns can be extracted by name, predictions(model)$estimate
, and all the usual data manipulation functions work out-of-the-box: colnames()
, head()
, subset()
, dplyr::filter()
, dplyr::arrange()
, etc.
Some of the data columns are not printed by default. You can disable pretty printing and print the full results as a standard data frame using the style
argument or by applying as.data.frame()
on the object. See examples below.
Usage
## S3 method for class 'marginaleffects'
print(
x,
style = getOption("marginaleffects_print_style", default = "summary"),
digits = getOption("marginaleffects_print_digits", default = 3),
p_eps = getOption("marginaleffects_print_p_eps", default = 0.001),
topn = getOption("marginaleffects_print_topn", default = 5),
nrows = getOption("marginaleffects_print_nrows", default = 30),
ncols = getOption("marginaleffects_print_ncols", default = 30),
type = getOption("marginaleffects_print_type", default = TRUE),
column_names = getOption("marginaleffects_print_column_names", default = FALSE),
...
)
Arguments
x |
An object produced by one of the |
style |
"summary", "data.frame", or "tinytable" |
digits |
The number of digits to display. |
p_eps |
p values smaller than this number are printed in "<0.001" style. |
topn |
The number of rows to be printed from the beginning and end of tables with more than |
nrows |
The number of rows which will be printed before truncation. |
ncols |
The maximum number of column names to display at the bottom of the printed output. |
type |
boolean: should the type be printed? |
column_names |
boolean: should the column names be printed? |
... |
Other arguments are currently ignored. |
Examples
library(marginaleffects)
mod <- lm(mpg ~ hp + am + factor(gear), data = mtcars)
p <- predictions(mod, by = c("am", "gear"))
p
subset(p, am == 1)
print(p, style = "data.frame")
data.frame(p)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Method to raise model-specific warnings and errors
Description
Method to raise model-specific warnings and errors
Usage
## S3 method for class 'glimML'
sanitize_model_specific(model, ...)
## S3 method for class 'betareg'
sanitize_model_specific(model, ...)
sanitize_model_specific(model, ...)
## Default S3 method:
sanitize_model_specific(
model,
vcov = NULL,
calling_function = "marginaleffects",
...
)
## S3 method for class 'brmsfit'
sanitize_model_specific(model, ...)
## S3 method for class 'bart'
sanitize_model_specific(model, ...)
## S3 method for class 'fixest'
sanitize_model_specific(
model,
vcov = TRUE,
calling_function = "predictions",
...
)
## S3 method for class 'glmmTMB'
sanitize_model_specific(model, vcov = TRUE, re.form, ...)
## S3 method for class 'merMod'
sanitize_model_specific(model, re.form, vcov = TRUE, ...)
## S3 method for class 'mblogit'
sanitize_model_specific(model, calling_function = "marginaleffects", ...)
## S3 method for class 'mlogit'
sanitize_model_specific(model, calling_function = NULL, ...)
## S3 method for class 'clm'
sanitize_model_specific(model, ...)
## S3 method for class 'plm'
sanitize_model_specific(model, ...)
## S3 method for class 'plm'
sanitize_model_specific(model, ...)
## S3 method for class 'rqs'
sanitize_model_specific(model, ...)
## S3 method for class 'svyolr'
sanitize_model_specific(model, wts = FALSE, by = FALSE, ...)
## S3 method for class 'svyglm'
sanitize_model_specific(model, wts = FALSE, by = FALSE, ...)
## S3 method for class 'coxph'
sanitize_model_specific(model, vcov, ...)
Arguments
model |
Model object |
... |
Additional arguments are passed to the |
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
wts |
logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in
|
by |
Aggregate unit-level estimates (aka, marginalize, average over). Valid inputs:
|
Value
A warning, an error, or nothing
Internal function to set coefficients
Description
Set the coefficients in a model to different values and return the modified object (internal function)
Usage
set_coef(model, coefs, ...)
## Default S3 method:
set_coef(model, coefs, ...)
## S3 method for class 'tobit'
set_coef(model, coefs, ...)
## S3 method for class 'polr'
set_coef(model, coefs, ...)
## S3 method for class 'glmmPQL'
set_coef(model, coefs, ...)
## S3 method for class 'hetprob'
set_coef(model, coefs, ...)
## S3 method for class 'ivpml'
set_coef(model, coefs, ...)
## S3 method for class 'afex_aov'
set_coef(model, coefs, ...)
## S3 method for class 'glimML'
set_coef(model, coefs, ...)
## S3 method for class 'betareg'
set_coef(model, coefs, ...)
## S3 method for class 'multinom'
set_coef(model, coefs, ...)
## S3 method for class 'crch'
set_coef(model, coefs, ...)
## S3 method for class 'hxlr'
set_coef(model, coefs, ...)
## S3 method for class 'data.frame'
set_coef(model, coefs, ...)
## S3 method for class 'flexsurvreg'
set_coef(model, coefs, ...)
## S3 method for class 'gamlss'
set_coef(model, coefs, ...)
## S3 method for class 'glmmTMB'
set_coef(model, coefs, ...)
## S3 method for class 'glmgee'
set_coef(model, coefs, ...)
## S3 method for class 'glmx'
set_coef(model, coefs, ...)
## S3 method for class 'merMod'
set_coef(model, coefs, ...)
## S3 method for class 'lmerModLmerTest'
set_coef(model, coefs, ...)
## S3 method for class 'lmerMod'
set_coef(model, coefs, ...)
## S3 method for class 'mlm'
set_coef(model, coefs, ...)
## S3 method for class 'lme'
set_coef(model, coefs, ...)
## S3 method for class 'hurdle'
set_coef(model, coefs, ...)
## S3 method for class 'zeroinfl'
set_coef(model, coefs, ...)
## S3 method for class 'rlmerMod'
set_coef(model, coefs, ...)
## S3 method for class 'stpm2'
set_coef(model, coefs, ...)
## S3 method for class 'pstpm2'
set_coef(model, coefs, ...)
## S3 method for class 'gsm'
set_coef(model, coefs, ...)
## S3 method for class 'aft'
set_coef(model, coefs, ...)
## S3 method for class 'selection'
set_coef(model, coefs, ...)
## S3 method for class 'scam'
set_coef(model, coefs, ...)
## S3 method for class 'glm'
set_coef(model, coefs, ...)
## S3 method for class 'lm'
set_coef(model, coefs, ...)
## S3 method for class 'nls'
set_coef(model, coefs, ...)
## S3 method for class 'svyolr'
set_coef(model, coefs, ...)
## S3 method for class 'survreg'
set_coef(model, coefs, ...)
## S3 method for class 'model_fit'
set_coef(model, coefs, ...)
## S3 method for class 'workflow'
set_coef(model, coefs, ...)
Arguments
model |
object to modify |
coefs |
vector of coefficients to insert in the model object |
Details
To compute the variance of marginal effects we need to take the Jacobian with
Value
Model object of the same class as the model
argument, but with
different stored coefficients.
Slopes (aka Partial derivatives, Marginal Effects, or Trends)
Description
Partial derivative of the regression equation with respect to a regressor of interest.
-
slopes()
: unit-level (conditional) estimates. -
avg_slopes()
: average (marginal) estimates.
The newdata
argument and the datagrid()
function can be used to control where statistics are evaluated in the predictor space: "at observed values", "at the mean", "at representative values", etc.
See the slopes vignette and package website for worked examples and case studies:
Usage
slopes(
model,
newdata = NULL,
variables = NULL,
type = NULL,
by = FALSE,
vcov = TRUE,
conf_level = 0.95,
slope = "dydx",
wts = FALSE,
hypothesis = NULL,
equivalence = NULL,
df = Inf,
eps = NULL,
numderiv = "fdforward",
...
)
avg_slopes(
model,
newdata = NULL,
variables = NULL,
type = NULL,
by = TRUE,
vcov = TRUE,
conf_level = 0.95,
slope = "dydx",
wts = FALSE,
hypothesis = NULL,
equivalence = NULL,
df = Inf,
eps = NULL,
numderiv = "fdforward",
...
)
Arguments
model |
Model object |
newdata |
Grid of predictor values at which we evaluate the slopes.
|
variables |
Focal variables
|
type |
string indicates the type (scale) of the predictions used to
compute contrasts or slopes. This can differ based on the model
type, but will typically be a string such as: "response", "link", "probs",
or "zero". When an unsupported string is entered, the model-specific list of
acceptable values is returned in an error message. When |
by |
Aggregate unit-level estimates (aka, marginalize, average over). Valid inputs:
|
vcov |
Type of uncertainty estimates to report (e.g., for robust standard errors). Acceptable values:
|
conf_level |
numeric value between 0 and 1. Confidence level to use to build a confidence interval. |
slope |
string indicates the type of slope or (semi-)elasticity to compute:
|
wts |
logical, string or numeric: weights to use when computing average predictions, contrasts or slopes. These weights only affect the averaging in
|
hypothesis |
specify a hypothesis test or custom contrast using a number , formula, string equation, vector, matrix, or function.
|
equivalence |
Numeric vector of length 2: bounds used for the two-one-sided test (TOST) of equivalence, and for the non-inferiority and non-superiority tests. See Details section below. |
df |
Degrees of freedom used to compute p values and confidence intervals.
|
eps |
NULL or numeric value which determines the step size to use when
calculating numerical derivatives: (f(x+eps)-f(x))/eps. When |
numderiv |
string or list of strings indicating the method to use to for the numeric differentiation used in to compute delta method standard errors.
|
... |
Additional arguments are passed to the |
Details
A "slope" or "marginal effect" is the partial derivative of the regression equation with respect to a variable in the model. This function uses automatic differentiation to compute slopes for a vast array of models, including non-linear models with transformations (e.g., polynomials). Uncertainty estimates are computed using the delta method.
Numerical derivatives for the slopes
function are calculated
using a simple epsilon difference approach: \partial Y / \partial X = (f(X + \varepsilon/2) - f(X-\varepsilon/2)) / \varepsilon
,
where f is the predict()
method associated with the model class, and
\varepsilon
is determined by the eps
argument.
Value
A data.frame
with one row per estimate. This data frame is pretty-printed by default, but users can interact with it as a regular data frame, with functions like nrow()
, head()
, colnames()
, etc. Values can be extracted using standard [,]
or $
operators, and manipulated using external packages like dplyr
or data.table
.
Columns may include:
-
rowid
: row number of thenewdata
data frame -
group
: (optional) value of the grouped outcome (e.g., categorical outcome models) -
term
: the focal variable. -
estimate
: an estimate of the prediction, counterfactual comparison, or slope. -
std.error
: standard errors computed via the delta method. -
p.value
: p value associated to theestimate
column. The null is determined by thehypothesis
argument (0 by default). -
s.value
: Shannon information transforms of p values. See the S values vignette at https://marginaleffects.com the marginaleffects website. -
conf.low
: lower bound of the confidence (or credible) interval defined by theconf_level
argument. -
conf.high
: upper bound of the confidence (or credible) interval defined by theconf_level
argument. -
predicted_lo
: predicted outcome for the "low" value of the focal predictor in a counterfactual comparison. -
predicted_hi
: predicted outcome for the "high" value of the focal predictor in a counterfactual comparison.
See ?print.marginaleffects
for printing options.
Functions
-
avg_slopes()
: Average slopes
Standard errors using the delta method
Standard errors for all quantities estimated by marginaleffects
can be obtained via the delta method. This requires differentiating a function with respect to the coefficients in the model using a finite difference approach. In some models, the delta method standard errors can be sensitive to various aspects of the numeric differentiation strategy, including the step size. By default, the step size is set to 1e-8
, or to 1e-4
times the smallest absolute model coefficient, whichever is largest.
marginaleffects
can delegate numeric differentiation to the numDeriv
package, which allows more flexibility. To do this, users can pass arguments to the numDeriv::jacobian
function through a global option. For example:
-
options(marginaleffects_numDeriv = list(method = "simple", method.args = list(eps = 1e-6)))
-
options(marginaleffects_numDeriv = list(method = "Richardson", method.args = list(eps = 1e-5)))
-
options(marginaleffects_numDeriv = NULL)
See the "Uncertainty" chapter on the marginaleffects
website for more details on the computation of standard errors, bootstrapping, and more:
https://marginaleffects.com/chapters/uncertainty.html
Model-Specific Arguments
Some model types allow model-specific arguments to modify the nature of
marginal effects, predictions, marginal means, and contrasts. Please report
other package-specific predict()
arguments on Github so we can add them to
the table below.
https://github.com/vincentarelbundock/marginaleffects/issues
Package | Class | Argument | Documentation |
brms | brmsfit | ndraws | brms::posterior_predict |
re_formula | brms::posterior_predict | ||
lme4 | merMod | re.form | lme4::predict.merMod |
allow.new.levels | lme4::predict.merMod | ||
glmmTMB | glmmTMB | re.form | glmmTMB::predict.glmmTMB |
allow.new.levels | glmmTMB::predict.glmmTMB | ||
zitype | glmmTMB::predict.glmmTMB | ||
mgcv | bam | exclude | mgcv::predict.bam |
gam | exclude | mgcv::predict.gam | |
robustlmm | rlmerMod | re.form | robustlmm::predict.rlmerMod |
allow.new.levels | robustlmm::predict.rlmerMod | ||
MCMCglmm | MCMCglmm | ndraws | |
sampleSelection | selection | part | sampleSelection::predict.selection |
Bayesian posterior summaries
By default, credible intervals in bayesian models are built as equal-tailed intervals. This can be changed to a highest density interval by setting a global option:
options("marginaleffects_posterior_interval" = "eti")
options("marginaleffects_posterior_interval" = "hdi")
By default, the center of the posterior distribution in bayesian models is identified by the median. Users can use a different summary function by setting a global option:
options("marginaleffects_posterior_center" = "mean")
options("marginaleffects_posterior_center" = "median")
When estimates are averaged using the by
argument, the tidy()
function, or
the summary()
function, the posterior distribution is marginalized twice over.
First, we take the average across units but within each iteration of the
MCMC chain, according to what the user requested in by
argument or
tidy()/summary()
functions. Then, we identify the center of the resulting
posterior using the function supplied to the
"marginaleffects_posterior_center"
option (the median by default).
Equivalence, Inferiority, Superiority
\theta
is an estimate, \sigma_\theta
its estimated standard error, and [a, b]
are the bounds of the interval supplied to the equivalence
argument.
Non-inferiority:
-
H_0
:\theta \leq a
-
H_1
:\theta > a
-
t=(\theta - a)/\sigma_\theta
p: Upper-tail probability
Non-superiority:
-
H_0
:\theta \geq b
-
H_1
:\theta < b
-
t=(\theta - b)/\sigma_\theta
p: Lower-tail probability
Equivalence: Two One-Sided Tests (TOST)
p: Maximum of the non-inferiority and non-superiority p values.
Thanks to Russell V. Lenth for the excellent emmeans
package and documentation which inspired this feature.
Prediction types
The type
argument determines the scale of the predictions used to compute quantities of interest with functions from the marginaleffects
package. Admissible values for type
depend on the model object. When users specify an incorrect value for type
, marginaleffects
will raise an informative error with a list of valid type
values for the specific model object. The first entry in the list in that error message is the default type.
The invlink(link)
is a special type defined by marginaleffects
. It is available for some (but not all) models, and only for the predictions()
function. With this link type, we first compute predictions on the link scale, then we use the inverse link function to backtransform the predictions to the response scale. This is useful for models with non-linear link functions as it can ensure that confidence intervals stay within desirable bounds, ex: 0 to 1 for a logit model. Note that an average of estimates with type="invlink(link)"
will not always be equivalent to the average of estimates with type="response"
. This type is default when calling predictions()
. It is available—but not default—when calling avg_predictions()
or predictions()
with the by
argument.
Some of the most common type
values are:
response, link, E, Ep, average, class, conditional, count, cum.prob, cumhaz, cumprob, density, detection, disp, ev, expected, expvalue, fitted, hazard, invlink(link), latent, latent_N, linear, linear.predictor, linpred, location, lp, mean, numeric, p, ppd, pr, precision, prediction, prob, probability, probs, quantile, risk, rmst, scale, survival, unconditional, utility, variance, xb, zero, zlink, zprob
Parallel computation
The slopes()
and comparisons()
functions can use parallelism to
speed up computation. Operations are parallelized for the computation of
standard errors, at the model coefficient level. There is always
considerable overhead when using parallel computation, mainly involved
in passing the whole dataset to the different processes. Thus, parallel
computation is most likely to be useful when the model includes many parameters
and the dataset is relatively small.
Warning: In many cases, parallel processing will not be useful at all.
To activate parallel computation, users must load the future.apply
package,
call plan()
function, and set a global option.
options(marginaleffects_parallel = TRUE)
: parallelize delta method computation of standard errors.
options(marginaleffects_parallel_inferences = TRUE)
: parallelize "rsample"
or "fwb"
bootstrap computation in inferences()
.
options(marginaleffects_parallel_packages = TRUE)
: vector of strings with the names of modeling packages used to fit the model, ex: c("survival", "splines")
For example:
library(future.apply) plan("multisession", workers = 4) options(marginaleffects_parallel = FALSE) options(marginaleffects_parallel_inferences = TRUE) options(marginaleffects_parallel_packages = c("survival", "splines")) slopes(model)
To disable parallelism in marginaleffects
altogether, you can set a global option:
options(marginaleffects_parallel = FALSE)
Order of operations
Behind the scenes, the arguments of marginaleffects
functions are evaluated in this order:
-
newdata
-
variables
-
comparison
andslope
-
by
-
vcov
-
hypothesis
-
transform
Global options
The behavior of marginaleffects
functions can be modified by setting global options.
Disable some safety checks and warnings:
-
options(marginaleffects_startup_message = FALSE)
Disable the startup message printed on
library(marginaleffects)
.
-
options(marginaleffects_safe = FALSE)
Disable safety checks and warnings.
-
options(marginaleffects_print_omit = c("p.value", "s.value"))
Omit some columns from the printed output.
Enforce lean return objects, sans information about the original model and data, and other ancillary attributes. Note that this will disable some advanced post-processing features and functions like hypotheses.
options(marginaleffects_lean = TRUE)
Other options:
-
marginaleffects_plot_gray
: logical. IfTRUE
, the default color of the plot is gray. Default isFALSE
.
References
Arel-Bundock V, Greifer N, Heiss A (2024). “How to Interpret Statistical Models Using marginaleffects for R and Python.” Journal of Statistical Software, 111(9), 1-32. doi:10.18637/jss.v111.i09 doi:10.18637/jss.v111.i09
Greenland S. 2019. "Valid P-Values Behave Exactly as They Should: Some Misleading Criticisms of P-Values and Their Resolution With S-Values." The American Statistician. 73(S1): 106–114.
Cole, Stephen R, Jessie K Edwards, and Sander Greenland. 2020. "Surprise!" American Journal of Epidemiology 190 (2): 191–93. doi:10.1093/aje/kwaa136
Examples
# Unit-level (conditional) Marginal Effects
mod <- glm(am ~ hp * wt, data = mtcars, family = binomial)
mfx <- slopes(mod)
head(mfx)
# Average Marginal Effect (AME)
avg_slopes(mod, by = TRUE)
# Marginal Effect at the Mean (MEM)
slopes(mod, newdata = datagrid())
# Marginal Effect at User-Specified Values
# Variables not explicitly included in `datagrid()` are held at their means
slopes(mod, newdata = datagrid(hp = c(100, 110)))
# Group-Average Marginal Effects (G-AME)
# Calculate marginal effects for each observation, and then take the average
# marginal effect within each subset of observations with different observed
# values for the `cyl` variable:
mod2 <- lm(mpg ~ hp * cyl, data = mtcars)
avg_slopes(mod2, variables = "hp", by = "cyl")
# Marginal Effects at User-Specified Values (counterfactual)
# Variables not explicitly included in `datagrid()` are held at their
# original values, and the whole dataset is duplicated once for each
# combination of the values in `datagrid()`
mfx <- slopes(mod,
newdata = datagrid(
hp = c(100, 110),
grid_type = "counterfactual"))
head(mfx)
# Heteroskedasticity robust standard errors
mfx <- slopes(mod, vcov = sandwich::vcovHC(mod))
head(mfx)
# hypothesis test: is the `hp` marginal effect at the mean equal to the `drat` marginal effect
mod <- lm(mpg ~ wt + drat, data = mtcars)
slopes(
mod,
newdata = "mean",
hypothesis = "wt = drat")
# same hypothesis test using row indices
slopes(
mod,
newdata = "mean",
hypothesis = "b1 - b2 = 0")
# same hypothesis test using numeric vector of weights
slopes(
mod,
newdata = "mean",
hypothesis = c(1, -1))
# two custom contrasts using a matrix of weights
lc <- matrix(
c(
1, -1,
2, 3),
ncol = 2)
colnames(lc) <- c("Contrast A", "Contrast B")
slopes(
mod,
newdata = "mean",
hypothesis = lc)