Title: | Tidy Data and 'Geoms' for Bayesian Models |
Version: | 3.0.7 |
Maintainer: | Matthew Kay <mjskay@northwestern.edu> |
Description: | Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands. |
Depends: | R (≥ 4.0.0) |
Imports: | methods, ggdist (≥ 3.0.0), dplyr (≥ 1.1.0), tidyr (≥ 1.0.0), ggplot2 (≥ 3.3.5), coda, rlang (≥ 0.3.0), arrayhelpers, tidyselect, tibble, magrittr, posterior (≥ 1.0.1), withr, cli, vctrs |
Suggests: | knitr, testthat, purrr (≥ 0.2.3), forcats, vdiffr (≥ 1.0.0), svglite, rstan (≥ 2.17.0), rstantools (≥ 2.1.0), runjags, rjags, jagsUI, rstanarm (≥ 2.19.2), emmeans, broom (≥ 0.4.3), MCMCglmm, bayesplot, modelr, brms (≥ 2.16.0), cowplot, covr, rmarkdown, ggrepel, bindrcpp, RColorBrewer, gganimate, gifski, png, ragg, pkgdown, distributional, transformr |
License: | GPL (≥ 3) |
Language: | en-US |
BugReports: | https://github.com/mjskay/tidybayes/issues |
URL: | https://mjskay.github.io/tidybayes/, https://github.com/mjskay/tidybayes/ |
VignetteBuilder: | knitr |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
Collate: | "ggdist-curve_interval.R" "ggdist-cut_cdf_qi.R" "ggdist-geom_slabinterval.R" "ggdist-geom_dotsinterval.R" "ggdist-geom_interval.R" "ggdist-geom_lineribbon.R" "ggdist-geom_pointinterval.R" "ggdist-lkjcorr_marginal.R" "ggdist-parse_dist.R" "ggdist-scales.R" "ggdist-stat_slabinterval.R" "ggdist-stat_dist_slabinterval.R" "ggdist-stat_sample_slabinterval.R" "ggdist-stat_dotsinterval.R" "ggdist-stat_pointinterval.R" "ggdist-stat_interval.R" "ggdist-stat_lineribbon.R" "ggdist-student_t.R" "ggdist-theme_ggdist.R" "ggdist-tidy_format_translators.R" "tidybayes-package.R" "add_draws.R" "combine_chains.R" "compare_levels.R" "compose_data.R" "density_bins.R" "emmeans_comparison.R" "epred_draws.R" "epred_rvars.R" "flip_aes.R" "gather_draws.R" "gather_emmeans_draws.R" "gather_pairs.R" "gather_rvars.R" "gather_variables.R" "get_variables.R" "global_variables.R" "linpred_draws.R" "linpred_rvars.R" "nest_rvars.R" "onAttach.R" "point_interval.R" "predict_curve.R" "predicted_draws.R" "predicted_rvars.R" "recover_types.R" "residual_draws.R" "sample_draws.R" "spread_draws.R" "spread_rvars.R" "summarise_draws.R" "testthat.R" "tidy_draws.R" "tidybayes-models.R" "ungather_draws.R" "unspread_draws.R" "util.R" "x_at_y.R" "deprecated.R" |
NeedsCompilation: | no |
Packaged: | 2024-09-15 05:33:44 UTC; matth |
Author: | Matthew Kay [aut, cre], Timothy Mastny [ctb] |
Repository: | CRAN |
Date/Publication: | 2024-09-15 06:20:02 UTC |
Tidy Data and 'Geoms' for Bayesian Models
Description
tidybayes
is an R package that aims to make it easy to integrate
popular Bayesian modeling methods into a tidy data + ggplot workflow.
Details
Tidy data frames (one observation per row) are particularly convenient for
use in a variety of R data manipulation and visualization packages (Wickham 2014).
However, when using Bayesian modeling functions like JAGS or Stan in R,
we often have to translate this data into a form the model understands,
and then after running the model, translate the resulting sample (or
predictions) into a more tidy format for use with other R functions.
tidybayes
aims to simplify these two common (often tedious)
operations. It also provides a variety of ggplot geometries aimed
at making the visualization of model output easier.
For a comprehensive overview of the package, see vignette("tidybayes")
.
For overviews aimed at the rstanarm
and brms
packages, see
vignette("tidy-rstanarm")
and vignette("tidy-brms")
. For an overview
of the majority of geoms in the ggdist/tidybayes family, see
vignette("slabinterval", package = "ggdist")
.
For a list of supported models, see tidybayes-models.
Author(s)
Maintainer: Matthew Kay mjskay@northwestern.edu
Other contributors:
Timothy Mastny tim.mastny@gmail.com [contributor]
References
Wickham, Hadley. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23. doi:10.18637/jss.v059.i10.
See Also
Useful links:
Report bugs at https://github.com/mjskay/tidybayes/issues
Add draws to a data frame in tidy format
Description
Add draws from a matrix of draws (usually draws from a predictive distribution) to a data frame in tidy format. This is
a generic version of add_predicted_draws()
that can be used with model types
that have their own prediction functions that are not yet supported by tidybayes.
Usage
add_draws(data, draws, value = ".value")
Arguments
data |
Data frame to add draws to, with M rows. |
draws |
N by M matrix of draws, with M columns corresponding to the M rows in |
value |
The name of the output column; default |
Details
Given a data frame with M rows and an N by M matrix of N draws, adds a .row
, .draw
, and .value
column (or another name if value
is set) to data
, and expands data
into a long-format dataframe of draws.
add_epred_draws(df, m)
is roughly equivalent to add_draws(df, posterior_epred(m, newdata = df))
, except
that add_epred_draws
standardizes argument names and values across packages and has additional features for some
model types (like handling ordinal responses and distributional parameters in brms).
add_predicted_draws(df, m)
is roughly equivalent to add_draws(df, posterior_predict(m, newdata = df))
, except
that add_predicted_draws
standardizes argument names and values across packages.
Value
A data frame (actually, a tibble) with a .row
column (a
factor grouping rows from the input data
), a .draw
column (a unique index corresponding to each draw
from the distribution), and a column with its name specified by the value
argument (default is .value
)
containing the values of draws from draws
. The data frame is grouped by all rows in data
plus the .row
column.
Author(s)
Matthew Kay
See Also
add_predicted_draws()
, add_draws()
Examples
## Not run:
library(ggplot2)
library(dplyr)
library(brms)
library(modelr)
theme_set(theme_light())
m_mpg = brm(mpg ~ hp * cyl, data = mtcars,
# 1 chain / few iterations just so example runs quickly
# do not use in practice
chains = 1, iter = 500)
# plot posterior predictive intervals
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
# the line below is roughly equivalent to add_epred_draws(m_mpg), except
# that it does not standardize arguments across model types.
add_draws(posterior_epred(m_mpg, newdata = .)) %>%
ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) +
stat_lineribbon(aes(y = .value), alpha = 0.25) +
geom_point(data = mtcars) +
scale_fill_brewer(palette = "Greys")
## End(Not run)
Add draws from the posterior fit, predictions, or residuals of a model to a data frame
Description
Given a data frame and a model, adds draws from the linear/link-level predictor, the expectation of the posterior predictive, the posterior predictive, or the residuals of a model to the data frame in a long format.
Usage
add_epred_draws(
newdata,
object,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
epred_draws(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
## Default S3 method:
epred_draws(
object,
newdata,
...,
value = ".epred",
seed = NULL,
category = NULL
)
## S3 method for class 'stanreg'
epred_draws(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
## S3 method for class 'brmsfit'
epred_draws(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
add_linpred_draws(
newdata,
object,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL,
n
)
linpred_draws(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL,
n,
scale
)
## Default S3 method:
linpred_draws(
object,
newdata,
...,
value = ".linpred",
seed = NULL,
category = NULL
)
## S3 method for class 'stanreg'
linpred_draws(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
## S3 method for class 'brmsfit'
linpred_draws(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
dpar = NULL
)
add_predicted_draws(
newdata,
object,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
n
)
predicted_draws(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
n,
prediction
)
## Default S3 method:
predicted_draws(
object,
newdata,
...,
value = ".prediction",
seed = NULL,
category = ".category"
)
## S3 method for class 'stanreg'
predicted_draws(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category"
)
## S3 method for class 'brmsfit'
predicted_draws(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category"
)
add_residual_draws(
newdata,
object,
...,
value = ".residual",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
n
)
residual_draws(
object,
newdata,
...,
value = ".residual",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category",
n,
residual
)
## Default S3 method:
residual_draws(object, newdata, ...)
## S3 method for class 'brmsfit'
residual_draws(
object,
newdata,
...,
value = ".residual",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
category = ".category"
)
Arguments
newdata |
Data frame to generate predictions from. |
object |
A supported Bayesian model fit that can provide fits and predictions. Supported models
are listed in the second section of tidybayes-models: Models Supporting Prediction. While other
functions in this package (like |
... |
Additional arguments passed to the underlying prediction method for the type of model given. |
value |
The name of the output column:
|
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
re_formula |
formula containing group-level effects to be considered in the prediction.
If |
category |
For some ordinal, multinomial, and multivariate models (notably, |
dpar |
For |
n |
(Deprecated). Use |
scale |
(Deprecated). Use the appropriate function ( |
prediction , residual |
(Deprecated). Use |
Details
Consider a model like:
\begin{array}{rcl}
y &\sim& \textrm{SomeDist}(\theta_1, \theta_2)\\
f_1(\theta_1) &=& \alpha_1 + \beta_1 x\\
f_2(\theta_2) &=& \alpha_2 + \beta_2 x
\end{array}
This model has:
an outcome variable,
y
a response distribution,
\textrm{SomeDist}
, having parameters\theta_1
(with link functionf_1
) and\theta_2
(with link functionf_2
)a single predictor,
x
coefficients
\alpha_1
,\beta_1
,\alpha_2
, and\beta_2
We fit this model to some observed data, y_\textrm{obs}
, and predictors,
x_\textrm{obs}
. Given new values of predictors, x_\textrm{new}
,
supplied in the data frame newdata
, the functions for posterior draws are
defined as follows:
-
add_predicted_draws()
adds draws from the posterior predictive distribution,p(y_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
, to the data. It corresponds torstanarm::posterior_predict()
orbrms::posterior_predict()
. -
add_epred_draws()
adds draws from the expectation of the posterior predictive distribution, aka the conditional expectation,E(y_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
, to the data. It corresponds torstanarm::posterior_epred()
orbrms::posterior_epred()
. Not all models support this function. -
add_linpred_draws()
adds draws from the posterior linear predictors to the data. It corresponds torstanarm::posterior_linpred()
orbrms::posterior_linpred()
. Depending on the model type and additional parameters passed, this may be:The untransformed linear predictor, e.g.
p(f_1(\theta_1) | x_\textrm{new}, y_\textrm{obs})
=p(\alpha_1 + \beta_1 x_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
. This is returned byadd_linpred_draws(transform = FALSE)
for brms and rstanarm models. It is analogous totype = "link"
inpredict.glm()
.The inverse-link transformed linear predictor, e.g.
p(\theta_1 | x_\textrm{new}, y_\textrm{obs})
=p(f_1^{-1}(\alpha_1 + \beta_1 x_\textrm{new}) | x_\textrm{new}, y_\textrm{obs})
. This is returned byadd_linpred_draws(transform = TRUE)
for brms and rstanarm models. It is analogous totype = "response"
inpredict.glm()
.
NOTE:
add_linpred_draws(transform = TRUE)
andadd_epred_draws()
may be equivalent but are not guaranteed to be. They are equivalent when the expectation of the response distribution is equal to its first parameter, i.e. whenE(y) = \theta_1
. Many distributions have this property (e.g. Normal distributions, Bernoulli distributions), but not all. If you want the expectation of the posterior predictive, it is best to useadd_epred_draws()
if available, and if not available, verify this property holds prior to usingadd_linpred_draws()
. -
add_residual_draws()
adds draws from residuals,p(y_\textrm{obs} - y_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
, to the data. It corresponds tobrms::residuals.brmsfit()
.
The corresponding functions without add_
as a prefix are alternate spellings
with the opposite order of the first two arguments: e.g. add_predicted_draws(newdata, object)
versus predicted_draws(object, newdata)
. This facilitates use in data
processing pipelines that start either with a data frame or a model.
Given equal choice between the two, the spellings prefixed with add_
are preferred.
Value
A data frame (actually, a tibble) with a .row
column (a
factor grouping rows from the input newdata
), .chain
column (the chain
each draw came from, or NA
if the model does not provide chain information),
.iteration
column (the iteration the draw came from, or NA
if the model does
not provide iteration information), and a .draw
column (a unique index corresponding to each draw
from the distribution). In addition, epred_draws
includes a column with its name specified by
the epred
argument (default ".epred"
); linpred_draws
includes a column with its name
specified by the linpred
argument (default ".linpred"
), and
predicted_draws
contains a column with its name specified by the .prediction
argument (default
".prediction"
). For convenience, the resulting data frame comes grouped by the original input rows.
Author(s)
Matthew Kay
See Also
add_draws()
for the variant of these functions for use with packages that do not have
explicit support for these functions yet. See spread_draws()
for manipulating posteriors directly.
Examples
## Not run:
library(ggplot2)
library(dplyr)
library(brms)
library(modelr)
theme_set(theme_light())
m_mpg = brm(mpg ~ hp * cyl, data = mtcars,
# 1 chain / few iterations just so example runs quickly
# do not use in practice
chains = 1, iter = 500)
# draw 100 lines from the posterior means and overplot them
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
# NOTE: only use ndraws here when making spaghetti plots; for
# plotting intervals it is always best to use all draws (omit ndraws)
add_epred_draws(m_mpg, ndraws = 100) %>%
ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) +
geom_line(aes(y = .epred, group = paste(cyl, .draw)), alpha = 0.25) +
geom_point(data = mtcars)
# plot posterior predictive intervals
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
add_predicted_draws(m_mpg) %>%
ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) +
stat_lineribbon(aes(y = .prediction), .width = c(.99, .95, .8, .5), alpha = 0.25) +
geom_point(data = mtcars) +
scale_fill_brewer(palette = "Greys")
## End(Not run)
Add rvar
s for the linear predictor, posterior expectation, posterior predictive, or residuals of a model to a data frame
Description
Given a data frame and a model, adds rvar
s of draws from the linear/link-level predictor,
the expectation of the posterior predictive, or the posterior predictive to
the data frame.
Usage
add_epred_rvars(
newdata,
object,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
epred_rvars(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
## Default S3 method:
epred_rvars(
object,
newdata,
...,
value = ".epred",
seed = NULL,
dpar = NULL,
columns_to = NULL
)
## S3 method for class 'stanreg'
epred_rvars(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
## S3 method for class 'brmsfit'
epred_rvars(
object,
newdata,
...,
value = ".epred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
add_linpred_rvars(
newdata,
object,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
linpred_rvars(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
## Default S3 method:
linpred_rvars(
object,
newdata,
...,
value = ".linpred",
seed = NULL,
dpar = NULL,
columns_to = NULL
)
## S3 method for class 'stanreg'
linpred_rvars(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
## S3 method for class 'brmsfit'
linpred_rvars(
object,
newdata,
...,
value = ".linpred",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
dpar = NULL,
columns_to = NULL
)
add_predicted_rvars(
newdata,
object,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
columns_to = NULL
)
predicted_rvars(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
columns_to = NULL
)
## Default S3 method:
predicted_rvars(
object,
newdata,
...,
value = ".prediction",
seed = NULL,
columns_to = NULL
)
## S3 method for class 'stanreg'
predicted_rvars(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
columns_to = NULL
)
## S3 method for class 'brmsfit'
predicted_rvars(
object,
newdata,
...,
value = ".prediction",
ndraws = NULL,
seed = NULL,
re_formula = NULL,
columns_to = NULL
)
Arguments
newdata |
Data frame to generate predictions from. |
object |
A supported Bayesian model fit that can provide fits and predictions. Supported models
are listed in the second section of tidybayes-models: Models Supporting Prediction. While other
functions in this package (like |
... |
Additional arguments passed to the underlying prediction method for the type of model given. |
value |
The name of the output column:
|
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
re_formula |
formula containing group-level effects to be considered in the prediction.
If |
dpar |
For |
columns_to |
For some models, such as ordinal, multinomial, and multivariate models (notably, |
Details
Consider a model like:
\begin{array}{rcl}
y &\sim& \textrm{SomeDist}(\theta_1, \theta_2)\\
f_1(\theta_1) &=& \alpha_1 + \beta_1 x\\
f_2(\theta_2) &=& \alpha_2 + \beta_2 x
\end{array}
This model has:
an outcome variable,
y
a response distribution,
\textrm{SomeDist}
, having parameters\theta_1
(with link functionf_1
) and\theta_2
(with link functionf_2
)a single predictor,
x
coefficients
\alpha_1
,\beta_1
,\alpha_2
, and\beta_2
We fit this model to some observed data, y_\textrm{obs}
, and predictors,
x_\textrm{obs}
. Given new values of predictors, x_\textrm{new}
,
supplied in the data frame newdata
, the functions for posterior draws are
defined as follows:
-
add_predicted_rvars()
addsrvar
s containing draws from the posterior predictive distribution,p(y_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
, to the data. It corresponds torstanarm::posterior_predict()
orbrms::posterior_predict()
. -
add_epred_rvars()
addsrvar
s containing draws from the expectation of the posterior predictive distribution, aka the conditional expectation,E(y_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
, to the data. It corresponds torstanarm::posterior_epred()
orbrms::posterior_epred()
. Not all models support this function. -
add_linpred_rvars()
addsrvar
s containing draws from the posterior linear predictors to the data. It corresponds torstanarm::posterior_linpred()
orbrms::posterior_linpred()
. Depending on the model type and additional parameters passed, this may be:The untransformed linear predictor, e.g.
p(f_1(\theta_1) | x_\textrm{new}, y_\textrm{obs})
=p(\alpha_1 + \beta_1 x_\textrm{new} | x_\textrm{new}, y_\textrm{obs})
. This is returned byadd_linpred_rvars(transform = FALSE)
for brms and rstanarm models. It is analogous totype = "link"
inpredict.glm()
.The inverse-link transformed linear predictor, e.g.
p(\theta_1 | x_\textrm{new}, y_\textrm{obs})
=p(f_1^{-1}(\alpha_1 + \beta_1 x_\textrm{new}) | x_\textrm{new}, y_\textrm{obs})
. This is returned byadd_linpred_rvars(transform = TRUE)
for brms and rstanarm models. It is analogous totype = "response"
inpredict.glm()
.
NOTE:
add_linpred_rvars(transform = TRUE)
andadd_epred_rvars()
may be equivalent but are not guaranteed to be. They are equivalent when the expectation of the response distribution is equal to its first parameter, i.e. whenE(y) = \theta_1
. Many distributions have this property (e.g. Normal distributions, Bernoulli distributions), but not all. If you want the expectation of the posterior predictive, it is best to useadd_epred_rvars()
if available, and if not available, verify this property holds prior to usingadd_linpred_rvars()
.
The corresponding functions without add_
as a prefix are alternate spellings
with the opposite order of the first two arguments: e.g. add_predicted_rvars(newdata, object)
versus predicted_rvars(object, newdata)
. This facilitates use in data
processing pipelines that start either with a data frame or a model.
Given equal choice between the two, the spellings prefixed with add_
are preferred.
Value
A data frame (actually, a tibble) equal to the input newdata
with
additional columns added containing rvar
s representing the requested predictions or fits.
Author(s)
Matthew Kay
See Also
add_predicted_draws()
for the analogous functions that use a long-data-frame-of-draws
format instead of a data-frame-of-rvar
s format. See spread_rvars()
for manipulating posteriors directly.
Examples
## Not run:
library(ggplot2)
library(dplyr)
library(posterior)
library(brms)
library(modelr)
theme_set(theme_light())
m_mpg = brm(mpg ~ hp * cyl, data = mtcars, family = lognormal(),
# 1 chain / few iterations just so example runs quickly
# do not use in practice
chains = 1, iter = 500)
# Look at mean predictions for some cars (epred) and compare to
# the exponeniated mu parameter of the lognormal distribution (linpred).
# Notice how they are NOT the same. This is because exp(mu) for a
# lognormal distribution is equal to its median, not its mean.
mtcars %>%
select(hp, cyl, mpg) %>%
add_epred_rvars(m_mpg) %>%
add_linpred_rvars(m_mpg, value = "mu") %>%
mutate(expmu = exp(mu), .epred - expmu)
# plot intervals around conditional means (epred_rvars)
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
add_epred_rvars(m_mpg) %>%
ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) +
stat_lineribbon(aes(dist = .epred), .width = c(.95, .8, .5), alpha = 1/3) +
geom_point(aes(y = mpg), data = mtcars) +
scale_color_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Set2")
# plot posterior predictive intervals (predicted_rvars)
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
add_predicted_rvars(m_mpg) %>%
ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) +
stat_lineribbon(aes(dist = .prediction), .width = c(.95, .8, .5), alpha = 1/3) +
geom_point(aes(y = mpg), data = mtcars) +
scale_color_brewer(palette = "Dark2") +
scale_fill_brewer(palette = "Set2")
## End(Not run)
Combine the chain and iteration columns of tidy data frames of draws
Description
Combines the chain and iteration columns of a tidy data frame of draws from a Bayesian model fit into a new column
that can uniquely identify each draw. Generally speaking not needed for pure tidybayes code, as tidybayes
functions now automatically include a .draw
column, but can be useful when interacting with packages that
do not provide such a column.
Usage
combine_chains(data, chain = .chain, iteration = .iteration, into = ".draw")
Arguments
data |
Tidy data frame of draws with columns representing the chain and iteration of each draw. |
chain |
Bare name of column in |
iteration |
Bare name of column in |
into |
Name (as a character vector) of the column to combine chains into. The default, |
Value
A data frame of tidy draws with a combined iteration column
Author(s)
Matthew Kay
See Also
Examples
library(magrittr)
library(coda)
data(line, package = "coda")
# The `line` posterior has two chains with 200 iterations each:
line %>%
tidy_draws() %>%
summary()
# combine_chains combines the chain and iteration column into the .draw column.
line %>%
tidy_draws() %>%
combine_chains() %>%
summary()
Compare the value of draws of some variable from a Bayesian model for different levels of a factor
Description
Given posterior draws from a Bayesian model in long format (e.g. as
returned by spread_draws()
), compare the value of a variable in those draws
across different paired combinations of levels of a factor.
Usage
compare_levels(
data,
variable,
by,
fun = `-`,
comparison = "default",
draw_indices = c(".chain", ".iteration", ".draw"),
ignore_groups = ".row"
)
Arguments
data |
Long-format |
variable |
Bare (unquoted) name of a column in data representing the
variable to compare across levels. Can be a numeric variable (as in
long-data-frame-of-draws format) or a |
by |
Bare (unquoted) name of a column in data that is a
|
fun |
Binary function to use for comparison. For each pair of levels of
|
comparison |
One of (a) the comparison types |
draw_indices |
Character vector of column names that should be treated
as indices of draws. Operations are done within combinations of these values.
The default is |
ignore_groups |
character vector of names of groups to ignore by
default in the input grouping. This is primarily provided to make it
easier to pipe output of |
Details
This function simplifies conducting comparisons across levels of some
variable in a tidy data frame of draws. It applies fun
to all
values of variable
for each pair of levels of by
as selected
by comparison
. By default, all pairwise comparisons are generated if
by
is an unordered factor
and ordered comparisons are made if
by
is ordered
.
The included comparison
types are:
-
ordered
: compare each leveli
with leveli - 1
; e.g.fun(i, i - 1)
-
pairwise
: compare each level ofby
with every other level. -
control
: compare each level ofby
with the first level ofby
. If you wish to compare with a different level, you can first applyrelevel()
toby
to set the control (reference) level. -
default
: useordered
ifis.ordered(by)
andpairwise
otherwise.
Value
A data.frame
with the same columns as data
, except
that the by
column contains a symbolic representation of the
comparison of pairs of levels of by
in data
, and
variable
contains the result of that comparison.
Author(s)
Matthew Kay
See Also
emmeans_comparison()
to use emmeans
-style contrast methods with
compare_levels()
.
Examples
library(dplyr)
library(ggplot2)
data(RankCorr, package = "ggdist")
# Let's do all pairwise comparisons of b[i,1]:
RankCorr %>%
spread_draws(b[i,j]) %>%
filter(j == 1) %>%
compare_levels(b, by = i) %>%
median_qi()
# Or let's plot all comparisons against the first level (control):
RankCorr %>%
spread_draws(b[i,j]) %>%
filter(j == 1) %>%
compare_levels(b, by = i, comparison = control) %>%
ggplot(aes(x = b, y = i)) +
stat_halfeye()
# Or let's plot comparisons of all levels of j within
# all levels of i
RankCorr %>%
spread_draws(b[i,j]) %>%
group_by(i) %>%
compare_levels(b, by = j) %>%
ggplot(aes(x = b, y = j)) +
stat_halfeye() +
facet_grid(cols = vars(i))
Compose data for input into a Bayesian model
Description
Compose data into a list suitable to be passed into a Bayesian model (JAGS, BUGS, Stan, etc).
Usage
compose_data(..., .n_name = n_prefix("n"))
Arguments
... |
Data to be composed into a list suitable for being passed into
Stan, JAGS, etc. Named arguments will have their name used as the |
.n_name |
A function that is used to form dimension index variables (a variable
whose value is number of levels in a factor or the length of a data frame in
|
Details
This function recursively translates each argument into list elements using
as_data_list()
, merging all resulting lists together. By
default this means that:
numerics are included as-is.
logicals are translated into numeric using
as.numeric()
.factors are translated into numeric using
as.numeric()
, and an additional element named.n_name(argument_name)
is added with the number of levels in the factor. The default.n_name
function prefixes"n_"
before the factor name; e.g. a factor namedfoo
will have an element namedn_foo
added containing the number of levels infoo
.character vectors are converted into factors then translated into numeric in the same manner as factors are.
lists are translated by translating all elements of the list (recursively) and adding them to the result.
data.frames are translated by translating every column of the data.frame and adding them to the result. A variable named
"n"
(or.n_name(argument_name)
if the data.frame is passed as a named argumentargument_name
) is also added containing the number of rows in the data frame.-
NULL
values are dropped. Setting a named argument toNULL
can be used to drop that item from the resulting list (if an unwanted element was added to the list by a previous argument, such as a column from a data frame that is not needed in the model). all other types are dropped (and a warning given)
As in functions like dplyr::mutate()
, each expression is evaluated in an
environment containing the data list built up so far.
For example, this means that if the first argument to compose_data
is a data frame, subsequent arguments can include direct references to columns
from that data frame. This allows you, for example, to easily use
x_at_y()
to generate indices for nested models.
If you wish to add support for additional types not described above,
provide an implementation of as_data_list()
for the type. See
the implementations of as_data_list.numeric
,
as_data_list.logical
, etc for examples.
Value
A list where each element is a translated variable as described above.
Author(s)
Matthew Kay
See Also
x_at_y()
, spread_draws()
,
gather_draws()
.
Examples
library(magrittr)
df = data.frame(
plot = factor(paste0("p", rep(1:8, times = 2))),
site = factor(paste0("s", rep(1:4, each = 2, times = 2)))
)
# without changing `.n_name`, compose_data() will prefix indices
# with "n" by default
df %>%
compose_data()
# you can use n_prefix() to define a different prefix (e.g. "N"):
df %>%
compose_data(.n_name = n_prefix("N"))
# If you have nesting, you may want a nested index, which can be generated using x_at_y()
# Here, site[p] will give the site for plot p
df %>%
compose_data(site = x_at_y(site, plot))
Data lists for input into Bayesian models
Description
Functions used by compose_data()
to create lists of data suitable for
input into a Bayesian modeling function. These functions typically should not be called directly
(instead use compose_data()
), but are exposed for the rare cases in which
you may need to provide your own conversion routines for a data type not already
supported (see Details).
Usage
data_list(...)
as_data_list(object, name = "", ...)
## Default S3 method:
as_data_list(object, name = "", ...)
## S3 method for class 'numeric'
as_data_list(object, name = "", scalar_as_array = FALSE, ...)
## S3 method for class 'logical'
as_data_list(object, name = "", ...)
## S3 method for class 'factor'
as_data_list(object, name = "", .n_name = n_prefix("n"), ...)
## S3 method for class 'character'
as_data_list(object, name = "", ...)
## S3 method for class 'list'
as_data_list(object, name = "", ...)
## S3 method for class 'data.frame'
as_data_list(object, name = "", .n_name = n_prefix("n"), ...)
## S3 method for class 'data_list'
as_data_list(object, name = "", ...)
Arguments
... |
Additional arguments passed to other implementations of
|
object |
The object to convert (see Details). |
name |
The name of the element in the returned list corresponding to this object. |
scalar_as_array |
If |
.n_name |
A function that is used to form variables storing the number of
rows in data frames or the number of levels in factors in |
Details
data_list
creates a list with class c("data_list", "list")
instead of c("list")
, but largely otherwise acts like the list()
function.
as_data_list
recursively translates its first argument into list elements,
concatenating all resulting lists together. By default this means that:
numerics are included as-is.
logicals are translated into numeric using
as.numeric()
.factors are translated into numeric using
as.numeric()
, and an additional element named.n_name(name)
is added with the number of levels in the factor.character vectors are converted into factors then translated into numeric in the same manner as factors are.
lists are translated by translating all elements of the list (recursively) and adding them to the result.
data.frames are translated by translating every column of the data.frame and adding them to the result. A variable named
"n"
(or.n_name(name)
ifname
is not""
) is also added containing the number of rows in the data frame.all other types are dropped (and a warning given)
If you wish to add support for additional types not described above, provide
an implementation of as_data_list()
for the type. See the
implementations of as_data_list.numeric
, as_data_list.logical
,
etc for examples.
Value
An object of class c("data_list", "list")
, where each element
is a translated variable as described above.
Author(s)
Matthew Kay
See Also
Examples
# Typically these functions should not be used directly.
# See the compose_data function for examples of how to translate
# data in lists for input to Bayesian modeling functions.
Density bins and histogram bins as data frames
Description
Generates a data frame of bins representing the kernel density (or
histogram) of a vector, suitable for use in generating predictive
distributions for visualization. These functions were originally
designed for use with the now-deprecated predict_curve()
, and
may be deprecated in the future.
Usage
density_bins(x, n = 101, ...)
histogram_bins(x, n = 30, breaks = n, ...)
Arguments
x |
A numeric vector |
n |
Number of bins |
... |
|
breaks |
Used to set bins for |
Details
These functions are simple wrappers to density()
and
hist()
that compute density estimates and return their results
in a consistent format: a data frame of bins suitable for use with
the now-deprecated predict_curve()
.
density_bins
computes a kernel density estimate using
density()
.
histogram_bins
computes a density histogram using hist()
.
Value
A data frame representing bins and their densities with the following columns:
mid |
Bin midpoint |
lower |
Lower endpoint of each bin |
upper |
Upper endpoint of each bin |
density |
Density estimate of the bin |
Author(s)
Matthew Kay
See Also
See add_predicted_draws()
and stat_lineribbon()
for a better approach. These
functions may be deprecated in the future.
Examples
## Not run:
library(ggplot2)
library(dplyr)
library(brms)
library(modelr)
theme_set(theme_light())
m_mpg = brm(mpg ~ hp * cyl, data = mtcars)
step = 1
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, by = step)) %>%
add_predicted_draws(m_mpg) %>%
summarise(density_bins(.prediction), .groups = "drop") %>%
ggplot() +
geom_rect(aes(
xmin = hp - step/2, ymin = lower, ymax = upper, xmax = hp + step/2,
fill = ordered(cyl), alpha = density
)) +
geom_point(aes(x = hp, y = mpg, fill = ordered(cyl)), shape = 21, data = mtcars) +
scale_alpha_continuous(range = c(0, 1)) +
scale_fill_brewer(palette = "Set2")
## End(Not run)
Use emmeans contrast methods with compare_levels
Description
Convert emmeans contrast methods into comparison functions
suitable for use with compare_levels()
.
Usage
emmeans_comparison(method, ...)
Arguments
method |
An emmeans-style contrast method. One of: (1) a string specifying
the name of an emmeans contrast method, like
|
... |
Arguments passed on to the contrast method. |
Details
Given an emmeans contrast method name as a string
(e.g., "pairwise"
, "trt.vs.ctrl"
, etc) or an emmeans-style contrast function
(e.g., emmeans::pairwise.emmc, emmeans::trt.vs.ctrl.emmc, etc), emmeans_comparison()
returns a new function that can be used in the comparison
argument to compare_levels()
to compute those contrasts.
Value
A function that takes a single argument, var
, containing a variable
to generate contrasts for (e.g., a factor or a character vector) and returns
a function that generates a list of named unevaluated expressions representing
different contrasts of that variable. This function is suitable to be used
as the comparison
argument in compare_levels()
.
Author(s)
Matthew Kay
See Also
compare_levels()
, emmeans::contrast-methods.
See gather_emmeans_draws()
for a different approach to using emmeans
with
tidybayes
.
Examples
library(dplyr)
library(ggplot2)
data(RankCorr, package = "ggdist")
# emmeans contrast methods return matrices. E.g. the "eff" comparison
# compares each level to the average of all levels:
emmeans:::eff.emmc(c("a","b","c","d"))
# tidybayes::compare_levels() can't use a contrast matrix like this
# directly; it takes arbitrary expressions of factor levels. But
# we can use `emmeans_comparison` to generate the equivalent expressions:
emmeans_comparison("eff")(c("a","b","c","d"))
# We can use the "eff" comparison type with `compare_levels()` as follows:
RankCorr %>%
spread_draws(b[i,j]) %>%
filter(j == 1) %>%
compare_levels(b, by = i, comparison = emmeans_comparison("eff")) %>%
median_qi()
Extract draws of variables in a Bayesian model fit into a tidy data format
Description
Extract draws from a Bayesian model for one or more variables (possibly with named dimensions) into one of two types of long-format data frames.
Usage
gather_draws(
model,
...,
regex = FALSE,
sep = "[, ]",
ndraws = NULL,
seed = NULL,
draw_indices = c(".chain", ".iteration", ".draw"),
n
)
spread_draws(
model,
...,
regex = FALSE,
sep = "[, ]",
ndraws = NULL,
seed = NULL,
draw_indices = c(".chain", ".iteration", ".draw"),
n
)
Arguments
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Expressions in the form of
|
regex |
If |
sep |
Separator used to separate dimensions in variable names, as a regular expression. |
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
draw_indices |
Character vector of column names that should be treated
as indices of draws. Operations are done within combinations of these values.
The default is |
n |
(Deprecated). Use |
Details
Imagine a JAGS or Stan fit named model
. The model may contain a variable named
b[i,v]
(in the JAGS or Stan language) with dimension i
in 1:100
and
dimension v
in 1:3
. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like "b[1,1]"
, "b[2,1]"
, etc.
spread_draws
and gather_draws
provide a straightforward
syntax to translate these columns back into properly-indexed variables in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.
spread_draws
and gather_draws
return data frames already grouped by
all dimensions used on the variables you specify.
The difference between spread_draws
is that names of variables in the model will
be spread across the data frame as column names, whereas gather_draws
will
gather variables into a single column named ".variable"
and place values of variables into a
column named ".value"
. To use naming schemes from other packages (such as broom
), consider passing
results through functions like to_broom_names()
or to_ggmcmc_names()
.
For example, spread_draws(model, a[i], b[i,v])
might return a grouped
data frame (grouped by i
and v
), with:
column
".chain"
: the chain number.NA
if not applicable to the model type; this is typically only applicable to MCMC algorithms.column
".iteration"
: the iteration number. Guaranteed to be unique within-chain only.NA
if not applicable to the model type; this is typically only applicable to MCMC algorithms.column
".draw"
: a unique number for each draw from the posterior. Order is not guaranteed to be meaningful.column
"i"
: value in1:5
column
"v"
: value in1:10
column
"a"
: value of"a[i]"
for draw".draw"
column
"b"
: value of"b[i,v]"
for draw".draw"
gather_draws(model, a[i], b[i,v])
on the same model would return a grouped
data frame (grouped by i
and v
), with:
column
".chain"
: the chain numbercolumn
".iteration"
: the iteration numbercolumn
".draw"
: the draw numbercolumn
"i"
: value in1:5
column
"v"
: value in1:10
, orNA
if".variable"
is"a"
.column
".variable"
: value inc("a", "b")
.column
".value"
: value of"a[i]"
(when".variable"
is"a"
) or"b[i,v]"
(when".variable"
is"b"
) for draw".draw"
spread_draws
and gather_draws
can use type information
applied to the model
object by recover_types()
to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the v
dimension
in the original data frame data
was a factor with levels c("a","b","c")
,
then we could use recover_types
before spread_draws
:
model %>% recover_types(data) spread_draws(model, b[i,v])
Which would return the same data frame as above, except the "v"
column
would be a value in c("a","b","c")
instead of 1:3
.
For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable d[i]
with the same i
subscript
as b[i,v]
, and a variable x
with no subscripts, we could do this:
spread_draws(model, x, d[i], b[i,v])
Which is roughly equivalent to this:
spread_draws(model, x) %>% inner_join(spread_draws(model, d[i])) %>% inner_join(spread_draws(model, b[i,v])) %>% group_by(i,v)
Similarly, this:
gather_draws(model, x, d[i], b[i,v])
Is roughly equivalent to this:
bind_rows( gather_draws(model, x), gather_draws(model, d[i]), gather_draws(model, b[i,v]) )
The c
and cbind
functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts i
and v
, we could do either of these:
spread_draws(model, c(w, x, y, z)[i,v])
spread_draws(model, cbind(w, x, y, z)[i,v]) # equivalent
Each of which is roughly equivalent to this:
spread_draws(model, w[i,v], x[i,v], y[i,v], z[i,v])
Besides being more compact, the c()
-style syntax is currently also
faster (though that may change).
Dimensions can be omitted from the resulting data frame by leaving their names
blank; e.g. spread_draws(model, b[,v])
will omit the first dimension of
b
from the output. This is useful if a dimension is known to contain all
the same value in a given model.
The shorthand ..
can be used to specify one column that should be put
into a wide format and whose names will be the base variable name, plus a dot
("."), plus the value of the dimension at ..
. For example:
spread_draws(model, b[i,..])
would return a grouped data frame
(grouped by i
), with:
column
".chain"
: the chain numbercolumn
".iteration"
: the iteration numbercolumn
".draw"
: the draw numbercolumn
"i"
: value in1:20
column
"b.1"
: value of"b[i,1]"
for draw".draw"
column
"b.2"
: value of"b[i,2]"
for draw".draw"
column
"b.3"
: value of"b[i,3]"
for draw".draw"
An optional clause in the form | wide_dimension
can also be used to put
the data frame into a wide format based on wide_dimension
. For example, this:
spread_draws(model, b[i,v] | v)
is roughly equivalent to this:
spread_draws(model, b[i,v]) %>% spread(v,b)
The main difference between using the |
syntax instead of the
..
syntax is that the |
syntax respects prototypes applied to
dimensions with recover_types()
, and thus can be used to get
columns with nicer names. For example:
model %>% recover_types(data) %>% spread_draws(b[i,v] | v)
would return a grouped data frame
(grouped by i
), with:
column
".chain"
: the chain numbercolumn
".iteration"
: the iteration numbercolumn
".draw"
: the draw numbercolumn
"i"
: value in1:20
column
"a"
: value of"b[i,1]"
for draw".draw"
column
"b"
: value of"b[i,2]"
for draw".draw"
column
"c"
: value of"b[i,3]"
for draw".draw"
The shorthand .
can be used to specify columns that should be nested
into vectors, matrices, or n-dimensional arrays (depending on how many dimensions
are specified with .
).
For example, spread_draws(model, a[.], b[.,.])
might return a
data frame, with:
column
".chain"
: the chain number.column
".iteration"
: the iteration number.column
".draw"
: a unique number for each draw from the posterior.column
"a"
: a list column of vectors.column
"b"
: a list column of matrices.
Ragged arrays are turned into non-ragged arrays with
missing entries given the value NA
.
Finally, variable names can be regular expressions by setting regex = TRUE
; e.g.:
spread_draws(model, `b_.*`[i], regex = TRUE)
Would return a tidy data frame with variables starting with b_
and having one dimension.
Value
A data frame.
Author(s)
Matthew Kay
See Also
spread_rvars()
, recover_types()
, compose_data()
.
Examples
library(dplyr)
library(ggplot2)
data(RankCorr, package = "ggdist")
RankCorr %>%
spread_draws(b[i, j])
RankCorr %>%
spread_draws(b[i, j], tau[i], u_tau[i])
RankCorr %>%
gather_draws(b[i, j], tau[i], u_tau[i])
RankCorr %>%
gather_draws(tau[i], typical_r) %>%
median_qi()
Extract a tidy data frame of draws of posterior distributions of "estimated marginal means" (emmeans/lsmeans) from a Bayesian model fit.
Description
Extract draws from the result of a call to emmeans::emmeans()
(formerly lsmeans
)
or emmeans::ref_grid()
applied to a Bayesian model.
Usage
gather_emmeans_draws(object, value = ".value", ...)
## Default S3 method:
gather_emmeans_draws(object, value = ".value", ...)
## S3 method for class 'emm_list'
gather_emmeans_draws(object, value = ".value", grid = ".grid", ...)
Arguments
object |
An |
value |
The name of the output column to use to contain the values of draws. Defaults to |
... |
Additional arguments passed to the underlying method for the type of object given. |
grid |
If |
Details
emmeans::emmeans()
provides a convenient syntax for generating draws from "estimated marginal means" from a model,
and can be applied to various Bayesian models, like rstanarm::stanreg-objects and
MCMCglmm::MCMCglmm()
. Given a emmeans::ref_grid()
object as returned by functions like
emmeans::ref_grid()
or emmeans::emmeans()
applied to a Bayesian model,
gather_emmeans_draws
returns a tidy format data frame of draws from
the marginal posterior distributions generated by emmeans::emmeans()
.
Value
A tidy data frame of draws. The columns of the reference grid are returned as-is, with an
additional column called .value
(by default) containing marginal draws. The resulting data
frame is grouped by the columns from the reference grid to make use of summary functions like
point_interval()
straightforward.
If object
is an emmeans::emm_list()
, which contains estimates from different reference grids,
an additional column with the default name of ".grid"
is added to indicate the reference grid for each row in the output.
The name of this column is controlled by the grid
argument.
Author(s)
Matthew Kay
See Also
Examples
## Not run:
library(dplyr)
library(magrittr)
library(brms)
library(emmeans)
# Here's an example dataset with a categorical predictor (`condition`) with several levels:
set.seed(5)
n = 10
n_condition = 5
ABC = tibble(
condition = rep(c("A","B","C","D","E"), n),
response = rnorm(n * 5, c(0,1,2,1,-1), 0.5)
)
m = brm(response ~ condition, data = ABC,
# 1 chain / few iterations just so example runs quickly
# do not use in practice
chains = 1, iter = 500)
# Once we've fit the model, we can use emmeans() (and functions
# from that package) to get whatever marginal distributions we want.
# For example, we can get marginal means by condition:
m %>%
emmeans(~ condition) %>%
gather_emmeans_draws() %>%
median_qi()
# or we could get pairwise differences:
m %>%
emmeans(~ condition) %>%
contrast(method = "pairwise") %>%
gather_emmeans_draws() %>%
median_qi()
# see the documentation of emmeans() for more examples of types of
# contrasts supported by that packge.
## End(Not run)
Gather pairwise combinations of values from key/value columns in a long-format data frame
Description
Fast method for producing combinations of values in a value column for different levels of a key column, assuming long-format (tidy) data with an equal number of values per key. Among other things, this is useful for producing scatter-plot matrices.
Usage
gather_pairs(
data,
key,
value,
row = ".row",
col = ".col",
x = ".x",
y = ".y",
triangle = c("lower only", "upper only", "lower", "upper", "both only", "both")
)
Arguments
data |
Tidy data frame. |
key |
Bare name of column in |
value |
Bare name of column in |
row |
Character vector giving the name of the output column identifying rows in the matrix
of pairs (takes values of |
col |
Character vector giving the name of the output column identifying columns in the matrix
of pairs (takes values of |
x |
Character vector giving the name of the output column with x values in the matrix
of pairs (takes values of |
y |
Character vector giving the name of the output column with y values in the matrix
of pairs (takes values of |
triangle |
Should the upper or lower triangle of the matrix of all possible combinations be returned?
The default, This method is particularly useful for constructing scatterplot matrices. See examples below. |
Value
A tidy data frame of combinations of values in key
and value
, with columns row
and col
(default names ".row"
and ".col"
) containing values from key
,
and columns y
and x
(default names ".y"
and ".x"
) containing values
from value
.
Author(s)
Matthew Kay
See Also
Examples
library(ggplot2)
library(dplyr)
t_a = rnorm(100)
t_b = rnorm(100, t_a * 2)
t_c = rnorm(100)
df = rbind(
data.frame(g = "a", t = t_a),
data.frame(g = "b", t = t_b),
data.frame(g = "c", t = t_c)
)
df %>%
gather_pairs(g, t, row = "g_row", col = "g_col", x = "t_x", y = "t_y") %>%
ggplot(aes(t_x, t_y)) +
geom_point() +
facet_grid(vars(g_row), vars(g_col))
df %>%
gather_pairs(g, t, triangle = "upper") %>%
ggplot(aes(.x, .y)) +
geom_point() +
facet_grid(vars(.row), vars(.col))
df %>%
gather_pairs(g, t, triangle = "both") %>%
ggplot(aes(.x, .y)) +
geom_point() +
facet_grid(vars(.row), vars(.col))
data(line, package = "coda")
line %>%
tidy_draws() %>%
gather_variables() %>%
gather_pairs(.variable, .value) %>%
ggplot(aes(.x, .y)) +
geom_point(alpha = .25) +
facet_grid(vars(.row), vars(.col))
line %>%
tidy_draws() %>%
gather_variables() %>%
gather_pairs(.variable, .value) %>%
ggplot(aes(.x, .y, color = factor(.chain))) +
geom_density_2d(alpha = .5) +
facet_grid(vars(.row), vars(.col))
Extract draws from a Bayesian model into tidy data frames of random variables
Description
Extract draws from a Bayesian model for one or more variables (possibly with named dimensions) into one of two types of long-format data frames of posterior::rvar objects.
Usage
gather_rvars(model, ..., ndraws = NULL, seed = NULL)
spread_rvars(model, ..., ndraws = NULL, seed = NULL)
Arguments
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Expressions in the form of
|
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
Details
Imagine a JAGS or Stan fit named model
. The model may contain a variable named
b[i,v]
(in the JAGS or Stan language) with dimension i
in 1:100
and
dimension v
in 1:3
. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like "b[1,1]"
, "b[2,1]"
, etc.
spread_rvars
and gather_rvars
provide a straightforward
syntax to translate these columns back into properly-indexed rvar
s in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.
spread_rvars
will spread names of variables in the model across the data frame as column names,
whereas gather_rvars
will gather variable names into a single column named ".variable"
and place
values of variables into a column named ".value"
. To use naming schemes from other packages
(such as broom
), consider passing
results through functions like to_broom_names()
or to_ggmcmc_names()
.
For example, spread_rvars(model, a[i], b[i,v])
might return a data frame with:
column
"i"
: value in1:5
column
"v"
: value in1:10
column
"a"
:rvar
containing draws from"a[i]"
column
"b"
:rvar
containing draws from"b[i,v]"
gather_rvars(model, a[i], b[i,v])
on the same model would return a data frame with:
column
"i"
: value in1:5
column
"v"
: value in1:10
, orNA
on rows where".variable"
is"a"
.column
".variable"
: value inc("a", "b")
.column
".value"
:rvar
containing draws from"a[i]"
(when".variable"
is"a"
) or"b[i,v]"
(when".variable"
is"b"
)
spread_rvars
and gather_rvars
can use type information
applied to the model
object by recover_types()
to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the v
dimension
in the original data frame data
was a factor with levels c("a","b","c")
,
then we could use recover_types
before spread_rvars
:
model %>% recover_types(data) spread_rvars(model, b[i,v])
Which would return the same data frame as above, except the "v"
column
would be a value in c("a","b","c")
instead of 1:3
.
For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable d[i]
with the same i
subscript
as b[i,v]
, and a variable x
with no subscripts, we could do this:
spread_rvars(model, x, d[i], b[i,v])
Which is roughly equivalent to this:
spread_rvars(model, x) %>% inner_join(spread_rvars(model, d[i])) %>% inner_join(spread_rvars(model, b[i,v]))
Similarly, this:
gather_rvars(model, x, d[i], b[i,v])
Is roughly equivalent to this:
bind_rows( gather_rvars(model, x), gather_rvars(model, d[i]), gather_rvars(model, b[i,v]) )
The c
and cbind
functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts i
and v
, we could do either of these:
spread_rvars(model, c(w, x, y, z)[i,v])
spread_rvars(model, cbind(w, x, y, z)[i,v]) # equivalent
Each of which is roughly equivalent to this:
spread_rvars(model, w[i,v], x[i,v], y[i,v], z[i,v])
Besides being more compact, the c()
-style syntax is currently also slightly
faster (though that may change).
Dimensions can be left nested in the resulting rvar
objects by leaving their names
blank; e.g. spread_rvars(model, b[i,])
will place the first index (i
) into
rows of the data frame but leave the second index nested in the b
column
(see Examples below).
Value
A data frame.
Author(s)
Matthew Kay
See Also
spread_draws()
, recover_types()
, compose_data()
. See also
posterior::rvar()
and posterior::as_draws_rvars()
, the functions that power
spread_rvars
and gather_rvars
.
Examples
library(dplyr)
data(RankCorr, package = "ggdist")
RankCorr %>%
spread_rvars(b[i, j])
# leaving an index out nests the index in the column containing the rvar
RankCorr %>%
spread_rvars(b[i, ])
RankCorr %>%
spread_rvars(b[i, j], tau[i], u_tau[i])
# gather_rvars places variables and values in a longer format data frame
RankCorr %>%
gather_rvars(b[i, j], tau[i], typical_r)
Gather variables from a tidy data frame of draws from variables into a single column
Description
Given a data frame such as might be returned by tidy_draws()
or spread_draws()
,
gather variables and their values from that data frame into a ".variable"
and ".value"
column.
Usage
gather_variables(data, exclude = c(".chain", ".iteration", ".draw", ".row"))
Arguments
data |
A data frame with variable names spread across columns, such as one returned by
|
exclude |
A character vector of names of columns to be excluded from the gather. Default ignores several meta-data column names used in tidybayes. |
Details
This function gathers every column except grouping columns and those matching the expression
exclude
into key/value columns ".variable"
and ".value"
.
Imagine a data frame data
as returned by spread_draws(fit, a[i], b[i,v])
, like this:
column
".chain"
: the chain numbercolumn
".iteration"
: the iteration numbercolumn
".draw"
: the draw numbercolumn
"i"
: value in1:5
column
"v"
: value in1:10
column
"a"
: value of"a[i]"
for draw number".draw"
column
"b"
: value of"b[i,v]"
for draw number".draw"
gather_variables(data)
on that data frame would return a grouped
data frame (grouped by i
and v
), with:
column
".chain"
: the chain numbercolumn
".iteration"
: the iteration numbercolumn
".draw"
: the draw numbercolumn
"i"
: value in1:5
column
"v"
: value in1:10
column
".variable"
: value inc("a", "b")
.column
".value"
: value of"a[i]"
(when".variable"
is"a"
; repeated for every value of"v"
) or"b[i,v]"
(when".variable"
is"b"
) for draw number".draw"
In this example, this call:
gather_variables(data)
Is roughly equivalent to:
data %>% gather(.variable, .value, -c(.chain, .iteration, .draw, i, v)) %>% group_by(.variable, .add = TRUE)
Value
A data frame.
Author(s)
Matthew Kay
See Also
Examples
library(dplyr)
data(RankCorr, package = "ggdist")
RankCorr %>%
spread_draws(b[i,v], tau[i]) %>%
gather_variables() %>%
median_qi()
# the first three lines below are roughly equivalent to ggmcmc::ggs(RankCorr)
RankCorr %>%
tidy_draws() %>%
gather_variables() %>%
median_qi()
Get the names of the variables in a fitted Bayesian model
Description
Get a character vector of the names of the variables in a variety of fitted
Bayesian model types. All models supported by tidy_draws()
are
supported.
Usage
get_variables(model)
## Default S3 method:
get_variables(model)
## S3 method for class 'mcmc'
get_variables(model)
## S3 method for class 'mcmc.list'
get_variables(model)
Arguments
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
Details
This function is often useful for inspecting a model interactively in order
to construct calls to spread_draws()
or gather_draws()
in order to extract draws from models in a tidy format.
Value
A character vector of variable names in the fitted model.
Author(s)
Matthew Kay
See Also
spread_draws()
, gather_draws()
.
Examples
data(line, package = "coda")
get_variables(line)
data(RankCorr, package = "ggdist")
get_variables(RankCorr)
Prefix function generator for composing dimension index columns
Description
Generates a function for generating names of index columns for factors in
compose_data()
by prefixing a character vector to the original
column name.
Usage
n_prefix(prefix)
Arguments
prefix |
Character vector to be prepended to column names by
Returns a function. The function returned takes a character vector,
|
See Also
The .n_name
argument of compose_data()
.
Examples
library(magrittr)
df = data.frame(
plot = factor(paste0("p", rep(1:8, times = 2))),
site = factor(paste0("s", rep(1:4, each = 2, times = 2)))
)
# without changing `.n_name`, compose_data() will prefix indices
# with "n" by default
df %>%
compose_data()
# you can use n_prefix() to define a different prefix (e.g. "N"):
df %>%
compose_data(.n_name = n_prefix("N"))
Nest and unnest rvar
columns in data frames
Description
Converts between data-frame-of-rvar
s format and long-data-frame-of-draws
formats by nesting or unnesting all columns containing posterior::rvar
objects.
Usage
nest_rvars(data)
unnest_rvars(data)
Arguments
data |
A data frame to nest or unnest.
|
Value
For nest_rvars()
, returns a data frame without .chain
, .iteration
, and
.draw
columns, where all non-grouped columns have been converted to rvar
s.
For unnest_rvars()
, returns a data frame with .chain
, .iteration
, and
.draw
columns added, where every rvar
column in the input has been
converted to (one or more) columns containing draws from those rvar
s
in long format. The result is grouped by all non-rvar
columns in the
input; this ensures that nest_rvars(unnest_rvars(x))
returns x
.
Examples
library(dplyr)
data(RankCorr, package = "ggdist")
# here's a data frame with some rvars
rvar_df = RankCorr %>%
spread_rvars(b[i,], tau[i])
rvar_df
# we can unnest it into long format.
# note how the result is grouped by all non-rvar input columns,
# and nested indices in `b` are converted into columns.
draws_df = rvar_df %>%
unnest_rvars()
draws_df
# calling nest_rvars() again on the result of unnest_rvars()
# recovers the original data frame
nest_rvars(draws_df)
Deprecated: Prediction curves for arbitrary functions of posteriors
Description
Deprecated function for generating prediction curves (or a density for a prediction curve).
Usage
predict_curve(data, formula, summary = median, ...)
predict_curve_density(
data,
formula,
summary = function(...) density_bins(..., n = n),
n = 50,
...
)
Arguments
data |
A |
formula |
A formula specifying the prediction curve. The left-hand side
of the formula should be a name representing the name of the column that
will hold the predicted response in the returned data frame. The right-hand
side is an expression that may include numeric columns from |
summary |
The function to apply to summarize each predicted response.
Useful functions (if you just want a curve) might be |
... |
Variables defining the curve. The right-hand side of
|
n |
For |
Details
This function is deprecated. Use modelr::data_grid()
combined
with point_interval()
or dplyr::do()
and
density_bins()
instead.
The function generates a predictive curve given posterior draws
(data
), an expression (formula
), and a set of variables
defining the curve (...
). For every group in data
(if it is a
grouped data frame—see group_by(); otherwise the entire data
frame is taken at once), and for each combination of values in ...
,
the right-hand side of formula
is evaluated and its results passed to
the summary
function. This allows a predictive curve to be generated,
given (e.g.) some samples of coefficients in data
and a set of
predictors defining the space of the curve in ...
.
Given a summary function like median()
or mean()
,
this function will produce the median (resp. mean) prediction at each point
on the curve.
Given a summary function like density_bins()
, this function will
produce a predictive distribution for each point on the curve.
predict_curve_density
is a shorthand for such a call, with a
convenient argument for adjusting the number of bins per point on the
curve.
Value
If formula
is in the form lhs ~ rhs
and summary
is a function that returns a single value, such as median
or
mode
, then predict_curve
returns a data.frame
with a
column for each group in data
(if it was grouped), a column for each
variable in ...
, and a column named lhs
with the value of
summary(rhs)
evaluated for every group in data
and combination
of variables in ...
.
If summary
is a function that returns a data.frame
, such as
density_bins()
, predict_curve
has the same set of columns
as above, except that in place of the lhs
column is a set of columns
named lhs.x
for every column named x
returned by
summary
. For example, density_bins()
returns a data frame
with the columns mid
, lower
, upper
, and density
,
so the data frame returned by predict_curve
with summary = density_bins
will have columns lhs.mid
, lhs.lower
,
lhs.upper
, and lhs.density
in place of lhs
.
Author(s)
Matthew Kay
See Also
See density_bins()
.
Examples
# Deprecated; see examples for density_bins
Decorate a model fit or sample with data types recovered from the input data
Description
Decorate a Bayesian model fit or a sample from it with types for
variable and dimension data types. Meant to be used before calling
spread_draws()
or gather_draws()
so that the values returned by
those functions are translated back into useful data types.
Usage
recover_types(model, ...)
Arguments
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Lists (or data frames) providing data prototypes used to convert
columns returned by |
Details
Each argument in ...
specifies a list or data.frame. The model
is decorated with a list of constructors that can convert a numeric column
into the data types in the lists in ...
.
Then, when spread_draws()
or gather_draws()
is called on the decorated
model
, each list entry with the same name as the variable or a dimension
in variable_spec
is a used as a prototype for that variable or dimension —
i.e., its type is taken to be the expected type of that variable or dimension.
Those types are used to translate numeric values of variables back into
useful values (for example, levels of a factor).
The most common use of recover_types
is to automatically translate
dimensions of a variable that correspond to levels of a factor in the original data back into
levels of that factor. The simplest way to do this is to pass in the data
frame from which the original data came.
Supported types of prototypes are factor, ordered, and logical. For example:
if
prototypes$v
is a factor, the v column in the returned draws is translated into a factor usingfactor(v, labels=levels(prototypes$v), ordered=is.ordered(prototypes$v))
.if
prototypes$v
is a logical, the v column is translated into a logical usingas.logical(v)
.
Additional data types can be supported by providing a custom implementation
of the generic function as_constructor
.
Value
A decorated version of model
.
Author(s)
Matthew Kay
See Also
spread_draws()
, gather_draws()
, compose_data()
.
Examples
## Not run:
library(dplyr)
library(magrittr)
library(rstan)
# Here's an example dataset with a categorical predictor (`condition`) with several levels:
set.seed(5)
n = 10
n_condition = 5
ABC = tibble(
condition = factor(rep(c("A","B","C","D","E"), n)),
response = rnorm(n * 5, c(0,1,2,1,-1), 0.5)
)
# We'll fit the following model to it:
stan_code = "
data {
int<lower=1> n;
int<lower=1> n_condition;
int<lower=1, upper=n_condition> condition[n];
real response[n];
}
parameters {
real overall_mean;
vector[n_condition] condition_zoffset;
real<lower=0> response_sd;
real<lower=0> condition_mean_sd;
}
transformed parameters {
vector[n_condition] condition_mean;
condition_mean = overall_mean + condition_zoffset * condition_mean_sd;
}
model {
response_sd ~ cauchy(0, 1); // => half-cauchy(0, 1)
condition_mean_sd ~ cauchy(0, 1); // => half-cauchy(0, 1)
overall_mean ~ normal(0, 5);
//=> condition_mean ~ normal(overall_mean, condition_mean_sd)
condition_zoffset ~ normal(0, 1);
for (i in 1:n) {
response[i] ~ normal(condition_mean[condition[i]], response_sd);
}
}
"
m = stan(model_code = stan_code, data = compose_data(ABC), control = list(adapt_delta=0.99),
# 1 chain / few iterations just so example runs quickly
# do not use in practice
chains = 1, iter = 500)
# without using recover_types(), the `condition` column returned by spread_draws()
# will be an integer:
m %>%
spread_draws(condition_mean[condition]) %>%
median_qi()
# If we apply recover_types() first, subsequent calls to other tidybayes functions will
# automatically back-convert factors so that they are labeled with their original levels
# (assuming the same name is used)
m %<>% recover_types(ABC)
# now the `condition` column with be a factor with levels "A", "B", "C", ...
m %>%
spread_draws(condition_mean[condition]) %>%
median_qi()
## End(Not run)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- ggdist
axis_titles_bottom_left
,curve_interval
,cut_cdf_qi
,dlkjcorr_marginal
,dstudent_t
,facet_title_left_horizontal
,facet_title_right_horizontal
,from_broom_names
,from_ggmcmc_names
,geom_dots
,geom_dotsinterval
,geom_interval
,geom_lineribbon
,geom_pointinterval
,geom_slab
,geom_slabinterval
,GeomDots
,GeomDotsinterval
,GeomInterval
,GeomLineribbon
,GeomPointinterval
,GeomSlab
,GeomSlabinterval
,guide_colorbar2
,guide_colourbar2
,hdci
,hdi
,marginalize_lkjcorr
,mean_hdci
,mean_hdi
,mean_qi
,median_hdci
,median_hdi
,median_qi
,Mode
,mode_hdci
,mode_hdi
,mode_qi
,parse_dist
,plkjcorr_marginal
,point_interval
,pstudent_t
,qi
,qlkjcorr_marginal
,qstudent_t
,r_dist_name
,rlkjcorr_marginal
,rstudent_t
,scale_interval_alpha_continuous
,scale_interval_alpha_discrete
,scale_interval_color_continuous
,scale_interval_color_discrete
,scale_interval_colour_continuous
,scale_interval_colour_discrete
,scale_interval_linetype_continuous
,scale_interval_linetype_discrete
,scale_interval_size_continuous
,scale_interval_size_discrete
,scale_point_alpha_continuous
,scale_point_alpha_discrete
,scale_point_color_continuous
,scale_point_color_discrete
,scale_point_colour_continuous
,scale_point_colour_discrete
,scale_point_fill_continuous
,scale_point_fill_discrete
,scale_point_size_continuous
,scale_point_size_discrete
,scale_slab_alpha_continuous
,scale_slab_alpha_discrete
,scale_slab_color_continuous
,scale_slab_color_discrete
,scale_slab_colour_continuous
,scale_slab_colour_discrete
,scale_slab_fill_continuous
,scale_slab_fill_discrete
,scale_slab_linetype_continuous
,scale_slab_linetype_discrete
,scale_slab_shape_continuous
,scale_slab_shape_discrete
,scale_slab_size_continuous
,scale_slab_size_discrete
,stat_ccdfinterval
,stat_cdfinterval
,stat_dist_ccdfinterval
,stat_dist_cdfinterval
,stat_dist_dots
,stat_dist_dotsinterval
,stat_dist_eye
,stat_dist_gradientinterval
,stat_dist_halfeye
,stat_dist_interval
,stat_dist_lineribbon
,stat_dist_pointinterval
,stat_dist_slab
,stat_dist_slabinterval
,stat_dots
,stat_dotsinterval
,stat_eye
,stat_gradientinterval
,stat_halfeye
,stat_histinterval
,stat_interval
,stat_lineribbon
,stat_pointinterval
,stat_sample_slabinterval
,stat_slab
,stat_slabinterval
,StatDistSlabinterval
,StatInterval
,StatPointinterval
,StatSampleSlabinterval
,StatSlabinterval
,theme_ggdist
,theme_tidybayes
,to_broom_names
,to_ggmcmc_names
- posterior
Sample draws from a tidy-format data frame of draws
Description
Given a tidy-format data frame of draws with a column indexing each draw, subsample the data frame to a given size based on a column indexing draws, ensuring that rows in sub-groups of a grouped data frame are sampled from the same draws.
Usage
sample_draws(data, ndraws, draw = ".draw", seed = NULL)
Arguments
data |
Data frame to sample from |
ndraws |
The number of draws to return, or |
draw |
The name of the column indexing the draws; default |
seed |
A seed to use when subsampling draws (i.e. when |
Details
sample_draws()
makes it easier to sub-sample a grouped, tidy-format data frame of draws. On a grouped data frame,
the naive approach of using filter
with the .draw
column will give incorrect results as it will
select a different sample within each group. sample_draws()
ensures the same sample is selected within
each group.
Author(s)
Matthew Kay
Examples
## Not run:
library(ggplot2)
library(dplyr)
library(brms)
library(modelr)
theme_set(theme_light())
m_mpg = brm(mpg ~ hp * cyl, data = mtcars,
# 1 chain / few iterations just so example runs quickly
# do not use in practice
chains = 1, iter = 500)
# draw 100 fit lines from the posterior and overplot them
mtcars %>%
group_by(cyl) %>%
data_grid(hp = seq_range(hp, n = 101)) %>%
add_epred_draws(m_mpg) %>%
# NOTE: only use sample_draws here when making spaghetti plots; for
# plotting intervals it is always best to use all draws
sample_draws(100) %>%
ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) +
geom_line(aes(y = .epred, group = paste(cyl, .draw)), alpha = 0.25) +
geom_point(data = mtcars)
## End(Not run)
Summaries of draws in grouped_df
objects
Description
An implementation of posterior::summarise_draws()
for grouped data frames
(dplyr::grouped_df
objects) such as returned by dplyr::group_by()
and
the various grouped-data-aware functions in tidybayes, such as spread_draws()
,
gather_draws()
, add_epred_draws()
, and add_predicted_draws()
.
This function provides a quick way to get a variety of summary statistics
and diagnostics on draws.
Usage
## S3 method for class 'grouped_df'
summarise_draws(.x, ...)
Arguments
.x |
A grouped data frame ( |
... |
Name-value pairs of summary or diagnostic functions. The provided names will be used as the names of the columns in the result unless the function returns a named vector, in which case the latter names are used. The functions can be specified in any format supported by as_function(). See Examples. |
Details
While posterior::summarise_draws()
can operate on tidy data frames of
draws in the posterior::draws_df()
format, that format does not support
grouping columns. This provides an implementation of summarise_draws()
that does support grouped data tables, essentially applying
posterior::summarise_draws()
to every sub-table of .x
implied by the
groups defined on the data frame.
See posterior::summarise_draws()
for
more details on the summary statistics and diagnostics you can use with this
function. If you just want point summaries and intervals (not diagnostics),
particularly for plotting, see point_interval()
, which returns long-format
data tables more suitable for that purpose (especially if you want to plot
multiple uncertainty levels).
Value
A data frame (actually, a tibble) with all grouping
columns from .x
, a "variable"
column containing variable names from .x
,
and the remaining columns containing summary statistics and diagnostics.
Author(s)
Matthew Kay
See Also
posterior::summarise_draws()
, point_interval()
Examples
library(posterior)
library(dplyr)
d = posterior::example_draws()
# The default posterior::summarise_draws() summarises all variables without
# splitting out indices:
summarise_draws(d)
# The grouped_df implementation of summarise_draws() in tidybayes can handle
# output from spread_draws(), which is a grouped data table with the indices
# (here, `i`) left as columns:
d %>%
spread_draws(theta[i]) %>%
summarise_draws()
# Summary functions can also be provided, as in posterior::summarise_draws():
d %>%
spread_draws(theta[i]) %>%
summarise_draws(median, mad, rhat, ess_tail)
Get a sample of posterior draws from a model as a tibble
Description
Extract draws from a Bayesian fit into a wide-format data frame with a
.chain
, .iteration
, and .draw
column, as well as all variables
as columns. This function does not parse indices from variable names
(e.g. for variable names like "x[1]"
); see spread_draws()
or
gather_draws()
for functions that parse variable indices.
Usage
tidy_draws(model, ...)
## Default S3 method:
tidy_draws(model, ...)
## S3 method for class 'draws'
tidy_draws(model, ...)
## S3 method for class 'data.frame'
tidy_draws(model, ...)
## S3 method for class 'mcmc.list'
tidy_draws(model, ...)
## S3 method for class 'stanfit'
tidy_draws(model, ...)
## S3 method for class 'stanreg'
tidy_draws(model, ...)
## S3 method for class 'runjags'
tidy_draws(model, ...)
## S3 method for class 'jagsUI'
tidy_draws(model, ...)
## S3 method for class 'brmsfit'
tidy_draws(model, ...)
## S3 method for class 'CmdStanFit'
tidy_draws(model, ...)
## S3 method for class 'CmdStanMCMC'
tidy_draws(model, ...)
## S3 method for class 'matrix'
tidy_draws(model, ...)
## S3 method for class 'MCMCglmm'
tidy_draws(model, ...)
Arguments
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Further arguments passed to other methods (mostly unused). |
Details
This function can be useful for quick glances at models (especially
combined with gather_variables()
and median_qi()
), and for models with
parameters without indices in their names (like "x[1]"
).
spread_draws()
and gather_draws()
, which do parse variable name indices,
call this function internally if their input is not already a tidy data frame.
To provide support for new models in tidybayes,
you must provide an implementation of this function or an implementation
of coda::as.mcmc.list()
(tidy_draws
should work on any model
with an implementation of coda::as.mcmc.list()
)
tidy_draws()
can be applied to a data frame that is already a tidy-format data frame
of draws, provided it has one row per draw. In other words, it can be applied to data frames
that have the same format it returns, and it will return the same data frame back, while
checking to ensure the .chain
, .iteration
, and .draw
columns are all
integers (converting if possible) and that the .draw
column is unique. This allows
you to pass already-tidy-format data frames into other tidybayes functions, like
spread_draws()
or gather_draws()
. This functionality can be useful if the
tidying step is expensive: you can tidy once, possibly subsetting to some
particular variables of interest, then call spread_draws()
or gather_draws()
repeatedly to extract variables and indices from the already-tidied data frame.
Value
A data frame (actually, a tibble) with a .chain
column,
.iteration
column, .draw
column, and one column for every variable in model
.
Author(s)
Matthew Kay
See Also
spread_draws()
or gather_draws()
, which use this function
internally and provides a friendly interface for extracting tidy data frames from model fits.
Examples
library(magrittr)
data(line, package = "coda")
line %>%
tidy_draws()
Deprecated functions, arguments, and column names in tidybayes
Description
Deprecated functions, arguments, and column names and their alternatives are listed below. Many of the deprecations are due to a naming scheme overhaul in tidybayes version 1.0 (see Deprecated Functions and Deprecated Arguments and Column Names below) or due to the deprecation of horizontal shortcut geoms and stats in tidybayes 2.1 (see Deprecated Horizontal Shortcut Geoms and Stats).
Deprecated Functions
Several deprecated versions of functions use slightly different output
formats (e.g., they use names like term
and estimate
where new
functions use .variable
and .value
; or they set .iteration
even
when iteration information is not available — new functions always set .draw
but may not set .iteration
), so be careful when upgrading to new function names.
See Deprecated Arguments and Column Names, below, for more information.
Functions deprecated in tidybayes 3.0:
-
fitted_draws
andadd_fitted_draws
are deprecated because their names were confusing: it was unclear to many users if these functions returned draws from the posterior predictive, the mean of the posterior predictive, or the linear predictor (and depending on model type it might have been either of the latter). Useepred_draws()
/add_epred_draws()
if you want the expectation of the posterior predictive and uselinpred_draws()
/add_linpred_draws()
if you want the linear predictor.
Functions deprecated in tidybayes 1.0:
-
spread_samples
,extract_samples
, andtidy_samples
are deprecated names forspread_draws()
. The spread/gather terminology better distinguishes the resulting data frame format, and draws is more correct terminology than samples for describing multiple realizations from a posterior distribution. -
gather_samples
is a deprecated name forgather_draws()
, reflecting a package-wide move to using draws instead of samples for describing multiple realizations from a distribution. -
unspread_samples
is a deprecated name forunspread_draws()
, reflecting a package-wide move to using draws instead of samples for describing multiple realizations from a distribution. -
ungather_samples
is a deprecated name forungather_draws()
, reflecting a package-wide move to using draws instead of samples for describing multiple realizations from a distribution. -
fitted_samples
/add_fitted_samples
are deprecated names forfitted_draws
/add_fitted_draws
, reflecting a package-wide move to using draws instead of samples for describing multiple realizations from a distribution. (though see the note above about the deprecation offitted_draws
in favor ofepred_draws()
andlinpred_draws()
). -
predicted_samples
/add_predicted_samples
are deprecated names forpredicted_draws()
/add_predicted_draws()
, reflecting a package-wide move to using draws instead of samples for describing multiple realizations from a distribution. -
gather_lsmeans_samples
andgather_emmeans_samples
are deprecated aliases forgather_emmeans_draws()
. The new name (estimated marginal means) is more appropriate for Bayesian models than the old name (least-squares means), and reflects the naming of the neweremmeans
package. It also reflects a package-wide move to using draws instead of samples for describing multiple realizations from a distribution. -
as_sample_tibble
andas_sample_data_frame
are deprecated aliases fortidy_draws()
. The original intent ofas_sample_tibble
was to be used primarily internally (hence its less user-friendly name); however, increasingly I have come across use cases oftidy_draws
that warrant a more user-friendly name. It also reflects a package-wide move to using draws instead of samples for describing multiple realizations from a distribution. -
ggeye
is deprecated: for a package whose goal is flexible and customizable visualization, monolithic functions are inflexible and do not sufficiently capitalize on users' existing knowledge of ggplot; instead, I think it is more flexible to design geoms and stats that can used within a complete ggplot workflow.stat_eye()
offers a horizontal eye plot geom that can be used instead ofggeye
. See the sections below for additional deprecated functions, including horizontal geoms, stats, and point_intervals
Deprecated Eye Geom Spellings
geom_eye
, geom_eyeh
, and geom_halfeyeh
are deprecated spellings of stat_eye()
and
stat_halfeye()
from before name standardization of stats and geoms. Use those functions instead.
Deprecated Horizontal Shortcut Geoms and Stats
Due to the introduction of automatic orientation detection in tidybayes 2.1,
shortcut geoms and stats (which end in h
) are no longer necessary, and are
deprecated. In most cases, these can simply be replaced with the same
geom without the h
suffix and they will remain horizontal; e.g.
stat_halfeyeh(...)
can simply be replaced with stat_halfeye(...)
.
If automatic orientation detection fails, override it with the orientation
parameter; e.g. stat_halfeye(orientation = "horizontal")
.
These deprecated stats and geoms include:
-
stat_eyeh
/stat_dist_eyeh
-
stat_halfeyeh
/stat_dist_halfeyeh
-
geom_slabh
/stat_slabh
/stat_dist_slabh
-
geom_intervalh
/stat_intervalh
/stat_dist_intervalh
-
geom_pointintervalh
/stat_pointintervalh
/stat_dist_pointintervalh
-
stat_gradientintervalh
/stat_dist_gradientintervalh
-
stat_cdfintervalh
/stat_dist_cdfintervalh
-
stat_ccdfintervalh
/stat_dist_ccdfintervalh
-
geom_dotsh
/stat_dotsh
/stat_dist_dotsh
-
geom_dotsintervalh
/stat_intervalh
/stat_dist_intervalh
-
stat_histintervalh
Deprecated Horizontal Point/Interval Functions
These functions ending in h
(e.g., point_intervalh
, median_qih
)
used to be needed for use with ggstance::stat_summaryh
, but are
no longer necessary because ggplot2::stat_summary()
supports
automatic orientation detection, so they have been deprecated.
They behave identically to the corresponding function without the h
,
except that when passed a vector, they return a data frame with
x
/xmin
/xmax
instead of y
/ymin
/ymax
.
-
point_intervalh
-
mean_qih
/median_qih
/mode_qih
-
mean_hdih
/median_hdih
/mode_hdih
-
mean_hdcih
/median_hdcih
/mode_hdcih
Deprecated Arguments and Column Names
Arguments deprecated in tidybayes 3.0 are:
The
n
argument is now calledndraws
inpredicted_draws()
,linpred_draws()
, etc. This prevents some bugs due to partial matching of argument names wheren
might be mistaken fornewdata
.The
value
argument inlinpred_draws()
is now spelledlinpred
and defaults to".linpred"
in the same way that thepredicted_draws()
andepred_draws()
functions work.The
scale
argument inlinpred_draws()
is no longer allowed (usetransform
instead) as this naming scheme only made sense whenlinpred_draws()
was an alias forfitted_draws()
, which it no longer is (see note above about the deprecation offitted_draws()
).
Versions of tidybayes before version 1.0 used a different naming scheme for several arguments and output columns.
Arguments and column names deprecated in tidybayes 1.0 are:
-
term
is now.variable
-
estimate
is now.value
-
pred
is now.prediction
-
conf.low
is now.lower
-
conf.high
is now.upper
-
.prob
is now.width
The
.draw
column was added, and should be used instead of.chain
and.iteration
to uniquely identify draws when you do not care about chains. (.chain
and.iteration
are still provided for identifying draws within chains, if desired).
To translate to/from the old naming scheme in output, use to_broom_names()
and from_broom_names()
.
Many of these names were updated in version 1.0 in order to make terminology more consistent and in order to satisfy these criteria:
Ignore compatibility with broom names on the assumption an adapter function can be created.
Use names that could be compatible with frequentist approaches (hence
.width
instead of.prob
).Always precede with "." to avoid collisions with variable names in models.
No abbreviations (remembering if something is abbreviated or not can be a pain).
No two-word names (multi-word names can always be standardized on and used in documentation, but I think data frame output should be succinct).
Names should be nouns (I made an exception for lower/upper because they are common).
Author(s)
Matthew Kay
Models supported by tidybayes
Description
Tidybayes supports two classes of models and sample formats: Models/formats that provide prediction functions, and those that do not.
All Supported Models/Sample Formats
All supported models/formats support the base tidybayes sample extraction functions, such as
tidy_draws()
, spread_draws()
, gather_draws()
, spread_rvars()
, and gather_rvars()
.
These models/formats include:
-
rstan models
-
cmdstanr models
-
brms::brm()
models -
rstanarm models
-
runjags::runjags()
models -
rjags::jags.model()
models, if sampled usingrjags::coda.samples()
-
jagsUI::jags()
models -
MCMCglmm::MCMCglmm()
models -
coda::mcmc()
andcoda::mcmc.list()
objects, which are output by several model types. -
posterior::draws objects
Any object with an implementation of
posterior::as_draws_df()
orposterior::as_draws()
. For a list of those available in your environment, runmethods(as_draws_df)
ormethods(as_draws)
Any object with an implementation of
coda::as.mcmc.list()
. For a list of those available in your environment, runmethods(as.mcmc.list)
If you install the tidybayes.rethinking package, models from the rethinking package are also supported.
Models Supporting Prediction
In addition, the following models support fit and prediction extraction functions, such as
add_epred_draws()
, add_predicted_draws()
, add_linpred_draws()
, add_epred_rvars()
,
add_predicted_rvars()
, and add_linpred_rvars()
:
-
brms::brm()
models -
rstanarm models
any package with implementations of
rstantools::posterior_epred()
,rstantools::posterior_predict()
, orrstantools::posterior_linpred()
that include an argument callednewdata
which takes a data frame of predictors.
If your model type is not in the above list, you may still be able to use the add_draws()
function to turn matrices of predictive draws (or fit draws) into tidy data frames. Or,
you can wrap output from a prediction function in posterior::rvar()
and add it
to a data frame so long as that output is a matrix with draws as rows.
If you install the tidybayes.rethinking package, models from the rethinking package are also supported.
Extending tidybayes
To include basic support for new models, one need only implement the tidy_draws()
generic function
for that model. Alternatively, objects that support posterior::as_draws()
or coda::as.mcmc.list()
will automatically be supported by tidy_draws()
.
To include support for estimation and prediction, one must either implement the epred_draws()
,
predicted_draws()
, and linpred_draws()
functions or their correspond functions from
rstantools: rstantools::posterior_epred()
, rstantools::posterior_predict()
, and
rstantools::posterior_linpred()
. If you take the latter approach, you should include
newdata
and ndraws
arguments that work as documented in predicted_draws()
.
Turn tidy data frames of variables from a Bayesian model back into untidy data
Description
Inverse operations of spread_draws()
and gather_draws()
, giving
results that look like tidy_draws()
.
Usage
ungather_draws(
data,
...,
variable = ".variable",
value = ".value",
draw_indices = c(".chain", ".iteration", ".draw"),
drop_indices = FALSE
)
unspread_draws(
data,
...,
draw_indices = c(".chain", ".iteration", ".draw"),
drop_indices = FALSE
)
Arguments
data |
A tidy data frame of draws, such as one output by |
... |
Expressions in the form of
|
variable |
The name of the column in |
value |
The name of the column in |
draw_indices |
Character vector of column names that should be treated
as indices of draws. Operations are done within combinations of these values.
The default is |
drop_indices |
Drop the columns specified by |
Details
These functions take symbolic specifications of variable names and dimensions in the same format as
spread_draws()
and gather_draws()
and invert the tidy data frame back into
a data frame whose column names are variables with dimensions in them.
Value
A data frame.
Author(s)
Matthew Kay
See Also
spread_draws()
, gather_draws()
, tidy_draws()
.
Examples
library(dplyr)
data(RankCorr, package = "ggdist")
# We can use unspread_draws to allow us to manipulate draws with tidybayes
# and then transform the draws into a form we can use with packages like bayesplot.
# Here we subset b[i,j] to just values of i in 1:2 and j == 1, then plot with bayesplot
RankCorr %>%
spread_draws(b[i,j]) %>%
filter(i %in% 1:2, j == 1) %>%
unspread_draws(b[i,j], drop_indices = TRUE) %>%
bayesplot::mcmc_areas()
# As another example, we could use compare_levels to plot all pairwise comparisons
# of b[1,j] for j in 1:3
RankCorr %>%
spread_draws(b[i,j]) %>%
filter(i == 1, j %in% 1:3) %>%
compare_levels(b, by = j) %>%
unspread_draws(b[j], drop_indices = TRUE) %>%
bayesplot::mcmc_areas()
Generate lookup vectors for composing nested indices
Description
Generates a lookup vector such that x_at_y(x, y)[y] == x
. Particularly useful
for generating lookup tables for nested indices in conjunction with compose_data()
.
Usage
x_at_y(x, y, missing = NA)
Arguments
x |
Values in the resulting lookup vector. There should be only
one unique value of |
y |
Keys in the resulting lookup vector. Should be factors or integers. |
missing |
Missing levels from |
Details
x_at_y(x, y)
returns a vector k
such that k[y] == x
. It also
fills in missing values in y
: if y
is an integer, k
will contain
entries for all values from 1
to max(y)
; if y
is a factor,
k
will contain entries for all values from 1
to nlevels(y)
.
Missing values are replaced with missing
(default NA
).
Author(s)
Matthew Kay
See Also
Examples
library(magrittr)
df = data.frame(
plot = factor(paste0("p", rep(1:8, times = 2))),
site = factor(paste0("s", rep(1:4, each = 2, times = 2)))
)
# turns site into a nested index: site[p] gives the site for plot p
df %>%
compose_data(site = x_at_y(site, plot))