Type: | Package |
Title: | Tidy Dataframes and Expressions with Statistical Details |
Version: | 1.7.0 |
Maintainer: | Indrajeet Patil <patilindrajeet.science@gmail.com> |
Description: | Utilities for producing dataframes with rich details for the most common types of statistical approaches and tests: parametric, nonparametric, robust, and Bayesian t-test, one-way ANOVA, correlation analyses, contingency table analyses, and meta-analyses. The functions are pipe-friendly and provide a consistent syntax to work with tidy data. These dataframes additionally contain expressions with statistical details, and can be used in graphing packages. This package also forms the statistical processing backend for 'ggstatsplot'. References: Patil (2021) <doi:10.21105/joss.03236>. |
License: | MIT + file LICENSE |
URL: | https://indrajeetpatil.github.io/statsExpressions/, https://github.com/IndrajeetPatil/statsExpressions |
BugReports: | https://github.com/IndrajeetPatil/statsExpressions/issues |
Depends: | R (≥ 4.3.0), stats |
Imports: | afex (≥ 1.4-1), BayesFactor (≥ 0.9.12-4.7), bayestestR (≥ 0.15.3), correlation (≥ 0.8.7), datawizard (≥ 1.1.0), dplyr (≥ 1.1.4), effectsize (≥ 1.0.0), glue (≥ 1.8.0), insight (≥ 1.2.0), magrittr (≥ 2.0.3), parameters (≥ 0.25.0), performance (≥ 0.13.0), PMCMRplus (≥ 1.9.12), purrr (≥ 1.0.4), rlang (≥ 1.1.6), rstantools (≥ 2.4.0), tidyr (≥ 1.3.1), withr (≥ 3.0.2), WRS2 (≥ 1.1-6), zeallot (≥ 0.1.0) |
Suggests: | ggplot2, knitr, metaBMA, metafor, metaplus (≥ 1.0-6), patrick, rmarkdown, survival, testthat (≥ 3.2.3), utils |
VignetteBuilder: | knitr |
Config/Needs/check: | anthonynorth/roxyglobals |
Config/roxyglobals/unique: | TRUE |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | true |
Encoding: | UTF-8 |
Language: | en-US |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-05-09 21:34:26 UTC; indrajeetpatil |
Author: | Indrajeet Patil |
Repository: | CRAN |
Date/Publication: | 2025-05-09 22:20:02 UTC |
statsExpressions: Tidy Dataframes and Expressions with Statistical Details
Description
The {statsExpressions}
package has two key aims:
to provide a consistent syntax to do statistical analysis with tidy data (in pipe-friendly manner),
to provide statistical expressions (pre-formatted in-text statistical results) for plotting functions.
Statistical packages exhibit substantial diversity in terms of their syntax and expected input type. This can make it difficult to switch from one statistical approach to another. For example, some functions expect vectors as inputs, while others expect dataframes. Depending on whether it is a repeated measures design or not, different functions might expect data to be in wide or long format. Some functions can internally omit missing values, while other functions error in their presence. Furthermore, if someone wishes to utilize the objects returned by these packages downstream in their workflow, this is not straightforward either because even functions from the same package can return a list, a matrix, an array, a dataframe, etc., depending on the function.
This is where {statsExpressions}
comes in: It can be thought of as a unified
portal through which most of the functionality in these underlying packages can
be accessed, with a simpler interface and no requirement to change data format.
This package forms the statistical processing backend for ggstatsplot
package.
For more documentation, see the dedicated website.
Details
statsExpressions
Author(s)
Maintainer: Indrajeet Patil patilindrajeet.science@gmail.com (ORCID) [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/IndrajeetPatil/statsExpressions/issues
Template for expressions with statistical details
Description
Creates an expression from a data frame containing statistical details.
Ideally, this data frame would come from having run tidy_model_parameters()
on your model object.
This function is currently not stable and should not be used outside of this package context.
Usage
add_expression_col(
data,
paired = FALSE,
statistic.text = NULL,
effsize.text = NULL,
prior.type = NULL,
n = NULL,
n.text = ifelse(paired, list(quote(italic("n")["pairs"])),
list(quote(italic("n")["obs"]))),
digits = 2L,
digits.df = 0L,
digits.df.error = digits.df,
...
)
Arguments
data |
A data frame containing details from the statistical analysis and should contain some or all of the the following columns:
|
paired |
Logical that decides whether the experimental design is
repeated measures/within-subjects or between-subjects. The default is
|
statistic.text |
A character that specifies the relevant test statistic.
For example, for tests with t-statistic, |
effsize.text |
A character that specifies the relevant effect size. |
prior.type |
The type of prior. |
n |
An integer specifying the sample size used for the test. |
n.text |
A character that specifies the design, which will determine
what the |
digits , digits.df , digits.df.error |
Number of decimal places to display
for the parameters (default: |
... |
Currently ignored. |
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
set.seed(123)
# creating a data frame with stats results
stats_df <- cbind.data.frame(
statistic = 5.494,
df = 29.234,
p.value = 0.00001,
estimate = -1.980,
conf.level = 0.95,
conf.low = -2.873,
conf.high = -1.088,
method = "Student's t-test"
)
# expression for *t*-statistic with Cohen's *d* as effect size
# note that the plotmath expressions need to be quoted
add_expression_col(
data = stats_df,
statistic.text = list(quote(italic("t"))),
effsize.text = list(quote(italic("d"))),
n = 32L,
n.text = list(quote(italic("n")["no.obs"])),
digits = 3L,
digits.df = 3L
)
Tidy version of the "Bugs" dataset.
Description
Tidy version of the "Bugs" dataset.
Usage
bugs_long
Format
A data frame with 372 rows and 6 variables
subject. Dummy identity number for each participant.
gender. Participant's gender (Female, Male).
region. Region of the world the participant was from.
education. Level of education.
condition. Condition of the experiment the participant gave rating for (LDLF: low freighteningness and low disgustingness; LFHD: low freighteningness and high disgustingness; HFHD: high freighteningness and low disgustingness; HFHD: high freighteningness and high disgustingness).
desire. The desire to kill an arthropod was indicated on a scale from 0 to 10.
Details
This data set, "Bugs", provides the extent to which men and women want to kill arthropods that vary in freighteningness (low, high) and disgustingness (low, high). Each participant rates their attitudes towards all anthropods. Subset of the data reported by Ryan et al. (2013).
References
Ryan, R. S., Wilde, M., & Crist, S. (2013). Compared to a small, supervised lab experiment, a large, unsupervised web-based experiment on a previously unknown effect has benefits that outweigh its potential costs. Computers in Human Behavior, 29(4), 1295-1301.
Examples
dim(bugs_long)
head(bugs_long)
dplyr::glimpse(bugs_long)
Data frame and expression for distribution properties
Description
Parametric, non-parametric, robust, and Bayesian measures of centrality.
Usage
centrality_description(
data,
x,
y,
type = "parametric",
conf.level = 0.95,
tr = 0.2,
digits = 2L,
...
)
Arguments
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The grouping (or independent) variable in |
y |
The response (or outcome or dependent) variable from |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
conf.level |
Scalar between |
tr |
Trim level for the mean when carrying out |
digits |
Number of digits for rounding or significant figures. May also
be |
... |
Currently ignored. |
Details
This function describes a distribution for y
variable for each level of the
grouping variable in x
by a set of indices (e.g., measures of centrality,
dispersion, range, skewness, kurtosis, etc.). It additionally returns an
expression containing a specified centrality measure. The function internally
relies on datawizard::describe_distribution()
function.
Centrality measures
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Type | Measure | Function used |
Parametric | mean | datawizard::describe_distribution() |
Non-parametric | median | datawizard::describe_distribution() |
Robust | trimmed mean | datawizard::describe_distribution() |
Bayesian | MAP | datawizard::describe_distribution() |
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
# for reproducibility
set.seed(123)
# ----------------------- parametric -----------------------
centrality_description(iris, Species, Sepal.Length, type = "parametric")
# ----------------------- non-parametric -------------------
centrality_description(mtcars, am, wt, type = "nonparametric")
# ----------------------- robust ---------------------------
centrality_description(ToothGrowth, supp, len, type = "robust")
# ----------------------- Bayesian -------------------------
centrality_description(sleep, group, extra, type = "bayes")
Contingency table analyses
Description
Parametric and Bayesian one-way and two-way contingency table analyses.
Usage
contingency_table(
data,
x,
y = NULL,
paired = FALSE,
type = "parametric",
counts = NULL,
ratio = NULL,
alternative = "two.sided",
digits = 2L,
conf.level = 0.95,
sampling.plan = "indepMulti",
fixed.margin = "rows",
prior.concentration = 1,
...
)
Arguments
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The variable to use as the rows in the contingency table. |
y |
The variable to use as the columns in the contingency table.
Default is |
paired |
Logical indicating whether data came from a within-subjects or
repeated measures design study (Default: |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
counts |
The variable in data containing counts, or |
ratio |
A vector of proportions: the expected proportions for the
proportion test (should sum to |
alternative |
A character string specifying the alternative hypothesis;
Controls the type of CI returned: |
digits |
Number of digits for rounding or significant figures. May also
be |
conf.level |
Scalar between |
sampling.plan |
Character describing the sampling plan. Possible options:
|
fixed.margin |
For the independent multinomial sampling plan, which
margin is fixed ( |
prior.concentration |
Specifies the prior concentration parameter, set
to |
... |
Additional arguments (currently ignored). |
Value
The returned tibble data frame can contain some or all of the following columns (the exact columns will depend on the statistical test):
-
statistic
: the numeric value of a statistic -
df
: the numeric value of a parameter being modeled (often degrees of freedom for the test) -
df.error
anddf
: relevant only if the statistic in question has two degrees of freedom (e.g. anova) -
p.value
: the two-sided p-value associated with the observed statistic -
method
: the name of the inferential statistical test -
estimate
: estimated value of the effect size -
conf.low
: lower bound for the effect size estimate -
conf.high
: upper bound for the effect size estimate -
conf.level
: width of the confidence interval -
conf.method
: method used to compute confidence interval -
conf.distribution
: statistical distribution for the effect -
effectsize
: the name of the effect size -
n.obs
: number of observations -
expression
: pre-formatted expression containing statistical details
For examples, see data frame output vignette.
Contingency table analyses
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
two-way table
Hypothesis testing
Type | Design | Test | Function used |
Parametric/Non-parametric | Unpaired | Pearson's chi-squared test | stats::chisq.test() |
Bayesian | Unpaired | Bayesian Pearson's chi-squared test | BayesFactor::contingencyTableBF() |
Parametric/Non-parametric | Paired | McNemar's chi-squared test | stats::mcnemar.test() |
Bayesian | Paired | No | No |
Effect size estimation
Type | Design | Effect size | CI available? | Function used |
Parametric/Non-parametric | Unpaired | Cramer's V | Yes | effectsize::cramers_v() |
Bayesian | Unpaired | Cramer's V | Yes | effectsize::cramers_v() |
Parametric/Non-parametric | Paired | Cohen's g | Yes | effectsize::cohens_g() |
Bayesian | Paired | No | No | No |
one-way table
Hypothesis testing
Type | Test | Function used |
Parametric/Non-parametric | Goodness of fit chi-squared test | stats::chisq.test() |
Bayesian | Bayesian Goodness of fit chi-squared test | (custom) |
Effect size estimation
Type | Effect size | CI available? | Function used |
Parametric/Non-parametric | Pearson's C | Yes | effectsize::pearsons_c() |
Bayesian | No | No | No |
Examples
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
#### -------------------- association test ------------------------ ####
# ------------------------ frequentist ---------------------------------
# unpaired
set.seed(123)
contingency_table(
data = mtcars,
x = am,
y = vs,
paired = FALSE
)
# paired
paired_data <- tibble(
response_before = structure(c(1L, 2L, 1L, 2L), levels = c("no", "yes"), class = "factor"),
response_after = structure(c(1L, 1L, 2L, 2L), levels = c("no", "yes"), class = "factor"),
Freq = c(65L, 25L, 5L, 5L)
)
set.seed(123)
contingency_table(
data = paired_data,
x = response_before,
y = response_after,
paired = TRUE,
counts = Freq
)
# ------------------------ Bayesian -------------------------------------
# unpaired
set.seed(123)
contingency_table(
data = mtcars,
x = am,
y = vs,
paired = FALSE,
type = "bayes"
)
# paired
set.seed(123)
contingency_table(
data = paired_data,
x = response_before,
y = response_after,
paired = TRUE,
counts = Freq,
type = "bayes"
)
#### -------------------- goodness-of-fit test -------------------- ####
# ------------------------ frequentist ---------------------------------
set.seed(123)
contingency_table(
data = as.data.frame(HairEyeColor),
x = Eye,
counts = Freq
)
# ------------------------ Bayesian -------------------------------------
set.seed(123)
contingency_table(
data = as.data.frame(HairEyeColor),
x = Eye,
counts = Freq,
ratio = c(0.2, 0.2, 0.3, 0.3),
type = "bayes"
)
}
Correlation analyses
Description
Parametric, non-parametric, robust, and Bayesian correlation test.
Usage
corr_test(
data,
x,
y,
type = "parametric",
digits = 2L,
conf.level = 0.95,
tr = 0.2,
bf.prior = 0.707,
...
)
Arguments
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The column in |
y |
The column in |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
digits |
Number of digits for rounding or significant figures. May also
be |
conf.level |
Scalar between |
tr |
Trim level for the mean when carrying out |
bf.prior |
A number between |
... |
Additional arguments (currently ignored). |
Value
The returned tibble data frame can contain some or all of the following columns (the exact columns will depend on the statistical test):
-
statistic
: the numeric value of a statistic -
df
: the numeric value of a parameter being modeled (often degrees of freedom for the test) -
df.error
anddf
: relevant only if the statistic in question has two degrees of freedom (e.g. anova) -
p.value
: the two-sided p-value associated with the observed statistic -
method
: the name of the inferential statistical test -
estimate
: estimated value of the effect size -
conf.low
: lower bound for the effect size estimate -
conf.high
: upper bound for the effect size estimate -
conf.level
: width of the confidence interval -
conf.method
: method used to compute confidence interval -
conf.distribution
: statistical distribution for the effect -
effectsize
: the name of the effect size -
n.obs
: number of observations -
expression
: pre-formatted expression containing statistical details
For examples, see data frame output vignette.
Correlation analyses
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing and Effect size estimation
Type | Test | CI available? | Function used |
Parametric | Pearson's correlation coefficient | Yes | correlation::correlation() |
Non-parametric | Spearman's rank correlation coefficient | Yes | correlation::correlation() |
Robust | Winsorized Pearson's correlation coefficient | Yes | correlation::correlation() |
Bayesian | Bayesian Pearson's correlation coefficient | Yes | correlation::correlation() |
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
# for reproducibility
set.seed(123)
# ----------------------- parametric -----------------------
corr_test(mtcars, wt, mpg, type = "parametric")
# ----------------------- non-parametric -------------------
corr_test(mtcars, wt, mpg, type = "nonparametric")
# ----------------------- robust ---------------------------
corr_test(mtcars, wt, mpg, type = "robust")
# ----------------------- Bayesian -------------------------
corr_test(mtcars, wt, mpg, type = "bayes")
Switch the type of statistics.
Description
Relevant mostly for {ggstatsplot}
and {statsExpressions}
packages, where
different statistical approaches are supported via this argument: parametric,
non-parametric, robust, and Bayesian. This switch function converts strings
entered by users to a common pattern for convenience.
Usage
extract_stats_type(type)
stats_type_switch(type)
Arguments
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
Examples
extract_stats_type("p")
extract_stats_type("bf")
Edgar Anderson's Iris Data in long format.
Description
Edgar Anderson's Iris Data in long format.
Usage
iris_long
Format
A data frame with 600 rows and 5 variables
id. Dummy identity number for each flower (150 flowers in total).
Species. The species are Iris setosa, versicolor, and virginica.
condition. Factor giving a detailed description of the attribute (Four levels:
"Petal.Length"
,"Petal.Width"
,"Sepal.Length"
,"Sepal.Width"
).attribute. What attribute is being measured (
"Sepal"
or"Pepal"
).measure. What aspect of the attribute is being measured (
"Length"
or"Width"
).value. Value of the measurement.
Details
This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica.
This is a modified dataset from {datasets}
package.
Examples
dim(iris_long)
head(iris_long)
dplyr::glimpse(iris_long)
Convert long/tidy data frame to wide format
Description
This conversion is helpful mostly for repeated measures design, where
removing NA
s by participant can be a bit tedious.
Usage
long_to_wide_converter(
data,
x,
y,
subject.id = NULL,
paired = TRUE,
spread = TRUE,
...
)
Arguments
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The grouping (or independent) variable from |
y |
The response (or outcome or dependent) variable from |
subject.id |
Relevant in case of a repeated measures or within-subjects
design ( |
paired |
Logical that decides whether the experimental design is
repeated measures/within-subjects or between-subjects. The default is
|
spread |
Logical that decides whether the data frame needs to be
converted from long/tidy to wide (default: |
... |
Currently ignored. |
Value
A data frame with NA
s removed while respecting the
between-or-within-subjects nature of the dataset.
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
# for reproducibility
library(statsExpressions)
set.seed(123)
# repeated measures design
long_to_wide_converter(
bugs_long,
condition,
desire,
subject.id = subject,
paired = TRUE
)
# independent measures design
long_to_wide_converter(mtcars, cyl, wt, paired = FALSE)
Random-effects meta-analysis
Description
Parametric, non-parametric, robust, and Bayesian random-effects meta-analysis.
Usage
meta_analysis(
data,
type = "parametric",
random = "mixture",
digits = 2L,
conf.level = 0.95,
...
)
Arguments
data |
A data frame. It must contain columns named
|
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
random |
The type of random effects distribution. One of "normal", "t-dist", "mixture", for standard normal, |
digits |
Number of digits for rounding or significant figures. May also
be |
conf.level |
Scalar between |
... |
Additional arguments passed to the respective meta-analysis function. |
Value
The returned tibble data frame can contain some or all of the following columns (the exact columns will depend on the statistical test):
-
statistic
: the numeric value of a statistic -
df
: the numeric value of a parameter being modeled (often degrees of freedom for the test) -
df.error
anddf
: relevant only if the statistic in question has two degrees of freedom (e.g. anova) -
p.value
: the two-sided p-value associated with the observed statistic -
method
: the name of the inferential statistical test -
estimate
: estimated value of the effect size -
conf.low
: lower bound for the effect size estimate -
conf.high
: upper bound for the effect size estimate -
conf.level
: width of the confidence interval -
conf.method
: method used to compute confidence interval -
conf.distribution
: statistical distribution for the effect -
effectsize
: the name of the effect size -
n.obs
: number of observations -
expression
: pre-formatted expression containing statistical details
For examples, see data frame output vignette.
Random-effects meta-analysis
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing and Effect size estimation
Type | Test | CI available? | Function used |
Parametric | Pearson's correlation coefficient | Yes | correlation::correlation() |
Non-parametric | Spearman's rank correlation coefficient | Yes | correlation::correlation() |
Robust | Winsorized Pearson's correlation coefficient | Yes | correlation::correlation() |
Bayesian | Bayesian Pearson's correlation coefficient | Yes | correlation::correlation() |
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Note
Important: The function assumes that you have already downloaded the
needed package ({metafor}
, {metaplus}
, or {metaBMA}
) for meta-analysis.
If they are not available, you will be asked to install them.
Examples
set.seed(123)
library(statsExpressions)
# let's use `mag` dataset from `{metaplus}`
data(mag, package = "metaplus")
dat <- dplyr::rename(mag, estimate = yi, std.error = sei)
# ----------------------- parametric ----------------------------------------
meta_analysis(dat)
# ----------------------- robust --------------------------------------------
meta_analysis(dat, type = "random", random = "normal")
# ----------------------- Bayesian ------------------------------------------
meta_analysis(dat, type = "bayes")
Movie information and user ratings from IMDB.
Description
Movie information and user ratings from IMDB.
Usage
movies_long
Format
A data frame with 1,579 rows and 8 variables
title. Title of the movie.
year. Year of release.
budget. Total budget (if known) in US dollars
length. Length in minutes.
rating. Average IMDB user rating.
votes. Number of IMDB users who rated this movie.
mpaa. MPAA rating.
genre. Different genres of movies (action, animation, comedy, drama, documentary, romance, short).
Details
Modified dataset from {ggplot2movies}
package.
Source
https://CRAN.R-project.org/package=ggplot2movies
Examples
dim(movies_long)
head(movies_long)
dplyr::glimpse(movies_long)
One-sample tests
Description
Parametric, non-parametric, robust, and Bayesian one-sample tests.
Usage
one_sample_test(
data,
x,
type = "parametric",
test.value = 0,
alternative = "two.sided",
digits = 2L,
conf.level = 0.95,
tr = 0.2,
bf.prior = 0.707,
effsize.type = "g",
...
)
Arguments
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
A numeric variable from the data frame |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
test.value |
A number indicating the true value of the mean (Default:
|
alternative |
a character string specifying the alternative
hypothesis, must be one of |
digits |
Number of digits for rounding or significant figures. May also
be |
conf.level |
Scalar between |
tr |
Trim level for the mean when carrying out |
bf.prior |
A number between |
effsize.type |
Type of effect size needed for parametric tests. The
argument can be |
... |
Currently ignored. |
Value
The returned tibble data frame can contain some or all of the following columns (the exact columns will depend on the statistical test):
-
statistic
: the numeric value of a statistic -
df
: the numeric value of a parameter being modeled (often degrees of freedom for the test) -
df.error
anddf
: relevant only if the statistic in question has two degrees of freedom (e.g. anova) -
p.value
: the two-sided p-value associated with the observed statistic -
method
: the name of the inferential statistical test -
estimate
: estimated value of the effect size -
conf.low
: lower bound for the effect size estimate -
conf.high
: upper bound for the effect size estimate -
conf.level
: width of the confidence interval -
conf.method
: method used to compute confidence interval -
conf.distribution
: statistical distribution for the effect -
effectsize
: the name of the effect size -
n.obs
: number of observations -
expression
: pre-formatted expression containing statistical details
For examples, see data frame output vignette.
One-sample tests
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
Hypothesis testing
Type | Test | Function used |
Parametric | One-sample Student's t-test | stats::t.test() |
Non-parametric | One-sample Wilcoxon test | stats::wilcox.test() |
Robust | Bootstrap-t method for one-sample test | WRS2::trimcibt() |
Bayesian | One-sample Student's t-test | BayesFactor::ttestBF() |
Effect size estimation
Type | Effect size | CI available? | Function used |
Parametric | Cohen's d, Hedge's g | Yes | effectsize::cohens_d() , effectsize::hedges_g() |
Non-parametric | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
Robust | trimmed mean | Yes | WRS2::trimcibt() |
Bayes Factor | difference | Yes | bayestestR::describe_posterior() |
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
# for reproducibility
set.seed(123)
# ----------------------- parametric -----------------------
one_sample_test(mtcars, wt, test.value = 3)
# ----------------------- non-parametric -------------------
one_sample_test(mtcars, wt, test.value = 3, type = "nonparametric")
# ----------------------- robust ---------------------------
one_sample_test(mtcars, wt, test.value = 3, type = "robust")
# ----------------------- Bayesian -------------------------
one_sample_test(mtcars, wt, test.value = 3, type = "bayes")
One-way analysis of variance (ANOVA)
Description
Parametric, non-parametric, robust, and Bayesian one-way ANOVA.
Usage
oneway_anova(
data,
x,
y,
subject.id = NULL,
type = "parametric",
paired = FALSE,
digits = 2L,
conf.level = 0.95,
effsize.type = "omega",
var.equal = FALSE,
bf.prior = 0.707,
tr = 0.2,
nboot = 100L,
...
)
Arguments
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The grouping (or independent) variable from |
y |
The response (or outcome or dependent) variable from |
subject.id |
Relevant in case of a repeated measures or within-subjects
design ( |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
paired |
Logical that decides whether the experimental design is
repeated measures/within-subjects or between-subjects. The default is
|
digits |
Number of digits for rounding or significant figures. May also
be |
conf.level |
Scalar between |
effsize.type |
Type of effect size needed for parametric tests. The
argument can be |
var.equal |
a logical variable indicating whether to treat the
two variances as being equal. If |
bf.prior |
A number between |
tr |
Trim level for the mean when carrying out |
nboot |
Number of bootstrap samples for computing confidence interval
for the effect size (Default: |
... |
Additional arguments (currently ignored). |
Value
The returned tibble data frame can contain some or all of the following columns (the exact columns will depend on the statistical test):
-
statistic
: the numeric value of a statistic -
df
: the numeric value of a parameter being modeled (often degrees of freedom for the test) -
df.error
anddf
: relevant only if the statistic in question has two degrees of freedom (e.g. anova) -
p.value
: the two-sided p-value associated with the observed statistic -
method
: the name of the inferential statistical test -
estimate
: estimated value of the effect size -
conf.low
: lower bound for the effect size estimate -
conf.high
: upper bound for the effect size estimate -
conf.level
: width of the confidence interval -
conf.method
: method used to compute confidence interval -
conf.distribution
: statistical distribution for the effect -
effectsize
: the name of the effect size -
n.obs
: number of observations -
expression
: pre-formatted expression containing statistical details
For examples, see data frame output vignette.
One-way ANOVA
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
between-subjects
Hypothesis testing
Type | No. of groups | Test | Function used |
Parametric | > 2 | Fisher's or Welch's one-way ANOVA | stats::oneway.test() |
Non-parametric | > 2 | Kruskal-Wallis one-way ANOVA | stats::kruskal.test() |
Robust | > 2 | Heteroscedastic one-way ANOVA for trimmed means | WRS2::t1way() |
Bayes Factor | > 2 | Fisher's ANOVA | BayesFactor::anovaBF() |
Effect size estimation
Type | No. of groups | Effect size | CI available? | Function used |
Parametric | > 2 | partial eta-squared, partial omega-squared | Yes | effectsize::omega_squared() , effectsize::eta_squared() |
Non-parametric | > 2 | rank epsilon squared | Yes | effectsize::rank_epsilon_squared() |
Robust | > 2 | Explanatory measure of effect size | Yes | WRS2::t1way() |
Bayes Factor | > 2 | Bayesian R-squared | Yes | performance::r2_bayes() |
within-subjects
Hypothesis testing
Type | No. of groups | Test | Function used |
Parametric | > 2 | One-way repeated measures ANOVA | afex::aov_ez() |
Non-parametric | > 2 | Friedman rank sum test | stats::friedman.test() |
Robust | > 2 | Heteroscedastic one-way repeated measures ANOVA for trimmed means | WRS2::rmanova() |
Bayes Factor | > 2 | One-way repeated measures ANOVA | BayesFactor::anovaBF() |
Effect size estimation
Type | No. of groups | Effect size | CI available? | Function used |
Parametric | > 2 | partial eta-squared, partial omega-squared | Yes | effectsize::omega_squared() , effectsize::eta_squared() |
Non-parametric | > 2 | Kendall's coefficient of concordance | Yes | effectsize::kendalls_w() |
Robust | > 2 | Algina-Keselman-Penfield robust standardized difference average | Yes | WRS2::wmcpAKP() |
Bayes Factor | > 2 | Bayesian R-squared | Yes | performance::r2_bayes() |
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
# for reproducibility
set.seed(123)
library(statsExpressions)
# ----------------------- parametric -------------------------------------
# between-subjects
oneway_anova(
data = mtcars,
x = cyl,
y = wt
)
# within-subjects design
oneway_anova(
data = iris_long,
x = condition,
y = value,
subject.id = id,
paired = TRUE
)
# ----------------------- non-parametric ----------------------------------
# between-subjects
oneway_anova(
data = mtcars,
x = cyl,
y = wt,
type = "np"
)
# within-subjects design
oneway_anova(
data = iris_long,
x = condition,
y = value,
subject.id = id,
paired = TRUE,
type = "np"
)
# ----------------------- robust -------------------------------------
# between-subjects
oneway_anova(
data = mtcars,
x = cyl,
y = wt,
type = "r"
)
# within-subjects design
oneway_anova(
data = iris_long,
x = condition,
y = value,
subject.id = id,
paired = TRUE,
type = "r"
)
# ----------------------- Bayesian -------------------------------------
# between-subjects
oneway_anova(
data = mtcars,
x = cyl,
y = wt,
type = "bayes"
)
# within-subjects design
oneway_anova(
data = iris_long,
x = condition,
y = value,
subject.id = id,
paired = TRUE,
type = "bayes"
)
p-value adjustment method text
Description
Preparing text to describe which p-value adjustment method was used
Usage
p_adjust_text(p.adjust.method)
Arguments
p.adjust.method |
Adjustment method for p-values for multiple
comparisons. Possible methods are: |
Value
Standardized text description for what method was used.
Examples
p_adjust_text("none")
p_adjust_text("BY")
Multiple pairwise comparison for one-way design
Description
Calculate parametric, non-parametric, robust, and Bayes Factor pairwise comparisons between group levels with corrections for multiple testing.
Usage
pairwise_comparisons(
data,
x,
y,
subject.id = NULL,
type = "parametric",
paired = FALSE,
var.equal = FALSE,
tr = 0.2,
bf.prior = 0.707,
p.adjust.method = "holm",
digits = 2L,
...
)
Arguments
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The grouping (or independent) variable from |
y |
The response (or outcome or dependent) variable from |
subject.id |
Relevant in case of a repeated measures or within-subjects
design ( |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
paired |
Logical that decides whether the experimental design is
repeated measures/within-subjects or between-subjects. The default is
|
var.equal |
a logical variable indicating whether to treat the
two variances as being equal. If |
tr |
Trim level for the mean when carrying out |
bf.prior |
A number between |
p.adjust.method |
Adjustment method for p-values for multiple
comparisons. Possible methods are: |
digits |
Number of digits for rounding or significant figures. May also
be |
... |
Additional arguments passed to other methods. |
Value
The returned tibble data frame can contain some or all of the following columns (the exact columns will depend on the statistical test):
-
statistic
: the numeric value of a statistic -
df
: the numeric value of a parameter being modeled (often degrees of freedom for the test) -
df.error
anddf
: relevant only if the statistic in question has two degrees of freedom (e.g. anova) -
p.value
: the two-sided p-value associated with the observed statistic -
method
: the name of the inferential statistical test -
estimate
: estimated value of the effect size -
conf.low
: lower bound for the effect size estimate -
conf.high
: upper bound for the effect size estimate -
conf.level
: width of the confidence interval -
conf.method
: method used to compute confidence interval -
conf.distribution
: statistical distribution for the effect -
effectsize
: the name of the effect size -
n.obs
: number of observations -
expression
: pre-formatted expression containing statistical details
For examples, see data frame output vignette.
Pairwise comparison tests
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
between-subjects
Hypothesis testing
Type | Equal variance? | Test | p-value adjustment? | Function used |
Parametric | No | Games-Howell test | Yes | PMCMRplus::gamesHowellTest() |
Parametric | Yes | Student's t-test | Yes | stats::pairwise.t.test() |
Non-parametric | No | Dunn test | Yes | PMCMRplus::kwAllPairsDunnTest() |
Robust | No | Yuen's trimmed means test | Yes | WRS2::lincon() |
Bayesian | NA | Student's t-test | NA | BayesFactor::ttestBF() |
Effect size estimation
Not supported.
within-subjects
Hypothesis testing
Type | Test | p-value adjustment? | Function used |
Parametric | Student's t-test | Yes | stats::pairwise.t.test() |
Non-parametric | Durbin-Conover test | Yes | PMCMRplus::durbinAllPairsTest() |
Robust | Yuen's trimmed means test | Yes | WRS2::rmmcp() |
Bayesian | Student's t-test | NA | BayesFactor::ttestBF() |
Effect size estimation
Not supported.
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
References
For more, see: https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/pairwise.html
Examples
# for reproducibility
set.seed(123)
library(statsExpressions)
#------------------- between-subjects design ----------------------------
# parametric
# if `var.equal = TRUE`, then Student's t-test will be run
pairwise_comparisons(
data = mtcars,
x = cyl,
y = wt,
type = "parametric",
var.equal = TRUE,
paired = FALSE,
p.adjust.method = "none"
)
# if `var.equal = FALSE`, then Games-Howell test will be run
pairwise_comparisons(
data = mtcars,
x = cyl,
y = wt,
type = "parametric",
var.equal = FALSE,
paired = FALSE,
p.adjust.method = "bonferroni"
)
# non-parametric (Dunn test)
pairwise_comparisons(
data = mtcars,
x = cyl,
y = wt,
type = "nonparametric",
paired = FALSE,
p.adjust.method = "none"
)
# robust (Yuen's trimmed means *t*-test)
pairwise_comparisons(
data = mtcars,
x = cyl,
y = wt,
type = "robust",
paired = FALSE,
p.adjust.method = "fdr"
)
# Bayes Factor (Student's *t*-test)
pairwise_comparisons(
data = mtcars,
x = cyl,
y = wt,
type = "bayes",
paired = FALSE
)
#------------------- within-subjects design ----------------------------
# parametric (Student's *t*-test)
pairwise_comparisons(
data = bugs_long,
x = condition,
y = desire,
subject.id = subject,
type = "parametric",
paired = TRUE,
p.adjust.method = "BH"
)
# non-parametric (Durbin-Conover test)
pairwise_comparisons(
data = bugs_long,
x = condition,
y = desire,
subject.id = subject,
type = "nonparametric",
paired = TRUE,
p.adjust.method = "BY"
)
# robust (Yuen's trimmed means t-test)
pairwise_comparisons(
data = bugs_long,
x = condition,
y = desire,
subject.id = subject,
type = "robust",
paired = TRUE,
p.adjust.method = "hommel"
)
# Bayes Factor (Student's *t*-test)
pairwise_comparisons(
data = bugs_long,
x = condition,
y = desire,
subject.id = subject,
type = "bayes",
paired = TRUE
)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Expressions with statistics for tidy regression data frames
Description
Expressions with statistics for tidy regression data frames
Usage
tidy_model_expressions(
data,
statistic = NULL,
digits = 2L,
effsize.type = "omega",
...
)
Arguments
data |
A tidy data frame from regression model object (see
|
statistic |
Which statistic is to be displayed (either |
digits |
Number of digits for rounding or significant figures. May also
be |
effsize.type |
Type of effect size needed for parametric tests. The
argument can be |
... |
Currently ignored. |
Details
When any of the necessary numeric column values (estimate
, statistic
,
p.value
) are missing, for these rows, a NULL
is returned instead of an
expression with empty strings.
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
# setup
set.seed(123)
library(statsExpressions)
# extract a tidy data frame
df <- tidy_model_parameters(lm(wt ~ am * cyl, mtcars))
# create a column containing expression; the expression will depend on `statistic`
tidy_model_expressions(df, statistic = "t")
tidy_model_expressions(df, statistic = "z")
tidy_model_expressions(df, statistic = "chi")
Convert {parameters}
package output to {tidyverse}
conventions
Description
Convert {parameters}
package output to {tidyverse}
conventions
Usage
tidy_model_parameters(model, ...)
Arguments
model |
Statistical Model. |
... |
Arguments passed to or from other methods. Non-documented arguments are
|
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
model <- lm(mpg ~ wt + cyl, data = mtcars)
tidy_model_parameters(model)
Two-sample tests
Description
Parametric, non-parametric, robust, and Bayesian two-sample tests.
Usage
two_sample_test(
data,
x,
y,
subject.id = NULL,
type = "parametric",
paired = FALSE,
alternative = "two.sided",
digits = 2L,
conf.level = 0.95,
effsize.type = "g",
var.equal = FALSE,
bf.prior = 0.707,
tr = 0.2,
nboot = 100L,
...
)
Arguments
data |
A data frame (or a tibble) from which variables specified are to
be taken. Other data types (e.g., matrix,table, array, etc.) will not
be accepted. Additionally, grouped data frames from |
x |
The grouping (or independent) variable from |
y |
The response (or outcome or dependent) variable from |
subject.id |
Relevant in case of a repeated measures or within-subjects
design ( |
type |
A character specifying the type of statistical approach:
You can specify just the initial letter. |
paired |
Logical that decides whether the experimental design is
repeated measures/within-subjects or between-subjects. The default is
|
alternative |
a character string specifying the alternative
hypothesis, must be one of |
digits |
Number of digits for rounding or significant figures. May also
be |
conf.level |
Scalar between |
effsize.type |
Type of effect size needed for parametric tests. The
argument can be |
var.equal |
a logical variable indicating whether to treat the
two variances as being equal. If |
bf.prior |
A number between |
tr |
Trim level for the mean when carrying out |
nboot |
Number of bootstrap samples for computing confidence interval
for the effect size (Default: |
... |
Currently ignored. |
Value
The returned tibble data frame can contain some or all of the following columns (the exact columns will depend on the statistical test):
-
statistic
: the numeric value of a statistic -
df
: the numeric value of a parameter being modeled (often degrees of freedom for the test) -
df.error
anddf
: relevant only if the statistic in question has two degrees of freedom (e.g. anova) -
p.value
: the two-sided p-value associated with the observed statistic -
method
: the name of the inferential statistical test -
estimate
: estimated value of the effect size -
conf.low
: lower bound for the effect size estimate -
conf.high
: upper bound for the effect size estimate -
conf.level
: width of the confidence interval -
conf.method
: method used to compute confidence interval -
conf.distribution
: statistical distribution for the effect -
effectsize
: the name of the effect size -
n.obs
: number of observations -
expression
: pre-formatted expression containing statistical details
For examples, see data frame output vignette.
Two-sample tests
The table below provides summary about:
statistical test carried out for inferential statistics
type of effect size estimate and a measure of uncertainty for this estimate
functions used internally to compute these details
between-subjects
Hypothesis testing
Type | No. of groups | Test | Function used |
Parametric | 2 | Student's or Welch's t-test | stats::t.test() |
Non-parametric | 2 | Mann-Whitney U test | stats::wilcox.test() |
Robust | 2 | Yuen's test for trimmed means | WRS2::yuen() |
Bayesian | 2 | Student's t-test | BayesFactor::ttestBF() |
Effect size estimation
Type | No. of groups | Effect size | CI available? | Function used |
Parametric | 2 | Cohen's d, Hedge's g | Yes | effectsize::cohens_d() , effectsize::hedges_g() |
Non-parametric | 2 | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
Robust | 2 | Algina-Keselman-Penfield robust standardized difference | Yes | WRS2::akp.effect() |
Bayesian | 2 | difference | Yes | bayestestR::describe_posterior() |
within-subjects
Hypothesis testing
Type | No. of groups | Test | Function used |
Parametric | 2 | Student's t-test | stats::t.test() |
Non-parametric | 2 | Wilcoxon signed-rank test | stats::wilcox.test() |
Robust | 2 | Yuen's test on trimmed means for dependent samples | WRS2::yuend() |
Bayesian | 2 | Student's t-test | BayesFactor::ttestBF() |
Effect size estimation
Type | No. of groups | Effect size | CI available? | Function used |
Parametric | 2 | Cohen's d, Hedge's g | Yes | effectsize::cohens_d() , effectsize::hedges_g() |
Non-parametric | 2 | r (rank-biserial correlation) | Yes | effectsize::rank_biserial() |
Robust | 2 | Algina-Keselman-Penfield robust standardized difference | Yes | WRS2::wmcpAKP() |
Bayesian | 2 | difference | Yes | bayestestR::describe_posterior() |
Citation
Patil, I., (2021). statsExpressions: R Package for Tidy Dataframes and Expressions with Statistical Details. Journal of Open Source Software, 6(61), 3236, https://doi.org/10.21105/joss.03236
Examples
# ----------------------- within-subjects -------------------------------------
# data
df <- dplyr::filter(bugs_long, condition %in% c("LDLF", "LDHF"))
# for reproducibility
set.seed(123)
# ----------------------- parametric ---------------------------------------
two_sample_test(df, condition, desire, subject.id = subject, paired = TRUE, type = "parametric")
# ----------------------- non-parametric -----------------------------------
two_sample_test(df, condition, desire, subject.id = subject, paired = TRUE, type = "nonparametric")
# ----------------------- robust --------------------------------------------
two_sample_test(df, condition, desire, subject.id = subject, paired = TRUE, type = "robust")
# ----------------------- Bayesian ---------------------------------------
two_sample_test(df, condition, desire, subject.id = subject, paired = TRUE, type = "bayes")
# ----------------------- between-subjects -------------------------------------
# for reproducibility
set.seed(123)
# ----------------------- parametric ---------------------------------------
# unequal variance
two_sample_test(ToothGrowth, supp, len, type = "parametric")
# equal variance
two_sample_test(ToothGrowth, supp, len, type = "parametric", var.equal = TRUE)
# ----------------------- non-parametric -----------------------------------
two_sample_test(ToothGrowth, supp, len, type = "nonparametric")
# ----------------------- robust --------------------------------------------
two_sample_test(ToothGrowth, supp, len, type = "robust")
# ----------------------- Bayesian ---------------------------------------
two_sample_test(ToothGrowth, supp, len, type = "bayes")