Type: | Package |
Title: | Save Output of Statistical Tests |
Version: | 0.6.3 |
Maintainer: | Willem Sleegers <w.sleegers@me.com> |
Description: | Save the output of statistical tests in an organized file that can be shared with others or used to report statistics in scientific papers. |
URL: | https://willemsleegers.github.io/tidystats/ |
BugReports: | https://github.com/WillemSleegers/tidystats/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
LazyData: | true |
Depends: | R (≥ 4.1.0) |
Imports: | dplyr, tidyr, purrr, stringr, readr, jsonlite, tibble, checkmate |
Suggests: | BayesFactor, knitr, lme4, lmerTest, rmarkdown, effectsize, effsize, Hmisc, afex, emmeans, irr, testthat, MEMSS, lavaan, methods, nlme, rlang, tidyselect |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-04-06 10:18:42 UTC; willem |
Author: | Willem Sleegers [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-04-06 10:30:02 UTC |
Add statistical output to a tidystats list
Description
add_stats()
is used to add the output of a statistical test to a
tidystats list.
Usage
add_stats(
list,
output,
identifier = NULL,
type = NULL,
preregistered = NULL,
notes = NULL,
args = NULL,
class = NULL
)
Arguments
list |
A tidystats list. |
output |
Output of a statistical test. |
identifier |
A string identifying the model. Automatically created if not provided. |
type |
A string specifying the type of analysis: primary, secondary, or exploratory. |
preregistered |
A boolean specifying whether the analysis was preregistered or not. |
notes |
A string specifying additional information. |
args |
A list of additional arguments to customize which statistics should be extracted. See 'Details' for a list of supported functions and their arguments. |
class |
A string to manually specify the class of the object so that tidystats knows how to extract the statistics. See 'Details' for a list of classes that are supported. |
Details
Many functions to perform statistical tests (e.g., t.test()
, lm()
) return
an object containing the statistics. These objects can be stored in variables
and used with add_stats()
to extract the statistics and add them to a
list.
The list can be saved to a file using the write_stats()
function.
For a list of supported functions, see vignette("supported-functions", package = "tidystats")
.
Examples
# Conduct analyses
sleep_wide <- reshape(
sleep,
direction = "wide",
idvar = "ID",
timevar = "group",
sep = "_"
)
sleep_test <- t.test(sleep_wide$extra_1, sleep_wide$extra_2, paired = TRUE)
ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)
lm_D9 <- lm(weight ~ group)
lm_D9_confint <- confint(lm_D9)
npk_aov <- aov(yield ~ block + N * P * K, npk)
# Create an empty list to store the statistics in
statistics <- list()
# Add statistics to the list
statistics <- statistics |>
add_stats(sleep_test, type = "primary", preregistered = TRUE) |>
add_stats(lm_D9) |>
add_stats(lm_D9_confint, class = "confint") |>
add_stats(npk_aov, notes = "An ANOVA example")
Count the number of observations
Description
count_data()
returns the number and proportion of observations for
categorical variables.
Usage
count_data(data, ..., na.rm = FALSE, pct = FALSE)
Arguments
data |
A data frame. |
... |
One or more unquoted (categorical) column names from the data frame, separated by commas. |
na.rm |
A boolean specifying whether missing values (including NaN) should be removed. |
pct |
A boolean indicating whether to calculate percentages instead of
proportions. The default is |
Details
The data frame can be grouped using dplyr::group_by()
so that the number of observations will be calculated within each group
level.
Examples
count_data(quote_source, source)
count_data(quote_source, source, sex)
count_data(quote_source, source, sex, na.rm = TRUE)
count_data(quote_source, source, sex, na.rm = TRUE, pct = TRUE)
# Use dplyr::group_by() to calculate proportions within a group
quote_source |>
dplyr::group_by(source) |>
count_data(sex)
Create a custom statistic
Description
custom_stat()
is used together with the custom_stats()
function to add
statistics from unsupported functions via add_stats()
. See the
custom_stats()
function for more information.
Usage
custom_stat(
name,
value,
symbol = NULL,
subscript = NULL,
interval = NULL,
level = NULL,
lower = NULL,
upper = NULL
)
Arguments
name |
A string specifying the name of the statistic. |
value |
The numeric value of the statistic. |
symbol |
A string specifying the symbol of the statistic to use when reporting the statistic. |
subscript |
A string specifying a subscript to use when reporting the statistic. |
interval |
A string specifying the type of interval if the statistic is a ranged statistic (e.g., 95% confidence interval) |
level |
A numeric value between 0 and 1 indicating the level of the interval. |
lower |
The numeric value of the lower bound of the statistic. |
upper |
The numeric value of the upper bound of the statistic. |
Examples
# Example 1: A single mean value
sample <- rnorm(1000, mean = 0, sd = 1)
mean <- mean(sample)
custom_stat(name = "mean", value = mean, symbol = "M")
# Example 2: A mean with a 95% confidence interval
sample <- rnorm(1000, mean = 0, sd = 1)
mean <- mean(sample)
se <- sd(sample) / sqrt(length(sample))
CI <- c(mean - 1.96 * se, mean + 1.96 * se)
custom_stat(
name = "mean",
value = mean,
symbol = "M",
interval = "CI",
level = .95,
lower = CI[1],
upper = CI[2]
)
Create a collection of custom statistics
Description
custom_stats()
is used to create a collection of statistics from
unsupported functions to add to a list via add_stats()
.
Usage
custom_stats(method, statistics)
Arguments
method |
A string specifying the method used to obtain the statistics. |
statistics |
A vector of statistics created with |
Details
custom_stats()
supports adding a single statistic or a group of statistics.
Multiple groups of statistics are not (yet) supported.
Examples
# Example: BIC Bayes factor (approx.)
# Run the analysis
lm1 <- lm(Fertility ~ ., data = swiss)
lm2 <- update(lm1, . ~ . - Examination)
BF10 <- 1 / exp((BIC(lm2) - BIC(lm1)) / 2)
# Create the custom statistics
BIC_BFs <- custom_stats(
method = "BIC Bayes factor",
statistics = c(
custom_stat(name = "BF", value = BF10, subscript = "10"),
custom_stat(name = "BF", value = 1 / BF10, subscript = "01")
)
)
# Create an empty list
statistics <- list()
# Add the custom statistics to the list
statistics <- add_stats(statistics, BIC_BFs)
Calculate common descriptive statistics
Description
describe_data()
returns a set of common descriptive statistics
(e.g., number of observations, mean, standard deviation) for one or more
numeric variables.
Usage
describe_data(data, ..., na.rm = TRUE, short = FALSE)
Arguments
data |
A data frame. |
... |
One or more unquoted column names from the data frame. |
na.rm |
A boolean indicating whether missing values (including NaN) should be excluded in calculating the descriptives? The default is TRUE. |
short |
A boolean indicating whether only a subset of descriptives
should be reported? If set to |
Details
The data can be grouped using dplyr::group_by()
so that
descriptives will be calculated for each group level.
Skew and kurtosis are based on the datawizard::skewness()
and
datawizard::kurtosis()
functions (Komsta & Novomestky, 2015).
Examples
describe_data(quote_source, response)
describe_data(quote_source, response, na.rm = FALSE)
quote_source |>
dplyr::group_by(source) |>
describe_data(response)
quote_source |>
dplyr::group_by(source) |>
describe_data(response, short = TRUE)
A Many Labs replication of Lorge & Curtiss (1936)
Description
Data of multiple studies from the Many Labs project (Klein et al., 2014) replicating Lorge & Curtiss (1936).
Usage
quote_source
Format
A data frame with 6343 rows and 15 columns:
- ID
participant number
- source
attributed source of the quote: Washington or Bin Laden
- response
evaluation of the quote on a 9-point Likert scale, with 1 indicating disagreement and 9 indicating agreement
- age
participant's age
- sex
participant's sex
- citizenship
participant's citizenship
- race
participant's race
- major
participant's major
- native_language
participant's native language
- referrer
location of where the study was conducted
- compensation
how the participant was compensated for their participation
- recruitment
how the participant was recruited
- separation
description of how the study was administered in terms of participant isolation
- us_or_international
whether the study was conducted in the US or outside of the US (international)
- lab_or_online
whether the study was conducted in the lab or online
Details
Lorge and Curtiss (1936) examined how a quotation is perceived when it is attributed to a liked or disliked individual. The quotation of interest was: "I hold it that a little rebellion, now and then, is a good thing, and as necessary in the political world as storms are in the physical world." In one condition the quotation was attributed to Thomas Jefferson, a liked individual, and in the other condition it was attributed to Vladimir Lenin, a disliked individual. More agreement was observed when the quotation was attributed to Jefferson than Lenin. In the replication studies, the quotation was: "I have sworn to only live free, even if I find bitter the taste of death." This quotation was attributed to either George Washington, the liked individual, or Osama Bin Laden, the disliked individual.
References
Lorge, I., & Curtiss, C. C. (1936). Prestige, suggestion, and attitudes. The Journal of Social Psychology, 7, 386-402. doi:10.1080/00224545.1936.9919891
Klein, R.A. et al. (2014) Investigating Variation in Replicability: A "Many Labs" Replication Project. Social Psychology, 45(3), 142-152. doi:10.1027/1864-9335/a000178
Read a .json file that was produced with write_stats()
Description
read_stats()
can read a .json file containing statistics that was produced
using tidystats. It returns a list containing the statistics, with the
identifier as the name for each list element.
Usage
read_stats(file)
Arguments
file |
A string specifying the path to the tidystats data file. |
Examples
# A simple example, assuming there is a file called 'statistics.json'
## Not run:
statistics <- read_stats("statistics.json")
## End(Not run)
# A working example
statistics <- read_stats(
file = system.file("statistics.json", package = "tidystats")
)
Helper functions in tidystats
Description
Functions used under the hood in the tidystats package.
Usage
tidy_matrix(m, symmetric = TRUE)
add_statistic(
list,
name,
value,
symbol = NULL,
subscript = NULL,
interval = NULL,
level = NULL,
lower = NULL,
upper = NULL
)
symbol(
x = c("alpha", "chi_squared", "delta", "guttmans_lambda", "K_squared", "lambda",
"p_hat", "R_squared", "sigma", "t_squared", "tau")
)
expect_equal_models(model, expected_tidy_model, tolerance = 0.001)
write_test_stats(x, path, digits = 6)
Arguments
m |
A matrix. |
Functions
-
tidy_matrix()
: Function to convert matrix objects to a tidy data frame. -
add_statistic()
: Function to add a statistic to list. It helps create the list and ignores NULL values. -
symbol()
: Function to return symbols in ASCII. -
expect_equal_models()
: Function to compare tidied models during testing. -
write_test_stats()
: Function to save tidied statistics to a file. Since these files are used during testing, it's important to only store files with correctly tidied statistics, hence the prompt.
Tidy the output of a statistics object
Description
tidy_stats
is used to convert the output of a statistical object to a
list of organized statistics. The tidy_stats
function is automatically
run when add_stats
is used, so there is generally no need to use this
function explicitly. It can be used, however, to peek at how the output of a
specific analysis will be organized.
Usage
## S3 method for class 'BFBayesFactor'
tidy_stats(x, args = NULL)
## S3 method for class 'rcorr'
tidy_stats(x, args = NULL)
tidy_stats(x, args = NULL)
## S3 method for class 'htest'
tidy_stats(x, args = NULL)
## S3 method for class 'pairwise.htest'
tidy_stats(x, args = NULL)
## S3 method for class 'lm'
tidy_stats(x, args = NULL)
## S3 method for class 'glm'
tidy_stats(x, args = NULL)
## S3 method for class 'anova'
tidy_stats(x, args = NULL)
## S3 method for class 'aov'
tidy_stats(x, args = NULL)
## S3 method for class 'aovlist'
tidy_stats(x, args = NULL)
## S3 method for class 'confint'
tidy_stats(x, args = NULL)
## S3 method for class 'afex_aov'
tidy_stats(x, args = NULL)
## S3 method for class 'mixed'
tidy_stats(x, args = NULL)
## S3 method for class 'brmsfit'
tidy_stats(x, args = NULL)
## S3 method for class 'effectsize_difference'
tidy_stats(x, args = NULL)
## S3 method for class 'effsize'
tidy_stats(x, args = NULL)
## S3 method for class 'emmGrid'
tidy_stats(x, args = NULL)
## S3 method for class 'summary_emm'
tidy_stats(x, args = NULL)
## S3 method for class 'emm_list'
tidy_stats(x, args = NULL)
## S3 method for class 'icclist'
tidy_stats(x, args = NULL)
## S3 method for class 'lavaan'
tidy_stats(x, args = NULL)
## S3 method for class 'lmerMod'
tidy_stats(x, args = NULL)
## S3 method for class 'lmerModLmerTest'
tidy_stats(x, args = NULL)
## S3 method for class 'lme'
tidy_stats(x, args = NULL)
## S3 method for class 'nlme'
tidy_stats(x, args = NULL)
## S3 method for class 'anova.lme'
tidy_stats(x, args = NULL)
## S3 method for class 'gls'
tidy_stats(x, args = NULL)
## S3 method for class 'psych'
tidy_stats(x, args = NULL)
## S3 method for class 'tidystats'
tidy_stats(x, args = NULL)
## S3 method for class 'tidystats_descriptives'
tidy_stats(x, args = NULL)
## S3 method for class 'tidystats_counts'
tidy_stats(x, args = NULL)
Arguments
x |
The output of a statistical test. |
Methods (by class)
-
tidy_stats(BFBayesFactor)
: tidy_stats method for class 'BayesFactor' -
tidy_stats(rcorr)
: tidy_stats method for class 'rcorr' -
tidy_stats(htest)
: tidy_stats method for class 'htest' -
tidy_stats(pairwise.htest)
: tidy_stats method for class 'pairwise.htest' -
tidy_stats(lm)
: tidy_stats method for class 'lm' -
tidy_stats(glm)
: tidy_stats method for class 'glm' -
tidy_stats(anova)
: tidy_stats method for class 'anova' -
tidy_stats(aov)
: tidy_stats method for class 'aov' -
tidy_stats(aovlist)
: tidy_stats method for class 'aovlist' -
tidy_stats(confint)
: tidy_stats method for class 'confint' -
tidy_stats(afex_aov)
: tidy_stats method for class 'afex_aov' -
tidy_stats(mixed)
: tidy_stats method for class 'afex_aov' -
tidy_stats(brmsfit)
: tidy_stats method for class 'brmsfit' -
tidy_stats(effectsize_difference)
: tidy_stats method for class 'effectsize_difference' -
tidy_stats(effsize)
: tidy_stats method for class 'effsize' -
tidy_stats(emmGrid)
: tidy_stats method for class 'emmGrid' -
tidy_stats(summary_emm)
: tidy_stats method for class 'summary_emm' -
tidy_stats(emm_list)
: tidy_stats method for class 'emm_list' -
tidy_stats(icclist)
: tidy_stats method for class 'icclist' -
tidy_stats(lavaan)
: tidy_stats method for class 'lavaan' -
tidy_stats(lmerMod)
: tidy_stats method for class 'lmerMod' -
tidy_stats(lmerModLmerTest)
: tidy_stats method for class 'lmerModLmerTest' -
tidy_stats(lme)
: tidy_stats method for class 'lme' -
tidy_stats(nlme)
: tidy_stats method for class 'nlme' -
tidy_stats(anova.lme)
: tidy_stats method for class 'anova.lme' -
tidy_stats(gls)
: tidy_stats method for class 'gls' -
tidy_stats(psych)
: tidy_stats method for class 'psych' -
tidy_stats(tidystats)
: tidy_stats method for class 'tidystats' -
tidy_stats(tidystats_descriptives)
: tidy_stats method for class 'tidystats_descriptives' -
tidy_stats(tidystats_counts)
: tidy_stats method for class 'tidystats_counts'
Convert a tidystats list to a data frame
Description
tidy_stats_to_data_frame()
converts a tidystats list to a data frame,
which can then be used to extract specific statistics using standard
subsetting functions (e.g., dplyr::filter()
).
Usage
tidy_stats_to_data_frame(x)
Arguments
x |
A tidystats list. |
Examples
# Conduct analyses
sleep_wide <- reshape(
sleep,
direction = "wide",
idvar = "ID",
timevar = "group",
sep = "_"
)
sleep_test <- t.test(sleep_wide$extra_1, sleep_wide$extra_2, paired = TRUE)
ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)
lm_D9 <- lm(weight ~ group)
npk_aov <- aov(yield ~ block + N * P * K, npk)
# Create an empty list to store the statistics in
statistics <- list()
# Add statistics
statistics <- statistics |>
add_stats(sleep_test, type = "primary", preregistered = TRUE) |>
add_stats(lm_D9) |>
add_stats(npk_aov, notes = "An ANOVA example")
# Convert the list to a data frame
df <- tidy_stats_to_data_frame(statistics)
# Select all the p-values
dplyr::filter(df, statistic_name == "p")
Write a tidystats list to a file
Description
write_stats()
writes a tidystats list to a .json file.
Usage
write_stats(x, path, digits = 6)
Arguments
x |
A tidystats list. |
path |
A string specifying the path or connection to write to. |
digits |
The number of decimal places to use. The default is 6. |
Examples
# Conduct a statistical test
sleep_wide <- reshape(
sleep,
direction = "wide",
idvar = "ID",
timevar = "group",
sep = "_"
)
sleep_test <- t.test(sleep_wide$extra_1, sleep_wide$extra_2, paired = TRUE)
# Create an empty list
statistics <- list()
# Add statistics to the list
statistics <- add_stats(statistics, sleep_test)
# Save the statistics to a file
dir <- tempdir()
write_stats(statistics, file.path(dir, "statistics.json"))