Help for package tidystats

Type:

Package

Title:

Save Output of Statistical Tests

Version:

0.6.3

Maintainer:

Willem Sleegers <w.sleegers@me.com>

Description:

Save the output of statistical tests in an organized file that can be shared with others or used to report statistics in scientific papers.

URL:

https://willemsleegers.github.io/tidystats/

BugReports:

https://github.com/WillemSleegers/tidystats/issues

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.3.2

LazyData:

true

Depends:

R (≥ 4.1.0)

Imports:

dplyr, tidyr, purrr, stringr, readr, jsonlite, tibble, checkmate

Suggests:

BayesFactor, knitr, lme4, lmerTest, rmarkdown, effectsize, effsize, Hmisc, afex, emmeans, irr, testthat, MEMSS, lavaan, methods, nlme, rlang, tidyselect

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-04-06 10:18:42 UTC; willem

Author:

Willem Sleegers [aut, cre]

Repository:

CRAN

Date/Publication:

2025-04-06 10:30:02 UTC

Add statistical output to a tidystats list

Description

add_stats() is used to add the output of a statistical test to a tidystats list.

Usage

add_stats(
  list,
  output,
  identifier = NULL,
  type = NULL,
  preregistered = NULL,
  notes = NULL,
  args = NULL,
  class = NULL
)

Arguments

list

A tidystats list.

output

Output of a statistical test.

identifier

A string identifying the model. Automatically created if not provided.

type

A string specifying the type of analysis: primary, secondary, or exploratory.

preregistered

A boolean specifying whether the analysis was preregistered or not.

notes

A string specifying additional information.

args

A list of additional arguments to customize which statistics should be extracted. See 'Details' for a list of supported functions and their arguments.

class

A string to manually specify the class of the object so that tidystats knows how to extract the statistics. See 'Details' for a list of classes that are supported.

Details

Many functions to perform statistical tests (e.g., t.test(), lm()) return an object containing the statistics. These objects can be stored in variables and used with add_stats() to extract the statistics and add them to a list.

The list can be saved to a file using the write_stats() function.

For a list of supported functions, see vignette("supported-functions", package = "tidystats").

Examples

# Conduct analyses
sleep_wide <- reshape(
  sleep,
  direction = "wide",
  idvar = "ID",
  timevar = "group",
  sep = "_"
)
sleep_test <- t.test(sleep_wide$extra_1, sleep_wide$extra_2, paired = TRUE)

ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)
lm_D9 <- lm(weight ~ group)
lm_D9_confint <- confint(lm_D9)

npk_aov <- aov(yield ~ block + N * P * K, npk)

# Create an empty list to store the statistics in
statistics <- list()

# Add statistics to the list
statistics <- statistics |>
  add_stats(sleep_test, type = "primary", preregistered = TRUE) |>
  add_stats(lm_D9) |>
  add_stats(lm_D9_confint, class = "confint") |>
  add_stats(npk_aov, notes = "An ANOVA example")

Count the number of observations

Description

count_data() returns the number and proportion of observations for categorical variables.

Usage

count_data(data, ..., na.rm = FALSE, pct = FALSE)

Arguments

data

A data frame.

...

One or more unquoted (categorical) column names from the data frame, separated by commas.

na.rm

A boolean specifying whether missing values (including NaN) should be removed.

pct

A boolean indicating whether to calculate percentages instead of proportions. The default is FALSE.

Details

The data frame can be grouped using dplyr::group_by() so that the number of observations will be calculated within each group level.

Examples

count_data(quote_source, source)
count_data(quote_source, source, sex)
count_data(quote_source, source, sex, na.rm = TRUE)
count_data(quote_source, source, sex, na.rm = TRUE, pct = TRUE)

# Use dplyr::group_by() to calculate proportions within a group
quote_source |>
  dplyr::group_by(source) |>
  count_data(sex)

Create a custom statistic

Description

custom_stat() is used together with the custom_stats() function to add statistics from unsupported functions via add_stats(). See the custom_stats() function for more information.

Usage

custom_stat(
  name,
  value,
  symbol = NULL,
  subscript = NULL,
  interval = NULL,
  level = NULL,
  lower = NULL,
  upper = NULL
)

Arguments

name

A string specifying the name of the statistic.

value

The numeric value of the statistic.

symbol

A string specifying the symbol of the statistic to use when reporting the statistic.

subscript

A string specifying a subscript to use when reporting the statistic.

interval

A string specifying the type of interval if the statistic is a ranged statistic (e.g., 95% confidence interval)

level

A numeric value between 0 and 1 indicating the level of the interval.

lower

The numeric value of the lower bound of the statistic.

upper

The numeric value of the upper bound of the statistic.

Examples

# Example 1: A single mean value
sample <- rnorm(1000, mean = 0, sd = 1)
mean <- mean(sample)

custom_stat(name = "mean", value = mean, symbol = "M")

# Example 2: A mean with a 95% confidence interval
sample <- rnorm(1000, mean = 0, sd = 1)
mean <- mean(sample)
se <- sd(sample) / sqrt(length(sample))
CI <- c(mean - 1.96 * se, mean + 1.96 * se)

custom_stat(
  name = "mean",
  value = mean,
  symbol = "M",
  interval = "CI",
  level = .95,
  lower = CI[1],
  upper = CI[2]
)

Create a collection of custom statistics

Description

custom_stats() is used to create a collection of statistics from unsupported functions to add to a list via add_stats().

Usage

custom_stats(method, statistics)

Arguments

method

A string specifying the method used to obtain the statistics.

statistics

A vector of statistics created with custom_stat().

Details

custom_stats() supports adding a single statistic or a group of statistics. Multiple groups of statistics are not (yet) supported.

Examples

# Example: BIC Bayes factor (approx.)
# Run the analysis
lm1 <- lm(Fertility ~ ., data = swiss)
lm2 <- update(lm1, . ~ . - Examination)

BF10 <- 1 / exp((BIC(lm2) - BIC(lm1)) / 2)

# Create the custom statistics
BIC_BFs <- custom_stats(
  method = "BIC Bayes factor",
  statistics = c(
    custom_stat(name = "BF", value = BF10, subscript = "10"),
    custom_stat(name = "BF", value = 1 / BF10, subscript = "01")
  )
)

# Create an empty list
statistics <- list()

# Add the custom statistics to the list
statistics <- add_stats(statistics, BIC_BFs)

Calculate common descriptive statistics

Description

describe_data() returns a set of common descriptive statistics (e.g., number of observations, mean, standard deviation) for one or more numeric variables.

Usage

describe_data(data, ..., na.rm = TRUE, short = FALSE)

Arguments

data

A data frame.

...

One or more unquoted column names from the data frame.

na.rm

A boolean indicating whether missing values (including NaN) should be excluded in calculating the descriptives? The default is TRUE.

short

A boolean indicating whether only a subset of descriptives should be reported? If set to ⁠TRUE``, only the N, M, and SD will be returned. The default is ⁠FALSE'.

Details

The data can be grouped using dplyr::group_by() so that descriptives will be calculated for each group level.

Skew and kurtosis are based on the datawizard::skewness() and datawizard::kurtosis() functions (Komsta & Novomestky, 2015).

Examples

describe_data(quote_source, response)

describe_data(quote_source, response, na.rm = FALSE)

quote_source |>
  dplyr::group_by(source) |>
  describe_data(response)

quote_source |>
  dplyr::group_by(source) |>
  describe_data(response, short = TRUE)

A Many Labs replication of Lorge & Curtiss (1936)

Description

Data of multiple studies from the Many Labs project (Klein et al., 2014) replicating Lorge & Curtiss (1936).

Usage

quote_source

Format

A data frame with 6343 rows and 15 columns:

ID: participant number
source: attributed source of the quote: Washington or Bin Laden
response: evaluation of the quote on a 9-point Likert scale, with 1 indicating disagreement and 9 indicating agreement
age: participant's age
sex: participant's sex
citizenship: participant's citizenship
race: participant's race
major: participant's major
native_language: participant's native language
referrer: location of where the study was conducted
compensation: how the participant was compensated for their participation
recruitment: how the participant was recruited
separation: description of how the study was administered in terms of participant isolation
us_or_international: whether the study was conducted in the US or outside of the US (international)
lab_or_online: whether the study was conducted in the lab or online

Details

Lorge and Curtiss (1936) examined how a quotation is perceived when it is attributed to a liked or disliked individual. The quotation of interest was: "I hold it that a little rebellion, now and then, is a good thing, and as necessary in the political world as storms are in the physical world." In one condition the quotation was attributed to Thomas Jefferson, a liked individual, and in the other condition it was attributed to Vladimir Lenin, a disliked individual. More agreement was observed when the quotation was attributed to Jefferson than Lenin. In the replication studies, the quotation was: "I have sworn to only live free, even if I find bitter the taste of death." This quotation was attributed to either George Washington, the liked individual, or Osama Bin Laden, the disliked individual.

References

Lorge, I., & Curtiss, C. C. (1936). Prestige, suggestion, and attitudes. The Journal of Social Psychology, 7, 386-402. doi:10.1080/00224545.1936.9919891

Klein, R.A. et al. (2014) Investigating Variation in Replicability: A "Many Labs" Replication Project. Social Psychology, 45(3), 142-152. doi:10.1027/1864-9335/a000178

Read a .json file that was produced with `write_stats()`

Description

read_stats() can read a .json file containing statistics that was produced using tidystats. It returns a list containing the statistics, with the identifier as the name for each list element.

Usage

read_stats(file)

Arguments

file

A string specifying the path to the tidystats data file.

Examples

# A simple example, assuming there is a file called 'statistics.json'
## Not run: 
statistics <- read_stats("statistics.json")

## End(Not run)

# A working example
statistics <- read_stats(
  file = system.file("statistics.json", package = "tidystats")
)

Helper functions in tidystats

Description

Functions used under the hood in the tidystats package.

Usage

tidy_matrix(m, symmetric = TRUE)

add_statistic(
  list,
  name,
  value,
  symbol = NULL,
  subscript = NULL,
  interval = NULL,
  level = NULL,
  lower = NULL,
  upper = NULL
)

symbol(
  x = c("alpha", "chi_squared", "delta", "guttmans_lambda", "K_squared", "lambda",
    "p_hat", "R_squared", "sigma", "t_squared", "tau")
)

expect_equal_models(model, expected_tidy_model, tolerance = 0.001)

write_test_stats(x, path, digits = 6)

Arguments

m

A matrix.

Functions

tidy_matrix(): Function to convert matrix objects to a tidy data frame.
add_statistic(): Function to add a statistic to list. It helps create the list and ignores NULL values.
symbol(): Function to return symbols in ASCII.
expect_equal_models(): Function to compare tidied models during testing.
write_test_stats(): Function to save tidied statistics to a file. Since these files are used during testing, it's important to only store files with correctly tidied statistics, hence the prompt.

Tidy the output of a statistics object

Description

tidy_stats is used to convert the output of a statistical object to a list of organized statistics. The tidy_stats function is automatically run when add_stats is used, so there is generally no need to use this function explicitly. It can be used, however, to peek at how the output of a specific analysis will be organized.

Usage

## S3 method for class 'BFBayesFactor'
tidy_stats(x, args = NULL)

## S3 method for class 'rcorr'
tidy_stats(x, args = NULL)

tidy_stats(x, args = NULL)

## S3 method for class 'htest'
tidy_stats(x, args = NULL)

## S3 method for class 'pairwise.htest'
tidy_stats(x, args = NULL)

## S3 method for class 'lm'
tidy_stats(x, args = NULL)

## S3 method for class 'glm'
tidy_stats(x, args = NULL)

## S3 method for class 'anova'
tidy_stats(x, args = NULL)

## S3 method for class 'aov'
tidy_stats(x, args = NULL)

## S3 method for class 'aovlist'
tidy_stats(x, args = NULL)

## S3 method for class 'confint'
tidy_stats(x, args = NULL)

## S3 method for class 'afex_aov'
tidy_stats(x, args = NULL)

## S3 method for class 'mixed'
tidy_stats(x, args = NULL)

## S3 method for class 'brmsfit'
tidy_stats(x, args = NULL)

## S3 method for class 'effectsize_difference'
tidy_stats(x, args = NULL)

## S3 method for class 'effsize'
tidy_stats(x, args = NULL)

## S3 method for class 'emmGrid'
tidy_stats(x, args = NULL)

## S3 method for class 'summary_emm'
tidy_stats(x, args = NULL)

## S3 method for class 'emm_list'
tidy_stats(x, args = NULL)

## S3 method for class 'icclist'
tidy_stats(x, args = NULL)

## S3 method for class 'lavaan'
tidy_stats(x, args = NULL)

## S3 method for class 'lmerMod'
tidy_stats(x, args = NULL)

## S3 method for class 'lmerModLmerTest'
tidy_stats(x, args = NULL)

## S3 method for class 'lme'
tidy_stats(x, args = NULL)

## S3 method for class 'nlme'
tidy_stats(x, args = NULL)

## S3 method for class 'anova.lme'
tidy_stats(x, args = NULL)

## S3 method for class 'gls'
tidy_stats(x, args = NULL)

## S3 method for class 'psych'
tidy_stats(x, args = NULL)

## S3 method for class 'tidystats'
tidy_stats(x, args = NULL)

## S3 method for class 'tidystats_descriptives'
tidy_stats(x, args = NULL)

## S3 method for class 'tidystats_counts'
tidy_stats(x, args = NULL)

Arguments

x

The output of a statistical test.

Methods (by class)

tidy_stats(BFBayesFactor): tidy_stats method for class 'BayesFactor'
tidy_stats(rcorr): tidy_stats method for class 'rcorr'
tidy_stats(htest): tidy_stats method for class 'htest'
tidy_stats(pairwise.htest): tidy_stats method for class 'pairwise.htest'
tidy_stats(lm): tidy_stats method for class 'lm'
tidy_stats(glm): tidy_stats method for class 'glm'
tidy_stats(anova): tidy_stats method for class 'anova'
tidy_stats(aov): tidy_stats method for class 'aov'
tidy_stats(aovlist): tidy_stats method for class 'aovlist'
tidy_stats(confint): tidy_stats method for class 'confint'
tidy_stats(afex_aov): tidy_stats method for class 'afex_aov'
tidy_stats(mixed): tidy_stats method for class 'afex_aov'
tidy_stats(brmsfit): tidy_stats method for class 'brmsfit'
tidy_stats(effectsize_difference): tidy_stats method for class 'effectsize_difference'
tidy_stats(effsize): tidy_stats method for class 'effsize'
tidy_stats(emmGrid): tidy_stats method for class 'emmGrid'
tidy_stats(summary_emm): tidy_stats method for class 'summary_emm'
tidy_stats(emm_list): tidy_stats method for class 'emm_list'
tidy_stats(icclist): tidy_stats method for class 'icclist'
tidy_stats(lavaan): tidy_stats method for class 'lavaan'
tidy_stats(lmerMod): tidy_stats method for class 'lmerMod'
tidy_stats(lmerModLmerTest): tidy_stats method for class 'lmerModLmerTest'
tidy_stats(lme): tidy_stats method for class 'lme'
tidy_stats(nlme): tidy_stats method for class 'nlme'
tidy_stats(anova.lme): tidy_stats method for class 'anova.lme'
tidy_stats(gls): tidy_stats method for class 'gls'
tidy_stats(psych): tidy_stats method for class 'psych'
tidy_stats(tidystats): tidy_stats method for class 'tidystats'
tidy_stats(tidystats_descriptives): tidy_stats method for class 'tidystats_descriptives'
tidy_stats(tidystats_counts): tidy_stats method for class 'tidystats_counts'

Convert a tidystats list to a data frame

Description

tidy_stats_to_data_frame() converts a tidystats list to a data frame, which can then be used to extract specific statistics using standard subsetting functions (e.g., dplyr::filter()).

Usage

tidy_stats_to_data_frame(x)

Arguments

x

A tidystats list.

Examples

# Conduct analyses
sleep_wide <- reshape(
  sleep,
  direction = "wide",
  idvar = "ID",
  timevar = "group",
  sep = "_"
)
sleep_test <- t.test(sleep_wide$extra_1, sleep_wide$extra_2, paired = TRUE)

ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)
lm_D9 <- lm(weight ~ group)

npk_aov <- aov(yield ~ block + N * P * K, npk)

# Create an empty list to store the statistics in
statistics <- list()

# Add statistics
statistics <- statistics |>
  add_stats(sleep_test, type = "primary", preregistered = TRUE) |>
  add_stats(lm_D9) |>
  add_stats(npk_aov, notes = "An ANOVA example")

# Convert the list to a data frame
df <- tidy_stats_to_data_frame(statistics)

# Select all the p-values
dplyr::filter(df, statistic_name == "p")

Write a tidystats list to a file

Description

write_stats() writes a tidystats list to a .json file.

Usage

write_stats(x, path, digits = 6)

Arguments

x

A tidystats list.

path

A string specifying the path or connection to write to.

digits

The number of decimal places to use. The default is 6.

Examples

# Conduct a statistical test
sleep_wide <- reshape(
  sleep,
  direction = "wide",
  idvar = "ID",
  timevar = "group",
  sep = "_"
)
sleep_test <- t.test(sleep_wide$extra_1, sleep_wide$extra_2, paired = TRUE)

# Create an empty list
statistics <- list()

# Add statistics to the list
statistics <- add_stats(statistics, sleep_test)

# Save the statistics to a file
dir <- tempdir()
write_stats(statistics, file.path(dir, "statistics.json"))

Add statistical output to a tidystats list

Description

Usage

Arguments

Details

Examples

Count the number of observations

Description

Usage

Arguments

Details

Examples

Create a custom statistic

Description

Usage

Arguments

Examples

Create a collection of custom statistics

Description

Usage

Arguments

Details

Examples

Calculate common descriptive statistics

Description

Usage

Arguments

Details

Examples

A Many Labs replication of Lorge & Curtiss (1936)

Description

Usage

Format

Details

References

Read a .json file that was produced with write_stats()

Description

Usage

Arguments

Examples

Helper functions in tidystats

Description

Usage

Arguments

Functions

Tidy the output of a statistics object

Description

Usage

Arguments

Methods (by class)

Convert a tidystats list to a data frame

Description

Usage

Arguments

Examples

Write a tidystats list to a file

Description

Usage

Arguments

Examples

Read a .json file that was produced with `write_stats()`