Type: Package
Title: Judd, McClelland, & Ryan Formatting for ANOVA Output
Version: 3.0.0
Date: 2024-02-06
Description: Produces ANOVA tables in the format used by Judd, McClelland, and Ryan (2017, ISBN: 978-1138819832) in their introductory textbook, Data Analysis. This includes proportional reduction in error and formatting to improve ease the transition between the book and R.
License: GPL (≥ 3)
URL: https://github.com/UCLATALL/supernova
BugReports: https://github.com/UCLATALL/supernova/issues
Depends: R (≥ 3.4.0)
Imports: cli, methods, pillar (≥ 1.5.0), purrr, rlang, stringr, tibble, vctrs
Suggests: car, covr, dplyr (≥ 1.0.0), ggplot2, lintr, lme4, magrittr, readr, remotes, testthat (≥ 2.1.0), tidyr, vdiffr
Config/testthat/edition: 3
Encoding: UTF-8
RoxygenNote: 7.3.1
NeedsCompilation: no
Packaged: 2024-02-07 02:13:20 UTC; adamblake
Author: Adam Blake ORCID iD [cre, aut], Jeff Chrabaszcz [aut], Ji Son ORCID iD [aut], Jim Stigler ORCID iD [aut]
Maintainer: Adam Blake <adam@coursekata.org>
Repository: CRAN
Date/Publication: 2024-02-07 11:40:02 UTC

ANOVA table with nicer column names.

Description

ANOVA table with nicer column names.

Usage

anova_tbl(model)

Arguments

model

A model fitted by lm or lmer.

Value

An ANOVA table with standard column names.


Plotting method for pairwise objects.

Description

Plotting method for pairwise objects.

Usage

autoplot.pairwise(object, ...)

## S3 method for class 'pairwise'
plot(x, y, ...)

Arguments

object

A pairwise object.

...

Additional arguments passed to the plotting geom.

x

A pairwise object.

y

Ignored, required for compatibility with the plot() generic.

Details

This function requires an optional dependency: ggplot2. When this package is installed, calling autoplot() or plot on a pairwise object will generate a plot of the pairwise comparisons. The plot will show the differences between the groups, with error bars representing the confidence intervals. The x-axis will be labeled with the type of confidence interval used and the values of the differences, and the y-axis will be labeled with the groups being compared. A dashed line at 0 is included to help visualize the differences.

Examples

if (require(ggplot2)) {
  # generate the plot immediately
  pairwise(lm(mpg ~ factor(am) + disp, data = mtcars), plot = TRUE)

  # or save the object and plot it later
  p <- pairwise(lm(mpg ~ factor(am) + disp, data = mtcars))
  plot(p)
}

Paste, Concatenate, add End-Of-Line and Print

Description

Paste, Concatenate, add End-Of-Line and Print

Usage

cat_line(...)

Arguments

...

Character vectors to paste together.

Value

None (invisible NULL).


Check that the arguments are compatible with the rest of the pairwise code.

Description

Check that the arguments are compatible with the rest of the pairwise code.

Usage

check_pairwise_args(fit, alpha)

check_aov_compat(fit)

check_not_empty(fit)

Arguments

fit

A model fit by lm() or aov() (or similar).

alpha

A single double value indicating the alpha to use for the tests.

Functions


Drop a term from the given model

Description

This function is needed to re-fit the models for Type III SS. If you have a model with an interactive term (e.g. y ~ a + b + a:b), when you try to refit without one of the lower-order terms (e.g. y ~ a + a:b) lm() will add it back in. This function uses a fitting function that operates underneath lm() to circumvent this behavior. (It is very similar to drop1().)

Usage

drop_term(fit, term)

Arguments

fit

The model to update.

term

The term to drop from the model.

Value

An object of the class lm.


Print the output of lm() with the fitted equation.

Description

Print the output of lm() with the fitted equation.

Usage

equation(x, digits = max(3L, getOption("digits") - 3L))

Arguments

x

The fitted linear model to print.

digits

The minimal number of significant digits.

Value

Invisibly return the fitted linear model.


Find the categorical variables in a model

Description

Find the categorical variables in a model

Usage

find_categorical_vars(fit)

Arguments

fit

A model fit by lm() or aov() (or similar).

Value

A character vector of the categorical variables in the model. Note these are not terms, they are variables, e.g. interactions are not included here, only the variables they are comprised of.


We have to insert spaces where terms were removed from the part model.

Description

We have to insert spaces where terms were removed from the part model.

Usage

formula_string(obj, part, term)

Build a formula from terms

Description

Build a formula from terms

Usage

frm_build(lhs, rhs, env = parent.frame())

Arguments

lhs

The outcome term for the left-hand side.

rhs

The terms for the right-hand side.

env

The environment to assign to the formula (defaults to calling environment).

Value

The right-hand side terms are joined with +. Then, the right-hand side is joined to the left and returned as a formula.

See Also

formula_extraction formula_expansion


Expand a formula

Description

Expand a formula

Usage

frm_expand(frm)

Arguments

frm

A formula that may have compact terms like a * b.

Value

The expanded formula where terms like a * b are expanded to a + b + a:b.

See Also

formula_building formula_extraction


Extracting from formulae

Description

These tools extracting parts from formulae. The only function that extracts from the left-hand side is frm_outcome. The rest only extract from the right-hand side. The word term is used to denote functions that extract full terms from the formula, whereas var denotes functions that extract the variables the formula uses. For example, the formula y ~ a * b + (1 | group) has terms a, b, a:b, and 1 | group. The same formula has variables a, b, and group.

Usage

frm_outcome(frm)

frm_terms(frm)

frm_interaction_terms(frm)

frm_fixed_terms(frm)

frm_random_terms(frm)

frm_vars(frm)

frm_random_vars(frm)

frm_fixed_vars(frm)

Arguments

frm

The formula to extract values from

Details

These tools are ONLY tested against models and formulae that are explicitly supported. See the README and test cases for more information.

Value

The function name and parameters should be descriptive enough (see Description above). The extracted parts are always strings.

See Also

formula_building formula_expansion


Remove a term or variable from the right-hand side of a formula

Description

Remove a term or variable from the right-hand side of a formula

Usage

frm_remove_term(frm, term)

frm_remove_var(frm, var)

Arguments

frm

The formula to modify.

term, var

The term or variable to drop.

Value

The formula with the term removed.

See Also

formula_building formula_expansion formula_extraction


Get the string representation of the formula.

Description

Get the string representation of the formula.

Usage

frm_string(frm)

Arguments

frm

The formula (or something that can be coerced to a formula).

Value

A character string of the formula.


Generate a List of Models for Computing Different Types of Sums of Squares

Description

This function will return a list of lists where the top-level keys (names) of the items indicate the component of the full model (i.e. the term) that the generated models can be used to test. At each of these keys is a list with both the complex and ⁠simple models⁠ that can be compared to test the component. The complex models always include the target term, and the simple models are identical to the complex except the target term is removed. Thus, when the models are compared (e.g. using anova, except for Type III; see details below), the resulting values will show the effect of adding the target term to the model. There are three generally used approaches to determining what the appropriate comparison models should be, called Type I, II, and III. See the sections below for more information on these types.

Usage

generate_models(model, type = 3)

## S3 method for class 'formula'
generate_models(model, type = 3)

## S3 method for class 'lm'
generate_models(model, type = 3)

Arguments

model

The model to generate the models from, of the type lm(), aov(), or formula().

type

The type of sums of squares to calculate: - Use 1, I, and sequential for Type I. - Use 2, II, and hierarchical for Type II. - Use 3, III, and orthogonal for Type III.

Value

A list of the augmented models for each term, where the associated term is the key for each model in the list.

Type I

For Type I SS, or sequential SS, each term is considered in order after the preceding terms are considered. Consider the example model

Y ~ A + B + A:B

, where ":" indicates an interaction. To determine the Type I effect of A, we would compare the model Y ~ A to the same model without the term: Y ~ NULL. For B, we compare Y ~ A + B to Y ~ A; and for A:B, we compare Y ~ A + B + A:B to Y ~ A + B. Incidentally, the anova() function that ships with the base installation of R computes Type I statistics.

Type II

For Type II SS, or hierarchical SS, each term is considered in the presence of all of the terms that do not include it. For example, consider an example three-way factorial model

Y ~ A + B + C + A:B + A:C + B:C + A:B:C

, where ":" indicates an interaction. The effect of A is found by comparing Y ~ B + C + B:C + A to Y ~ B + C + B:C (the only terms included are those that do not include A). For B, the comparison models would be Y ~ A + C + A:C + B and Y ~ A + C + A:C; for A:B, the models would be Y ~ A + B + C + A:C + B:C + A:B and Y ~ A + B + C + A:C + B:C; and so on.

Type III

For Type III SS, or orthogonal SS, each term is considered in the presence of all of the other terms. For example, consider an example two-way factorial model

Y ~ A + B + A:B

, where : indicates an interaction between the terms. The effect of A, is found by comparing Y ~ B + A:B + A to Y ~ B + A:B; for B, the comparison models would be Y ~ A + A:B + B and Y ~ A + A:B; and for A:B, the models would be Y ~ A + B + A:B and Y ~ A + B.

Unfortunately, anova() cannot be used to compare Type III models. anova() does not allow for violation of the principle of marginality, which is the rule that interactions should only be tested in the context of their lower order terms. When an interaction term is present in a model, anova() will automatically add in the lower-order terms, making a model like Y ~ A + A:B unable to be compared: it will add the lower-order term B,and thus use the model Y ~ A + B + A:B instead. To get the appropriate statistics for Type III comparisons, use drop1() with the full scope, i.e. drop1(model_fit, scope = . ~ .).

Examples

# create all type 2 comparison models
models <- generate_models(
  lm(mpg ~ hp * factor(am), data = mtcars),
  type = 2
)

# compute the SS for the hp term
anova_hp <- anova(models$hp$simple, models$hp$complex)
anova_hp[["Sum of Sq"]][[2]]

Insert a row of data into a table.

Description

Insert a row of data into a table.

Usage

insert_row(df, insert_at, contents)

Arguments

df

The original data.frame.

insert_at

The row in which to insert the data.

contents

The row of contents to insert (should be a vector of length ncol(df)).

Value

The original data.frame with the row of data inserted.


Insert a horizontal rule in a table for pretty printing

Description

Insert a horizontal rule in a table for pretty printing

Usage

insert_rule(df, insert_at)

Arguments

df

The original data.frame

insert_at

The row in which to insert the dashes.

Value

The original data.frame with the horizontal rule inserted.


Get all pairs for a given vector

Description

The output of this function should match the pairs you get when you run TukeyHSD.

Usage

level_pairs(levels)

Arguments

levels

The vector to get pairs for. It is called levels because it was written for the purpose of comparing levels of a factor to one another with multiple comparisons.

Value

A tibble with two columns, group 1 and group 2, where each row is a unique pair.


Remove cases with missing values.

Description

Remove cases with missing values.

Usage

listwise_delete(obj, vars)

## S3 method for class 'data.frame'
listwise_delete(obj, vars = names(obj))

## S3 method for class 'lm'
listwise_delete(obj, vars = all.vars(formula(obj)))

Arguments

obj

The data.frame or lm object to process.

vars

The variables to consider.

Value

For data.frames, the vars are checked for missing values. If one is found on any of the variables, the entire row is removed (list-wise deletion). For linear models, the model is refit after the underlying data have been processed.


Find and return the lower triangle of a matrix

Description

Same as lower.tri() except it returns the values from the matrix (rather than a positional matrix that lets you look up the values).

Usage

lower_tri(x, diag = FALSE)

Arguments

x

a matrix or other R object with length(dim(x)) == 2. For back compatibility reasons, when the above is not fulfilled, as.matrix(x) is called first.

diag

logical. Should the diagonal be included?

Value

The values in the lower triangular part of the matrix.


Get the means and counts for each categorical term in the model

Description

Get the means and counts for each categorical term in the model

Usage

means_and_counts(fit, term)

Arguments

fit

A model fit by lm() or aov() (or similar).

term

If NULL, use each categorical term in the model. Otherwise, only use the given term.

Value

A list of the means and counts for each level of each term.


Constructor for pairwise comparison tables

Description

Constructor for pairwise comparison tables

Usage

new_pairwise_tbl(tbl, term, fit, fwer, alpha, correction)

Arguments

tbl

A tibble-like object.

term

The term the table describes.

fit

The linear model the term comes from.

fwer

The family-wise error-rate for the group of tests in the table.

alpha

The alpha to use when computing the family-wise error-rate.

correction

The type of alpha correction the tests in the table use.

Value

A tibble sub-classed as pairwise_comparison_tbl. These have custom printers and retain their attributes when subsetted.


number vector

Description

This creates a formatted double vector. You can specify the number of digits you want the value to display after the decimal, and the underlying value will not change. Additionally you can explicitly set whether scientific notation should be used and if numbers less than 0 should contain a leading 0.

Usage

number(x = numeric(), digits = 3L, scientific = FALSE, leading_zero = TRUE)

is_number(x)

as_number(x)

Arguments

x
  • For number(): A numeric vector

    • For is_number(): An object to test

    • For as_number(): An object to coerce to a number

digits

The number of digits to display after the decimal point.

scientific

Whether the number should be represented with scientific notation (e.g. 1e2)

leading_zero

Whether a leading zero should be used on numbers less than 0 (e.g. .001)

Value

An S3 vector of class supernova_number. It should behave like a double, but be formatted consistently.

Examples

number(1:5, digits = 3)

Pad x to length of y

Description

Pad x to length of y

Usage

pad(x, y, after = length(x), pad = NA)

Arguments

x

The vector to pad.

y

The vector with target length.

after

A subscript, after which the padding is to be appended.

pad

The value to pad the vector with.

Value

The padded vector.


Pad x to a given output length

Description

Pad x to a given output length

Usage

pad_len(x, output_length, after = length(x), pad = NA)

Arguments

x

The vector to pad.

output_length

The length to pad the vector to.

after

A subscript, after which the padding is to be appended.

pad

The value to pad the vector with.

Value

The padded vector.


Compute all pairwise comparisons between category levels

Description

This function is useful for generating and testing all pairwise comparisons of categorical terms in a linear model. This can be done in base R using functions like pairwise.t.test and TukeyHSD, but these functions are inconsistent both in their output format and their general approach to pairwise comparisons. pairwise() will return a consistent table format, and will make consistent decisions about how to calculate error terms and confidence intervals. See the Details section low for more on how the models are tested (and why your output might not match other functions).

Usage

pairwise(
  fit,
  correction = "Tukey",
  term = NULL,
  alpha = 0.05,
  var_equal = TRUE,
  plot = FALSE
)

pairwise_t(fit, term = NULL, alpha = 0.05, correction = "none")

pairwise_bonferroni(fit, term = NULL, alpha = 0.05)

pairwise_tukey(fit, term = NULL, alpha = 0.05)

Arguments

fit

A model fit by lm() or aov() (or similar).

correction

The type of correction (if any) to perform to maintain the family-wise error-rate specified by alpha: - Tukey: computes Tukey's Honestly Significant Differences (see TukeyHSD()) - Bonferroni: computes pairwise t-tests and then apply a Bonferroni correction - none: computes pairwise t-tests and reports the uncorrected statistics

term

If NULL, use each categorical term in the model. Otherwise, only use the given term.

alpha

The family-wise error-rate to restrict the tests to. If "none" is given for correction, this value is the value for each test (and is used to calculate the family-wise error-rate for the group of tests).

var_equal

If TRUE (default), treat the variances between each group as being equal, otherwise the Welch or Satterthwaite method is used to appropriately weight the variances. Note:, currently only TRUE is supported. Alternative methods forthcoming.

plot

Setting plot to TRUE will automatically call plot on the returned object.

Details

For simple one-way models where a single categorical variable predicts and outcome, you will get output similar to other methods of computing pairwise comparisons. Essentially, the differences on the outcome between each of the groups defined by the categorical variable are compared with the requested test, and their confidence intervals and p-values are adjusted by the requested correction.

However, when more than two variables are entered into the model, the outcome will diverge somewhat from other methods of computing pairwise comparisons. For traditional pairwise tests you need to estimate an error term, usually by pooling the standard deviation of the groups being compared. This means that when you have other predictors in the model, their presence is ignored when running these tests. For the functions in this package, we instead compute the pooled standard error by using the mean squared error (MSE) from the full model fit.

Let's take a concrete example to explain that. If we are predicting a car's miles-per-gallon (mpg) based on whether it has an automatic or manual transmission (am), we can create that linear model and get the pairwise comparisons like this:

pairwise(lm(mpg ~ factor(am), data = mtcars))

The output of this code will have one table showing the comparison of manual and automatic transmissions with regard to miles-per-gallon. The pooled standard error is the same as the square root of the MSE from the full model.

In these data the am variable did not have any other values than automatic and manual, but we can imagine situations where the predictor has more than two levels. In these cases, the pooled SD would be calculated by taking the MSE of the full model (not of each group) and then weighting it based on the size of the groups in question (divide by n).

To improve our model, we might add the car's displacement (disp) as a quantitative predictor:

pairwise(lm(mpg ~ factor(am) + disp, data = mtcars))

Note that the output still only has a table for am. This is because we can't do a pairwise comparison using disp because there are no groups to compare. Most functions will drop or not let you use this variable during pairwise comparisons. Instead, pairwise() uses the same approach as in the 3+ groups situation: we use the MSE for the full model and then weight it by the size of the groups being compared. Because we are using the MSE for the full model, the effect of disp is accounted for in the error term even though we are not explicitly comparing different displacements. Importantly, the interpretation of the outcome is different than in other traditional t-tests. Instead of saying, "there is a difference in miles-per-gallon based on the type of transmission," we must add that this difference is found "after accounting for displacement."

Value

A list of tables organized by the terms in the model. For each term (categorical terms only, as splitting on a continuous variable is generally uninformative), the table describes all of the pairwise-comparisons possible.


Paste together lines of text.

Description

The lines are joined together with a newline (⁠\n⁠) character.

Usage

paste_line(...)

Arguments

...

Vectors of lines of text.

Value

Check out the paste function for more information.


Refit a model, dropping any non-categorical terms.

Description

Refit a model, dropping any non-categorical terms.

Usage

refit_categorical(fit)

Arguments

fit

A model fit by lm() or aov() (or similar).

Value

A linear model that only has categorical predictors.


Rename a column in a data frame

Description

Rename a column in a data frame

Usage

rename(data, col_name, replacement)

Arguments

data

A data frame to modify.

col_name

A character vector of columns to rename.

replacement

A character vector of replacement column names.

Value

Returns the renamed data frame.


Convert SS type parameter to the corresponding numeric value

Description

Convert SS type parameter to the corresponding numeric value

Usage

resolve_type(type)

Arguments

type

The value to convert, either string or numeric.

Value

The numeric value corresponding to the input.


A template for a row in an ANOVA table.

Description

A template for a row in an ANOVA table.

Usage

row_blank(
  term = NA_character_,
  description = NA_character_,
  ss = NA_real_,
  df = NA_integer_,
  ms = ss/df,
  f = NA_real_,
  pre = NA_real_,
  p = NA_real_
)

Arguments

term

The name of the term the row describes.

description

An optional, short description of the term (pedagogical).

ss

The sum of squares for the term (defaults to blank)

df

The degrees of freedom the term uses (defaults to blank).

ms

The mean square for the term (defaults to ss / df)

f

Fisher's F statistic for the term in the model (defaults to blank).

pre

The proportional reduction of error the term provides (defaults to blank).

p

The p-value of the F (and PRE) for the term in the model (defaults to blank).

Value

A tibble_row of length 1 with all of the variables initialized.


Compute and construct an ANOVA table row for an error term

Description

Compute and construct an ANOVA table row for an error term

Usage

row_error(term, description, fit)

Arguments

term

The name of the term the row describes (e.g. Error or Total).

description

An optional, short description of the term (pedagogical).

fit

The model we are describing error from.

Value

A tibble_row with the properties initialized. The code has been written to be as simple and understandable as possible. Please take a look at the source and offer any suggestions for improvement!


Compute and construct an ANOVA table row for a term.

Description

"Term" is loosely defined here and is probably better understood as "everything in the table that is not an error row.

Usage

row_term(name, description, models, term)

Arguments

description

An optional, short description of the term (pedagogical).

models

The models created by generate_models() for comparison.

term

The term to compute the row for.

Value

A tibble_row with the properties initialized. The code has been written to be as simple and understanding as possible. Please take a look at the source and offer any suggestions for improvement!


Select terms based on the user's term specification

Description

Before returning the selection, ensure that the term we are subsetting on exists.

Usage

select_terms(fit, term = NULL)

Arguments

fit

A model fit by lm() or aov() (or similar).

term

If NULL, use each categorical term in the model. Otherwise, only use the given term.

Value

A character vector of terms to run analyses on.


supernova

Description

An alternative set of summary statistics for ANOVA. Sums of squares, degrees of freedom, mean squares, and F value are all computed with Type III sums of squares, but for fully-between subjects designs you can set the type to I or II. This function adds to the output table the proportional reduction in error, an explicit summary of the whole model, separate formatting of p values, and is intended to match the output used in Judd, McClelland, and Ryan (2017).

Usage

supernova(fit, type = 3, verbose = TRUE)

## S3 method for class 'lm'
supernova(fit, type = 3, verbose = TRUE)

## S3 method for class 'lmerMod'
supernova(fit, type = 3, verbose = FALSE)

Arguments

fit

A model fit by lm() or lme4::lmer()

type

The type of sums of squares to calculate (see generate_models()). Defaults to the widely used Type III SS.

verbose

If FALSE, the description column is suppressed.

Value

An object of the class supernova, which has a clean print method for displaying the ANOVA table in the console as well as a named list:

tbl

The ANOVA table as a data.frame

fit

The original lm or lmer object being tested

models

Models created by generate_models

References

Judd, C. M., McClelland, G. H., & Ryan, C. S. (2017). Data Analysis: A Model Comparison Approach to Regression, ANOVA, and Beyond (3rd ed.). New York: Routledge. ISBN:879-1138819832

Examples

supernova(lm(mpg ~ disp, data = mtcars))

change_p_decimals <- supernova(lm(mpg ~ disp, data = mtcars))
print(change_p_decimals, pcut = 8)

Update a model in the environment the model was created in

Description

stats::update() will perform the update in parent.frame() by default, but this can cause problems when the update is called by another function (so the parent frame is no longer the environment the user is in).

Usage

update_in_env(object, formula., ...)

Arguments

object

An existing fit from a model function such as lm(), glm() and many others.

formula.

Changes to the formula – see update.formula for details.

...

Additional arguments to the call, or arguments with changed values. Use name = NULL to remove the argument name.

Value

The updated model is returned.


Extract the variables from a model formula

Description

Extract the variables from a model formula

Usage

variables(object)

## S3 method for class 'supernova'
variables(object)

## S3 method for class 'formula'
variables(object)

## S3 method for class 'lm'
variables(object)

## S3 method for class 'lmerMod'
variables(object)

Arguments

object

A formula, lm or supernova object

Value

A list containing the outcome and predictor variables in the model.