Title: | Helper Tools for Teaching Statistical Data Analysis |
Description: | Provides functions and data-sets that are helpful for teaching statistics and data analysis. It was originally designed for use when teaching students in the Psychology Department at Nottingham Trent University. |
Version: | 0.1.0 |
License: | GPL-3 |
Maintainer: | Mark Andrews <mark.andrews@ntu.ac.uk> |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.2 |
Imports: | ggplot2 (≥ 3.3.5), dplyr (≥ 1.0.7), tidyr (≥ 1.1.3), magrittr (≥ 2.0.1), formula.tools (≥ 1.7.1), GGally (≥ 2.1.2), purrr (≥ 0.3.4), ggthemes (≥ 4.2.4), psych (≥ 2.1.6), plyr (≥ 1.8.6), effsize (≥ 0.8.1), rlang (≥ 0.4.11), fastDummies (≥ 1.6.3), ez (≥ 4.4), tidyselect (≥ 1.1.1) |
Depends: | R (≥ 2.10) |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2021-09-14 21:42:51 UTC; andrews |
Author: | Mark Andrews [aut, cre], Jens Roeser [aut] |
Repository: | CRAN |
Date/Publication: | 2021-09-15 09:20:05 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Anthropometric data from US Army Personnel
Description
Data on the height, weight, handedness from men and women of different ages and different races.
Usage
ansur
Format
A data frame with 6068 observations from 9 variables.
- subjectid
Unique ID of the person
- gender
Binary variable indicating the subject's sex:
male
orfemale
.- height
Height in centimeters.
- weight
Weight in kilograms.
- handedness
Categorical variable indicating if the person is left, or right handed, or both.
- age
Age in years
- race
Race, with categories like
white
,black
,hispanic
.- height_tercile
The tercile of the person's height.
- age_tercile
The tercile of the person's weight.
Source
This data is a transformed version of data sets obtained the Anthropometric Survey of US Army Personnel (ANSUR 2 or ANSUR II).
Cohen's d and Hedges g effect size
Description
This is wrapper to the effsize::cohen.d()
function.
Usage
cohen_d(...)
Arguments
... |
A comma separated list of arguments. See |
Value
A list of class effsize
as returned by effsize::cohen.d()
.
Examples
cohen_d(weight ~ gender, data = ansur)
cohen_d(age ~ gender, data = schizophrenia)
Calculate Cronbach's alpha for sets of psychometric scale items
Description
This function calculates the Cronbach alpha for one or more sets of psychometric scale items. Each item is a variable in a data frame. Each set of items is defined by a tidy selection of a set of items.
Usage
cronbach(.data, ..., .ci = 0.95)
Arguments
.data |
A data frame with columns that are psychometric items. |
... |
A set of comma separated tidy selectors that selects sets of
columns from |
.ci |
The value of the confidence interval to calculate. |
Value
A data frame whose rows are psychometric scales and for each scale, we have the Cronbach's alpha, and the lower and upper bound of the confidence interval on alpha.
Examples
# Return the Cronbach alpha and 95% ci for two scales.
# The first scale, named `x`, is identified by all items beginning with `x_`.
# The second scale, named `y`, is identified by the consecutive items from `y_1` to `y_10`.
cronbach(test_psychometrics,
x = starts_with('x'),
y = y_1:y_10)
A density plot
Description
This is a wrapper to the typical ggplot
based density plot, i.e., using
geom_density
. A continuous variable, x
, is required as an input.
Optionally, a by
categorical variable can be provided.
Usage
densityplot(
x,
data,
by = NULL,
position = "stack",
facet = NULL,
facet_type = "wrap",
alpha = 1,
xlab = NULL,
ylab = NULL
)
Arguments
x |
The numeric variable that is to be density plotted. |
data |
A data frame with at least one numeric variable (the |
by |
A categorical variable by which to group the |
position |
If the |
facet |
A character string or character vector. If provided, we
|
facet_type |
By default, this takes the value of |
alpha |
The transparency to for the filled histogram bars. This is
probably only required when using |
xlab |
The label of the x-axis (defaults to the |
ylab |
The label of the y-axis (defaults to the |
Value
A ggplot2::ggplot
object, which may be modified with further ggplot2
commands.
Examples
densityplot(x = age, data = schizophrenia, by = gender)
Calculate descriptive statistics
Description
This function is a lightweight wrapper to dplyr
's summarize
function.
It can be used to calculate any descriptive or summary statistic for any
variable in the data set. Optionally, a by
grouping variable can be used,
and then the summary statistics are calculated for each subgroup defined by
the different values of the by
variable.
Usage
describe(data, by = NULL, ...)
Arguments
data |
A data frame |
by |
A grouping variable. If included, the |
... |
Arguments of functions applied to variables, e.g. |
Value
A tibble data frame with each row providing descriptive statistics
for selected variables for each value of the grouping by
variable.
Examples
describe(faithfulfaces, avg = mean(faithful), stdev = sd(faithful))
describe(faithfulfaces, by = face_sex, avg = mean(faithful), stdev = sd(faithful))
Apply multiple descriptive functions to multiple variables
Description
This function is a wrapper to dplyr
's summarize
used with the
across
function. For each variable in a set of variables, calculate each
summary statistic from a list of summary statistic functions. Optionally,
group the variables by a grouping variable, and then calculate the
statistics. Optionally, the tibble that is returned by default, which is in a
wide format, can be pivoted to a long format.
Usage
describe_across(data, variables, functions, by = NULL, pivot = FALSE)
Arguments
data |
A data frame |
variables |
A vector of variables in |
functions |
A list of summary statistic function. If it is named list, which is recommended, the names of the functions will be used to make the names of the returned data frame. |
by |
A grouping variable. If included, the |
pivot |
A logical variable indicating if the wide format da |
Value
A tibble data frame. If pivot = F
, which is the default, the data
frames contains one row per value of the by
variable, or just one row overall
if there is no by
variable. If pivot = T
, there will be k
+ 1 columns
if there is no by
variable, or k
+ 2 columns if there is a by
variable,
where k
is the number of functions.
Examples
describe_across(faithfulfaces,
variables = c(trustworthy, faithful),
functions = list(avg = mean, stdev = sd),
pivot = TRUE)
describe_across(faithfulfaces,
variables = c(trustworthy, faithful),
functions = list(avg = mean, stdev = sd),
by = face_sex)
describe_across(faithfulfaces,
variables = c(trustworthy, faithful),
functions = list(avg = mean, stdev = sd),
by = face_sex,
pivot = TRUE)
Analysis of variance
Description
This is wrapper to the ez::ezANOVA()
function.
Usage
ez_anova(
data,
dv,
wid,
within = NULL,
within_full = NULL,
within_covariates = NULL,
between = NULL,
between_covariates = NULL,
observed = NULL,
diff = NULL,
reverse_diff = FALSE,
type = 2,
white.adjust = FALSE,
detailed = FALSE,
return_aov = FALSE
)
Arguments
data |
Data frame containing the data to be analyzed. |
dv |
Name of the column in |
wid |
Name of the column in |
within |
Names of columns in |
within_full |
Same as within, but intended to specify the full within-Ss design in cases where the data have not already been collapsed to means per condition specified by |
within_covariates |
Names of columns in |
between |
Names of columns in |
between_covariates |
Names of columns in |
observed |
Names of columns in |
diff |
Names of any variables to collapse to a difference score. If a single value, may be specified by name alone; if multiple values, must be specified as a .() list. |
reverse_diff |
Logical. If TRUE, triggers reversal of the difference collapse requested by |
type |
Numeric value (either |
white.adjust |
Only affects behaviour if the design contains only between-Ss predictor variables. If not FALSE, the value is passed as the white.adjust argument to Anova, which provides heteroscedasticity correction. |
detailed |
Logical. If TRUE, returns extra information (sums of squares columns, intercept row, etc.) in the ANOVA table. |
return_aov |
Logical. If TRUE, computes and returns an aov object corresponding to the requested ANOVA (useful for computing post-hoc contrasts). |
Value
A list containing one or more components as returned by ez::ezANOVA()
.
Examples
ez_anova(data = selfesteem2_long,
dv = score,
wid = id,
within = c(time, treatment),
detailed = TRUE,
return_aov = TRUE)
Faithfulness from a Photo?
Description
Ratings from a facial photo and actual faithfulness.
Usage
faithfulfaces
Format
A data frame with 170 observations on the following 7 variables.
- sex_dimorph
Rating of sexual dimorphism (masculinity for males, femininity for females)
- attractive
Rating of attractiveness
- cheater
Was the face subject unfaithful to a partner?
- trustworthy
Rating of trustworthiness
- faithful
Rating of faithfulness
- face_sex
Sex of face (female or male)
- rater_sex
Sex of rater (female or male)
Details
College students were asked to look at a photograph of an opposite-sex adult face and to rate the person, on a scale from 1 (low) to 10 (high), for attractiveness. They were also asked to rate trustworthiness, faithfulness, and sexual dimorphism (i.e., how masculine a male face is and how feminine a female face is). Overall, 68 students (34 males and 34 females) rated 170 faces (88 men and 82 women).
Source
This data set was taken from the
Stats2Data
R package. From the description in that package, the original is based on
G. Rhodes et al. (2012), "Women can judge sexual unfaithfulness from
unfamiliar men's faces," Biology Letters, November 2012. All of the 68
raters were heterosexual Caucasians, as were the 170 persons who were
rated. (We have deleted 3 subjects with missing values and 16 subjects who
were over age 35.)
Show the dummy code of a categorical variable
Description
For each value of a categorical variables, show the binary
code used in a regression model to represent its value.
This is wrapper to the fastDummies::dummy_cols()
function.
Usage
get_dummy_code(Df, variable)
Arguments
Df |
A data frame |
variable |
A categorical variable (e.g. character vector or factor). |
Value
A data frame whose rows provide the dummy code for
each distinct value of variable
.
Examples
get_dummy_code(PlantGrowth, group)
A histogram
Description
This is a wrapper to the typical ggplot
based histogram, i.e., using
geom_histogram
. A continuous variable, x
, is required as an input.
Optionally, a by
categorical variable can be provided.
Usage
histogram(
x,
data,
by = NULL,
position = "stack",
facet = NULL,
facet_type = "wrap",
bins = 10,
alpha = 1,
xlab = NULL,
ylab = NULL
)
Arguments
x |
The numeric variable that is to be histogrammed. |
data |
A data frame with at least one numeric variable (the |
by |
A categorical variable by which to group the |
position |
If the |
facet |
A character string or character vector. If provided, we
|
facet_type |
By default, this takes the value of |
bins |
The number of bins to use in the histogram. |
alpha |
The transparency to for the filled histogram bars. This is
probably only required when using |
xlab |
The label of the x-axis (defaults to the |
ylab |
The label of the y-axis (defaults to the |
Value
A ggplot2::ggplot
object, which may be modified with further ggplot2
commands.
Examples
histogram(x= age, data = schizophrenia, by = gender, bins = 20)
histogram(x= age, data = schizophrenia, by = gender, position = 'identity', bins = 20, alpha = 0.7)
histogram(x= age, data = schizophrenia, by = gender, position = 'dodge', bins = 20)
histogram(x = weight, bins = 20, data = ansur, facet = height_tercile)
histogram(x = weight, bins = 20, data = ansur,
facet = c(height_tercile, age_tercile), facet_type = 'grid')
Make a interaction line plot
Description
Make a interaction line plot
Usage
interaction_line_plot(y, x, by, data, ylim = NULL, xlab = NULL, ylab = NULL)
Arguments
y |
A continuous variable to be plotted along the y-axis |
x |
A continuous variable to be plotted along the x-axis |
by |
A categorical variable by which we split the data and create one line plot for each resulting group |
data |
A data frame with the |
ylim |
A vector of limits for the y-axis |
xlab |
The label of the x-axis (defaults to the |
ylab |
The label of the y-axis (defaults to the |
Value
A ggplot2::ggplot
object, which may be modified with further ggplot2
commands.
Examples
interaction_line_plot(y = score, x = time, by = treatment,
data = selfesteem2_long, ylim = c(70, 100))
interaction_line_plot(y = score, x = time, by = treatment,
data = selfesteem2_long,
xlab = 'measurement time',
ylab = 'self esteem score',
ylim = c(70, 100))
Job Satisfaction Data for Two-Way ANOVA
Description
Contains the job satisfaction score organized by gender and education level.
This data set was taken from the
datarium
R
package.
Usage
data("jobsatisfaction")
Format
A data frame with 58 rows and 3 columns.
Examples
data(jobsatisfaction)
jobsatisfaction
Paired samples t-test
Description
A wrapper to stats::t.test()
with paired = TRUE
.
Usage
paired_t_test(y1, y2, data, ...)
Arguments
y1 |
A numeric vector of observations |
y2 |
A numeric vector of observations, with each value of y2 is assumed to be paired, such as by repeated measures, the corresponding value of y1. |
data |
A data frame with |
... |
Additional arguments passed to |
Value
A list with class "htest" as returned by stats::t.test()
.
Examples
paired_t_test(y1, y2, data = pairedsleep)
Paired sleep data
Description
Data which show the effect of two soporific drugs (increase in hours of sleep compared to control) on 10 patients.
Usage
pairedsleep
Format
A data frame with 10 observations on the following 3 variables.
- ID
The patient ID.
- y1
The increase in hours, relative to control, for drug 1.
- y2
The increase in hours, relative to control, for drug 2.
Source
This data is a transformed version of datasets::sleep.
A pairs plot
Description
This is a wrapper to the GGally
based pairs plot of a list of variables
displayed as scatterplots for pairs of continuous variables, density functions in
the diagonal, and boxplots for pairs of continuous and categorical variables.
Optionally, a by
categorical variable can be provided.
Usage
pairs_plot(variables, data, by = NULL)
Arguments
variables |
A vector of variable names |
data |
The data frame. |
by |
An optional variable, usually categorical (factor or character), by which the data are grouped and coloured. |
Value
A GGally::ggpairs
plot.
Examples
# A simple pairs plot
pairs_plot(variables = c("sex_dimorph", "attractive"),
data = faithfulfaces)
# A pairs plot with grouping variable
pairs_plot(variables = c("sex_dimorph", "attractive"),
by = face_sex,
data = faithfulfaces)
Pairwise t-test
Description
This is wrapper to the pairwise.t.test
function. The p-value adjustment is
"bonferroni" by default. Other possible values are "holm", "hochberg",
"hommel", "BH", "BY", "fdr", "none". See stats::p.adjust()
.
Usage
pairwise_t_test(formula, data, p_adj = "bonferroni")
Arguments
formula |
A two sided formula with one variable on either side, e.g. y ~
x, where the left hand side, dependent, variable is a numeric variable in
|
data |
A data frame that contains the dependent and independent variables. |
p_adj |
The p-value adjustment method (see Description). |
Value
An object of class pairwise.htest
as returned by stats::pairwise.t.test()
.
Examples
data_df <- dplyr::mutate(vizverb, IV = interaction(task, response))
pairwise_t_test(time ~ IV, data = data_df)
Recode specified values by new values
Description
Recode specified values by new values
Usage
re_code(x, from, to)
Arguments
x |
A vector, including column of data frame |
from |
The set of old values to be replaced by new ones |
to |
The set of new values to replace the old ones |
Value
A vector that is the input vector but with old values replaced by new ones.
Examples
# Replace any occurrence of 1 and 2 with 101 and 201, respectively
x <- c(1, 2, 3, 4, 5, 1, 2)
re_code(x, from = c(1, 2), to = c(101, 201))
A two dimensional scatterplot
Description
This function is a wrapper around the typical ggplot
command to create two
dimensional scatterplots, i.e. using geom_point
. It provides the option of
colouring point by a third variable, one that is usually, though not
necessarily categorical. Also, it provides the option of placing the line of
best fit on the scatterplot. If points are coloured by a categorical
variable, the a different line of best for each value of the categorical
variable is provided.
Usage
scatterplot(
x,
y,
data,
by = NULL,
best_fit_line = FALSE,
xlab = NULL,
ylab = NULL
)
Arguments
x |
A numeric variable in |
y |
A numeric variable in |
data |
A data frame with the |
by |
An optional variable, usually categorical (factor or character), by which the points in the scatterplot are byed and coloured. |
best_fit_line |
A logical variable indicating if the line of best fit should shown or not. |
xlab |
The label of the x-axis (defaults to the |
ylab |
The label of the y-axis (defaults to the |
Value
A ggplot2::ggplot
object, which may be modified with further ggplot2
commands.
Examples
scatterplot(x = attractive, y = trustworthy, data = faithfulfaces)
scatterplot(x = attractive, y = trustworthy, data = faithfulfaces,
xlab = 'attractiveness', ylab = 'trustworthiness')
scatterplot(x = attractive, y = trustworthy, data = faithfulfaces,
by = face_sex)
scatterplot(x = trustworthy, y = faithful, data = faithfulfaces,
by = face_sex, best_fit_line = TRUE)
Make a scatterplot matrix
Description
Make a scatterplot matrix
Usage
scatterplot_matrix(.data, ..., .by = NULL, .bins = 10)
Arguments
.data |
A data frame |
... |
A comma separated list of tidyselections of columns. This can be as simple as a set of column names. |
.by |
An optional categorical variable by which to group and colour the points. |
.bins |
The number of bins in the histograms on diagonal of matrix. |
Value
A GGally::ggpairs
plot.
Examples
data_df <- test_psychometrics %>%
total_scores(x = starts_with('x_'),
y = starts_with('y_'),
z = starts_with('z_'))
scatterplot_matrix(data_df, x, y, z)
Age of Onset of Schizophrenia Data
Description
Data on sex differences in the age of onset of schizophrenia.
Usage
schizophrenia
Format
A data frame with 251 observations on the following 2 variables.
- age
Age at the time of diagnosis.
- gender
A categorical variable with values
female
andmale
Details
A sex difference in the age of onset of schizophrenia was noted by Kraepelin (1919). Subsequently epidemiological studies of the disorder have consistently shown an earlier onset in men than in women. One model that has been suggested to explain this observed difference is known as the subtype model which postulates two type of schizophrenia, one characterised by early onset, typical symptoms and poor premorbid competence, and the other by late onset, atypical symptoms, and good premorbid competence. The early onset type is assumed to be largely a disorder of men and the late onset largely a disorder of women.
Source
This data set was taken from the
HSAUR
R
package. From the description in that package, the original is E.
Kraepelin (1919), Dementia Praecox and Paraphrenia. Livingstone,
Edinburgh.
Self-Esteem Score Data for One-way Repeated Measures ANOVA
Description
The dataset contains 10 individuals' self-esteem score on three time points during a specific diet to determine whether their self-esteem improved.
One-way repeated measures ANOVA can be performed in order to determine the effect of time on the self-esteem score.
This data set was taken from the
datarium
R
package.
Usage
data("selfesteem")
Format
A data frame with 10 rows and 4 columns.
Examples
data(selfesteem)
selfesteem
Self Esteem Score Data for Two-way Repeated Measures ANOVA
Description
Data are the self esteem score of 12 individuals enrolled in 2 successive short-term trials (4 weeks) - control (placebo) and special diet trials.
The self esteem score was recorded at three time points: at the beginning (t1), midway (t2) and at the end (t3) of the trials.
The same 12 participants are enrolled in the two different trials with enough time between trials.
Two-way repeated measures ANOVA can be performed in order to determine whether there is interaction between time and treatment on the self esteem score.
This data set was taken from the
datarium
R
package.
Usage
data("selfesteem2")
Format
A data frame with 24 rows and 5 columns.
Examples
data(selfesteem2)
selfesteem2
Self Esteem Score Data for Two-way Repeated Measures ANOVA: Long format
Description
Data are the self esteem score of 12 individuals enrolled in 2 successive short-term trials (4 weeks) - control (placebo) and special diet trials.
The self esteem score was recorded at three time points: at the beginning (t1), midway (t2) and at the end (t3) of the trials.
The same 12 participants are enrolled in the two different trials with enough time between trials.
Two-way repeated measures ANOVA can be performed in order to determine whether there is interaction between time and treatment on the self esteem score.
This data set was converted from the selfesteem2
data taken from the
datarium
R
package.
Usage
data("selfesteem2_long")
Format
A data frame with 72 rows and 4 columns.
- id
Unique ID of the person
- treatment
Binary variable indicating the treatment condition:
Diet
orctr
.- time
A categorical variable indicating the time of measurement: beginning (
t1
), midway (t2
) and at the end (t3
)- score
Self-esteem score
Examples
data(selfesteem2_long)
selfesteem2_long
Shapiro-Wilk normality test
Description
This function is a wrapper around stats::shapiro.test()
.
It implements the Shapiro-Wilk test that tests the null hypothesis that a sample of values is a sample from a normal distribution.
Thie function can be applied to single vectors or groups of vectors.
Usage
shapiro_test(y, by = NULL, data)
Arguments
y |
A numeric variable whose normality is being tested. |
by |
An optional grouping variable |
data |
A data frame containing |
Value
A tibble data frame with one row for each value of the by
variable,
or one row overall if there is no by
variable. For the y
variable whose
normality is being tested, for each subset of values corresponding to the
values of they by
variable, or for all values if there is no by
variable, return the Shapiro-Wilk statistic, and the corresponding p-value.
Examples
shapiro_test(faithful, data = faithfulfaces)
shapiro_test(faithful, by = face_sex, data = faithfulfaces)
Descriptive statistics for variables with missing values
Description
Most descriptive statistic function like base::sum()
, base::mean()
,
stats::median()
, etc., do not skip NA
values when computing the results
and so always return NA
if there is at least one NA
in the input vector.
The NA
values can be skipped always by setting the na.rm
argument to
TRUE
. While this is simply to do usually, in some cases, such as when a
function is being passed to another function, setting na.rm = TRUE
in that
function requires creating a new anonymous function. The functions here,
which all end in _xna
, are wrappers to common statistics functions, but
with na.rm = TRUE
.
Usage
sum_xna(...)
mean_xna(...)
median_xna(...)
iqr_xna(...)
sd_xna(...)
var_xna(...)
Arguments
... |
Arguments to a descriptive statistic function |
Value
A numeric vector, usually with one element, that provides the result
of a descriptive statistics function applied to a vector after the NA
values have been removed.
Functions
-
mean_xna
: The arithmetic mean for vectors with missing values. -
median_xna
: The median for vectors with missing values. -
iqr_xna
: The interquartile range for vectors with missing values. -
sd_xna
: The standard deviation for vectors with missing values. -
var_xna
: The variance for vectors with missing values.
Examples
set.seed(10101)
# Make a vector of random numbers
x <- runif(10, min = 10, max = 20)
# Concatenate with a NA value
x1 <- c(NA, x)
sum(x)
sum(x1) # Will be NA
sum_xna(x1) # Will be same as sum(x)
stopifnot(sum_xna(x1) == sum(x))
stopifnot(mean_xna(x1) == mean(x))
stopifnot(median_xna(x1) == median(x))
stopifnot(iqr_xna(x1) == IQR(x))
stopifnot(sd_xna(x1) == sd(x))
stopifnot(var_xna(x1) == var(x))
Independent samples t-test
Description
A wrapper to stats::t.test()
with var.equal = TRUE
.
Usage
t_test(formula, data)
Arguments
formula |
A two sided formula with one variable on either side, e.g. y ~
x, where the left hand side, dependent, variable is a numeric variable in
|
data |
A data frame that contains the dependent and independent variables. |
Value
A list with class "htest" as returned by stats::t.test()
.
Examples
t_test(trustworthy ~ face_sex, data = faithfulfaces)
Psychometrics raw data from testing or demo purposes
Description
Typical psychometrics raw data files have multiple psychometric
variables (scales), each with multiple constituent items.
In this data set, there are three psychometric variables, each with 10 constituent items.
The variables can be labelled x
, y
, and z
.
The constituent items of x
, y
and z
are x_1, x_2 ... x_10
,
y_1, y_2 ... y_10
, z_1, z_2 ... z_10
, respectively.
Usage
data('test_psychometrics')
Format
A data frame with 44 rows and 30 columns
Examples
data(test_psychometrics)
test_psychometrics
Calculate the total scores from sets of scores
Description
Calculate the total scores from sets of scores
Usage
total_scores(.data, ..., .method = "mean", .append = FALSE)
Arguments
.data |
A data frame with columns to summed or averaged over. |
... |
A comma separated set of named tidy selectors, each of which selects a set of columns to which to apply the totalling function. |
.method |
The method used to calculate the total. Must be one of "mean", "sum", or "sum_like". The "mean" is the arithmetic mean, skipping missing values. The "sum" is the sum, skipping missing values. The "sum_like" is the arithmetic mean, again skipping missing values, multiplied by the number of elements, including missing values. |
.append |
logical If FALSE, just the totals be returned. If TRUE, the totals are appended as new columns to original data frame. |
Value
A new data frame with columns representing the total scores.
Examples
# Calculate the mean of all items beginning with `x_` and separately all items beginning with `y_`
total_scores(test_psychometrics, x = starts_with('x'), y = starts_with('y'))
# Calculate the sum of all items beginning with `z_` and separately all items beginning with `x_`
total_scores(test_psychometrics, .method = 'sum', z = starts_with('z'), x = starts_with('x_'))
# Calculate the mean of all items from `x_1` to `y_10`
total_scores(test_psychometrics, xy = x_1:y_10)
A Tukey box-and-whisker plot
Description
This function is a wrapper around a typical ggplot
based box-and-whisker
plot, i.e. using geom_boxplot
, which implements the Tukey variant of the
box-and-whisker plot. The y
variable is the outcome variable whose
distribution is represented by the box-and-whisker plot. If the x
variable
is missing, then a single box-and-whisker plot using all values of y
is
shown. If an x
variable is used, this is used an the independent variable
and one box-and-whisker plot is provided for each set of y
values that
correspond to each unique value of x
. For this reason, x
is usually a
categorical variable. If x
is a continuous numeric variable, it ideally
should have relatively few unique values, so that each value of x
corresponds to a sufficiently large set of y
values.
Usage
tukeyboxplot(
y,
x,
data,
by = NULL,
jitter = FALSE,
box_width = 1/3,
jitter_width = 1/5,
xlab = NULL,
ylab = NULL
)
Arguments
y |
The outcome variable |
x |
The optional independent/predictor/grouping variable |
data |
The data frame with the |
by |
An optional variable, usually categorical (factor or character), by which the points in the box-and-whisker plots are grouped and coloured. |
jitter |
A logical variable, defaulting to |
box_width |
The width of box in each box-and-whisker plot. The default
used, |
jitter_width |
The width of the jitter relative to box width. For
example, set |
xlab |
The label of the x-axis (defaults to the |
ylab |
The label of the y-axis (defaults to the |
Value
A ggplot2::ggplot
object, which may be modified with further ggplot2
commands.
Examples
# A single box-and-whisker plot
tukeyboxplot(y = time, data = vizverb)
# One box-and-whisker plot for each value of a categorical variable
tukeyboxplot(y = time, x = task, data = vizverb)
# Box-and-whisker plots with jitters
tukeyboxplot(y = time, x = task, data = vizverb, jitter = TRUE)
# `tukeyboxplot` can be used with a continuous numeric variable too
tukeyboxplot(y = len, x = dose, data = ToothGrowth)
tukeyboxplot(y = len, x = dose, data = ToothGrowth,
by = supp, jitter = TRUE, box_width = 0.5, jitter_width = 1)
Visual versus Verbal Perception and Responses
Description
An experiment studying the interaction between visual versus perception and visual versus verbal responses.
Usage
vizverb
Format
A data frame with 80 observations on the following 5 variables.
- subject
Subject identifying number (
s1
tos20
)- task
Describe a diagram (
visual
) or a sentence (verbal
)- response
Point response (
visual
) or say response (verbal
)- time
Response time (in seconds)
Details
Subjects carried out two kinds of tasks. One task was visual (describing a diagram), and the other was classed as verbal (reading and describing a sentence sentences). They reported the results either by pointing (a "visual" response), or speaking (a verbal response). Time to complete each task was recorded in seconds.
Source
This data set was taken from the
Stats2Data
R package. From the description in that package, the original data appear
to have been collected in a Mount Holyoke College psychology class based
replication of an experiment by Brooks, L., R. (1968) "Spatial and verbal
components of the act of recall," Canadian J. Psych. V 22, pp. 349 - 368.