Type: Package
Title: Interactive Tutorials and Data for "Discovering Statistics Using R and RStudio"
Version: 0.2.2
Language: en-GB
Maintainer: Andy Field <andyf@sussex.ac.uk>
Description: Interactive 'R' tutorials and datasets for the textbook Field (2026), "Discovering Statistics Using R and RStudio", https://www.discovr.rocks/. Interactive tutorials cover general workflow in 'R' and 'RStudio', summarizing data, visualizing data, fitting models and bias, correlation, the general linear model (GLM), moderation, mediation, missing values, comparing means using the GLM (analysis of variance), comparing adjusted means (analysis of covariance), factorial designs, repeated measures designs, exploratory factor analysis (EFA). There are no functions, only datasets and interactive tutorials.
License: GPL-3
URL: https://www.discovr.rocks, https://github.com/profandyfield/discovr
BugReports: https://github.com/profandyfield/discovr/issues
Depends: learnr (≥ 0.11.4), R (≥ 4.2.0)
Imports: ggplot2, glue, grDevices, scales
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.3.2
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-06-04 16:11:19 UTC; andyfield
Author: Andy Field [aut, cre, cph]
Repository: CRAN
Date/Publication: 2025-06-06 13:00:10 UTC

discovr: Resources for Discovering Statistics Using R and RStudio (Field, 2023)

Description

The discovr package contains interactive learnr tutorials and datasets that accompany my textbook Discovering Statistics Using R and RStudio.

Who is the package aimed at?

Anyone teaching from or reading Discovering Statistics Using R and RStudio should find these resources useful.

Interactive tutorials

Getting started:

I recommend working through this tutorial on how to install, set up and work within R and RStudio before starting the interactive tutorials.

Running a tutorial:

To run each tutorial execute

learnr::run_tutorial("name_of_tutorial", package = "discovr")

Replacing name_of_tutorial with the name in bold below. For example, to load the tutorial discovr_02 execute:

learnr::run_tutorial("discovr_02", package = "discovr")

Workflow:

The tutorials are self-contained (you practice code in code boxes) so you don't need to use RStudio at the same time. However, to get the most from them I would recommend that you create an RStudio project and within that open (and save) a new R Markdown file each time to work through a tutorial. Within that Markdown file, replicate parts of the code from the tutorial (in code chunks) and use Markdown to write notes about what you have done, and to reflect on things that you have struggled with, or note useful tips to help you remember things. Basically, write a learning journal. This workflow has the advantage of not just teaching you the code that you need to do certain things, but also provides practice in using RStudio itself.

Datasets

See the book or data descriptions for more details. This is a list of available datasets within the package. Raw CSV files are available from the book's website.

Smart Alex solutions

Solutions for end of chapter tasks are available at www.discovr.rocks/solutions/alex/.

Labcoat Leni solutions

Solutions for the Labcoat Leni tasks are available at www.discovr.rocks/solutions/leni/.

Chapter code

Although I recommend working through the interactive solutions, each book Chapter has online code and a downloadable R Markdown file available from www.discovr.rocks/solutions/leni/.

Colour palettes

Colour palettes

A colour blind-friendly pallette based on Okabe and Ito. Also colour themes based around the studio albums of my favourite band Iron Maiden. If you're wondering why some albums are missing, here's the explanation: X Factor (would basically be 8 shades of grey), Fear of the Dark (terrible album), The Book of Souls (would be 8 shades of black). The following palettes exist.

References

Author(s)

Maintainer: Andy Field andyf@sussex.ac.uk [copyright holder]

See Also

Useful links:


Oxoby (2008) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

acdc

Format

A tibble with 36 rows and 2 variables.

Details

AC/DC are one one of the best-selling hard rock bands in history, with around 100 million certified sales, and an estimated 200 million actual sales. In 1980 their original singer Bon Scott died of alcohol poisoning and choking on his own vomit. He was replaced by Brian Johnson who has been their singer ever since. Debate rages with unerring frequency within the rock music press over who is the better frontman. The conventional wisdom is that Bon Scott was better although personally, and I seem to be somewhat in the minority here, I prefer Brian Johnson. Anyway, Robert Oxoby in a playfull paper decided to put this argument to bed once and for all (Oxoby, 2008). Using a task from experimental economics called the ultimatum game, individuals are assigned the role of either proposer or responder and paired randomly. Proposers are allocated $10 from which they have to make a financial offer to the responder (i.e., $2). The responder can accept or reject this offer. If the offer is rejected neither party gets any money, but if the offer is accepted the responder keeps the offered amount (e.g., $2), and the proposer keeps the original amount minus what they offered (e.g., $8). For half of the participants the song 'It's a long way to the top' sung by Bon Scott was playing in the background, for the remainder 'Shoot to thrill' sung by Brian Johnson was playing. Oxoby measured the offers made by proposers, and the minimum offers that responders accepted (called the minimum acceptable offer). He reasoned that people would accept lower offers and propose higher offers when listening to something they like (because of the 'feel-good factor' the music creates). Therefore, by comparing the value of offers made and the minimum acceptable offers in the two groups he could see whether people have more of a feel good factor when listening to Bon or Brian. There were 18 people per group.

These data are approximated from graphs within Oxoby (2008). The object contains the following variables:

Source

www.discovr.rocks/csv/acdc.csv

References


Album sales data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

album_sales

Format

A tibble with 200 rows and 5 variables.

Details

Fictitious data that imagines a world where I have a cool job in the music industry. Except, it's not that cool because my job is to predict album sales (broadly defined in some way that accounts for physical sales, streams and digital sales). In my little fantasy I collect data from 200 releasures (albums). For each one, I have information about the amount spent advertising the album, the number of sales, the number of plays on radio songs from the album had per week, and a rating of the image of the band. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/album_sales.csv


Alien scents

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

alien_scents

Format

A tibble with 50 rows and 4 variables.

Details

The aliens, excited by humans' apparent inability to train sniffer dogs to detect them (see sniffer_dogs), decided to move their invasion plan forward. Aliens are far too wedded to p-values in small samples. They decided that they could make themselves even harder to detect by fooling the sniffer dogs by masking their alien smell. After extensive research they agreed that the two most effective masking scents would be human pheromones (which they hoped would make them smell human-like) and fox-pheromones (because they are a powerful, distracting smell for dogs). The aliens started smearing themselves with humans and foxes and prepared to invade. Meanwhile, the top-secret government agency for Training Extra-terrestrial Reptile Detection (TERD) had got wind of their plan and set about testing how effective it would be. They trained 50 sniffer dogs. During training, these dogs were rewarded for making vocalizations while sniffing alien space lizards. On the test trials, the 50 dogs were allowed to sniff 9 different entities for 1-minute each: 3 alien space lizards, 3 shapeshifting alien space lizard who had taken on humanoid form, and 3 humans. Within each type of entity, 1 had no masking scent, 1 was smothered in human pheromones and 1 wore fox pheromones. The number of vocalizations made during each 1-minute sniffing session was recorded.

Source

www.discovr.rocks/csv/alien_scents.csv


A Matter of Life and Death palette

Description

Colour palette based on Iron Maiden's A Matter of Life and Death album sleeve.

Usage

amolad_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_amolad(n, type = "discrete", reverse = FALSE, ...)

scale_colour_amolad(n, type = "discrete", reverse = FALSE, ...)

scale_fill_amolad(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(amolad_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_amolad()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_amolad()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_amolad()


Video games and aggression example 1

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

angry_pigs

Format

A tibble with 336 rows and 4 variables

Details

Angry Birds is a video game in which you fire birds at pigs. A (fabricated) study was set up in which people played Angry Birds and a control game (Tetris) over a 2-year period (1 year per game). They were put in a pen of pigs for a day before the study, and after 1 month, 6 months and 12 months. Their violent acts towards the pigs were counted. The (fictional) data contains

Source

www.discovr.rocks/csv/speed_date.csv


Video games and aggression example 2

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

angry_real

Format

A tibble with 504 rows and 4 variables

Details

Angry Birds is a video game in which you fire birds at pigs. A (fabricated) study was set up in which people played Angry Birds and a control game (Tetris) over a 2-year period (1 year per game). The participant’s violent acts in everyday life were monitored before the study, and after 1 month, 6 months and 12 months. The (fictional) data contains

Source

www.discovr.rocks/csv/speed_date.csv


Animal bride data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

animal_bride

Format

A tibble with 20 rows and 3 variables.

Details

Fictitious data inspired by two news stories that I enjoyed. The first was about a Sudanese man who was forced to marry a goat after being caught having sex with it. I'm not sure he treated the goat to a nice dinner in a posh restaurant before taking advantage of her, but either way you have to feel sorry for the goat. I'd barely had time to recover from that story when another appeared about an Indian man forced to marry a dog to atone for stoning two dogs and stringing them up in a tree 15 years earlier. Why anyone would think it's a good idea to enter a dog into matrimony with a man with a history of violent behaviour towards dogs is beyond me. Still, I wondered whether a goat or dog made a better spouse. I found (but not really) some other people who had been forced to marry goats and dogs and measured their life satisfaction and, also, how much they like animals. The data contains the following variables:

Source

www.discovr.rocks/csv/animal_bride.csv


Dancing cats and dogs data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

animal_dance

Format

A tibble with 270 rows and 3 variables.

Details

Fictional data about dancing cats and dogs. A researcher was interested in whether animals could be trained to dance. He took 200 cats and 70 dogs and tried to train them to line-dance by giving them either food or affection as a reward for dance-like behaviour. At the end of the week he counted how many animals could line-dance and how many could not. The object contains the following variables:

Source

www.discovr.rocks/csv/animal_dance.csv


Beckham (1929) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

beckham_1929

Format

A tibble with 16 rows and 5 variables.

Details

During my psychology degree I spent a lot of time reading about the civil rights movement in the USA. Instead of reading psychology, I read about Malcolm X and Martin Luther King Jr. For this reason I find Beckham's 1929 study of black Americans a fascinating historical piece of research. Beckham was a black American who founded the psychology laboratory at Howard University, Washington, DC and his wife Ruth was the first black woman ever to be awarded a PhD (also in psychology) at the University of Minnesota. To put some context on the study, it was published 36 years before the Jim Crow laws were finally overthrown by the Civil Rights Act of 1964, and in a time when black Americans were segregated, openly discriminated against and victims of the most abominable violations of civil liberties and human rights (I recommend James Baldwin's superb The fire next time for an insight into the times). The language of the study and the data from it are an uncomfortable reminder of the era in which it was conducted.

Beckham sought to measure the psychological state of 3443 black Americans with three questions. He asked them to answer yes or no to whether they thought black Americans were happy, whether they personally were happy as a black American, and whether black Americans should be happy. Beckham did no formal statistical analysis of his data (Fisher's article containing the popularized version of the chi-square test was published only 7 years earlier in a statistics journal that would not have been read by psychologists). I love this study, because it demonstrates that you do not need elaborate methods to answer important and far-reaching questions; with just three questions, Beckham told the world an enormous amount about very real and important psychological and sociological phenomena. These are the data from that study. The data contains the following variables:

Source

www.discovr.rocks/csv/beckham_1929.csv

References


Big hairy spider data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

big_hairy_spider

Format

A tibble with 24 rows and 3 variables.

Details

Is arachnophobia (fear of spiders) specific to real spiders or will pictures of spiders evoke similar levels of anxiety? Twelve arachnophobes were asked to play with a big hairy tarantula with big fangs and an evil look in its eight eyes and at a different point in time were shown only photos of the same spider. The participants' anxiety was measured in each case. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/big_hairy_spider.csv


The biggest liar data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

biggest_liar

Format

A tibble with 68 rows and 4 variables.

Details

Fictional data based on the World's Biggest Liar competition held annually at the Santon Bridge Inn in Wasdale (in the Lake District, UK). Each year locals are encouraged to attempt to tell the biggest lie in the world. I wanted to test a theory that more creative people will be able to create taller tales. I gathered together 68 past contestants from this competition and noted where they were placed in the competition (first, second, third, etc.); I also gave them a creativity questionnaire (maximum score 60). The data set has four variables

Source

www.discovr.rocks/csv/biggest_liar.csv


Brave New World palette

Description

Colour palette based on Iron Maiden's Brave New World album sleeve.

Usage

bnw_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_bnw(n, type = "discrete", reverse = FALSE, ...)

scale_colour_bnw(n, type = "discrete", reverse = FALSE, ...)

scale_fill_bnw(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(bnw_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_bnw()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_bnw()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_bnw()

Bronstein et al. (2019) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

bronstein_2019

Format

A tibble with 947 rows and 5 variables

Details

The rapid increase in 'fake news' and misinformation is a worrying trend in recent years. Perhaps more worrying is how widely some of this news is taken as fact. Researchers have started to look at what characteristics predict susceptibility to fake news. Bronstein et al. (2019) hypothesised that delusion-prone individuals may be more likely to believe fake news because of their tendency to engage in less analytic and open-minded thinking. They conducted two online studies that got merged into a single analysis to test this hypothesis. This object is a subset of variables from their data (I have changed the variable names to match the constructs measured rather than the scales used to measure them). The full dataset is available at doi:10.1016/j.jarmac.2018.09.005.

Source

www.discovr.rocks/csv/bronstein_2019.csv

References


Bronstein et al. (2019) data with missing values inserted

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

bronstein_miss_2019

Format

A tibble with 947 rows and 5 variables

Details

A version of the Bronstein et al. (2019) fake news data (bronstein_2019) but with missing values inserted using MCAR amputation (with the help of the mice package and ampute() function). For details of variables see bronstein_2019.

Source

www.discovr.rocks/csv/bronstein_miss_2019.csv

References


Dancing cats data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

cat_dance

Format

A tibble with 200 rows and 2 variables.

Details

Fictional data about dancing cats. A researcher was interested in whether animals could be trained to dance. He took 200 cats and tried to train them to line-dance by giving them either food or affection as a reward for dance-like behaviour. At the end of the week he counted how many animals could line-dance and how many could not. The object contains the following variables:

Source

www.discovr.rocks/csv/cat_dance.csv


Cat regression data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

cat_reg

Format

A tibble with 200 rows and 7 variables.

Details

Fictional data illustrating how the chi-square test is a linear model. It's about line dancing cats. The object contains the following variables:

Source

www.discovr.rocks/csv/cat_regression.csv


Catterplot data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

catterplot

Format

A tibble with 78 rows and 2 variables.

Details

Fictional data for plotting a catterplot. The object contains the following variables:

Source

www.discovr.rocks/csv/catterplot.csv


Cetinkaya and Domjan (2006) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

cetinkaya_2006

Format

A tibble with 59 rows and 6 variables.

Details

Some quail develop fetishes. Really. In studies where a terrycloth object acts as a sign that a mate will shortly become available, some quail start to direct their sexuial behaviour towards the terrycloth object. In evolutionary terms, this fetishistic behaviour seems counterproductive because sexual behaviour becomes directed towards something that cannot provide reproductive success. However, perhaps this behaviour serves to prepare the organism for the 'real' mating behaviour.

Cetinkaya and Domjan (2006) sexually conditioned male quail. All quail experienced the terrycloth stimulus and an opportunity to mate, but for some the terrycloth stimulus immediately preceded the mating opportunity (paired group) whereas others experienced a 2-hour delay (this acted as a control group because the terrycloth stimulus did not predict a mating opportunity). In the paired group, quail were classified as fetishistic or not depending on whether they engaged in sexual behaviour with the terrycloth object.

During a test trial the quail mated with a female and the researchers measured the percentage of eggs fertilized, the time spent near the terrycloth object, the latency to initiate copulation, and copulatory efficiency. If this fetishistic behaviour provides an evolutionary advantage then we would expect the fetishistic quail to fertilize more eggs, initiate copulation faster and be more efficient in their copulations. These are the data from that study. The data contains the following variables:

Source

www.discovr.rocks/csv/cetinkaya_2006.csv

References


Chamorro-Premuzic, et al. (2008) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

chamorro_premuzic

Format

A tibble with 430 rows and 12 variables.

Details

There is some evidence that students tend to pick courses of lecturers they perceive to be enthusastic and good communicators. In a fascinating study, Tomas Chamorro-Premuzic and his colleagues (Chamorro-Premuzic, Furnham, Christopher, Garwood, & Martin, 2008) tested the hypothesis that students tend to like lecturers who are like themselves. The authors measured students' own personalities using a very well-established measure (the NEO-FFI) which measures five fundamental personality traits: neuroticism, extroversion, openness to experience, agreeableness and conscientiousness. Students also completed a questionnaire in which they were given descriptions (e.g., 'warm: friendly, warm, sociable, cheerful, affectionate, outgoing') and asked to rate how much they wanted to see this in a lecturer from -5 (I don't want this characteristic at all) through 0 (the characteristic is not important) to +5 (I really want this characteristic in my lecturer). The characteristics were the same as those measured by the NEO-FFI. As such, the authors had a measure of how much a student had each of the five core personality characteristics, but also a measure of how much they wanted to see those same characteristics in their lecturer. These are the data from that study. The data contains the following variables:

Source

www.discovr.rocks/csv/chamorro_premuzic.csv

References


Child aggression data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

child_aggression

Format

A tibble with 666 rows and 6 variables.

Details

A study was carried out to explore the relationship between aggression and several potential predicting factors in 666 children who had an older sibling. Variables measured were parenting_style (high score = bad parenting practices), computer_games (high score = more time spent playing computer games), television (high score = more time spent watching television), diet (high score = the child has a good diet low in harmful additives), and sibling_aggression (high score = more aggression seen in their older sibling). Past research indicated that parenting style and sibling aggression were good predictors of the level of aggression in the younger child. The data contain the following variables:

Source

www.discovr.rocks/csv/child_aggression.csv


Coldwell, Pike and Dunn (2006) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

coldwell_2006

Format

A tibble with 118 rows and 9 variables.

Details

Coldwell, Pike and Dunn (2006) investigated whether household chaos predicted children's problem behaviour over and above parenting. From 118 families they recorded the age and gender of the youngest child (child_age and child_gender). They measured dimensions of the child's perceived relationship with their mum: (1) warmth/enjoyment (child_warmth), and (2) anger/hostility (child_anger). Higher scores indicate more warmth/enjoyment and anger/hostility respectively. They measured the mum's perceived relationship with her child, resulting in dimensions of positivity (mum_pos) and negativity (mum_neg). Household chaos (chaos) was assessed. The outcome variable was the child's adjustment (sdq): the higher the score, the more problem behaviour the child was reported to be displaying. These data are from this study. The data contain the following variables:

Source

www.discovr.rocks/csv/coldwell_2006.csv


Cosmetic surgery data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

cosmetic

Format

A tibble with 1376 rows and 7 variables.

Details

Fictitious example based on quality of life predicted from undergoing cosmetic surgery. Cosmetic surgery is on the increase. For example, in the USA, there was a 1600% increase in cosmetic surgical and non-surgical treatments between 1992 and 2002. There are two main reasons to have cosmetic surgery: (1) to help a physical problem; and (2) to change your external appearance when there is no underlying physical pathology. This example uses fictitious data looks at the effects of cosmetic surgery on quality of life. The variables in the data are:

Source

www.discovr.rocks/csv/cosmetic.csv


Daniels (2012) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

daniels_2012

Format

A tibble with 4 rows and 7 variables.

Details

Women (and increasingly men) are bombared with 'idealized' images in the media and there is a growing concern about how these images affect our perceptions of ourselves. Daniels (2012) showed young women images of successful female athletes (e.g., Anna Kournikova) in which they were either playing sport (performance athlete images) or posing in bathing suits (sexualized images). Participants completed a short writing exercise after viewing these images. Each participant saw only one type of image, but several examples. Daniels then coded these written exercises and identified themes, one of which was whether women self-objectified (i.e., commented on their own appearance/attractiveness). Daniels hypothesized that women who viewed the sexualized images (n = 140) would self-objectify (i.e., this theme would be present in what they wrote) more than those who viewed the performance athlete pictures (n = 117, despite what the participants Section of the paper implies). These are the data from that study. The data contains the following variables:

Source

www.discovr.rocks/csv/daniels_2012.csv

References


Subliminal messages data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

dark_lord

Format

A tibble with 64 rows and 3 variables.

Details

Both Ozzy Osbourne and Judas Priest have been accused of putting backward masked messages on their albums that subliminally influence poor unsuspecting teenagers into doing things like blowing their heads off with shotguns. A psychologist was interested in whether backward masked messages could have an effect. He created a version of Taylor Swifts' 'Shake it off' that contained the masked message 'deliver your soul to the dark lord' repeated in the chorus. He took this version, and the original, and played one version (randomly) to a group of 32 people. Six months later he played them whatever version they hadn't heard the time before. So, each person heard both the original and the version with the masked message, but at different points in time. The psychologist measured the number of satanic intrusions the person had in the week after listening to each version. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/dark_lord.csv


Davey et al. (2003) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

davey_2003

Format

A tibble with 60 rows and 4 variables.

Details

Many of us have experienced that feeling after we have left the house of wondering whether we remembered to lock the door, close the window, or remove the bodies from the fridge in case the police turn up. However, some people with obsessive compulsive disorder (OCD) check things so excessively that they might, for example, take hours to leave the house. One theory is that this checking behaviour is caused by the mood you are in (positive or negative) interacting with the rules you use to decide when to stop a task (do you continue until you feel like stopping, or until you have done the task as best as you can?). Davey et al. (2003) tested this hypothesis by asking participants to think of as many things as they could that they should check before going on holiday (checks) after putting them into a negative, positive or neutral mood (mood). Within each mood group, half of the participants were instructed to generate as many items as they could, whereas the remainder were asked to generate items for as long as they felt like continuing the task (stop_rule). These are the data from that study. The data contains the following variables:

Source

www.discovr.rocks/csv/davey_2003.csv

References


DF beta data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

df_beta

Format

A tibble with 30 rows and 3 variables.

Details

Fictitious data to illustrate the DF Beta statistic. The tibble contains the following variables:

Source

www.discovr.rocks/csv/df_beta.csv


Dance of Death palette

Description

Colour palette based on Iron Maiden's Dance of Death album sleeve.

Usage

dod_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_dod(n, type = "discrete", reverse = FALSE, ...)

scale_colour_dod(n, type = "discrete", reverse = FALSE, ...)

scale_fill_dod(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(dod_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_dod()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_dod()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_dod()

Dog training data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

dog_training

Format

A tibble with 668 rows and 3 variables.

Details

Fictional data about dogs being trained to vocalize whenever they sniff an alien life form. Essentially dogs were trained using food rewards. One each trial they sniffed an alien and if they made a vocalization they were rewarded with food. This data shows how vocalisations change over blocks of these training trials. The tibble contains the following variables:

Source

www.discovr.rocks/csv/dog_training.csv


Download festival data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

download

Format

A tibble with 810 rows and 5 variables.

Details

Fictional data about people stinking at music festivals. A biologist was worried about the potential health effects of music festivals. She went to the Download Music Festival and measured the hygiene of 810 concert-goers over the three days of the festival. She tried to measure every person on every day but, because it was difficult to track people down, there were missing data on days 2 and 3. Hygiene was measured using a standardized technique that results in a score ranging between 0 (you smell like a corpse that's been left to rot up a skunk's arse) and 4 (you smell of sweet roses on a fresh spring day). I know from bitter experience that sanitation is not always great at these places and so the biologist predicted that personal hygiene would go down dramatically over the three days of the festival. The object contains the following variables:

Source

www.discovr.rocks/csv/download_festival.csv


Iron Maiden Spotify song features data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

eddiefy

Format

A tibble with 173 rows and 17 variables.

Details

A dataset containing the song features data from the Spotify API for the studio albums (190-2015) of the greatest band ever, Iron Maiden. Data were obtained using the spotifyr package.

Source

www.discovr.rocks/csv/eddiefy.csv


Eel data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

eel

Format

A tibble with 113 rows and 4 variables.

Details

Lo, Wong, Leung, Law, and Yip (2004) describe a case of a 50-year-old man who reported to the emergency department of a hospital with abdominal pain. An X-ray of the man's abdomen revealed the shadow of an eel. The patient claimed that he inserted the eel to 'relieve constipation'. I'm no medic, but this 'remedy' appears counterintuitive. However, it is an empirical question.

To test the hypothesis that an eel might cure constipation, we could do a randomized controlled trial. Our outcome variable would be 'cured' vs. 'not cured'. The main predictor variable would be the intervention condition (eel treatment arm vs. waiting list/no treatment arm). We might also factor in how many days the patient had been constipated before treatment (a proxy of symptom severity). The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/eel.csv

References


Elephant football data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

elephooty

Format

A tibble with 120 rows and 4 variables.

Details

Fictional data about elephant football. The highlight of the elephant calendar is the annual elephant soccer event in Nepal. A heated argument burns between the African and Asian elephants. In 2010, the president of the Asian Elephant Football Association, an elephant named Boji, claimed that Asian elephants were more talented than their African counterparts. The head of the African Elephant Soccer Association, an elephant called Tunc, issued a press statement that read 'I make it a matter of personal pride never to take seriously any remark made by something that looks like an enormous scrotum'. I was called in to settle things. I collected data from the two types of elephants (Asian or African) over a season and recorded how many goals each elephant scored and how many years of experience the elephant had. The data set has four variables:

Source

www.discovr.rocks/csv/elephooty.csv


Escape from inside data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

escape

Format

A tibble with 68 rows and 4 variables.

Details

In my teens I was in a band called Andromeda. I sang, we had a guitarist called Malcolm. We learnt several Queen and Iron Maiden songs and we were truly awful. Suffice it to say, you'd be hard pushed to recognize which Iron Maiden and Queen songs we were trying to play. It's common for bands to tire of cover versions and to get lofty ambitions to write their own tunes. I wrote one called ‘Escape From Inside’ about the film The Fly that contained the rhyming couplet of 'I am a fly, I want to die' – the great lyricists of the time quaked in their boots at the young new talent on the scene. The only thing we did that resembled the activities of a 'proper' band was to split up due to 'musical differences': Malcolm wanted to write 15-part symphonies about a boy's journey to worship electricity pylons, whereas I wanted to write songs about flies and dying (preferably both). When we could not agree on a musical direction the split became inevitable. Had I had the power of statistics in my hands back then, rather than split up we could have tested empirically the best musical direction for the band. This study imagines such a world. A study was conducted to see whether I wrote better songs than my old bandmate Malcolm, and whether this depended on the type of song (a symphony or song about flies). The outcome variable was the number of screams elicited by audience members during the songs.

Source

www.discovr.rocks/csv/escape.csv


Essay mark data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

essay_marks

Format

A tibble with 45 rows and 4 variables.

Details

Fictional data about essay marks. A student was interested in whether there was a positive relationship between the time spent doing an essay and the mark received. He got 45 of his friends and timed how long they spent writing an essay (hours) and the percentage they got in the essay (essay). He also translated these grades into their degree classifications (grade): in the UK, a student can get a first-class mark (the best), an upper-second-class mark, a lower second, a third, a pass or a fail (the worst). The data set has four variables

Source

www.discovr.rocks/csv/essay_marks.csv


Exam anxiety data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

exam_anxiety

Format

A tibble with 103 rows and 5 variables.

Details

A psychologist was interested in the effects of exam stress on exam performance. She devised and validated a questionnaire to assess state anxiety relating to exams (called the Exam Anxiety Questionnaire, or EAQ). This scale produced a measure of anxiety scored out of 100. Anxiety was measured before an exam, and the percentage mark of each student on the exam was used to assess the exam performance. These data are fictional. The fictional data contains the following variables:

Source

www.discovr.rocks/csv/exam_anxiety.csv


Field (2006) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

field_2006

Format

A tibble with 381 rows and 3 variables.

Details

Early in my career I looked at the effect of giving children information about entities. In one study (Field, 2006), I used three novel entities (the quoll, quokka and cuscus) and children were told threat information about one of the entities, positive information about another, and given no information about the third (our control). After the information I asked the children to place their hands in three wooden boxes each of which they believed contained one of the aforementioned entities The data from the study has three variables:

Source

www.discovr.rocks/csv/gallup_2003.csv

References

Field, A. P. (2006). The behavioral inhibition system and the verbal information pathway to children’s fears. Journal of Abnormal Psychology, 115, 742–752. doi:10.1037/0021-843x.115.4.742


The Final Frontier palette

Description

Colour palette based on Iron Maiden's The Final Frontier album sleeve.

Usage

frontier_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_frontier(n, type = "discrete", reverse = FALSE, ...)

scale_colour_frontier(n, type = "discrete", reverse = FALSE, ...)

scale_fill_frontier(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(frontier_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_frontier()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_frontier()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_frontier()

Gallup et al. (2003) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

gallup_2003

Format

A tibble with 15 rows and 3 variables.

Details

It's something of a wonder how evolution managed to produce such a monstrosity as the human penis. One theory is sperm competition: the human penis has an unusually large glans (the 'bell-end') compared to other primates, and this may have evolved so that the penis can displace seminal fluid from other males by 'scooping it out' during intercourse. Armed with various devices from Hollywood Exotic Novelties, an artificial vagina from California Exotic Novelties, and some water and cornstarch Gallup et al. (2003) put this theory to the test. They loaded the artificial vagina with 2.6 ml of fake sperm and inserted one of three female sex toys into it before withdrawing it: a control phallus that had no coronal ridge (i.e., no bell-end), a phallus with a minimal coronal ridge (small bell-end) and a phallus with a coronal ridge. They measured sperm displacement as a percentage: 100% means that all the sperm was displaced, and 0% means that none of the sperm was displaced. If the human penis evolved as a sperm displacement device then Gallup et al. predicted: (1) that having a bell-end would displace more sperm than not; and (2) that the phallus with the larger coronal ridge would displace more sperm than the phallus with the minimal coronal ridge. The data from the study has three variables:

Source

www.discovr.rocks/csv/gallup_2003.csv

References


Gelman & Weakliem (2009) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

gelman_2009

Format

A tibble with 548 rows and 3 variables.

Details

Apparently there are more beautiful women in the world than there are handsome men. Satoshi Kanazawa explains this finding in terms of good-looking parents being more likely to have a baby daughter as their first child than a baby son. Perhaps more controversially, he suggests that, from an evolutionarily perspective, beauty is a more valuable trait for women than for men (Kanazawa, 2007). In a playful and very informative paper, Andrew Gelman and David Weakliem discuss various statistical errors and misunderstandings, some of which have implications for Kanazawa's claims. The 'playful' part of the paper is that to illustrate their point they collected data on the 50 most beautiful celebrities (as listed by People magazine) of 1995-2000. They counted how many male and female children they had as of 2007. If Kanazawa is correct, these beautiful people would have produced more girls than boys. These are the data from that study. The data contains the following variables:

Source

www.discovr.rocks/csv/gelman_2009.csv

References


Glastonbury festival data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

glastonbury

Format

A tibble with 810 rows and 5 variables.

Details

More fictional data about people stinking at music festivals. The same biologist who was worried about the potential health effects of music festivals and collected data at a heavy metal festival (Download Festival), was worried that her findings might not generalize. To find out whether the type of music a person likes predicts whether hygiene decreases over the festival the biologist measured hygiene over the three days of the Glastonbury Music Festival, which has an eclectic clientele. Her hygiene measure ranged between 0 (you smell like you've bathed in sewage) and 4 (you smell like you've bathed in freshly baked bread). The biologist coded the festival-goer's musical affiliations into the categories 'hipster' (people who mainly like alternative music), 'metalhead' (people who like heavy metal), and 'raver' (people who like dance/ambient stuff). Anyone not falling into these categories was labelled 'no subculture'. The object contains the following variables:

Source

www.discovr.rocks/csv/glastonbury.csv


Beer goggles effect data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

goggles

Format

A tibble with 48 rows and 4 variables.

Details

Fictional data about the beer goggles effect. An anthropologist was interested in the effects of facial attractiveness on the beer-goggles effect. She randomly selected 48 participants. Participants were randomly subdivided into three groups of 16: (1) a placebo group drank 500 ml of alcohol-free beer; (2) a low-dose group drank 500 ml of average strength beer (4% ABV); and (3) a high-dose group drank 500 ml of strong beer (7% ABV). Within each group, half (n = 8) rated the attractiveness of 50 photos of unattractive faces on a scale from 0 (pass me a paper bag) to 10 (pass me their phone number) and the remaining half rated 50 photos of attractive faces. The outcome for each participant was their median rating across the 50 photos. The data set has four variables

Source

www.discovr.rocks/csv/goggles.csv


Beer goggles and lighting data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

goggles_lighting

Format

A tibble with 208 rows and 4 variables.

Details

Fictional data about the moderating effect of lighting on the beer goggles effect. In previous edxample we came across the beer-goggles which suggests that alcohol impairs judgements of facial attractiveness. In this fictional follow-up study a sample of 26 people are given doses of alcohol (0 pints, 2 pints, 4 pints and 6 pints of lager) over four different weeks. They are asked to rate a bunch of photos of faces in either dim or bright lighting. The outcome measure was the mean attractiveness rating (out of 100) of the faces and the predictors were the dose of alcohol and the lighting conditions The data set has four variables

Source

www.discovr.rocks/csv/goggles.csv


Grades data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

grades

Format

A tibble with 25 rows and 3 variables.

Details

Fictional data about stats grades. As a statistics lecturer I am interested in the factors that determine whether a student will do well on a statistics course. Imagine I took 25 students and looked at their grades for my statistics module at the end of their first year at university: first class, upper second class, lower second class, third class, pass and fail. I also asked these students what grade they got in their high school maths exams. In the UK GCSEs are school exams taken at age 16 that are graded A, B, C, D, E or F (an A grade is the best). The data set has three variables

Source

www.discovr.rocks/csv/grades.csv


Hangover cure data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

hangover

Format

A tibble with 15 rows and 4 variables.

Details

A marketing manager tested the benefit of soft drinks for curing hangovers. He took 15 people and got them drunk. The next morning as they awoke, dehydrated and feeling as though they'd licked a camel's sandy feet clean with their tongue, he gave five of them water to drink, five of them Lucozade (a very nice glucose-based UK drink) and the remaining five a leading brand of cola. He measured how well they felt (on a scale from 0 = I feel like death to 10 = I feel really full of beans and healthy) two hours later. He measured how drunk the person got the night before on a scale of 0 = as sober as a nun to 10 = flapping about like a haddock out of water on the floor in a puddle of their own vomit. These data are fictional. The object contains the following variables:

Source

www.discovr.rocks/csv/hangover.csv


Hiccups data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

hiccups

Format

A tibble with 60 rows and 3 variables.

Details

People have many methods for stopping hiccups (a surprise, holding your breath), and medical science has put its collective mind to the task too. The official treatment methods include tongue-pulling manoeuvres, massage of the carotid artery, and, believe it or not, digital rectal massage (Fesmire, 1988). Let's say we wanted to put digital rectal massage to the test (erm, as a cure for hiccups). We took 15 hiccup sufferers, and during a bout of hiccups administered each of the three procedures (in random order and at intervals of 5 minutes) after taking a baseline of how many hiccups they had per minute. We counted the number of hiccups in the minute after each procedure. These data are fictional. The object contains the following variables:

Source

www.discovr.rocks/csv/hiccups.csv


Hill et al. (2007) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

hill_2007

Format

A tibble with 503 rows and 4 variables.

Details

Hill et al. (2007) examined whether providing children with a leaflet based on the theory of planned behaviour increased their exercise. There were four different interventions (intervention): a control group, a leaflet, a leaflet and quiz, and a leaflet and a plan. A total of 503 children from 22 different classrooms were sampled (classroom). The 22 classrooms were randomly assigned to the four different conditions. Children were asked On average over the last three weeks, I have exercised energetically for at least 30 minutes ___ times per week after the intervention (post_exercise). The data from the study has three variables:

Source

www.discovr.rocks/csv/hill_2007.csv

References


Honesty lab data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

honesty_lab

Format

A tibble with 100 rows and 3 variables.

Details

Fictional data about the honesty lab. Imagine we were interested in how people evaluated dishonest acts. Participants evaluate the dishonesty of acts based on watching videos of people confessing to those acts. Imagine we took 100 people and showed them a random dishonest act described by the perpetrator. They then evaluated the honesty of the act (from 0 = appalling behaviour to 10 = it's OK really) and how much they liked the person (0 = not at all, 10 = a lot). The data set has three variables

Source

www.discovr.rocks/csv/honesty_lab.csv


Ice bucket challenge data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

ice_bucket

Format

A tibble with 23,230 rows and 1 variable.

Details

Google data relating to the ice bucket challenge from 2014. Golfer Chris Kennedy tipped a bucket of iced water on his head to raise awareness of the disease amyotrophic lateral sclerosis (ALS, also known as Lou Gehrig's disease). The idea is that you are challenged and have 24 hours to post a video of you having a bucket of iced water poured over your head in this video you also challenge at least three other people. If you fail to complete the challenge your forfeit is to donate to charity (in this case ALS). The CSV file contains the number of days after Chris Kennedy's initial ice bucket challenge that each of 2,323,452 ice bucket challenge video was uploaded to YouTube. The data here contains a randomly selected 1% of the original data (23,230 cases).

Source

www.discovr.rocks/csv/ice_bucket.csv


Iron Maiden palette

Description

Colour palette based on Iron Maiden's eponymous album sleeve.

Usage

im_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_im(n, type = "discrete", reverse = FALSE, ...)

scale_colour_im(n, type = "discrete", reverse = FALSE, ...)

scale_fill_im(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(im_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_im()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_im()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_im()

Cloak of invisibility data (pre-post design)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

invisibility_base

Format

A tibble with 80 rows and 4 variables.

Details

In invisibility_cloak we compared the number of mischievous acts in people who had invisibility cloaks to those without. Imagine we replicated that study, but changed the design so that we recorded the number of mischievous acts in these participants before the study began as well as during the study. The data contains the following variables:

Source

www.discovr.rocks/csv/invisibility_base.csv


Cloak of invisibility data (independent design)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

invisibility_cloak

Format

A tibble with 24 rows and 3 variables.

Details

I got very excited by two news stories implying that scientists had made Harry Potter's cloak of invisibility. Although the newspapers overstated the case, I imagined a future in which we have cloaks of invisibility to test out. Given my slightly mischievous streak, the future me is interested in the effect that wearing a cloak of invisibility has on the tendency for mischief. I take 24 participants and place them in an enclosed community. The community is riddled with hidden cameras so that we can record mischievous acts. Half of the participants are given cloaks of invisibility; they are told not to tell anyone else about their cloak and that they can wear it whenever they liked. I measure how many mischievous acts they performed in one week. The object contains the following variables:

Source

www.discovr.rocks/csv/invisibility.csv


Cloak of invisibility data (repeated measures design)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

invisibility_rm

Format

A tibble with 24 rows and 3 variables.

Details

I got very excited by two news stories implying that scientists had made Harry Potter's cloak of invisibility. Although the newspapers overstated the case, I imagined a future in which we have cloaks of invisibility to test out. Given my slightly mischievous streak, the future me is interested in the effect that wearing a cloak of invisibility has on the tendency for mischief. I take 12 participants and place them in an enclosed community. The community is riddled with hidden cameras so that we can record mischievous acts. For one week the participants are given cloaks of invisibility, during a different week they are not. I measure how many mischievous acts they performed in each week. These data are the same as in invisibility_cloak but arranged in a repeated measures design. The object contains the following variables:

Source

www.discovr.rocks/csv/invisibility_rm.csv


Jiminy Cricket data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

jiminy_cricket

Format

A tibble with 500 rows and 4 variables.

Details

Fictitious data inspired by my honeymoon at Disney in Orlando. The one blip in my tolerance of Disney, was their obsession with dreams coming true and wishing upon a star. Dreams are good, but a completely blinkered view that they'll come true without any work on your part is not. I think it highly unlikely that merely 'wishing upon a star' will make my dream come true. I wonder if the seismic increase in youth internalizing disorders (Twenge, 2000, 2011) is, in part, caused by millions of Disney children reaching the rather depressing realization that 'wishing upon a star' didn't work. Anyway, imagine that I collected some data from 250 people on their level of success using a composite measure involving their salary, quality of life and how closely their life matches their aspirations. This gave me a score from 0 (complete failure) to 100 (complete success). I then implemented an intervention: I told people that for the next 5 years they should either wish upon a star for their dreams to come true or work as hard as they could to make their dreams come true. I measured their success again 5 years later. People were randomly allocated to these two instructions. The data contains the following variables:

Source

www.discovr.rocks/csv/jiminy_cricket.csv


Johns et al. (2012) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

johns_2012

Format

A tibble with 160 rows and 4 variables.

Details

It is believed that males have a biological predispoition towards the colour red because it is sexually salient. The theory suggests that women use the colour red as a proxy signal for genital colour to indicate ovulation and sexual proceptivity. If this hypothesis is true then using the colour red in this way would have to attract men (otherwise it's a pointless strategy). In a novel study, Johns, Hargrave, and Newton-Fisher (2012) tested this idea by manipulating the colour of four pictures of female geneitalia to make them increasing shades of red (pale pink, light pink, dark pink, red). Heterosexual males rated the resulting 16 pictures from 0 (unattractive) to 100 (attractive). These are the data from that study. The data contains the following variables:

Source

www.discovr.rocks/csv/johns_2012.csv

References


Killers palette

Description

Colour palette based on Iron Maiden's killers album sleeve.

Usage

killers_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_killers(n, type = "discrete", reverse = FALSE, ...)

scale_colour_killers(n, type = "discrete", reverse = FALSE, ...)

scale_fill_killers(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(killers_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_killers()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_killers()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_killers()

Lambert et al. (2012) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

lambert_2012

Format

A tibble with 240 rows and 6 variables

Details

Lambert et al. (2012) found that pornography is related to infidelity. This object contains the data from that study.

Source

www.discovr.rocks/csv/lambert_2012.csv

References


Massar et al. (2012) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

massar_2012

Format

A tibble with 83 rows and 4 variables

Details

Everyone likes a good gossip from time to time, but apparently it has an evolutionary function. One school of thought is that gossip is used as a way to derogate sexual competitors – especially by questioning their appearance and sexual behaviour. Apparently men rate gossiped-about women as less attractive, and they are more influenced by the gossip if it came from a woman with a high mate value (i.e. attractive and sexually desirable). Karlijn Massar and her colleagues hypothesized that if this theory is true then (1) younger women will gossip more because there is more mate competation at younger ages; and (2) this relationship will be mediated by the mate value of the person (because for those with high mate value gossiping for the purpose of sexual competition will be more effective). These are the data from that study.

Eighty-three women aged from 20 to 50 (age) completed questionnaire measures of their tendency to gossip (gossip) and their sexual desirability (mate_value). Lambert et al. (2012) found that pornography is related to infidelity. This object contains the data from that study.

Source

www.discovr.rocks/csv/massar_2012.csv

References


McNulty et al. (2008) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

mcnulty_2008

Format

A tibble with 164 rows and 5 variables

Details

McNulty et al. (2008) found a relationship between a person's attractiveness and how much support they give their partner among newlywed heterosexual couples. These data simulate the results of that study. The object contains the following variables:

Source

www.discovr.rocks/csv/mcnulty_2008.csv

References


Are men like dogs data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

men_dogs

Format

A tibble with 40 rows and 3 variables.

Details

A psychologist was interested in the cross-species differences between men and dogs. She observed a group of dogs and a group of men in a naturalistic setting (20 of each). She classified several behaviours as being dog-like (urinating against trees and lampposts, attempts to copulate with anything that moved, and attempts to lick their own genitals). For each man and dog she counted the number of dog-like behaviours displayed in a 24-hour period. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/men_dogs.csv


Metal music and anger

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

metal

Format

A tibble with 90 rows and 4 variables.

Details

People have claimed that listening to heavy metal, because of its aggressive sonic palette and often violent or emotionally negative lyrics, leads to angry and aggressive behaviour. As a very non-violent metal fan this accusation bugs me (BTW there are some real data on this in sharman_2015). Imagine I designed a study to test this possibility. I took groups of self-classifying metalheads and non-metalheads (fan) and assigned them randomly to listen to 15 minutes of either the sound of an angle grinder scraping a sheet of metal (control noise), metal music, or pop music (soundtrack). Each person rated their anger on a scale ranging from 0 (All you need is love, da, da, da-da-da) to 100 (—- me, I'm all out of enemies). These data are fictitious.

Source

www.discovr.rocks/csv/metal.csv


Metal health

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

metal_health

Format

A tibble with 2506 rows and 2 variables.

Details

Lacourse et al. (2001) conducted a study to see whether suicide risk was related to listening to heavy metal music. They devised a scale to measure preference for bands falling into the category of heavy metal. This scale included heavy metal bands (Black Sabbath, Iron Maiden), speed metal bands (Slayer, Metallica), death/black metal bands (Obituary, Burzum) and gothic bands (Marilyn Manson, Sisters of Mercy). They then used this (and other variables) as predictors of suicide risk based on a scale measuring suicidal ideation etc. These data are from a fictitious replication. There are two variables representing scores on the scales described above:

Source

www.discovr.rocks/csv/metal_health.csv

References


Metallica data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

metallica

Format

A tibble with 7 rows and 9 variables.

Details

The data show various pieces of information about past and present members of the band Metallica that may or may not be accurate at the time of writing (2019). The data contains the following variables:

Source

www.discovr.rocks/csv/metallica.csv


Miller et al. (2007) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

miller_2007

Format

A tibble with 296 rows and 4 variables.

Details

Miller and colleagues (2007) tested the hidden-estrus theory, which suggests that unlike other female mammals, humans do not experience an estrus phase during which they are more sexually receptive, proceptive, selective and attractive. If this theory is wrong then human men should find women most attractive during the fertile phase of their menstrual cycle compared to the pre-fertile (menstrual) and post-fertile (luteal) phase. Miller used the tips obtained by dancers at a lap dancing club as a proxy for their sexual attractiveness and also recorded the phase of the dancer's menstrual cycle during a given shift, and whether they were using hormonal contraceptives. Dancers provided data from between 9 to 29 of their shifts.

Source

www.discovr.rocks/csv/miller_2007.csv

References


Imagery and advertising example

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

mixed_attitude

Format

A tibble with 180 rows and 5 variables

Details

A marketing researcher was interested in the effects of types of imagery (positive, negative or neutral) on perceptions of different types of drink (beer, wine, water). Participants viewed videos of different drink products in the context of positive, negative or neutral imagery and then rated the products on a scale from –100 (extremely dislike) through 0 (neutral) to 100 (extremely like). Those who identify as men and women might respond differently to the products, so participants self-reported their gender (a between-group variable). The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/speed_date.csv


Murder in the streets data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

murder

Format

A tibble with 36 rows and 3 variables.

Details

Fictitious data about murder. A sociologist wanted to compare murder rates (murder) each month in a year at three high-profile locations in London (street). The data contains the following variables:

Source

www.discovr.rocks/csv/murder.csv


Muris et al. (2008) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

muris_2008

Format

A tibble with 70 rows and 6 variables.

Details

Anxious people tend to interpret ambiguous information in a negative way. For example, being highly anxious myself, if I overheard a student saying "Andy Field's lectures are really different" I would assume that different meant rubbish, but it could also mean 'refreshing' or 'innovative'. Muris, Huijding, Mayer, and Hameetman (2008) addressed how these interpretational biases develop in children. Children imagined that they were astronauts who had discovered a new planet. They were given scenarios about their time on the planet (e.g., On the street, you encounter a spaceman. He has a toy handgun and he fires at you …) and the child had to decide whether a positive (You laugh: it is a water pistol and the weather is fine anyway) or negative (Oops, this hurts! The pistol produces a red beam which burns your skin!) outcome occurred. After each response the child was told whether their choice was correct. Half of the children were always told that the negative interpretation was correct, and the reminder were told that the positive interpretation was correct.

Over 30 scenarios children were trained to interpret their experiences on the planet as negative or positive. Muris et al. then measured interpretational biases in everyday life to see whether the training had created a bias to interpret things negatively. In doing so, they could ascertain whether children might learn interpretational biases through feedback (e.g., from parents).The data contains the following variables:

Source

www.discovr.rocks/csv/muris_2008.csv

References


Internet addiction scale (IAS) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

nichols_2004

Format

A tibble with 207 rows and 38 variables.

Details

The increasing populatrity (and usefulness) of the Internet has led to the serious problem of internet addiction. To research this construct it's helpful to be able to measure it, so Laura Nichols and Richard Nicki developed the Internet Addiction Scale, IAS (Nichols & Nicki, 2004). This 36-item questionnaire contains items such as I have stayed on the Internet longer than I intended to and My grades/work have suffered because of my Internet use to which responses are made on a five-point scale (never, rarely, sometimes, frequently, always). The authors dropped two items because they had low means and variances, and dropped three others because of relatively low correlations with other items. They performed a principal component analysis on the remaining 31 items (N = 207).

Source

www.discovr.rocks/csv/nichols_2004.csv

References


The Number of the Beast palette

Description

Colour palette based on Iron Maiden's The Number of the Beast album sleeve.

Usage

nob_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_nob(n, type = "discrete", reverse = FALSE, ...)

scale_colour_nob(n, type = "discrete", reverse = FALSE, ...)

scale_fill_nob(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(nob_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_nob()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_nob()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_nob()

The notebook data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

notebook

Format

A tibble with 40 rows and 3 variables.

Details

Fictitious data about the film The Notebook. Imagine that a film company director was interested in whether there was really such a thing as a 'chick flick' (a film that has the stereotype of appealing to women more than to men). He took 20 people who mostly self identify as men and 20 who mostly self identify as women and showed half of each sample a film that was supposed to be a 'chick flick' (The Notebook). The other half watched a documentary about notebooks as a control. In all cases the company director measured participants' arousal as an indicator of how much they enjoyed the film. The data contains the following variables:

Source

www.discovr.rocks/csv/notebook.csv


OCD data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

ocd

Format

A tibble with 30 rows and 4 variables.

Details

Fictitious data about interventions for obsessive compulsive disorder. Obsessive compulsive disorder (OCD) is a mental health problem characterized by intrusive images or thoughts that the sufferer finds abhorrent. These thoughts lead the sufferer to engage in activities to neutralize the unpleasantness of these thoughts (these activities can be mental or physical). A group of clinical psychologists were interested in the efficacy of two different interventions for OCD offered at their clinic: cognitive behaviour therapy (CBT) and behaviour therapy (BT). A group who were awaiting treatment acted as a control (a no treatment condition, NT). To gauge the success of therapy, the clinical psychologists measured two outcomes: the occurrence of obsession-related behaviours (actions) and the occurrence of obsession-related cognitions (thoughts) on a single day. Service users were randomly assigned to group 1 (CBT), group 2 (BT) or group 3 (NT). The data contains the following variables:

Source

www.discovr.rocks/csv/ocd.csv


Colourblind-friendly palette

Description

Colour palette based on Color Universal Design by Okabe and Ito https://jfly.uni-koeln.de/color/.

Usage

okabe_ito_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_oi(n, type = "discrete", reverse = FALSE, ...)

scale_colour_oi(n, type = "discrete", reverse = FALSE, ...)

scale_fill_oi(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(okabe_ito_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_oi()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_oi()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_oi()

Ong et al. (2011) data: wide/messy format

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

ong_2011

Format

A tibble with 275 rows and 12 variables.

Details

A study by Ong et al., (2011) examining the relationship between a person's narcissism and other people's ratings of their profile picture on Facebook. The pictures were rated on each of four dimensions: coolness, glamour, fashionableness, and attractiveness. In addition, each person was measures on introversion/extroversion and narcissism. These data are in messy/wide format. The data contains the following variables:

Source

www.discovr.rocks/csv/ong_2011.csv

References


Ong et al. (2011) data: tidy format

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

ong_tidy

Format

A tibble with 1100 rows and 9 variables.

Details

A study by Ong et al., (2011) examining the relationship between a person's narcissism and other people's ratings of their profile picture on Facebook. The pictures were rated on each of four dimensions: coolness, glamour, fashionableness, and attractiveness. In addition, each person was measures on introversion/extroversion and narcissism. These data are in tidy format. The data contains the following variables:

Source

www.discovr.rocks/csv/ong_2011_tidy.csv

References


Penalty kicks data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

penalty

Format

A tibble with 75 rows and 5 variables.

Details

Fictional data set looking at predictors of success of penalty takers in soccer (or whatever sport you enjoy). The outcome variable is whether a penalty is scored or missed. Based on (imaginary) past research there are two factors that reliably predict whether a penalty kick will be missed or scored: (1) the extent to which the penalty taker is prone to worry (measured using the Penn State Worry Questionnaire, PSWQ); and (2) the past success rate of the penalty taker. State anxiety is also likely detrimental effects on performance so it was also measured. The data contain the following variables:

Source

www.discovr.rocks/csv/penalty.csv


Piece of Mind palette

Description

Colour palette based on Iron Maiden's Piece of Mind album sleeve.

Usage

pom_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_pom(n, type = "discrete", reverse = FALSE, ...)

scale_colour_pom(n, type = "discrete", reverse = FALSE, ...)

scale_fill_pom(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(pom_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_pom()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_pom()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_pom()

Powerslave palette

Description

Colour palette based on Iron Maiden's Powerslave album sleeve.

Usage

power_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_power(n, type = "discrete", reverse = FALSE, ...)

scale_colour_power(n, type = "discrete", reverse = FALSE, ...)

scale_fill_power(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(power_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_power()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_power()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_power()

No Prayer for the Dying palette

Description

Colour palette based on Iron Maiden's No Prayer for the Dying album sleeve.

Usage

prayer_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_prayer(n, type = "discrete", reverse = FALSE, ...)

scale_colour_prayer(n, type = "discrete", reverse = FALSE, ...)

scale_fill_prayer(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(prayer_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_prayer()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_prayer()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_prayer()

Profile picture data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

profile_pic

Format

A tibble with 80 rows and 4 variables.

Details

A researcher was interested in the effect of profile pictures on social media on unsolicited attention. She took 40 people who had profiles on a social networking website; 17 of them had a relationship status of 'single' and the remaining 23 had their status as 'in a relationship'. We asked these people to set their profile picture to a photo of them on their own (alone) and to count how many friend request they got from random strangers over 3 weeks, then to switch it to a photo of them very obviously as part of a romantic couple and record their friend requests from random strangers over 3 weeks. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/profile_pic.csv


Pub data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

pubs

Format

A tibble with 8 rows and 2 variables.

Details

Data illustrating the difference between an outlier and an influencial case. The data came to me via David Hitchin, and he in turn got it from Dr Richard Roberts. I have no idea whether it's real or fictitious. The tibble contains the following variables:

Source

www.discovr.rocks/csv/pubs.csv


Puppy therapy data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

puppies

Format

A tibble with 15 rows and 3 variables.

Details

Despite the increase in puppies on my campus (which can only be a good thing) to reduce stress, the evidence base is pretty mixed. Imagine we wanted to contribute to this literature by running a study in which we randomized people into three groups (dose): (1) a control group, which could be a treatment as usual, a no treatment (no puppies) or ideally some kind of placebo group (we could give people in this group a cat disguised as a dog); (2) 15 minutes of puppy therapy (a low-dose group); and (3) 30 minutes of puppy contact (a high-dose group). The dependent variable was a measure of happiness ranging from 0 (as unhappy as I can possibly imagine) to 10 (as happy as I can possibly imagine). The design of this study mimics a very simple randomized controlled trial (as used in pharmacological, medical and psychological intervention trials) because people are randomized into a control group or groups containing the active intervention (in this case puppies, but in other cases a drug or a surgical procedure). The tibble contains the following variables:

Source

www.discovr.rocks/csv/puppies.csv


More puppy therapy data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

puppy_love

Format

A tibble with 30 rows and 4 variables.

Details

The researchers who conducted the puppy therapy study in puppies suddenly realized that a participant's love of dogs would affect whether puppy therapy would affect happiness. Therefore, they repeated the study on different participants, but included a self-report measure of love of puppies from 0 (I am a weird person who hates puppies, please be deeply suspicious of me) to 7 (puppies are the best thing ever, one day I might marry one). The tibble contains the following variables:

Source

www.discovr.rocks/csv/puppy_love.csv


R exam data data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

r_exam

Format

A tibble with 100 rows and 6 variables.

Details

Fictitious data relating to an R exam at two universities. The tibble contains the following variables:

Source

www.discovr.rocks/csv/r_exam.csv


R Anxiety Questionnaire (RAQ)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

raq

Format

A tibble with 2,571 rows and 24 variables.

Details

Fictitious data relating to a fictional questionnaire about R anxiety. I can't stress enough how fictional this example is. Like, don't email me for the questionnaire the whole thing is figment of my mind (and some data simulation). I thought this would be obvious from the questions, but apparently not. Imagine that I wanted to design a questionnaire to measure a trait that I termed 'R anxiety'. I devised a questionnaire to measure various aspects of students' anxiety towards learning R, the RAQ. I generated (in my imagination) questions based on interviews (that never happened in real life) with anxious and non-anxious students and came up with 23 possible questions to include. Each question was a statement followed by a five-point Likert scale: strongly disagree = 1, disagree, neither agree nor disagree, agree and strongly agree (SD, D, N, A and SA respectively). What's more, I wanted to know whether anxiety about R could be broken down into specific forms of anxiety. In other words, what latent variables contribute to anxiety about R?

With a little help from a few lecturer friends (this never happened in real life) I collected 2571 completed questionnaires. The data are stored in this object with 2,571 rows and 24 columns.

Source

www.discovr.rocks/csv/raq.csv


Reality TV example

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

reality_tv

Format

A tibble with 32 rows and 4 variables

Details

A researcher hypothesized that reality TV show contestants start off with personality disorders that are exacerbated by being forced to spend time with people as attention-seeking as them. To test this hypothesis, she gave eight contestants a questionnaire measuring personality disorders before and after they entered the show. A second group of eight people were given the questionnaires at the same time; these people were short-listed to go on the show, but never did. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/speed_date.csv


Roaming cats data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

roaming_cats

Format

A tibble with 60 rows and 4 variables.

Details

Fictional data about roaming cats. I was interested in the relationship between the sex of a cat and how much time it spent away from home. I had heard that male cats disappeared for substantial amounts of time on long-distance roams around the neighbourhood (something about hormones driving them to find mates) whereas female cats tended to be more homebound. The data set has four variables

Source

www.discovr.rocks/csv/roaming_cats.csv


Roaming cats data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

rollercoaster

Format

A tibble with 20 rows and 3 variables.

Details

Fictional data based on a study by Meston & Frohlich (2003) that showed that heterosexual people rate a picture of someone of the opposite sex as more attractive after riding a roller-coaster compared to before. Imagine we took 20 people as they came off the Rockit roller-coaster at Universal studios in Orlando and asked them to rate the attractiveness of people in a series of photographs on a scale of 0 (looks like Jabba the Hut) to 10 (looks like Princess Leia or Han Solo). The mean of their attractiveness ratings was the outcome. We also recorded their fear during the ride using a device that collates various indicators of physiological arousal and returns a value from 0, chill, to 10, terrified. This variable is the predictor. The prediction was that fear would be positively associated with ratings of attractiveness.

Source

www.discovr.rocks/csv/rollercoaster.csv


Self-help book data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

santas_log

Format

A tibble with 400 rows and 4 variables.

Details

Let's begin with a Christmas tale. A year ago Santa was resting in his workshop studying his nice and naughty lists. He noticed a name on the naughty list in bold, upper case letters. It said ANDY FIELD OF UNIVERSITY OF SUSSEX. He went to look up the file of this Andy Field character. He stared into his snow globe, and as the mists cleared he saw a sad, lonely, friend-less character walking across campus. Under one arm a box of chocolates, under the other a small pink Hippo. As he walked the campus he enticed the young students around him to follow him by offering chocolate. Like the Pied Piper, he led them to a large hall. Once inside, the boys and girls' eyes glistened in anticipation of more chocolate. Instead he unleashed a monologue about the general linear model of such fearsome tedium that Santa began to wonder how anyone could have grown to be so soulless and cruel.

Santa dusted off his sleigh and whizzed through the night sky to the Sussex campus. Once there he confronted the evil fiend that he had seen in his globe. "You've been a naughty boy," he said. "I give you a choice. Give up teaching statistics, or I will be forced to let the Krampus pay you a visit."

Andy looked sad, "But I love statistics," he said to Santa, "It's cool."

Santa pulled out a candy cane, from it emerged a screen. Just as he was about to instruct the screen to call the Krampus, an incoming message appeared: some presents had not been delivered last Christmas!

What was Santa to do? How could he find out what determines whether presents get delivered or not? He panicked.

Just then, Santa heard a sad little voice. It said, "I can help you".

"How? replied Santa.

"My students," he replied, "they can save Christmas. All they need are some data."

With that, Santa looked into his candy screen at the elves who had called him, and turned to Andy. "Tell them what you need."

Andy discovered that to deliver presents Santa uses a large team of elves, and that at each house they usually consume treats. The treats might be Christmas pudding, or sometimes mulled wine. He also discovered that they consume different quantities. Sometimes nothing is left, but other times there might be 1, 2, 3 or even 4 pieces of pudding or glasses of mulled wine. The Elves transmitted a log of 400 of the previous year's deliveries. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/santas_log.csv


Self-help book data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

self_help

Format

A tibble with 20 rows and 3 variables.

Details

'Pop psychology' books sometimes spout nonsense that is unsubstantiated by science. I took 20 people in relationships and randomly assigned them to one of two groups. One group read the famous popular psychology book Women are from Bras and men are from Penis, and the other read Marie Claire. The outcome variable was their relationship happiness after their assigned reading. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/self_help.csv


Self-help book vs statistics book data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

self_help_dsur

Format

A tibble with 1000 rows and 3 variables.

Details

Twaddle and Sons, the publishers of Women are from Bras and men are from Penis, were upset about my claims that their book was as useful as a paper umbrella. They ran their own experiment (N = 500) in which relationship happiness was measured after participants had read their book and after reading the book you are currently reading. (Participants read the books in counterbalanced order with a six-month delay.) The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/self_help_dsur.csv


Senjutsu palette

Description

Colour palette based on Iron Maiden's Senjutsu album inner gatefold sleeve.

Usage

senjutsu_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_senjutsu(n, type = "discrete", reverse = FALSE, ...)

scale_colour_senjutsu(n, type = "discrete", reverse = FALSE, ...)

scale_fill_senjutsu(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(senjutsu_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_senjutsu()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_senjutsu()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_senjutsu()

Sharman & Dingle (2015) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

sharman_2015

Format

A tibble with 117 rows and 4 variables.

Details

There's a perception that listening to extreme music causes anger and associated behavioural problems. As an avid Metal fan and fairly non-angry type of person this sterotype bothers me. Luckily science has come to the rescue. Sharman & Dingle (2015) tested 39 fans of extreme music (metal). Their heart rate was measured at baseline, during a subsequent anger induction and while subsequently listening to music of their choice (which included a lot of bands listed at various point in the acknowledgements of my books). They collected subjective measures too, but this data file contains only the heart rate data from the study.

Source

www.discovr.rocks/csv/sharman_2015.csv

References


Shopping and exercise data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

shopping

Format

A tibble with 10 rows and 3 variables.

Details

According to some highly unscientific research done by a UK department store chain and reported in Marie Claire magazine, shopping is good for you. They found that the average woman spends 150 minutes and walks 2.6 miles when she shops, burning off around 385 calories. In contrast, men spend only about 50 minutes shopping, covering 1.5 miles. This was based on strapping a pedometer on a mere 10 participants. Although I don't have the actual data, some simulated data based on these means are in this file.

Source

www.discovr.rocks/csv/shopping_exercise.csv


Somewhere in Time palette

Description

Colour palette based on Iron Maiden's Somewhere in Time album sleeve.

Usage

sit_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_sit(n, type = "discrete", reverse = FALSE, ...)

scale_colour_sit(n, type = "discrete", reverse = FALSE, ...)

scale_fill_sit(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(sit_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_sit()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_sit()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_sit()

Sniffer dogs

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

sniffer_dogs

Format

A tibble with 32 rows and 3 variables.

Details

When the alien invasion comes we'll need spaniels (or possibly other dogs, but lets hope its mainly spaniels because spaniels are cool) to help us to identify the space lizards. The top-secret government agency for Training Extra-terrestrial Reptile Detection (TERD) was put together to test the plausibility of training sniffer dogs to detect aliens. Over many trials 8 of their best dogs (Milton, Woofy, Ramsey, Mr. Snifficus III, Willock, The Venerable Dr. Waggy, Lord Scenticle, and Professor Nose) were recruited for a pilot study. During training, these dogs were rewarded for making vocalizations while sniffing alien space lizards (which they happened to have a few of in Hangar 18). On the test trial, the 8 dogs were allowed to sniff 4 entities for 1-minute each: an alien space lizard, a shapeshifting alien space lizard who had taken on humanoid form and worked undetected as a statistics lecturer, a human, and a human mannequin). The number of vocalizations made during each 1-minute sniffing session was recorded. For more alien lizard and sniffer dog adventures see alien_scents.

Source

www.discovr.rocks/csv/sniffer_dogs.csv


Social anxiety data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

social_anxiety

Format

A tibble with 134 rows and 4 variables.

Details

Anxiety disorders take on different shapes and forms, and each disorder is believed to be distinct and have unique causes. We can summarize the disorders and some popular theories as follows:

Social anxiety and obsessive compulsive disorder are seen as distinct disorders having different causes. However, there are some similarities. They both involve some kind of attentional bias: attention to bodily sensation in social anxiety and attention to things that could have negative consequences in OCD. They both involve repetitive thinking styles: social phobics ruminate about social encounters after the event (known as post-event processing), and people with OCD have recurring intrusive thoughts and images. They both involve safety behaviours (i.e. trying to avoid the thing that makes you anxious).

This might lead us to think that, rather than being different disorders, they are manifestations of the same core processes (Field & Cartwright-Hatton, 2008). One way to research this possibility would be to see whether social anxiety can be predicted from measures of other anxiety disorders. If social anxiety disorder and OCD are distinct we should expect that measures of OCD will not predict social anxiety. However, if there are core processes underlying all anxiety disorders, then measures of OCD should predict social anxiety. The data contains three variables:

Source

www.discovr.rocks/csv/social_anxiety.csv

References


Social media and grammar data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

social_media

Format

A tibble with 100 rows and 4 variables.

Details

Imagine we conducted an experiment in which a group of 25 people were encouraged to message their friends and post on social media using their mobiles over a six-month period. A second group of 25 people were banned from messaging and social media for the same period by being given armbands that administered painful shocks in the presence of microwaves (like those emitted from phones). The outcome was a percentage score on a grammatical test that was administered both before and after the intervention. The first independent variable was, therefore, social media use (encouraged or banned) and the second was the time at which grammatical ability was assessed (baseline or after 6 months). These data are fictional. The object contains the following variables:

Source

www.discovr.rocks/csv/social_media.csv


Soya and sperm counts data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

soya

Format

A tibble with 80 rows and 3 variables.

Details

I read a story in a newspaper (yes, back when they existed) claiming that the chemical genistein, which is naturally occurring in soya, was linked to lowered sperm counts in Western males. When you read the actual study, it had been conducted on rats, it found no link to lowered sperm counts, but there was evidence of abnormal sexual development in male rats (probably because genistein acts like oestrogen). As journalists tend to do, a study showing no link between soya and sperm counts was used as the scientific basis for an article about soya being the cause of declining sperm counts in Western males. Imagine the rat study was enough for us to want to test this idea in humans. We recruit 80 males and split them into four groups that vary in the number of soya 'meals' (a dinner containing 75g of soya) they ate per week over a year: no soya meals (i.e., none in the whole year), one per week (52 over the year), four per week (208 over the year), and seven per week (364 over the year). At the end of the year, participants produced some sperm that I could count (when I say 'I', I mean someone else in a laboratory as far away from me as humanly possible). The fictitious data contain the following variables:

Source

www.discovr.rocks/csv/soya.csv


Speed dating data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

speed_date

Format

A tibble with 180 rows and 5 variables

Details

Imagine a scientist designed a study to look at the interplay between looks, personality and dating strategies on evaluations of a date. She set up a speed-dating night with 9 tables at which there sat a 'date'. All the dates were stooges selected to vary in their attractiveness (high, average and low), their personality (high charisma, average charisma, writes statistics books), and also the strategy they were told to employ during the conversation (normal or playing hard to get). The dates were trained before the study to act charismatically to varying degrees, and also how to act in a way that made them seem unobtainable (hard to get) or not. As such, across the nine dates/stooges there were three 'high attractive' people one of whom acted charismatically, one who acted normally (average) and another who acted with low charisma, likewise for the three average looking dates and the three low attractiveness dates. Therefore, each participant attending a speed-dating night would be exposed to all combinations of attractiveness and charisma (these are repeated measures).

Upon arrival participants were randomly assigned a blue or red sticker. For the participants with the red sticker the stooges played hard to get (unobtainable) and for those with a blue sticker they acted normally. Over the course a few nights 20 people attended, spent 5-minutes with each of the 9 'dates' and then rated how much they'd like to have a proper date with the person as a percentage (100% = 'I'd pay large sums of money for their phone number', 0% = 'I'd pay a large sum of money for a plane ticket to get me as far away from them as possible'). The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/speed_date.csv


Seventh Son of a Seventh Son palette

Description

Colour palette based on Iron Maiden's Seventh Son of a Seventh Son album sleeve.

Usage

ssoass_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_ssoass(n, type = "discrete", reverse = FALSE, ...)

scale_colour_ssoass(n, type = "discrete", reverse = FALSE, ...)

scale_fill_ssoass(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(ssoass_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_ssoass()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_ssoass()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_ssoass()

Stalking therapy

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

stalker

Format

A tibble with 50 rows and 4 variables.

Details

Some fictional data about therapy for stalking. A few years back I was stalked. You'd think they could have found someone a bit more interesting to stalk, but apparently times were hard. It could have been a lot worse, but it wasn't particularly pleasant. I imagined a world in which a psychologist tried two different therapies on different groups of stalkers (25 stalkers in each treatment). To the first group he gave cruel-to-be-kind therapy (every time the stalkers followed him around, or sent him a letter, the psychologist attacked them with a cattle prod). The second therapy was psychodyshamic therapy, in which stalkers were hypnotized and regressed into their childhood to discuss their penis (or lack of penis), their father's penis, their dog's penis, the seventh penis of a seventh penis, and any other penis that sprang to mind. The psychologist measured the number of hours stalking in one week both before (stalk_pre) and after (stalk_post) treatment.The object contains the following variables:

Source

www.discovr.rocks/csv/stalker.csv


Students and lecturers data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

students

Format

A tibble with 10 rows and 7 variables.

Details

Some fictional data about students and lecturers. The object contains the following variables:

Source

www.discovr.rocks/csv/students.csv


Superhero data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

superhero

Format

A tibble with 30 rows and 3 variables.

Details

Children wearing superhero costumes are more likely to harm themselves because of the unrealistic impression of invincibility that these costumes could create. For example, children have reported to hospital with severe injuries because of trying 'to initiate flight without having planned for landing strategies' (Davies, Surridge, Hole, & Munro-Davies, 2007). I can relate to the imagined power that a costume bestows upon you; indeed, I have been known to dress up as Fisher by donning a beard and glasses and trailing a goat around on a lead in the hope that it might make me more knowledgeable about statistics. These fictional data contain the severity of injury (on a scale from 0, no injury, to 100, death) for children reporting to the accident and emergency department at hospitals, and information on which superhero costume they were wearing (hero): Spiderman, Superman, the Hulk or a teenage mutant ninja turtle. The fictitious data contain the following variables:

Source

www.discovr.rocks/csv/superhero.csv


Supermodel data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

supermodel

Format

A tibble with 231 rows and 4 variables.

Details

A fashion student was interested in factors that predicted the salaries of male and female catwalk models. She collected data from 231 models (supermodel.csv). For each model she asked them their salary per day (salary), their age (age), their length of experience as models (years), and their industry status as a model as their percentile position rated by a panel of experts (status). The fictitious data contain the following variables:

Source

www.discovr.rocks/csv/supermodel.csv


Switch: games console injuries

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

switch

Format

A tibble with 120 rows and 5 variables.

Details

Fictional data about injuries while playing video games on a console. There are reports of increases in injuries related to playing games consoles. These injuries were attributed mainly to muscle and tendon strains. A researcher hypothesized that a stretching warm-up before playing games would help lower injuries, and that athletes would be less susceptible to injuries because their regular activity makes them more flexible. She took 60 athletes and 60 non-athletes (athlete); half of them played on a Nintendo Switch and half watched others playing as a control (switch), and within these groups half did a 5-minute stretch routine before playing/watching whereas the other half did not (stretch). The outcome was a pain score out of 10 (where 0 is no pain, and 10 is severe pain) after playing for 4 hours (injury).

Source

www.discovr.rocks/csv/switch.csv


Tablet sales data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

tablets

Format

A tibble with 240 rows and 4 variables.

Details

A company owner was interested in how to make his brand of (computer) tablets more desirable. He collected data on how cool people perceived a product's advertising to be, how cool they thought the product was, and how desirable they found the product. Am I showing my age by using the word 'cool'? The fictitious data contain the following variables:

Source

www.discovr.rocks/csv/tablets.csv


Tea data (small sample)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

tea_15

Format

A tibble with 15 rows and 3 variables:

Details

One of my favourite activities, especially when trying to do brain-melting things like writing statistics books, is drinking tea. I am English, after all. Fortunately, tea improves your cognitive function – well, it does in old Chinese people at any rate (Feng, Gwee, Kua, & Ng, 2010). I may not be Chinese and I'm not that old, but I nevertheless enjoy the idea that tea might help me think. Here are some (fictional) data based on Feng et al.'s study that measured the number of cups of tea drunk per day and cognitive functioning (out of 80) in 15 people.

Source

www.discovr.rocks/csv/tea_makes_you_brainy_15.csv

References


Tea data (large sample)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

tea_716

Format

A tibble with 716 rows and 3 variables:

Details

One of my favourite activities, especially when trying to do brain-melting things like writing statistics books, is drinking tea. I am English, after all. Fortunately, tea improves your cognitive function – well, it does in old Chinese people at any rate (Feng, Gwee, Kua, & Ng, 2010). I may not be Chinese and I'm not that old, but I nevertheless enjoy the idea that tea might help me think. Here are some (fictional) data based on Feng et al.'s study that measured the number of cups of tea drunk per day and cognitive functioning (out of 80) in 716 people.

Source

www.discovr.rocks/csv/tea_makes_you_brainy_716.csv

References


Method of teaching data (3 groups)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

teach_method

Format

A tibble with 30 rows and 3 variables.

Details

To test how different teaching methods affected students' knowledge I took three statistics modules where I taught the same material. For one module I wandered around with a large cane and beat anyone who asked daft questions or got questions wrong (punish). In the second I encouraged students to discuss things that they found difficult and gave anyone working hard a nice sweet (reward). In the final course I neither punished nor rewarded students' efforts (indifferent). I measured the students' exam marks (percentage). This fictional data contains the following variables

Source

www.discovr.rocks/csv/teach_method.csv


Method of teaching data (2 groups)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

teaching

Format

A tibble with 20 rows and 3 variables.

Details

The data show the score (out of 20) for 20 different students, some of whom are biologically male and others biologically female, and some of whom were taught using positive reinforcement (being nice) and others who were taught using punishment (electric shock)

Source

www.discovr.rocks/csv/teaching.csv


Messaging apps and grammar example

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

text_messages

Format

A tibble with 100 rows and 4 variables

Details

Text messaging and Twitter encourage communication using abbreviated forms of words (if u no wat I mean). A researcher wanted to see the effect this had on children’s understanding of grammar. One group of 25 children was encouraged to send text messages on their mobile phones over a 6-month period. A second group of 25 was forbidden from sending text messages for the same period (to ensure adherence, this group were given armbands that administered painful shocks in the presence of a phone signal). The outcome was a score on a grammatical test (as a percentage) that was measured both before and after the experiment. The (fictional) data contains the following variables:

Source

www.discovr.rocks/csv/speed_date.csv


Tol muted palette

Description

Colour palette used in the book based on Paul Tol's muted palette https://sronpersonalpages.nl/~pault/data/colourschemes.pdf.

Usage

tol_muted_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_tol(n, type = "discrete", reverse = FALSE, ...)

scale_colour_tol(n, type = "discrete", reverse = FALSE, ...)

scale_fill_tol(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colors

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(tol_muted_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_tol()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_tol()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_tol()

The Teaching of Statistics for Scientific Experiments—Revised (TOSSE-R) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

tosser

Format

A tibble with 239 rows and 29 variables.

Details

Fictitious data relating to a fictional questionnaire about The Teaching of Statistics for Scientific Experiments. Again, I stress that this example is fictional. I thought the name of the questionnaire would give it away, I mean, no-one is calling a questionnaire TOSSER are they? Don't email me for the questionnaire, it's all made up, you definitley don't want to base your research upon it. Imagine I wanted to revise the 'Teaching of Statistics for Scientific Experiments' (TOSSE) questionnaire, which is (I mean, it isn't because I made it up) based on Bland's theory that says that good research methods lecturers should have: (1) a profound love of statistics; (2) an enthusiasm for experimental design; (3) a love of teaching; and (4) a complete absence of normal interpersonal skills. These characteristics should be related (i.e., correlated). The revised version of this questionnaire (TOSSE – R) was given to 239 research methods lecturers to see if it supported Bland's theory. Each question was a statement followed by a five-point Likert scale: strongly disagree = 1, disagree, neither agree nor disagree, agree and strongly agree (SD, D, N, A and SA respectively). The data contains the following variables

Source

www.discovr.rocks/csv/tosser.csv


Tuk et al. (2011) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

tuk_2011

Format

A tibble with 102 rows and 3 variables

Details

Visceral factors that require us to engage in self control (such as a filling bladder) can affect our inhibtory abilities in unrelated domains. In a fascinating study by Tuk, Trampe, and Warlop (2011) participants were given five cups of water: one group was asked to drink them all, whereas another was asked to take a sip from each. This manipulation led one group to have full bladders and the other group relatively empty (urgency). Later on, these participants were given eight trials on which they had to choose between a small financial reward that they would receive soon (SS) or a large financial reward for which they would wait longer (LL). They counted how many trials participants choose the LL reward as an indicator of inhibitory control (ll_sum). The data contains three variables:

Source

www.discovr.rocks/csv/tuk_2011.csv

References


Mobile phone use and brain tumour data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

tumour

Format

A tibble with 102 rows and 3 variables

Details

Mobile phones emit microwaves, and so holding one next to your brain for large parts of the day is a bit like sticking your brain in a microwave oven and pushing the 'cook until well done' button. If we wanted to test this experimentally, we could get six groups of people and strap a mobile phone on their heads, then by remote control turn the phones on for a certain amount of time each day. After six months, we measure the size of any tumour (in mm^3) close to the site of the phone antenna (just behind the ear). The six groups experienced 0, 1, 2, 3, 4 or 5 hours per day of phone microwaves for six months. The fictitious data contains three variables:

Source

www.discovr.rocks/csv/tumour.csv


Tutor marking data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

tutor_marks

Format

A tibble with 32 rows and 3 variables.

Details

It is common that lecturers obtain reputations for being ‘hard’ or ‘light’ markers, but there is often little to substantiate these reputations. A group of students investigated the consistency of marking by submitting the same essays to four different lecturers. The outcome was the percentage mark given by each lecturer and the predictor was the lecturer who marked the report. The fictitious data contains three variables:

Source

www.discovr.rocks/csv/tutor_marks.csv


Van Bourg et al. (2020) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

van_bourg_2020

Format

A tibble with 201 rows and 6 variables.

Details

Pet dogs often engage in behaviours helpful to their owners (mine likes to cuddle me when I’ve had a bad day, and in fact when I’ve had a good day, and now I think of it, pretty much any day regardless of how good or bad its been). It’s unclear whether these behaviours are truly prosocial. Can a dog engage in prosocial behaviours that haven’t been explicitly trained? Bourg et al (2020) addressed this question by trapping some dog’s owners in boxes! In the study 60 dogs were tested in three conditions all of which involved being in a room with large restrainer box (a large acrylic box with holes in the side that could be closed by resting a foam board door across its opening). Each dog had three experiences in the room and each time the experimenters were interested in whether the dog would open the restrainer box within 120 seconds. The order of the 3 experiences was counterbalanced so different dogs completed the experiences in different orders.

This data contains a subset of variables from the study, but the full dataset is available in the supplementary materials of the paper doi:10.1371/journal.pone.0231742.s001. The data contains the following variables

Source

www.discovr.rocks/csv/van_bourg_2020.csv

References


Video game and aggression data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

video_games

Format

A tibble with 442 rows and 4 variables

Details

Video games are among the favourite online activities for young people. These games have been linked to increased aggression in youths. Another predictor of aggression and conduct problems is callous-unemotional traits such as lack of guilt, lack of empathy, callous use of others for personal gain. Imagine that a scientist explored the relationship between playing violent video games and aggression. She measured aggressive behaviour, callous-traits, and the number of hours per week they play video games in 442 youths. These fictitious data contains three variables:

Source

www.discovr.rocks/csv/video_games.csv


Virtual IX palette

Description

Colour palette based on Iron Maiden's Virtual IX album sleeve.

Usage

virtual_pal(n, type = c("discrete", "continuous"), reverse = FALSE)

scale_color_virtual(n, type = "discrete", reverse = FALSE, ...)

scale_colour_virtual(n, type = "discrete", reverse = FALSE, ...)

scale_fill_virtual(n, type = "discrete", reverse = FALSE, ...)

Arguments

n

number of colours

type

discrete or continuous

reverse

reverse order, Default: FALSE

...

Arguments passed on to ggplot2::discrete_scale

aesthetics

The names of the aesthetics that this scale works with.

scale_name

[Deprecated] The name of the scale that should be used for error messages associated with this scale.

palette

A palette function that when called with a single integer argument (the number of levels in the scale) returns the values that they should take (e.g., scales::pal_hue()).

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the default breaks (the scale limits)

  • A character vector of breaks

  • A function that takes the limits as input and returns breaks as output. Also accepts rlang lambda function notation.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

limits

One of:

  • NULL to use the default scale values

  • A character vector that defines possible values of the scale and their order

  • A function that accepts the existing (automatic) values and returns new ones. Also accepts rlang lambda function notation.

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

na.translate

Unlike continuous scales, discrete scales can easily show missing values, and do so by default. If you want to remove missing values from a discrete scale, specify na.translate = FALSE.

na.value

If na.translate = TRUE, what aesthetic value should the missing values be displayed as? Does not apply to position scales where NA is always placed at the far right.

drop

Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE includes the levels in the factor. Please note that to display every level in a legend, the layer should use show.legend = TRUE.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

call

The call used to construct the scale for reporting messages.

super

The super class to use for the constructed scale

Value

A discrete or continuous scale.

Examples

library(scales)
show_col(virtual_pal()(8))

library(discovr)
library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(eddiefy, year < 1992, select = c("energy", "valence", "album_name"))

# Plot some data and apply theme to color (note US English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_color_virtual()

# Plot some data and apply theme to colour (note UK English)

ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  theme_minimal() +
  scale_colour_virtual()

# Plot some data and apply theme to fill

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_virtual()

Williams' questionnaire of organizational ability data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

williams

Format

A tibble with 239 rows and 29 variables.

Details

Dr Sian Williams (University of Brighton) devised a questionnaire to measure organizational ability. She predicted five factors to do with organizational ability: (1) preference for organization; (2) goal achievement; (3) planning approach; (4) acceptance of delays; and (5) preference for routine. These dimensions are theoretically independent. Williams's questionnaire contains 28 items using a seven-point Likert scale (1 = strongly disagree, 4 = neither, 7 = strongly agree). She gave it to 239 people.

Source

www.discovr.rocks/csv/williams.csv


Xbox: games console injuries

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

xbox

Format

A tibble with 40 rows and 4 variables.

Details

Fictional data about injuries while playing video games on a console. A researcher was interested in what factors contributed to injuries resulting from game console use. She tested 40 participants who were randomly assigned to either an active or static game played on either a Nintendo Switch or Xbox One Kinect. At the end of the session their physical condition was evaluated on an injury severity scale.

Source

www.discovr.rocks/csv/xbox.csv


Zhang et al. (2013) (subsample)

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

zhang_sample

Format

A tibble with 52 rows and 4 variables

Details

Statistics and maths anxiety are common and affect people's performance on maths and stats assignments; women in particular can lack confidence in mathematics (Field, 2010). Zhang, Schmader and Hall (2013) did an intriguing study in which students completed a maths test in which some put their own name on the test booklet, whereas others were given a booklet that already had either a male or female name on. Participants in the latter two conditions were told that they would use this other person's name for the purpose of the test. Women who completed the test using a different name performed significantly better than those who completed the test using their own name. (There were no such significant effects for men.) The data are a random subsample of Zhang et al.'s data with the following variables:

Source

www.discovr.rocks/csv/zhang_2013_subsample.csv

References


Zibarras et al. (2008) data

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

zibarras_2008

Format

A tibble with 207 rows and 12 variables.

Details

Zibarras, Port, and Woods (2008) looked at the relationship between personality and creativity. They used the Hogan Development Survey (HDS), which measures 11 dysfunctional dispositions of employed adults: being volatile, mistrustful, cautious, detached, passive_aggressive, arrogant, manipulative, dramatic, eccentric, perfectionist, and dependent.

Source

www.discovr.rocks/csv/zibarras_2008.csv

References


Zombie growth model

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

zombie_growth

Format

A tibble with 564 rows and 5 variables.

Details

In the story within Field (2016) a lot of people get turned into zombies. At the end of the book it is revealed that one of the central characters, Alice, uses a gene therapy that she invented to restore the zombies back to a human state. This dataset relates to her second study in which she tracked efficacy over 12 months after the treatment. The contains measures from 141 zombies measured at four timepoints (baseline and 1, 6, and 12 month follow-up). Zombies were randomly assigned to two arms of the trial (wait list vs. gene therapy) and the outcome was how much they resembled their pre-zombie state (as a percentage).

Source

www.discovr.rocks/csv/zombie_growth.csv

References


Zombie rehab

Description

A dataset from Field, A. P. (2026). Discovering statistics using R and RStudio (2nd ed.). London: Sage.

Usage

zombie_rehab

Format

A tibble with 190 rows and 6 variables.

Details

In the story within Field (2016) a lot of people get turned into zombies. At the end of the book it is revealed that one of the central characters, Alice, uses a gene therapy that she invented to restore the zombies back to a human state. This dataset relates to her first attempt at an efficacious gene therapy. It contains data from 190 zombies treated at 10 different clinics. Zombies were randomly assigned to two arms of the trial (wait list vs. gene therapy) and the outcome was how much they resembled their pre-zombie state (as a percentage).

Source

www.discovr.rocks/csv/zombie_rehab.csv

References