Title: | Functions to Facilitate Exploratory Data Analysis |
Version: | 1.0.3 |
Description: | Functions for descriptive statistics, data management, and data visualization. |
Depends: | R (≥ 3.5.0) |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.2 |
License: | MIT + file LICENSE |
VignetteBuilder: | knitr |
BugReports: | https://github.com/rkabacoff/qacBase/issues |
URL: | https://github.com/rkabacoff/qacBase |
Suggests: | rmarkdown, knitr, kableExtra |
Imports: | ggplot2, dplyr, tidyr, ggcorrplot, multcompView, PMCMRplus, crayon, purrr, haven, rlang, ggExtra, patchwork |
NeedsCompilation: | no |
Packaged: | 2022-02-09 21:57:41 UTC; rkaba |
Author: | Kabacoff Robert [aut, cre], Barich Griffen [ctb], Jamrog Kelly [ctb], Kravchenko Elizaveta [ctb], Kuruvilla Jacob [ctb], Liu Lex [ctb], Nakamura Shota [ctb], Pham Kim [ctb], Rodriguez Belen [ctb], Ross Shane [ctb], Russo Chris [ctb], Corpuz Frederick [ctb], Juradat Nurah [ctb], Karp Harrison [ctb], Koech Kevin [ctb], Peters Anna [ctb], Shah Dhhyey [ctb], Stevenson Kenneth [ctb], Thomas-Franz Kaitlyn [ctb], Zheng Jiner [ctb], Aldarmaki Ahmed [ctb], Alneyadi Mohammed [ctb], Altai Chossis [ctb], Colorado Sofia [ctb], Northrop Blake [ctb], Peretz Shea [ctb], Qin Cher [ctb], Tuhabonye Emma [ctb], Wong Phillip [ctb] |
Maintainer: | Kabacoff Robert <rkabacoff@wesleyan.edu> |
Repository: | CRAN |
Date/Publication: | 2022-02-09 22:20:02 UTC |
qacBase: Functions to Facilitate Exploratory Data Analysis
Description
Functions for descriptive statistics, data management, and data visualization.
Author(s)
Maintainer: Kabacoff Robert rkabacoff@wesleyan.edu
Other contributors:
Barich Griffen [contributor]
Jamrog Kelly [contributor]
Kravchenko Elizaveta [contributor]
Kuruvilla Jacob [contributor]
Liu Lex [contributor]
Nakamura Shota [contributor]
Pham Kim [contributor]
Rodriguez Belen [contributor]
Ross Shane [contributor]
Russo Chris [contributor]
Corpuz Frederick [contributor]
Juradat Nurah [contributor]
Karp Harrison [contributor]
Koech Kevin [contributor]
Peters Anna [contributor]
Shah Dhhyey [contributor]
Stevenson Kenneth [contributor]
Thomas-Franz Kaitlyn [contributor]
Zheng Jiner [contributor]
Aldarmaki Ahmed [contributor]
Alneyadi Mohammed [contributor]
Altai Chossis [contributor]
Colorado Sofia [contributor]
Northrop Blake [contributor]
Peretz Shea [contributor]
Qin Cher [contributor]
Tuhabonye Emma [contributor]
Wong Phillip [contributor]
See Also
Useful links:
Barcharts
Description
Create barcharts for all categorical variables in a data frame.
Usage
barcharts(
data,
fill = "deepskyblue2",
color = "grey30",
labels = TRUE,
sort = TRUE,
maxcat = 20,
abbrev = 20
)
Arguments
data |
data frame |
fill |
fill color for bars |
color |
color for bar labels |
labels |
if |
sort |
if |
maxcat |
numeric. barcharts with more than this number of bars will not be plotted. |
abbrev |
numeric. abbreviate bar labels to at most, this character length. |
Value
a ggplot graph
Examples
barcharts(cars74)
Automobile characteristics
Description
Cars dataset with features including make, model, year, engine, and other properties of the car used to predict its price.
Usage
cardata
Format
A data frame with 11914 rows and 16 variables. The variables are as follows:
- make
car brand
- model
model given by its brand
- year
year of manufacture
- engine_fuel_type
type of fuel required by its manufacturer
- engine_hp
engine horse power
- engine_cylinders
number of cylinders
- transmission_type
automatic vs. manual
- driven_wheels
AWD, FWD, AWD
- number_of_doors
Number of Doors
- market_category
Luxury, Performance, Hatchback, etc.
- vehicle_size
Compact, Midsize, Large
- vehicle_style
Type of Vehicle: Sedan, SUV, Coupe, etc.
- highway_mpg
highway miles per gallon
- city_mpg
city miles per gallon
- popularity
Popularity index
- msrp
manufacturer's suggested retail price
Details
This package contains a detailed car dataset.
Source
Taken from Kaggle https://www.kaggle.com/CooperUnion/cardataset.
Examples
summary(cardata)
Motor Trend car road tests
Description
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
Usage
cars74
Format
A data frame with 32 rows and 11 variables. The variables are as follows:
- auto
highway miles per gallon
- mpg
Miles/(US) gallon
- cyl
Number of cylinders
- disp
Displacement (cu.in.)
- hp
Gross horsepower
- drat
Rear axle ratio
- wt
Weight (1000 lbs)
- qsec
1/4 mile time
- vs
Engine cylinder configuration
- am
Transmission type
- gear
Number of forward gears
- carb
Number of carburetors
Details
This dataset is the mtcars
dataset that comes
with base R. However, cyl
, vs
, am
, gear
and carb
have been converted
to factors and rownames have been converted to the variable auto
.
A description of the variables by Soren Heitmann can be found
here.
Source
Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391-411.
Examples
summary(cars74)
Detailed description of a data frame
Description
contents
provides a comprehensive description of a data
frame, including summary statistics for both quantitative and
categorical variables
Usage
contents(data, digits = 2, maxcat = 10, label_length = 20)
Arguments
data |
a data frame |
digits |
number of decimal digits for statistics. |
maxcat |
maximum number of levels of a character/factor variable to print. |
label_length |
maximum length of factor level label to print. Longer labels will be truncated. |
Details
Prints a comprehensive description of a data frame via several tables, a general summary table and tables that provide a breakdown of quantitative and categorical variables.
Value
a list with 6 components:
- dfname
name of data frame
- nrow
number of rows
- ncol
number of columns
- overall
data frame of overall dataset characteristics
- qvars
data frame with summary statistics for quantitative variables
- cvars
data frame with summary statistics for categorical variables
Examples
contents(cars74)
Correlation matrix plot
Description
Create a correlation matrix for all quantitative variables in a data frame.
Usage
cor_plot(
data,
method = c("pearson", "kendall", "spearman"),
sort = FALSE,
axis_text_size = 12,
number_text_size = 3,
legend = FALSE
)
Arguments
data |
data frame |
method |
a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman". |
sort |
logical. If |
axis_text_size |
size for axis labels (default=12). |
number_text_size |
size for correlation coefficient labels (default=3). |
legend |
logical, if TRUE the legend is displayed. (default=FALSE) |
Details
The cor_plot
function will only select quantitative variables from
a data frame. Categorical variables are ignored.
The correlation matrix is presented as a lower triangle matrix.
Missing values are deleted in listwise fashion.
Value
a ggplot graph
Note
This function is a wrapper for the ggcorrplot
function.
Examples
cor_plot(cars74)
cor_plot(cars74, sort=TRUE)
Two-way frequency table
Description
This function creates a two way frequency table.
Usage
crosstab(
data,
rowvar,
colvar,
type = c("freq", "percent", "rowpercent", "colpercent"),
total = TRUE,
na.rm = TRUE,
digits = 2,
chisquare = FALSE,
plot = FALSE
)
Arguments
data |
data frame |
rowvar |
row factor (unquoted) |
colvar |
column factor (unquoted) |
type |
statistics to print. Options are |
total |
logical. if TRUE, includes total percents. |
na.rm |
logical. if TRUE, deletes cases with missing values. |
digits |
number of decimal digits to report for percents. |
chisquare |
logical. If |
plot |
logical. If |
Details
Given a data frame, a row factor, a column factor, and a type (frequencies, cell percents, row percents, or column percents) the function provides the requested cross-tabulation.
If na.rm = FALSE
, a level labeled <NA>
added. If
total = TRUE
, a level labeled Total
is added. If
chisquare = TRUE
, a chi-square test of independence is
performed.
Value
If plot=TRUE
, return a ggplot2 graph.
Otherwise the function return a list with 6 components:
table
(table). Table of frequencies or percentstype
(character). Type of table to printtotal
(logical). IfTRUE
, print row and or column totalsdigits
(numeric). number of digits to printrowname
(character). Row variable namecolname
(character). Column variable namechisquare
(character). Ifchisquare=TRUE
, contains the results of the Chi-square test.NULL
otherwise.
See Also
Examples
# print frequencies
crosstab(mtcars, cyl, gear)
# print cell percents
crosstab(cardata, vehicle_size, driven_wheels)
crosstab(cardata, vehicle_size, driven_wheels,
plot=TRUE)
crosstab(cardata, driven_wheels, vehicle_size,
type="colpercent", plot=TRUE, chisquare=TRUE)
Density plots
Description
Create desnsity plots for all quantitative variables in a data frame.
Usage
densities(data, fill = "deepskyblue2", adjust = 1)
Arguments
data |
data frame |
fill |
fill color for density plots |
adjust |
a factor multiplied by the smoothing bandwidth. See details. |
Details
The densities
function will only plot quantitative variables from
a data frame. Categorical variables are ignored.
The adjust
parameter mulitplies the smoothing parameter. For example
adjust = 2
will make the density plots twice as smooth.
The adjust = 1/2
will make the density plots half as smooth (i.e., twice as spiky).
Value
a ggplot graph
Examples
densities(cars74)
densities(cars74, adjust=2)
densities(cars74, adjust=1/2)
Visualize a data frame
Description
df_plot
visualizes the variables in a data frame.
Usage
df_plot(data)
Arguments
data |
a data frame. |
Details
For each variable, the plot displays
type (
numeric
,integer
,factor
,ordered factor
,logical
, ordate
)percent of available (and missing) cases
Variables are sorted by type and the total number of variables and cases are printed in the caption.
Value
a ggplot2
graph
See Also
For more descriptive statistics on a data frame see contents.
Examples
df_plot(cars74)
Test of group differences
Description
One-way analysis (ANOVA or Kruskal-Wallis Test) with post-hoc comparisons and plots
Usage
groupdiff(
data,
y,
x,
method = c("anova", "kw"),
digits = 2,
horizontal = FALSE,
posthoc = FALSE
)
Arguments
data |
a data frame. |
y |
a numeric response variable |
x |
a categorical explanatory variable. It will coerced to be a factor. |
method |
character. Either |
digits |
Number of significant digits to print. |
horizontal |
logical. If |
posthoc |
logical. If |
Details
The groupdiff
function performs one of two analyses:
anova
A one-way analysis of variance, with TukeyHSD post-hoc comparisons.
kw
A Kruskal Wallis Rank Sum Test, with Conover Test post-hoc comparisons.
In each case, summary statistics and a grouped boxplots are
provided. In the parametric case, the statistics are n, mean, and
standard deviation. In the nonparametric case the statistics are
n, median, and median absolute deviation. If posthoc = TRUE
,
pairwise comparisons of superimposed on the boxplots.
Groups that share a letter are not significantly different (p < .05),
controlling for multiple comparisons.
Value
a list with 3 components:
result
omnibus test
summarystats
summary statistics
plot
ggplot2 graph
See Also
kwAllPairsConoverTest, multcompLetters.
Examples
# parametric analysis
groupdiff(cars74, hp, gear)
# nonparametric analysis
groupdiff(cardata, popularity, vehicle_style, posthoc=TRUE,
method="kw", horizontal=TRUE)
Histograms
Description
Create histograms for all quantitative variables in a data frame.
Usage
histograms(data, fill = "deepskyblue2", color = "white", bins = 30)
Arguments
data |
data frame |
fill |
fill color for histogram bars |
color |
border color for histogram bars |
bins |
number of bins (bars) for the histograms |
Details
The histograms
function will only plot quantitative variables from
a data frame. Categorical variables are ignored.
Value
a ggplot graph
Examples
histograms(cars74)
histograms(cars74, bins=15, fill="darkred")
List object sizes and types
Description
lso
lists object sizes and types.
Usage
lso(
pos = 1,
pattern,
order.by = "Size",
decreasing = TRUE,
head = TRUE,
n = 10
)
Arguments
pos |
a number specifying the environment as a position in the search list. |
pattern |
an optional regular expression. Only names matching pattern are returned. glob2rx can be used to convert wildcard patterns to regular expressions. |
order.by |
column to sort the list by. Values are |
decreasing |
logical. If |
head |
logical. Should output be limited to |
n |
if |
Details
This function list the sizes and types of all objects in an environment. By default, the list describes the objects in the current environment, presented in descending order by object size and reported in megabytes (Mb).
Value
a data.frame with four columns (Type, Size, Rows, Columns) and object names as row names.
Author(s)
Based on based on postings by Petr Pikal and David Hinds to the r-help list in 2004 and modified Dirk Eddelbuettel, Patrick McCann, and Rob Kabacoff.
References
https://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session/.
Examples
data(cardata)
data(cars74)
lso()
Normalize numeric variables
Description
Normalize the numeric variables in a data frame
Usage
normalize(data, new_min = 0, new_max = 1)
Arguments
data |
a data frame. |
new_min |
minimum for the transformed variables. |
new_max |
maximum for the transformed variables. |
Details
normalize
transforms all the numeric variables
in a data frame to have the same minimum and maximum values.
By default, this will be a minimum of 0 and maximum of 1.
Character variables and factors are left unchanged.
Value
a data frame
Note
Use this function to be transform variables into a given range. The default is [0, 1], but [-1, 1], [0, 100], or any other range is permissible.
Examples
head(cars74)
cars74_st <- normalize(cars74)
head(cars74_st)
Get help on a package
Description
phelp
provides help on an installed package.
Usage
phelp(pckg)
Arguments
pckg |
The name of a package |
Details
This function provides help on an installed package. The package does not have to be loaded. The package name does not need to be entered with quotes.
Value
No return value, called for side effects.
Examples
phelp(stats)
Plot a crosstab object
Description
This function plots the results of a calculated two-way frequency table.
Usage
## S3 method for class 'crosstab'
plot(x, size = 3.5, ...)
Arguments
x |
An object of class |
size |
numeric. Size of bar text labels. |
... |
no currently used. |
Value
a ggplot2 graph
Examples
tbl <- crosstab(cars74, cyl, gear, type = "freq")
plot(tbl)
tbl <- crosstab(cars74, cyl, gear, type = "colpercent")
plot(tbl)
Plot a tab object
Description
Plot a frequency or cumulative frequency table
Usage
## S3 method for class 'tab'
plot(x, fill = "deepskyblue2", size = 3.5, ...)
Arguments
x |
An object of class |
fill |
Fill color for bars |
size |
numeric. Size of bar text labels. |
... |
Parameters passed to a function |
Value
a ggplot2 graph
Examples
tbl1 <- tab(cars74, carb)
plot(tbl1)
tbl2 <- tab(cars74, carb, sort = TRUE)
plot(tbl2)
tbl3 <- tab(cars74, carb, cum=TRUE)
plot(tbl3)
Print a contents object
Description
print.contents
prints the results of the content function.
Usage
## S3 method for class 'contents'
print(x, ...)
Arguments
x |
a object of class |
... |
not used. |
Value
No return value, called for side effects.
Examples
testdata <- data.frame(height=c(4, 5, 3, 2, 100),
weight=c(39, 88, NA, 15, -2),
names=c("Bill","Dean", "Sam", NA, "Jane"),
race=c('b', 'w', 'w', 'o', 'b'))
x <- contents(testdata)
print(x)
Print a crosstab object
Description
This function prints the results of a calculated two-way frequency table.
Usage
## S3 method for class 'crosstab'
print(x, ...)
Arguments
x |
An object of class |
... |
not currently used. |
Value
No return value, called for side effects
Examples
mycrosstab <- crosstab(mtcars, cyl, gear, type = "freq", digits = 2)
print(mycrosstab)
mycrosstab <- crosstab(mtcars, cyl, gear, type = "rowpercent", digits = 3)
print(mycrosstab)
Print a tab object
Description
Print the results of calculating a frequency table
Usage
## S3 method for class 'tab'
print(x, ...)
Arguments
x |
An object of class |
... |
Parameters passed to the print function |
Value
No return value, called for side effects
Examples
frequency <- tab(cardata, make, sort = TRUE, na.rm = FALSE)
print(frequency)
Summary statistics for a quantitative variable
Description
This function provides descriptive statistics for a quantitative variable alone or separately by groups. Any function that returns a single numeric value can bue used.
Usage
qstats(data, x, ..., stats = c("n", "mean", "sd"), na.rm = TRUE, digits = 2)
Arguments
data |
data frame |
x |
numeric variable in data (unquoted) |
... |
list of grouping variables |
stats |
statistics to calculate (any function that produces a
numeric value), Default: |
na.rm |
if |
digits |
number of decimal digits to print, Default: 2 |
Value
a data frame, where columns are grouping variables (optional) and statistics
Examples
# If no keyword arguments are provided, default values are used
qstats(mtcars, mpg, am, gear)
# You can supply as many (or no) grouping variables as needed
qstats(mtcars, mpg)
qstats(mtcars, mpg, am, cyl)
# You can specify your own functions (e.g., median,
# median absolute deviation, minimum, maximum))
qstats(mtcars, mpg, am, gear,
stats = c("median", "mad", "min", "max"))
R Colors
Description
Plot a grid of R colors and their associated names
Usage
rcolors(color = NULL, cex = 0.6)
Arguments
color |
character. A text string used to search for specific color variations (see examples.) |
cex |
numeric. text size for color labels. |
Details
By default rcolors
plots the basic 502 distinct colors provided by the
colors
function. If a color name or part of a name is provided, only
colors with matching names are plotted.
Value
No return value, called for side effects
References
This function is adapted from code published by Karl W. Broman.
See Also
Examples
rcolors()
rcolors("blue")
rcolors("red")
rcolors("dark")
Recode one or more variables
Description
recodes
recodes the values of one or more variables in
a data frame
Usage
recodes(data, vars, from, to)
Arguments
data |
a data frame. |
vars |
character vector of variable names. |
from |
a vector of values or conditions (see Details). |
to |
a vector of replacement values. |
Details
For each variable in the
vars
parameter, values are checked against the list of values in thefrom
vector. If a value matches, it is replaced with the corresponding entry in theto
vector.Once a given observation's value matches a
from
value, it is recoded. That particular observation will not be recoded again by thatrecodes()
statement (i.e., no chaining).One or more values in the
from
vector can be an expression, using the dollar sign ($) to represent the variable being recoded. If the expression evaluates toTRUE
, the correspondingto
value is returned.If the number of values in the
to
vector is less than thefrom
vector, the values are recycled. This lets you convert several values to a single outcome value (e.g.,NA
).If the
to
values are numeric, the resulting recoded variable will be numeric. If the variable being recoded is a factor and theto
values are character values, the resulting variable will remain a factor. If the variable being recoded is a character variable and theto
values are character values, the resulting variable will remain a character variable.
Value
a data frame
Note
See the vignette for detailed examples.
Examples
df <- data.frame(x = c(1, 5, 7, 3, 0),
y = c(9, 0, 5, 9, 2),
z = c(1, 1, 2, 2, 1)
)
df <- recodes(df,
vars = c("x", "y"),
from = 0, to = NA)
df <- recodes(df,
vars = "z",
from = c(1, 2), to = c("pass", "fail"))
Scatterplot
Description
Create a scatter plot between two quantitative variables.
Usage
scatter(
data,
x,
y,
outlier = 3,
alpha = 1,
digits = 3,
title,
margin = "none",
stats = TRUE,
point_color = "deepskyblue2",
outlier_color = "violetred1",
line_color = "grey30",
margin_color = "deepskyblue2"
)
Arguments
data |
data frame |
x |
quantitative predictor variable |
y |
quantitative response variable |
outlier |
number. Observations with studentized residuals larger than this value are flagged. If set to 0, observations are not flagged. |
alpha |
Transparency of data points. A numeric value between 0 (completely transparent) and 1 (completely opaque). |
digits |
Number of significant digits in displayed statistics. |
title |
Optional title. |
margin |
Marginal plots. If specified, parameter can be
|
stats |
logical. If |
point_color |
Color used for points. |
outlier_color |
Color used to identify outliers (see the |
line_color |
Color for regression line. |
margin_color |
Fill color for margin boxplots, density plots, or histograms. |
Details
The scatter
function generates a scatterplot between two quantitative
variables, along with a line of best fit and a 95% confidence interval.
By default, regression statistics (b, r, r2, p) are printed and
outliers (observations with studentized residuals > 3) are flagged.
Optionally, variable distributions (histograms, boxplots, violin plots,
density plots) can be added to the plot margins.
Value
a ggplot2 graph
Note
Variable names do not have to be quoted.
Examples
scatter(cars74, hp, mpg)
scatter(cars74, wt, hp)
p <- scatter(ggplot2::mpg, displ, hwy,
margin="histogram",
title="Engine Displacement vs. Highway Mileage")
plot(p)
Skewness
Description
Calculate the skewness of a numeric variable
Usage
skewness(x, na.rm = TRUE)
Arguments
x |
numeric vector. |
na.rm |
if |
Value
a number
Examples
skewness(mtcars$mpg)
Standardize numeric variables
Description
Standardize the numeric variables in a data frame
Usage
standardize(data, mean = 0, sd = 1, include_dummy = FALSE)
Arguments
data |
a data frame. |
mean |
mean of the transformed variables. |
sd |
standard deviation of the transformed variables. |
include_dummy |
logical. If |
Details
standardize
transforms all the numeric variables
in a data frame to have the same mean and standard deviation.
By default, this will be a mean of 0 and standard deviation of 1.
Character variables and factors are left unchanged. By default,
dummy coded variables are also left unchanged. Use
include_dummy=TRUE
to transform these variables as well.
Value
a data frame
Examples
head(cars74)
cars74_st <- standardize(cars74)
head(cars74_st)
Frequency distribution for a categorical variable
Description
Function to calculate frequency distributions for categorical variables
Usage
tab(
data,
x,
sort = FALSE,
maxcat = NULL,
minp = NULL,
na.rm = FALSE,
total = FALSE,
digits = 2,
cum = FALSE,
plot = FALSE
)
Arguments
data |
A dataframe |
x |
A factor variable in the data frame. |
sort |
logical. Sort levels from high to low. |
maxcat |
Maximum number of categories to be included. Smaller categories will be combined into an "Other" category. |
minp |
Minimum proportion for a category to be included. Categories representing smaller proportions willbe combined into an "Other" category. maxcat and minp cannot both be specified. |
na.rm |
logical. Removes missing values when TRUE. |
total |
logical. Include a total category when TRUE. |
digits |
Number of digits the percents should be rounded to. |
cum |
logical. If |
plot |
logical. If |
Details
The function tab
will calculate the frequency
distribution for a categorical variable and output a data frame
with three columns: level, n, percent.
Value
If plot = TRUE
return a ggplot2 bar chart. Otherwise
return a data frame.
Examples
tab(cars74, carb)
tab(cars74, carb, plot=TRUE)
tab(cars74, carb, sort=TRUE)
tab(cars74, carb, sort=TRUE, plot=TRUE)
tab(cars74, carb, cum=TRUE)
tab(cars74, carb, cum=TRUE, plot=TRUE)
Time spent watching television - 2017
Description
This is a data set detailing TV usage on days surveyed as determined by the 2017 American Time Use Survey. The data set includes demographic information, as well as details regarding employment and family makeup, where applicable. Information on days surveyed, as well as whether the day is a holiday, is also included.
Usage
tv
Format
A data frame with 10,223 rows and 21 variables. The variables are as follows:
- id
ID of respondent
- weight
ATUS final weight
- youngest_child
Age of the youngest child in the household that is less than 18 years old (if applicable). Range: 1-17; if no child in household: NA
- age
Age of respondent
- sex
Sex of respondent
- job
Status of employment of the respondent. Direct transcription from original codebook: 1 = Employed, at work, 2 = Employed, absent, 3 = Unemployed, on layoff, 4 = Unemployed, looking, 5 = Not in the labor force.
- m_job
The response to question, “in the last seven days did you have more than one job?” Returns NA if no job.
- f_job
Does the respondent have a full time job or a part time job? (NA if no job)
- educ
Are you enrolled in high school, college, or university? (NA if not currently enrolled)
- educ2
If yes to educ, are you enrolled in high school or upper schooling? (NA if not currently enrolled)
- partner
Presence of the respondent's spouse or unmarried partner in the household with 1 = Spouse present 2 = Unmarried partner present 3 = No spouse/unmarried partner present
- pr_job
Answer to the question, “does your partner have a job?” (NA if not applicable)
- salary
Weekly earnings at the respondent’s main job, two decimals implied
- children
Number of children under 18 in the household
- pr_job_f
Part time/full time job status of partner, if applicable (NA if partner unemployed or no partner)
- job_hours
Total hours usually worked per week (-4: Hours vary)
- day
Day of the week about which the respondent was interviewed (Monday thorugh Friday)
- holiday
Notes if the respondent was interviewed on a holiday
- elder_care
Total time spent providing elder care that day by the respondent, in minutes
- child_time
Total time spent during diary day providing secondary childcare for household children younger than 13, in minutes
- tv
Minutes spent watching TV
Details
For more information regarding the key visit https://www.bls.gov/tus/atusintcodebk17.pdf. This data is retrieved from the American Time Use Survey, made available through the Bureau of Labor Statistics https://www.bls.gov/tus/datafiles_2017.htm.
Examples
summary(tv)
hist(tv$tv, col="skyblue")
Univariate plot
Description
Generates a descriptive graph for a quantitative variable.
Usage
univariate_plot(
data,
x,
bins = 30,
fill = "deepskyblue",
pointcolor = "black",
density = TRUE,
densitycolor = "grey",
alpha = 0.2,
seed = 1234
)
Arguments
data |
a data frame. |
x |
a variable name (without quotes). |
bins |
number of histogram bins. |
fill |
fill color for the histogram and boxplot. |
pointcolor |
point color for the jitter plot. |
density |
logical. Plot a filled density curve over the the histogram. (default=TRUE) |
densitycolor |
fill color for density curve. |
alpha |
Alpha transparency (0-1) for the density curve and jittered points. |
seed |
pseudorandom number seed for jittered plot. |
Details
univariate_plot
generates a plot containing three graphs:
a histogram (with an optional density curve), a horizontal
jittered point plot, and a horizontal box plot. The subtitle
contains descriptive statistics, including the mean, standard
deviation, median, minimum, maximum, and skew.
Value
a ggplot2 graph
Note
The graphs are created with ggplot2 and then assembled into a single plot through the patchwork package. Missing values are deleted.
Examples
univariate_plot(mtcars, mpg)
univariate_plot(cardata, city_mpg, fill="lightsteelblue",
pointcolor="lightsteelblue", densitycolor="lightpink",
alpha=.6)