Type: | Package |
Title: | Easily Extracting Information About Your Data |
Version: | 0.0.13 |
Description: | Makes it easy to display descriptive information on a data set. Getting an easy overview of a data set by displaying and visualizing sample information in different tables (e.g., time and scope conditions). The package also provides publishable 'LaTeX' code to present the sample information. |
License: | GPL-3 |
URL: | https://github.com/cosimameyer/overviewR |
BugReports: | https://github.com/cosimameyer/overviewR/issues |
Depends: | R (≥ 3.5.0) |
Imports: | data.table (≥ 1.14.2), dplyr (≥ 1.0.0), ggplot2 (≥ 3.3.2), ggrepel (≥ 0.8.2), ggvenn (≥ 0.1.8), rlang, tibble (≥ 3.0.1), tidyr |
Suggests: | countrycode, covr, devtools, knitr, magrittr, pkgdown, rmarkdown, spelling, testthat, xtable |
VignetteBuilder: | knitr, rmarkdown |
Encoding: | UTF-8 |
Language: | en-US |
LazyData: | true |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | no |
Packaged: | 2023-02-15 07:41:16 UTC; cosima |
Author: | Cosima Meyer [cre, aut], Dennis Hammerschmidt [aut] |
Maintainer: | Cosima Meyer <cosima.meyer@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-02-15 07:50:02 UTC |
.overview_tab
Description
Internal function that calculates the 'overview_tab' for data.table objects
Usage
.overview_heat(
dat = NULL,
id = NULL,
time = NULL,
label = FALSE,
perc = FALSE,
col_low = NULL,
col_high = NULL,
xaxis = NULL,
yaxis = NULL,
theme_plot = NULL,
exp_total = NULL,
col_names = NULL
)
Arguments
dat |
The data set |
id |
The scope (e.g., country codes or individual IDs). The axis is ordered in ascending order by default. |
time |
The time (e.g., time periods given by years, months, ...) |
label |
If TRUE (default), the total number of observations/percentages of observations are displayed. If FALSE, it returns no labels. |
perc |
If FALSE (default) plot returns the total number of observations per time-scope-unit. If TRUE, it returns the number of observations per time-scope-unit in percentage |
col_low |
Hex color code for the lowest value (default is "#dceaf2") |
col_high |
Hex color code for the lowest value (default is "#2A5773") |
xaxis |
Label of your x axis ("Time frame" is default) |
yaxis |
Label of your y axis ("Sample" is default) |
theme_plot |
Previously generated theme |
exp_total |
Expected total number of observations (i.e. maximum) for time unit. |
col_names |
The column names (containing id and time) |
Value
A ggplot
.overview_tab
Description
Internal function that calculates the 'overview_tab' for data.table objects
Usage
.overview_tab(dat = NULL, id = NULL, time = NULL, col_names = NULL)
Arguments
dat |
Your data set |
id |
Scope (e.g., country codes or individual IDs) |
time |
Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing **one** time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)'). |
col_names |
The column names (containing id and time) |
Value
A data.table
calculate_share_non_row_wise
Description
Function used in 'overview_na' to calculate the column-wise share of NA
Usage
calculate_share_non_row_wise(dat = NULL)
Arguments
dat |
Data frame |
Value
The function returns a data set that has the information on the column-wise NA share
calculate_share_row_wise
Description
Function used in 'overview_na' to calculate the share of NA row-wise
Usage
calculate_share_row_wise(dat = NULL)
Arguments
dat |
Data frame |
Value
The function returns a data set that has the information on the row-wise NA share
find_int_runs
Description
Function used in 'overview_tab' to find running integers
Usage
find_int_runs(run = NULL)
Arguments
run |
Variable (integer) that should be checked for consecutive values |
Value
The function returns a data set
overview_add_na_output
Description
Function used in 'overview_na' to generate a new data frame with na_count and percentage share of NAs for each row
Usage
overview_add_na_output(dat_result = NULL, dat = NULL)
Arguments
dat_result |
Data.frame from 'overview_na' |
dat |
Data frame |
Value
The function returns a data set that has the information on the row-wise NA share
overview_crossplot
Description
This function plots a ggplot to visualize a cross table plot.
Usage
overview_crossplot(
dat,
id,
time,
cond1,
cond2,
threshold1,
threshold2,
xaxis = "Condition 1",
yaxis = "Condition 2",
label = FALSE,
color = FALSE,
dot_size = 2,
fontsize = 2.5
)
Arguments
dat |
Your data set |
id |
Your scope (e.g., country codes or individual IDs). If the id variable contains NAs, they will not be included in the plot. |
time |
Your time (e.g., time periods given by years, months, ...) |
cond1 |
Variable that describes the first condition |
cond2 |
Variable that describes the second condition |
threshold1 |
A threshold for |
threshold2 |
A threshold for |
xaxis |
Label of the x axis ("Condition 1" is default) |
yaxis |
Label of the y axis ("Condition 2" is default) |
label |
Label of the observations. Overlapping labels are avoided by using 'ggrepel' |
color |
Color of the different observation groups |
dot_size |
Option argument that defines the dot size (default is 2) |
fontsize |
If label is TRUE, the fontsize arguments allows to define the text of the labels (the default is 2.5) |
Value
A ggplot figure that presents the sample information visually in a cross table
Examples
data(toydata)
overview_crossplot(
dat = toydata,
cond1 = gdp,
cond2 = population,
threshold1 = 25000,
threshold2 = 27000,
id = ccode,
time = year
)
overview_crosstab
Description
Sorts a data set conditionally in a cross table. This can be helpful to get a sense of the time and scope conditions of a data set. Note, if used with a data set that has multiple observations on the id-time unit, the function automatically aggregates this information using the mean.
Usage
overview_crosstab(dat, cond1, cond2, threshold1, threshold2, id, time)
Arguments
dat |
A data set object |
cond1 |
Variable that describes the first condition |
cond2 |
Variable that describes the second condition |
threshold1 |
A threshold for |
threshold2 |
A threshold for |
id |
Scope (e.g., country codes or individual IDs) |
time |
Time (e.g., time periods given by years, months, ...) |
Value
A data frame object that contains a summary of the data set that can
later be converted to a 'LaTeX' output using overview_latex
Examples
data(toydata)
overview_crosstab(
dat = toydata,
cond1 = gdp,
cond2 = population,
threshold1 = 25000,
threshold2 = 27000,
id = ccode,
time = year
)
overview_heat
Description
This function plots a heat map to visualize the coverage of the time-scope-units of the data. Options include total number of cases per time-scope-unit or relative number in percentage.
Usage
overview_heat(
dat,
id,
time,
perc = FALSE,
exp_total = NULL,
xaxis = "Time frame",
yaxis = "Sample",
col_low = "#dceaf2",
col_high = "#2A5773",
label = TRUE
)
Arguments
dat |
The data set |
id |
The scope (e.g., country codes or individual IDs). The axis is ordered in ascending order by default. |
time |
The time (e.g., time periods given by years, months, ...) |
perc |
If FALSE (default) plot returns the total number of observations per time-scope-unit. If TRUE, it returns the number of observations per time-scope-unit in percentage |
exp_total |
Expected total number of observations (i.e. maximum) for time unit. |
xaxis |
Label of your x axis ("Time frame" is default) |
yaxis |
Label of your y axis ("Sample" is default) |
col_low |
Hex color code for the lowest value (default is "#dceaf2") |
col_high |
Hex color code for the lowest value (default is "#2A5773") |
label |
If TRUE (default), the total number of observations/percentages of observations are displayed. If FALSE, it returns no labels. |
Value
A ggplot figure that presents sample coverage visually
Examples
data(toydata)
overview_heat(toydata, ccode, year, perc = TRUE, exp_total = 12)
overview_latex
Description
Produces a 'LaTeX' output for output obtained via
overview_tab
and overview_crosstab
Usage
overview_latex(
obj,
title = "Time and scope of the sample",
id = "Sample",
time = "Time frame",
crosstab = FALSE,
cond1 = "Condition 1",
cond2 = "Condition 2",
save_out = FALSE,
file_path,
label = "tab:tab1",
fontsize,
file,
path
)
Arguments
obj |
Overview object produced by overview_tab or overview_crosstab |
title |
Caption of the table (default is "Time and scope of the sample") |
id |
The name of the left column (default is "Sample"), will be ignored if crosstab is TRUE |
time |
The name of the right column (default is ("Time frame")), will
be ignored if |
crosstab |
Logical argument, if TRUE produces a |
cond1 |
Description for the first condition (character), will be
ignored if |
cond2 |
Description for the second condition (character), will be
ignored if |
save_out |
Optional argument, exports the output table as a .tex file, default is FALSE |
file_path |
Specifies the path and file name (.tex) where you store your output |
label |
Specifies the label (default is "tab:tab1") |
fontsize |
Specifies the font size (all 'LaTeX' font sizes such as "scriptsize" or "small" work) |
file |
This argument is deprecated. Please use "file_path" instead and add the full path. |
path |
This argument is deprecated. Please use "file_path" instead and add the full path. |
Value
A 'LaTeX' output that can either be copy-pasted in a text document or exported directed as a .tex file
Examples
data(toydata)
overview_object <- overview_tab(dat = toydata, id = ccode, time = year)
overview_latex(
obj = overview_object,
title = "Some nice title",
crosstab = FALSE
)
#' overview_object <- overview_tab(dat = toydata, id = ccode, time = year)
overview_latex(
obj = overview_object,
title = "Some nice title",
file_path = "some/path_to/your_output_file.tex"
)
overview_ct_object <- overview_crosstab(
dat = toydata,
cond1 = gdp,
cond2 = population,
threshold1 = 25000,
threshold2 = 27000,
id = ccode,
time = year
)
overview_latex(
obj = overview_ct_object,
title = "Some nice title for a cross tab",
crosstab = TRUE,
cond1 = "Name of first condition",
cond2 = "Name of second condition"
)
overview_na
Description
This function plots a ggplot to visualize the distribution of NAs across all variables in the data set.
Usage
overview_na(
dat,
yaxis = "Variables",
perc = TRUE,
row_wise = FALSE,
add = FALSE
)
Arguments
dat |
Your data set |
yaxis |
Label of your y axis ("Variables" is default) |
perc |
If TRUE (default) plot returns the number of NAs in percentage |
row_wise |
If TRUE (FALSE is default) plot return the number of NAs per row |
add |
If TRUE (FALSE is default) it generates a new data frame with na_count and percentage share of NAs for each row |
Value
Depending on the selection, the function returns a ggplot figure that presents the distribution of NAs in the data set or adds the information on the row-wise NA share
Examples
data(toydata)
overview_na(toydata, perc = FALSE)
overview_overlap
Description
Provides an overview of the overlap of two data sets. Cautionary note: This function is currently only preliminary workable and can only capture 2 data sets. We are working on an extension that allows to compare multiple data sets.
Usage
overview_overlap(
dat1,
dat2,
dat1_id,
dat2_id,
dat1_name = "Data set 1",
dat2_name = "Data set 2",
plot_type = "bar"
)
Arguments
dat1 |
A first data set object |
dat2 |
A second data set object |
dat1_id |
Scope (e.g., country codes or individual IDs) of dat1. It is important that both ID variables are exactly the same to generate the perfect match. |
dat2_id |
Scope (e.g., country codes or individual IDs) of dat2. It is important that both ID variables are exactly the same to generate the perfect match. |
dat1_name |
Name of dat1 ("Data set 1" is the default) |
dat2_name |
Name of dat2 ("Data set 2" is the default) |
plot_type |
Type of plot ("bar" and "venn" are the two options) "venn" relies on the ggvenn function |
Value
A ggplot2 object (bar chart) that shows the overlap of two data sets.
Examples
## Not run:
data(toydata)
toydata2 <- toydata[which(toydata$year > 1992), ]
overview_overlap(
dat1 = toydata, dat2 = toydata2, dat1_id = ccode,
dat2_id = ccode
)
## End(Not run)
overview_plot
Description
This function plots a ggplot to visualize the distribution of scope objects across the time frame.
Usage
overview_plot(
dat,
id,
time,
xaxis = "Time frame",
yaxis = "Sample",
asc = TRUE,
color,
dot_size = 2
)
Arguments
dat |
Your data set |
id |
Your scope (e.g., country codes or individual IDs). If the id variable contains NAs, they will not be included in the plot. |
time |
Your time (e.g., time periods given by years, months, ...) |
xaxis |
Label of the x axis ("Time frame" is default) |
yaxis |
Label of the y axis ("Sample" is default) |
asc |
Sorting the y axis in ascending order ("TRUE" is default) |
color |
Optional argument that defines the color |
dot_size |
Option argument that defines the dot size (default is 2) |
Value
A ggplot figure that presents the sample information visually
Examples
data(toydata)
overview_plot(dat = toydata, id = ccode, time = year)
overview_plot_absolute
Description
Function used in 'overview_na' to plot the absolute share of NA values
Usage
overview_plot_absolute(
dat_result = NULL,
theme_plot = NULL,
yaxis = NULL,
xaxis = NULL
)
Arguments
dat_result |
Data frame |
theme_plot |
Theme for the plot (pre-defined) |
yaxis |
Name for yaxis |
xaxis |
Name for xaxix |
Value
The function returns a ggplot
overview_plot_percentage
Description
Function used in 'overview_na' to plot the percentage share of NA values
Usage
overview_plot_percentage(
dat_result = NULL,
theme_plot = NULL,
yaxis = NULL,
xaxis = NULL
)
Arguments
dat_result |
Data frame |
theme_plot |
Theme for the plot (pre-defined) |
yaxis |
Name for yaxis |
xaxis |
Name for xaxix |
Value
The function returns a ggplot
overview_tab
Description
Provides an overview table for the time and scope conditions of a data set. If a data.table object is provided, the function uses data.table's syntax to perform the evaluation
Usage
overview_tab(
dat,
id,
time = list(year = NULL, month = NULL, day = NULL),
complex_date = FALSE
)
Arguments
dat |
A data frame or data table object |
id |
Scope (e.g., country codes or individual IDs) |
time |
Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing **one** time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)'). |
complex_date |
Boolean argument identifying if there is a more complex (list-wise) date_time parameter (FALSE is the default) |
Value
A data frame object that contains a summary of a sample that
can later be converted to a 'LaTeX' output using overview_latex
Examples
# With version 1 (and also 2):
data(toydata)
output_table <- overview_tab(dat = toydata, id = ccode, time = year)
# With version 3:
overview_tab(dat = toydata, id = ccode, time = list(
year = toydata$year,
month = toydata$month, day = toydata$day
), complex_date = TRUE)
overview_tab_df
Description
Internal function that calculates the 'overview_tab' for data.frame objects
Usage
overview_tab_df(dat2 = NULL, dat = NULL, id = NULL, time = NULL)
Arguments
dat2 |
Your data set |
dat |
Your data set |
id |
Scope (e.g., country codes or individual IDs) |
time |
Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing **one** time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)'). |
Value
A data.frame
overview_tab_dt
Description
Internal function that calculates the 'overview_tab' for data.table objects
Usage
overview_tab_dt(dat = NULL, id = NULL, time = NULL, col_names = NULL)
Arguments
dat |
Your data set |
id |
Scope (e.g., country codes or individual IDs) |
time |
Time (e.g., time periods given by years, months, ...). There are three options to add a date variable: 1) Time can be a character vector containing **one** time variable, 2) a time variable following the YYYY-MM-DD format, or 3) or a list containing multiple time variables ('time = list(year = NULL, month = NULL, day = NULL)'). |
col_names |
The column names (containing id and time) |
Value
A data.table
theme_heat_plot
Description
Defines the theme for the 'overview_heat' plot function
Usage
theme_heat_plot()
Value
A theme for the 'overview_heat' plot
theme_na_plot
Description
Defines the theme for the 'overview_na' plot function
Usage
theme_na_plot()
Value
A theme for the 'overview_na' plot
Cross-sectional data for countries
Description
Small, artificially generated toy data set that comes in a cross-sectional format where the unit of analysis is either country-year or country-year-month. It provides artificial information for five countries (Angola, Benin, France, Rwanda, and the UK) for a time span from 1990 to 1999 to illustrate the use of the package.
Usage
data(toydata)
Format
An object of class "data.frame"
- ccode
ISO3 country code (as character) for the countries in the sample (Angola, Benin, France, Rwanda, and UK)
- year
A value between 1990 and 1999
- month
An abbreviation (MMM) for month (character)
- gpd
A fake value for GDP (randomly generated)
- population
A fake value for population (randomly generated)
References
This data set was artificially created for the overviewR package.
Examples
data(toydata)
head(toydata)