Title: | Compact and Flexible Summaries of Data |
Version: | 2.1.5 |
Description: | A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README. |
License: | GPL-3 |
URL: | https://docs.ropensci.org/skimr/ (website), https://github.com/ropensci/skimr/ |
BugReports: | https://github.com/ropensci/skimr/issues |
Depends: | R (≥ 3.1.2) |
Imports: | cli, dplyr (≥ 0.8.0), knitr (≥ 1.2), magrittr (≥ 1.5), pillar (≥ 1.6.4), purrr, repr, rlang (≥ 0.3.1), stats, stringr (≥ 1.1), tibble (≥ 2.0.0), tidyr (≥ 1.0), tidyselect (≥ 1.0.0), vctrs |
Suggests: | covr, crayon, data.table, dtplyr, extrafont, haven, lubridate, rmarkdown, testthat (≥ 2.0.0), withr |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Collate: | 'deprecated.R' 'dplyr.R' 'stats.R' 'skim_with.R' 'get_skimmers.R' 'reshape.R' 'sfl.R' 'skim.R' 'skim_obj.R' 'skim_print.R' 'skimr-package.R' 'summary.R' 'utils.R' 'vctrs.R' |
NeedsCompilation: | no |
Packaged: | 2022-12-23 10:37:38 UTC; elinwaring |
Author: | Elin Waring [cre, aut], Michael Quinn [aut], Amelia McNamara [aut], Eduardo Arino de la Rubia [aut], Hao Zhu [aut], Julia Lowndes [ctb], Shannon Ellis [aut], Hope McLeod [ctb], Hadley Wickham [ctb], Kirill Müller [ctb], RStudio, Inc. [cph] (Spark functions), Connor Kirkpatrick [ctb], Scott Brenstuhl [ctb], Patrick Schratz [ctb], lbusett [ctb], Mikko Korpela [ctb], Jennifer Thompson [ctb], Harris McGehee [ctb], Mark Roepke [ctb], Patrick Kennedy [ctb], Daniel Possenriede [ctb], David Zimmermann [ctb], Kyle Butts [ctb], Bastian Torges [ctb], Rick Saporta [ctb], Henry Morgan Stewart [ctb] |
Maintainer: | Elin Waring <elin.waring@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-12-23 11:10:02 UTC |
Skim a data frame
Description
This package provides an alternative to the default summary functions
within R. The package's API is tidy, functions take data frames, return
data frames and can work as part of a pipeline. The returned skimr
object is subsettable and offers a human readable output.
Details
skimr
is opinionated, providing a strong set of summary statistics
that are generated for a variety of different data types. It is also
provides an API for customization. Users can change both the functions
dispatched and the way the results are formatted.
Deprecated functions from skimr v1
Description
Skimr used to offer functions that combined skimming with a secondary effect,
like reshaping the data, building a list or printing the results. Some of
these behaviors are no longer necessary. skim()
always returns a wide
data frame. Others have been replaced by functions that do a single thing.
partition()
creates a list-like object from a skimmed data frame.
Usage
skim_to_wide(.data, ...)
skim_to_list(.data, ...)
skim_format(...)
Arguments
.data |
A tibble, or an object that can be coerced into a tibble. |
... |
Columns to select for skimming. When none are provided, the default is to skim all columns. |
Value
Either A skim_df
or a skim_list
object.
Functions
-
skim_to_wide()
:skim()
always produces a wide data frame. -
skim_to_list()
:partition()
creates a list. -
skim_format()
:print()
andskim_with()
set options.
Fix unicode histograms on Windows
Description
This functions changes your session's locale to address issues with printing histograms on Windows.
Usage
fix_windows_histograms()
Details
There are known issues with printing the spark-histogram characters when printing a data frame, appearing like this: "<U+2582><U+2585><U+2587>". This longstanding problem originates in the low-level code for printing dataframes.
See Also
Only show a subset of summary statistics after skimming
Description
This function is a variant of dplyr::select()
designed to work with
skim_df
objects. When using focus()
, skimr
metadata columns are kept,
and skimr
print methods are still utilized. Otherwise, the signature and
behavior is identical to dplyr::select()
.
Usage
focus(.data, ...)
Arguments
.data |
A |
... |
One or more unquoted expressions separated by commas. Variable names can be used as if they were positions in the data frame, so expressions like x:y can be used to select a range of variables. |
Examples
# Compare
iris %>%
skim() %>%
dplyr::select(n_missing)
iris %>%
skim() %>%
focus(n_missing)
# This is equivalent to
iris %>%
skim() %>%
dplyr::select(skim_variable, skim_type, n_missing)
View default skimmer names and functions
Description
These utility functions look up the currently-available defaults for one or
more skim_type
's. They work with all defaults in the skimr
package, as
well as the defaults in any package that extends skimr
. See
get_skimmers()
for writing lookup methods for different.
Usage
get_default_skimmers(skim_type = NULL)
get_one_default_skimmer(skim_type)
get_default_skimmer_names(skim_type = NULL)
get_one_default_skimmer_names(skim_type)
get_sfl(skim_type)
Arguments
skim_type |
The class of the column being skimmed. |
Details
The functions differ in return type and whether or not the result is in
a list. get_default_skimmers()
and get_one_default_skimmer()
return
functions. The former returns functions within a typed list, i.e.
list(numeric = list(...functions...))
.
The functions differ in return type and whether or not the result is in
a list. get_default_skimmer_names()
and get_one_default_skimmer_names()
return a list of character vectors or a single character vector.
get_sfl()
returns the skimmer function list for a particular skim_type
.
It differs from get_default_skimmers()
in that the returned sfl
contains
a list of functions and a skim_type
.
Functions
-
get_one_default_skimmer()
: Get the functions associated with oneskim_type
. -
get_default_skimmer_names()
: Get the names of the functions used in one or moreskim_type
's. -
get_one_default_skimmer_names()
: Get the names of the functions used in oneskim_type
. -
get_sfl()
: Get thesfl
for askim_type
.
Retrieve the summary functions for a specific data type
Description
These functions are used to set the default skimming functions for a data
type. They are combined with the base skim function list (sfl
) in
skim_with()
, to create the summary tibble for each type.
Usage
get_skimmers(column)
## Default S3 method:
get_skimmers(column)
## S3 method for class 'numeric'
get_skimmers(column)
## S3 method for class 'factor'
get_skimmers(column)
## S3 method for class 'character'
get_skimmers(column)
## S3 method for class 'logical'
get_skimmers(column)
## S3 method for class 'complex'
get_skimmers(column)
## S3 method for class 'Date'
get_skimmers(column)
## S3 method for class 'POSIXct'
get_skimmers(column)
## S3 method for class 'difftime'
get_skimmers(column)
## S3 method for class 'Timespan'
get_skimmers(column)
## S3 method for class 'ts'
get_skimmers(column)
## S3 method for class 'list'
get_skimmers(column)
## S3 method for class 'AsIs'
get_skimmers(column)
## S3 method for class 'haven_labelled'
get_skimmers(column)
modify_default_skimmers(skim_type, new_skim_type = NULL, new_funs = list())
Arguments
column |
An atomic vector or list. A column from a data frame. |
skim_type |
A character scalar. The class of the object with default skimmers. |
new_skim_type |
The type to assign to the looked up set of skimmers. |
new_funs |
Replacement functions for those in |
Details
When creating your own set of skimming functions, call sfl()
within a
get_skimmers()
method for your particular type. Your call to sfl()
should
also provide a matching class in the skim_type
argument. Otherwise, it
will not be possible to dynamically reassign your default functions when
working interactively.
Call get_default_skimmers()
to see the functions for each type of summary
function currently supported. Call get_default_skimmer_names()
to just see
the names of these functions. Use modify_default_skimmers()
for a method
for changing the skim_type
or functions for a default sfl
. This is useful
for creating new default sfl
's.
Value
A skim_function_list
object.
Methods (by class)
-
get_skimmers(default)
: The default method for skimming data. Only used when a column's data type doesn't match currently installed types. Call get_default_skimmer_names to see these defaults. -
get_skimmers(numeric)
: Summary functions for numeric columns, covering bothdouble()
andinteger()
classes:mean()
,sd()
,quantile()
andinline_hist()
. -
get_skimmers(factor)
: Summary functions for factor columns:is.ordered()
,n_unique()
andtop_counts()
. -
get_skimmers(character)
: Summary functions for character columns. Also, the default for unknown columns:min_char()
,max_char()
,n_empty()
,n_unique()
andn_whitespace()
. -
get_skimmers(logical)
: Summary functions for logical/ boolean columns:mean()
, which produces rates for each value, andtop_counts()
. -
get_skimmers(complex)
: Summary functions for complex columns:mean()
. -
get_skimmers(Date)
: Summary functions forDate
columns:min()
,max()
,median()
andn_unique()
. -
get_skimmers(POSIXct)
: Summary functions forPOSIXct
columns:min()
,max()
,median()
andn_unique()
. -
get_skimmers(difftime)
: Summary functions fordifftime
columns:min()
,max()
,median()
andn_unique()
. -
get_skimmers(Timespan)
: Summary functions forTimespan
columns:min()
,max()
,median()
andn_unique()
. -
get_skimmers(ts)
: Summary functions forts
columns:min()
,max()
,median()
andn_unique()
. -
get_skimmers(list)
: Summary functions forlist
columns:n_unique()
,list_min_length()
andlist_max_length()
. -
get_skimmers(AsIs)
: Summary functions forAsIs
columns:n_unique()
,list_min_length()
andlist_max_length()
. -
get_skimmers(haven_labelled)
: Summary functions forhaven_labelled
columns. Finds the appropriate skimmers for the underlying data in the vector.
See Also
Examples
# Defining default skimming functions for a new class, `my_class`.
# Note that the class argument is required for dynamic reassignment.
get_skimmers.my_class <- function(column) {
sfl(
skim_type = "my_class",
mean,
sd
)
}
# Integer and double columns are both "numeric" and are treated the same
# by default. To switch this behavior in another package, add a method.
get_skimmers.integer <- function(column) {
sfl(
skim_type = "integer",
p50 = ~ stats::quantile(
.,
probs = .50, na.rm = TRUE, names = FALSE, type = 1
)
)
}
x <- mtcars[c("gear", "carb")]
class(x$carb) <- "integer"
skim(x)
## Not run:
# In a package, to revert to the V1 behavior of skimming separately with the
# same functions, assign the numeric `get_skimmers`.
get_skimmers.integer <- skimr::get_skimmers.numeric
# Or, in a local session, use `skim_with` to create a different `skim`.
new_skim <- skim_with(integer = skimr::get_skimmers.numeric())
# To apply a set of skimmers from an old type to a new type
get_skimmers.new_type <- function(column) {
modify_default_skimmers("old_type", new_skim_type = "new_type")
}
## End(Not run)
Provide a default printing method for knitr.
Description
Instead of standard R output, knitr
and RMarkdown
documents will have
formatted knitr::kable()
output on return. You can disable this by setting
the chunk option render = normal_print
.
Usage
## S3 method for class 'skim_df'
knit_print(x, options = NULL, ...)
## S3 method for class 'skim_list'
knit_print(x, options = NULL, ...)
## S3 method for class 'one_skim_df'
knit_print(x, options = NULL, ...)
## S3 method for class 'summary_skim_df'
knit_print(x, options = NULL, ...)
Arguments
x |
An R object to be printed |
options |
Options passed into the print function. |
... |
Additional arguments passed to the S3 method. Currently ignored,
except two optional arguments |
Details
The summary statistics for the original data frame can be disabled by setting
the knitr
chunk option skimr_include_summary = FALSE
. See
knitr::opts_chunk for more information. You can change the number of digits
shown in the printed table with the skimr_digits
chunk option.
Alternatively, you can call collapse()
or yank()
to get the particular
skim_df
objects and format them however you like. One warning though.
Because histograms contain unicode characters, they can have unexpected
print results, as R as varying levels of unicode support. This affects
Windows users most commonly. Call vignette("Using_fonts")
for more details.
Value
A knit_asis
object. Which is used by knitr
when rendered.
Methods (by class)
-
knit_print(skim_df)
: Defaultknitr
print forskim_df
objects. -
knit_print(skim_list)
: Defaultknitr
print for askim_list
. -
knit_print(one_skim_df)
: Defaultknitr
print within a partitionedskim_df
. -
knit_print(summary_skim_df)
: Defaultknitr
print forskim_df
summaries.
See Also
Mutate a skim_df
Description
dplyr::mutate()
currently drops attributes, but we need to keep them around
for other skim behaviors. Otherwise the behavior is exactly the same. For
more information, see https://github.com/tidyverse/dplyr/issues/3429.
Usage
## S3 method for class 'skim_df'
mutate(.data, ...)
Arguments
.data |
A |
... |
Name-value pairs of expressions, each with length 1 or the same
length as the number of rows in the group, if using The arguments in |
Value
A skim_df
object, which also inherits the class(es) of the input
data. In many ways, the object behaves like a tibble::tibble()
.
See Also
dplyr::mutate()
for the function's expected behavior.
Separate a big skim_df
into smaller data frames, by type.
Description
The data frames produced by skim()
are wide and sparse, filled with
columns that are mostly NA
. For that reason, it can be convenient to
work with "by type" subsets of the original data frame. These smaller
subsets have their NA
columns removed.
Usage
partition(data)
bind(data)
yank(data, skim_type)
Arguments
data |
A |
skim_type |
A character scalar. The subtable to extract from a
|
Details
partition()
creates a list of smaller skim_df
data frames. Each entry
in the list is a data type from the original skim_df
. The inverse of
partition()
is bind()
, which takes the list and produces the original
skim_df
. While partition()
keeps all of the subtables as list entries,
yank()
gives you a single subtable for a data type.
Value
A skim_list
of skim_df
's, by type.
Functions
-
bind()
: The inverse of apartition()
. Rebuild the originalskim_df
. -
yank()
: Extract a subtable from askim_df
with a particular type.
Examples
# Create a wide skimmed data frame (a skim_df)
skimmed <- skim(iris)
# Separate into a list of subtables by type
separate <- partition(skimmed)
# Put back together
identical(bind(separate), skimmed)
# > TRUE
# Alternatively, get the subtable of a particular type
yank(skimmed, "factor")
Print skim
objects
Description
skimr
has custom print methods for all supported objects. Default printing
methods for knitr
/ rmarkdown
documents is also provided.
Usage
## S3 method for class 'skim_df'
print(
x,
include_summary = TRUE,
n = Inf,
width = Inf,
summary_rule_width = getOption("skimr_summary_rule_width", default = 40),
...
)
## S3 method for class 'skim_list'
print(x, n = Inf, width = Inf, ...)
## S3 method for class 'summary_skim_df'
print(x, .summary_rule_width = 40, ...)
Arguments
x |
Object to format or print. |
include_summary |
Whether a summary of the data frame should be printed |
n |
Number of rows to show. If |
width |
Width of text output to generate. This defaults to |
summary_rule_width |
Width of Data Summary cli rule, defaults to 40. |
... |
Passed on to |
.summary_rule_width |
the width for the main rule above the summary. |
Methods (by class)
-
print(skim_df)
: Print a skimmed data frame (skim_df
fromskim()
). -
print(skim_list)
: Print askim_list
, a list ofskim_df
objects. -
print(summary_skim_df)
: Print method for asummary_skim_df
object.
Printing options
For better or for worse, skimr
often produces more output than can fit in
the standard R console. Fortunately, most modern environments like RStudio
and Jupyter support more than 80 character outputs. Call
options(width = 90)
to get a better experience with skimr
.
The print methods in skimr
wrap those in the tibble
package. You can control printing behavior using the same global options.
Behavior in dplyr
pipelines
Printing a skim_df
requires specific columns that might be dropped when
using dplyr::select()
or dplyr::summarize()
on a skim_df
. In those
cases, this method falls back to tibble::print.tbl()
.
Options for controlling print behavior
You can control the width rule line for the printed subtables with an option:
skimr_table_header_width
.
See Also
tibble::trunc_mat()
For a list of global options for customizing
print formatting. crayon::has_color()
for the variety of issues that
affect tibble's color support.
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- magrittr
- tidyselect
contains
,ends_with
,everything
,matches
,num_range
,one_of
,starts_with
Skimr printing within Jupyter notebooks
Description
This reproduces printed results in the console. By default Jupyter kernels
render the final object in the cell. We want the version printed by
skimr
instead of the data that it contains.
Usage
## S3 method for class 'skim_df'
repr_text(obj, ...)
## S3 method for class 'skim_list'
repr_text(obj, ...)
## S3 method for class 'one_skim_df'
repr_text(obj, ...)
Arguments
obj |
The object to print and then return the output. |
... |
ignored. |
Value
None. invisible(NULL)
.
Create a skimr function list
Description
This constructor is used to create a named list of functions. It also you
also pass NULL
to identify a skimming function that you wish to remove.
Only functions that return a single value, working with dplyr::summarize()
,
can be used within sfl
.
Usage
sfl(..., skim_type = "")
Arguments
... |
Inherited from dplyr::data_masking() for dplyr version 1 or later or dplyr::funs() for older versions of dplyr. A list of functions specified by:
|
skim_type |
A character scalar. This is used to match locally-provided
skimmers with defaults. See |
Details
sfl()
will automatically generate callables and names for a variety of
inputs, including functions, formulas and strings. Nonetheless, we recommend
providing names when reasonable to get better skim()
output.
Value
A skimr_function_list
, which contains a list of fun_calls
,
returned by dplyr::funs()
and a list of skimming functions to drop.
See Also
dplyr::funs()
, skim_with()
and get_skimmers()
.
Examples
# sfl's can take a variety of input formats and will generate names
# if not provided.
sfl(mad, "var", ~ length(.)^2)
# But these can generate unpredictable names in your output.
# Better to set your own names.
sfl(mad = mad, variance = "var", length_sq = ~ length(.)^2)
# sfl's can remove individual skimmers from defaults by passing NULL.
sfl(hist = NULL)
# When working interactively, you don't need to set a type.
# But you should when defining new defaults with `get_skimmers()`.
get_skimmers.my_new_class <- function(column) {
sfl(n_missing, skim_type = "my_new_class")
}
Skim a data frame, getting useful summary statistics
Description
skim()
is an alternative to summary()
, quickly providing a broad
overview of a data frame. It handles data of all types, dispatching a
different set of summary functions based on the types of columns in the data
frame.
Usage
skim(data, ..., .data_name = NULL)
skim_tee(data, ..., skim_fun = skim)
skim_without_charts(data, ..., .data_name = NULL)
Arguments
data |
A tibble, or an object that can be coerced into a tibble. |
... |
Columns to select for skimming. When none are provided, the default is to skim all columns. |
.data_name |
The name to use for the data. Defaults to the same as data. |
skim_fun |
The skim function used. |
skim |
The skimming function to use in |
Details
Each call produces a skim_df
, which is a fundamentally a tibble with a
special print method. One unusual feature of this data frame is pseudo-
namespace for columns. skim()
computes statistics by data type, and it
stores them in the data frame as <type>.<statistic>
. These types are
stripped when printing the results. The "base" skimmers (n_missing
and
complete_rate
) are the only columns that don't follow this behavior.
See skim_with()
for more details on customizing skim()
and
get_default_skimmers()
for a list of default functions.
If you just want to see the printed output, call skim_tee()
instead.
This function returns the original data. skim_tee()
uses the default
skim()
, but you can replace it with the skim
argument.
The data frame produced by skim
is wide and sparse. To avoid type coercion
skimr
uses a type namespace for all summary statistics. Columns for numeric
summary statistics all begin numeric
; for factor summary statistics
begin factor
; and so on.
See partition()
and yank()
for methods for transforming this wide data
frame. The first function splits it into a list, with each entry
corresponding to a data type. The latter pulls a single subtable for a
particular type from the skim_df
.
skim()
is designed to operate in pipes and to generally play nicely with
other tidyverse
functions. This means that you can use tidyselect
helpers
within skim
to select or drop specific columns for summary. You can also
further work with a skim_df
using dplyr
functions in a pipeline.
Value
A skim_df
object, which also inherits the class(es) of the input
data. In many ways, the object behaves like a tibble::tibble()
.
Customizing skim
skim()
is an intentionally simple function, with minimal arguments like
summary()
. Nonetheless, this package provides two broad approaches to
how you can customize skim()
's behavior. You can customize the functions
that are called to produce summary statistics with skim_with()
.
Unicode rendering
If the rendered examples show unencoded values such as <U+2587>
you will
need to change your locale to allow proper rendering. Please review the
Using Skimr vignette for more information
(vignette("Using_skimr", package = "skimr")
).
Otherwise, we export skim_without_charts()
to produce summaries without the
spark graphs. These are the source of the unicode dependency.
Examples
skim(iris)
# Use tidyselect
skim(iris, Species)
skim(iris, starts_with("Sepal"))
skim(iris, where(is.numeric))
# Skim also works groupwise
iris %>%
dplyr::group_by(Species) %>%
skim()
# Which five numeric columns have the greatest mean value?
# Look in the `numeric.mean` column.
iris %>%
skim() %>%
dplyr::select(numeric.mean) %>%
dplyr::top_n(5)
# Which of my columns have missing values? Use the base skimmer n_missing.
iris %>%
skim() %>%
dplyr::filter(n_missing > 0)
# Use skim_tee to view the skim results and
# continue using the original data.
chickwts %>%
skim_tee() %>%
dplyr::filter(feed == "sunflower")
# Produce a summary without spark graphs
iris %>%
skim_without_charts()
Functions for accessing skim_df attributes
Description
These functions simplify access to attributes contained within a skim_df
.
While all attributes are read-only, being able to extract this information
is useful for different analyses. These functions should always be preferred
over calling base R's attribute functions.
Usage
data_rows(object)
data_cols(object)
df_name(object)
dt_key(object)
group_names(object)
base_skimmers(object)
skimmers_used(object)
Arguments
object |
A |
Value
Data contained within the requested skimr
attribute.
Functions
-
data_rows()
: Get the number of rows in the skimmed data frame. -
data_cols()
: Get the number of columns in the skimmed data frame. -
df_name()
: Get the name of the skimmed data frame. This is only available in contexts where the name can be looked up. This is often not the case within a pipeline. -
dt_key()
: Get the key of the skimmed data.table. This is only available in contexts wheredata
is of classdata.table
. -
group_names()
: Get the names of the groups in the original data frame. Only available if the data was grouped. Otherwise,NULL
. -
base_skimmers()
: Get the names of the base skimming functions used. -
skimmers_used()
: Get the names of the skimming functions used, separated by data type.
Test if an object is compatible with skimr
Description
Objects within skimr
are identified by a class, but they require additional
attributes and data columns for all operations to succeed. These checks help
ensure this. While they have some application externally, they are mostly
used internally.
Usage
has_type_column(object)
has_variable_column(object)
has_skimr_attributes(object)
has_skim_type_attribute(object)
has_skimmers(object)
is_data_frame(object)
is_skim_df(object)
is_one_skim_df(object)
is_skim_list(object)
could_be_skim_df(object)
assert_is_skim_df(object)
assert_is_skim_list(object)
assert_is_one_skim_df(object)
Arguments
object |
Any |
Details
Most notably, a skim_df
has columns skim_type
and skim_variable
. And
has the following special attributes
-
data_rows
: n rows in the original data -
data_cols
: original number of columns -
df_name
: name of the original data frame -
dt_key
: name of the key if original is a data.table -
groups
: if there were group variables -
base_skimmers
: names of functions applied to all skim types -
skimmers_used
: names of functions used to skim each type
The functions in these checks work like all.equal()
. The return TRUE
if
the check passes, or otherwise notifies why the check failed. This makes them
more useful when throwing errors.
Functions
-
has_type_column()
: Does the object have theskim_type
column? -
has_variable_column()
: Does the object have theskim_variable
column? -
has_skimr_attributes()
: Does the object have the appropriateskimr
attributes? -
has_skim_type_attribute()
: Does the object have askim_type
attribute? This makes it aone_skim_df
. -
has_skimmers()
: Does the object have skimmers? -
is_data_frame()
: Is the object a data frame? -
is_skim_df()
: Is the object askim_df
? -
is_one_skim_df()
: Is the object aone_skim_df
? This is similar to askim_df
, but does not have thetype
column. That is stored as an attribute instead. -
is_skim_list()
: Is the object askim_list
? -
could_be_skim_df()
: Is this a data frame withskim_variable
andskim_type
columns? -
assert_is_skim_df()
: Stop if the object is not askim_df
. -
assert_is_skim_list()
: Stop if the object is not askim_list
. -
assert_is_one_skim_df()
: Stop if the object is not aone_skim_df
.
Set or add the summary functions for a particular type of data
Description
While skim is designed around having an opinionated set of defaults, you can use this function to change the summary statistics that it returns.
Usage
skim_with(
...,
base = sfl(n_missing = n_missing, complete_rate = complete_rate),
append = TRUE
)
Arguments
... |
One or more ( |
base |
An |
append |
Whether the provided options should be in addition to the
defaults already in |
Details
skim_with()
is a closure: a function that returns a new function. This
lets you have several skimming functions in a single R session, but it
also means that you need to assign the return of skim_with()
before
you can use it.
You assign values within skim_with
by using the sfl()
helper (skimr
function list). This helper behaves mostly like dplyr::funs()
, but lets
you also identify which skimming functions you want to remove, by setting
them to NULL
. Assign an sfl
to each column type that you wish to modify.
Functions that summarize all data types, and always return the same type
of value, can be assigned to the base
argument. The default base skimmers
compute the number of missing values n_missing()
and the rate of values
being complete, i.e. not missing, complete_rate()
.
When append = TRUE
and local skimmers have names matching the names of
entries in the default skim_function_list
, the values in the default list
are overwritten. Similarly, if NULL
values are passed within sfl()
, these
default skimmers are dropped. Otherwise, if append = FALSE
, only the
locally-provided skimming functions are used.
Note that append
only applies to the typed
skimmers (i.e. non-base).
See get_default_skimmer_names()
for a list of defaults.
Value
A new skim()
function. This is callable. See skim()
for more
details.
Examples
# Use new functions for numeric functions. If you don't provide a name,
# one will be automatically generated.
my_skim <- skim_with(numeric = sfl(median, mad), append = FALSE)
my_skim(faithful)
# If you want to remove a particular skimmer, set it to NULL
# This removes the inline histogram
my_skim <- skim_with(numeric = sfl(hist = NULL))
my_skim(faithful)
# This works with multiple skimmers. Just match names to overwrite
my_skim <- skim_with(numeric = sfl(iqr = IQR, p25 = NULL, p75 = NULL))
my_skim(faithful)
# Alternatively, set `append = FALSE` to replace the skimmers of a type.
my_skim <- skim_with(numeric = sfl(mean = mean, sd = sd), append = FALSE)
# Skimmers are unary functions. Partially apply arguments during assigment.
# For example, you might want to remove NA values.
my_skim <- skim_with(numeric = sfl(iqr = ~ IQR(., na.rm = TRUE)))
# Set multiple types of skimmers simultaneously.
my_skim <- skim_with(numeric = sfl(mean), character = sfl(length))
# Or pass the same as a list, unquoting the input.
my_skimmers <- list(numeric = sfl(mean), character = sfl(length))
my_skim <- skim_with(!!!my_skimmers)
# Use the v1 base skimmers instead.
my_skim <- skim_with(base = sfl(
missing = n_missing,
complete = n_complete,
n = length
))
# Remove the base skimmers entirely
my_skim <- skim_with(base = NULL)
Functions for working with the vctrs package
Description
These make it clear that we need to use the tibble behavior when joining,
concatenating or casting skim_df
objects. For a better discussion, on
why this is important and how these functions work, see:
https://vctrs.r-lib.org/reference/howto-faq-coercion-data-frame.html.
Usage
## S3 method for class 'skim_df.skim_df'
vec_ptype2(x, y, ...)
## S3 method for class 'skim_df.tbl_df'
vec_ptype2(x, y, ...)
## S3 method for class 'tbl_df.skim_df'
vec_ptype2(x, y, ...)
## S3 method for class 'skim_df.skim_df'
vec_cast(x, to, ...)
## S3 method for class 'skim_df.tbl_df'
vec_cast(x, to, ...)
## S3 method for class 'tbl_df.skim_df'
vec_cast(x, to, ...)
Details
vec_ptype2.*
handles finding common prototypes between skim_df
and
similar objects. vec_cast.*
handles casting between objects. Note that
as of dplyr 1.0.2
, dplyr::bind_rows()
does not full support combining
attributes and vctrs::vec_rbind()
is preferred instead.
Summary statistic functions
Description
skimr
provides extensions to a variety of functions with R's stats package
to simplify creating summaries of data. All functions are vectorized over the
first argument. Additional arguments should be set in the sfl()
that sets
the appropriate skimmers for a data type. You can use these, along with other
vectorized R functions, for creating custom sets of summary functions for
a given data type.
Usage
n_missing(x)
n_complete(x)
complete_rate(x)
n_whitespace(x)
sorted_count(x)
top_counts(x, max_char = 3, max_levels = 4)
inline_hist(x, n_bins = 8)
n_empty(x)
min_char(x)
max_char(x)
n_unique(x)
ts_start(x)
ts_end(x)
inline_linegraph(x, length.out = 16)
list_lengths_min(x)
list_lengths_median(x)
list_lengths_max(x)
list_min_length(x)
list_max_length(x)
Arguments
x |
A vector |
max_char |
In |
max_levels |
The maximum number of levels to be displayed. |
n_bins |
In |
length.out |
In |
Functions
-
n_missing()
: Calculate the sum ofNA
andNULL
(i.e. missing) values. -
n_complete()
: Calculate the sum of notNA
andNULL
(i.e. missing) values. -
complete_rate()
: Calculate complete values; complete values are not missing. -
n_whitespace()
: Calculate the number of rows containing only whitespace values using s+ regex. -
sorted_count()
: Create a contingency table and arrange its levels in descending order. In case of ties, the ordering of results is alphabetical and depends upon the locale.NA
is treated as a ordinary value for sorting. -
top_counts()
: Compute and collapse a contingency table into a single character scalar. Wrapssorted_count()
. -
inline_hist()
: Generate inline histogram for numeric variables. The character length of the histogram is controlled by the formatting options for character vectors. -
n_empty()
: Calculate the number of blank values in a character vector. A "blank" is equal to "". -
min_char()
: Calculate the minimum number of characters within a character vector. -
max_char()
: Calculate the maximum number of characters within a character vector. -
n_unique()
: Calculate the number of unique elements but removeNA
. -
ts_start()
: Get the start for a time series without the frequency. -
ts_end()
: Get the finish for a time series without the frequency. -
inline_linegraph()
: Generate inline line graph for time series variables. The character length of the line graph is controlled by the formatting options for character vectors. Based on the function in the pillar package. -
list_lengths_min()
: Get the length of the shortest list in a vector of lists. -
list_lengths_median()
: Get the median length of the lists. -
list_lengths_max()
: Get the maximum length of the lists. -
list_min_length()
: Get the length of the shortest list in a vector of lists. -
list_max_length()
: Get the length of the longest list in a vector of lists.
See Also
get_skimmers()
for customizing the functions called by skim()
.
Summary function for skim_df
Description
This is a method of the generic function summary()
.
Usage
## S3 method for class 'skim_df'
summary(object, ...)
Arguments
object |
a skim dataframe. |
... |
Additional arguments affecting the summary produced. Not used. |
Value
A summary of the skim data frame.
Examples
a <- skim(mtcars)
summary(a)
Create "long" skim output
Description
Skim results returned as a tidy long data frame with four columns: variable, type, stat and formatted.
Usage
to_long(.data, ..., skim_fun = skim)
## Default S3 method:
to_long(.data, ..., skim_fun = skim)
## S3 method for class 'skim_df'
to_long(.data, ..., skim_fun = skim)
Arguments
.data |
A data frame or an object that can be coerced into a data frame. |
... |
Columns to select for skimming. When none are provided, the default is to skim all columns. |
skim_fun |
The skim function used. |
Value
A tibble
Methods (by class)
-
to_long(default)
: Skim a data frame and convert the results to a long data frame. -
to_long(skim_df)
: Transform a skim_df to a long data frame.
Examples
to_long(iris)
to_long(skim(iris))