Title: | Mode Estimation |
Version: | 0.2.1 |
Description: | Determines single or multiple modes (most frequent values). Checks if missing values make this impossible, and returns 'NA' in this case. Dependency-free source code. See Franzese and Iuliano (2019) <doi:10.1016/B978-0-12-809633-8.20354-3>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Suggests: | devtools, knitr, rmarkdown, stats, testthat (≥ 3.0.0), utils |
Config/testthat/edition: | 3 |
URL: | https://github.com/lhdjung/moder, https://lhdjung.github.io/moder/ |
BugReports: | https://github.com/lhdjung/moder/issues |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2023-05-05 16:59:41 UTC; lukasjung |
Author: | Lukas Jung [aut, cre] |
Maintainer: | Lukas Jung <jung-lukas@gmx.net> |
Repository: | CRAN |
Date/Publication: | 2023-05-07 10:30:02 UTC |
Possible sets of modes
Description
mode_possible_min()
and mode_possible_max()
determine the
minimal and maximal sets of modes from among known modes, given the number
of missing values.
Usage
mode_possible_min(x, multiple = FALSE)
mode_possible_max(x, multiple = FALSE)
Arguments
x |
A vector to search for its possible modes. |
multiple |
Boolean. If |
Value
By default, a vector with the minimal or maximal possible sets of
modes (values tied for most frequent) in x
. If the functions can't
determine these possible modes because of missing values, they return
NA
by default (multiple = FALSE
).
See Also
mode_count_range()
for the minimal and maximal numbers of
possible modes. They can always be determined, even if the present
functions return NA
.
Examples
# "a" is guaranteed to be a mode,
# "b" might also be one, but
# "c" is impossible:
mode_possible_min(c("a", "a", "a", "b", "b", "c", NA))
mode_possible_max(c("a", "a", "a", "b", "b", "c", NA))
# Only `8` can possibly be the mode
# because, even if `NA` is `7`, it's
# still less frequent than `8`:
mode_possible_min(c(7, 7, 8, 8, 8, 8, NA))
mode_possible_max(c(7, 7, 8, 8, 8, 8, NA))
# No clear minimal or maximal set
# of modes because `NA` may tip
# the balance between `1` and `2`
# towards a single mode:
mode_possible_min(c(1, 1, 2, 2, 3, 4, 5, NA))
mode_possible_max(c(1, 1, 2, 2, 3, 4, 5, NA))
# With `multiple = TRUE`, the functions
# return all values that might be part of
# the min / max sets of modes; not these
# sets themselves:
mode_possible_min(c(1, 1, 2, 2, 3, 4, 5, NA), multiple = TRUE)
mode_possible_max(c(1, 1, 2, 2, 3, 4, 5, NA), multiple = TRUE)
All modes
Description
mode_all()
returns the set of all modes in a vector.
Usage
mode_all(x, na.rm = FALSE)
Arguments
x |
A vector to search for its modes. |
na.rm |
Boolean. Should missing values in |
Value
A vector with all modes (values tied for most frequent) in x
. If
the modes can't be determined because of missing values,
returns NA
instead.
See Also
-
mode_first()
for the first-appearing mode. -
mode_single()
for the only mode, orNA
if there are more.
Examples
# Both `3` and `4` are the modes:
mode_all(c(1, 2, 3, 3, 4, 4))
# Only `8` is:
mode_all(c(8, 8, 9))
# Can't determine the modes here --
# `9` might be another mode:
mode_all(c(8, 8, 9, NA))
# Either `1` or `2` could be a
# single mode, depending on `NA`:
mode_all(c(1, 1, 2, 2, NA))
# `1` is the most frequent value,
# no matter what `NA` stands for:
mode_all(c(1, 1, 1, 2, NA))
# Ignore `NA`s with `na.rm = TRUE`
# (there should be good reasons for this!):
mode_all(c(8, 8, 9, NA), na.rm = TRUE)
mode_all(c(1, 1, 2, 2, NA), na.rm = TRUE)
Modal count
Description
mode_count()
counts the modes in a vector. Thin wrapper around
mode_all()
.
Usage
mode_count(x, na.rm = FALSE, max_unique = NULL)
Arguments
x |
A vector to search for its modes. |
na.rm |
Boolean. Should missing values in |
max_unique |
Numeric or string. If the maximum number of unique values
in |
Value
Integer. Number of modes (values tied for most frequent) in x
. If
the modes can't be determined because of missing values,
returns NA
instead.
Examples
# There are two modes, `3` and `4`:
mode_count(c(1, 2, 3, 3, 4, 4))
# Only one mode, `8`:
mode_count(c(8, 8, 9))
# Can't determine the number of modes
# here -- `9` might be another mode:
mode_count(c(8, 8, 9, NA))
# Either `1` or `2` could be a
# single mode, depending on `NA`:
mode_count(c(1, 1, 2, 2, NA))
# `1` is the most frequent value,
# no matter what `NA` stands for:
mode_count(c(1, 1, 1, 2, NA))
# Ignore `NA`s with `na.rm = TRUE`
# (there should be good reasons for this!):
mode_count(c(8, 8, 9, NA), na.rm = TRUE)
mode_count(c(1, 1, 2, 2, NA), na.rm = TRUE)
Modal count range
Description
mode_count_range()
determines the minimal and maximal number
of modes given the number of missing values.
Usage
mode_count_range(x, max_unique = NULL)
Arguments
x |
A vector to search for its possible modes. |
max_unique |
Numeric or string. If the maximum number of unique values
in |
Details
If x
is a factor, max_unique
should be "known"
or there is a
warning. This is because a factor's levels are supposed to include all of
its possible values.
Value
Integer (length 2). Minimal and maximal number of modes (values tied
for most frequent) in x
.
Examples
# If `NA` is `7` or `8`, that number is
# the only mode; otherwise, both numbers
# are modes:
mode_count_range(c(7, 7, 8, 8, NA))
# Same result here -- `7` is the only mode
# unless `NA` is secretly `8`, in which case
# there are two modes:
mode_count_range(c(7, 7, 7, 8, 8, NA))
# But now, there is now way for `8` to be
# as frequent as `7`:
mode_count_range(c(7, 7, 7, 7, 8, 8, NA))
# The `NA`s might form a new mode here
# if they are both, e.g., `9`:
mode_count_range(c(7, 7, 8, 8, NA, NA))
# However, if there can be no values beyond
# those already known -- `7` and `8` --
# the `NA`s can't form a new mode.
# Specify this with `max_unique = "known"`:
mode_count_range(c(7, 7, 8, 8, NA, NA), max_unique = "known")
The first-appearing mode
Description
mode_first()
returns the mode that appears first in a vector, i.e., before
any other modes.
Usage
mode_first(x, na.rm = FALSE, accept = FALSE)
Arguments
x |
A vector to search for its first mode. |
na.rm |
Boolean. Should missing values in |
accept |
Boolean. Should the first-appearing value known to be a mode be
accepted? If |
Value
The first mode (most frequent value) in x
. If it can't be
determined because of missing values, returns NA
instead.
See Also
-
mode_all()
for the full set of modes. -
mode_single()
for the only mode, orNA
if there are more.
Examples
# `2` is most frequent:
mode_first(c(1, 2, 2, 2, 3))
# Can't determine the first mode --
# it might be `1` or `2` depending
# on the true value behind `NA:
mode_first(c(1, 1, 2, 2, NA))
# Ignore `NA`s with `na.rm = TRUE`
# (there should be good reasons for this!):
mode_first(c(1, 1, 2, 2, NA), na.rm = TRUE)
# `1` is the most frequent value,
# no matter what `NA` stands for:
mode_first(c(1, 1, 1, 2, NA))
# By default, the function insists on
# the first mode, so it won't accept the
# first value *known* to be a mode if an
# earlier value might be a mode, too:
mode_first(c(1, 2, 2, NA))
# You may accept the first-known mode:
mode_first(c(1, 2, 2, NA), accept = TRUE)
Modal frequency
Description
Call mode_frequency()
to get the number of times that a
vector's mode appears in the vector.
See mode_frequency_range()
for bounds on an unknown frequency.
Usage
mode_frequency(x, na.rm = FALSE, max_unique = NULL)
Arguments
x |
A vector to check for its modal frequency. |
na.rm |
Boolean. Should missing values in |
max_unique |
Numeric or string. If the maximum number of unique values
in |
Details
By default (na.rm = FALSE
), the function returns NA
if any
values are missing. That is because missings make the frequency uncertain
even if the mode is known: any missing value may or may not be the mode,
and hence count towards the modal frequency.
Value
Integer (length 1) or NA
.
See Also
mode_first()
, which the function wraps.
Examples
# The mode, `9`, appears three times:
mode_frequency(c(7, 8, 8, 9, 9, 9))
# With missing values, the frequency
# is unknown, even if the mode isn't:
mode_frequency(c(1, 1, NA))
# You can ignore this problem and
# determine the frequency among known values
# (there should be good reasons for this!):
mode_frequency(c(1, 1, NA), na.rm = TRUE)
Modal frequency range
Description
mode_frequency_range()
determines the minimum and maximum
number of times that a vector's mode appears in the vector. The minimum
assumes that no NA
s are the mode; the maximum assumes that all NA
s are.
Usage
mode_frequency_range(x, max_unique = NULL)
Arguments
x |
A vector to check for its modal frequency. |
max_unique |
Numeric or string. If the maximum number of unique values
in |
Details
If there are no NA
s in x
, the two return values are identical.
If all x
values are NA
, the return values are 1
(no two x
values
are the same) and the total number of values (all x
values are the same).
Value
Integer (length 2).
See Also
mode_frequency()
, for the precise frequency (or NA
if it can't
be determined).
Examples
# The mode is `7`. It appears four or
# five times because the `NA` might
# also be a `7`:
mode_frequency_range(c(7, 7, 7, 7, 8, 8, NA))
# All of `"c"`, `"d"`, and `"e"` are the modes,
# and each of them appears twice:
mode_frequency_range(c("a", "b", "c", "c", "d", "d", "e", "e"))
Is the mode trivial?
Description
mode_is_trivial()
checks whether all values in a given vector
are equally frequent. The mode is not too informative in such cases.
Usage
mode_is_trivial(x, na.rm = FALSE, max_unique = NULL)
Arguments
x |
A vector to search for its modes. |
na.rm |
Boolean. Should missing values in |
max_unique |
Numeric or string. If the maximum number of unique values
in |
Details
The function returns TRUE
whenever x
has length < 3 because no
value is more frequent than another one. Otherwise, it returns NA
in
these cases:
Some
x
values are missing and all known values are equal. Thus, it is unknown whether there is a value with a different frequency.All known values are modes if the
NA
s "fill up" the non-modal values exactly, i.e., without anyNA
s remaining.Some
NA
s remain after "filling up" the non-modal values withNA
s (so that they are hypothetically modes), and the number of remainingNA
s is divisible by the number of unique known values.There are so many missing values that they might form mode-sized groups of values that are not among the known values, and the number of
NA
s is divisible by the modal frequency so that all (partly hypothetical) values might be equally frequent. You can limit the number of such hypothetical values by specifyingmax_unique
. The function might then returnFALSE
instead ofNA
.
Value
Boolean (length 1).
Examples
# The mode is trivial if
# all values are equal...
mode_is_trivial(c(1, 1, 1))
# ...and even if all unique
# values are equally frequent:
mode_is_trivial(c(1, 1, 2, 2))
# It's also trivial if
# all values are different:
mode_is_trivial(c(1, 2, 3))
# Here, the mode is nontrivial
# because `1` is more frequent than `2`:
mode_is_trivial(c(1, 1, 2))
# Two of the `NA`s might be `8`s, and
# the other three might represent a value
# different from both `7` and `8`. Thus,
# it's possible that all three distinct
# values are equally frequent:
mode_is_trivial(c(7, 7, 7, 8, rep(NA, 5)))
# The same is not true if all values,
# even the missing ones, must represent
# one of the known values:
mode_is_trivial(c(7, 7, 7, 8, rep(NA, 5)), max_unique = "known")
The single mode
Description
mode_single()
returns the only mode in a vector. If there are multiple
modes, it returns NA
by default.
Usage
mode_single(x, na.rm = FALSE, accept = FALSE, multiple = "NA")
Arguments
x |
A vector to search for its mode. |
na.rm |
Boolean. Should missing values in |
accept |
Boolean. Should the minimum set of modes be accepted to check
for a single mode? If |
multiple |
String or integer (length 1), or a function. What to do if
|
Details
If accept
is FALSE
(the default), the set of modes is obtained
via mode_all()
instead of mode_possible_min()
. Set it to TRUE
to
avoid returning NA
when some, though not all modes are known. The purpose
of the default is to insist on a single mode.
If x
is a string vector and multiple
is "min"
or "max"
, the mode is
selected lexically, just like min(letters)
returns "a"
. The "mean"
and "median"
options return NA
with a warning. For factors, "min"
,
"max"
, and "median"
are errors, but "mean"
returns NA
with a
warning. These are inconsistencies in base R.
The multiple
options "first"
and "last"
always select the mode that
appears first or last in x
. Index numbers, like multiple = 2
, allow you
to select more flexibly. If multiple
is a function, its output must be
length 1.
Value
The only mode (most frequent value) in x
. If it can't be determined
because of missing values, NA
is returned instead. By default, NA
is
also returned if there are multiple modes (multiple = "NA"
).
See Also
-
mode_first()
for the first-appearing mode. -
mode_all()
for the complete set of modes. -
mode_possible_min()
for the minimal set of modes.
Examples
# `8` is the only mode:
mode_single(c(8, 8, 9))
# With more than one mode, the function
# returns `NA`:
mode_single(c(1, 2, 3, 3, 4, 4))
# Can't determine the modes here --
# `9` might be another mode:
mode_single(c(8, 8, 9, NA))
# Accept `8` anyways if it's
# sufficient to just have any mode:
mode_single(c(8, 8, 9, NA), accept = TRUE)
# `1` is the most frequent value,
# no matter what `NA` stands for:
mode_single(c(1, 1, 1, 2, NA))
# Ignore `NA`s with `na.rm = TRUE`
# (there should be good reasons for this!):
mode_single(c(8, 8, 9, NA), na.rm = TRUE)