Type: | Package |
Title: | Statistical Functions for Probability Distributions and Regression |
Version: | 0.2.3 |
Description: | A collection of miscellaneous statistical functions for probability distributions: 'dbern()', 'pbern()', 'qbern()', 'rbern()' for the Bernoulli distribution, and 'distr2name()', 'name2distr()' for distribution names; probability density estimation: 'densityfun()'; most frequent value estimation: 'mfv()', 'mfv1()'; other statistical measures of location: 'cv()' (coefficient of variation), 'midhinge()', 'midrange()', 'trimean()'; construction of histograms: 'histo()', 'find_breaks()'; calculation of the Hellinger distance: 'hellinger()'; use of classical kernels: 'kernelfun()', 'kernel_properties()'; univariate piecewise-constant regression: 'picor()'. |
License: | GPL-3 |
LazyData: | TRUE |
Depends: | R (≥ 3.1.3) |
Imports: | clue, graphics, rpart, stats |
Suggests: | knitr, testthat |
URL: | https://github.com/paulponcet/statip |
BugReports: | https://github.com/paulponcet/statip/issues |
RoxygenNote: | 7.0.0 |
NeedsCompilation: | yes |
Packaged: | 2019-11-17 21:21:11 UTC; YL1101 |
Author: | Paul Poncet [aut, cre], The R Core Team [aut, cph] (C function 'BinDist' copied from package 'stats'), The R Foundation [cph] (C function 'BinDist' copied from package 'stats'), Adrian Baddeley [ctb] (C function 'BinDist' copied from package 'stats') |
Maintainer: | Paul Poncet <paulponcet@yahoo.fr> |
Repository: | CRAN |
Date/Publication: | 2019-11-17 21:40:02 UTC |
Bandwidth calculation
Description
bandwidth
computes the bandwidth to be used in the
densityfun
function.
Usage
bandwidth(x, rule)
Arguments
x |
numeric. The data from which the estimate is to be computed. |
rule |
character. A rule to choose the bandwidth. See |
Value
A numeric value.
Coefficient of variation
Description
Compute the coefficient of variation of a numeric vector x
,
defined as the ratio between the standard deviation and the mean.
Usage
cv(x, na_rm = FALSE, ...)
Arguments
x |
numeric. A numeric vector. |
na_rm |
logical. Should missing values be removed before computing the coefficient of variation? |
... |
Additional arguments to be passed to |
Value
A numeric value, the coefficient of variation.
References
https://en.wikipedia.org/wiki/Coefficient_of_variation.
The Bernoulli distribution
Description
Density, distribution function, quantile function and random generation for the Bernoulli distribution.
Usage
dbern(x, prob, log = FALSE)
qbern(p, prob, lower.tail = TRUE, log.p = FALSE)
pbern(q, prob, lower.tail = TRUE, log.p = FALSE)
rbern(n, prob)
Arguments
x |
numeric. Vector of quantiles. |
prob |
Probability of success on each trial. |
log |
logical. If |
p |
numeric in |
lower.tail |
logical. If |
log.p |
logical. If |
q |
numeric. Vector of quantiles. |
n |
number of observations.
If |
See Also
See the help page of the Binomial
distribution.
Kernel density estimation
Description
Return a function performing kernel density estimation.
The difference between density
and
densityfun
is similar to that between
approx
and approxfun
.
Usage
densityfun(
x,
bw = "nrd0",
adjust = 1,
kernel = "gaussian",
weights = NULL,
window = kernel,
width,
n = 512,
from,
to,
cut = 3,
na.rm = FALSE,
...
)
Arguments
x |
numeric. The data from which the estimate is to be computed. |
bw |
numeric. The smoothing bandwidth to be used.
See the eponymous argument of |
adjust |
numeric. The bandwidth used is actually |
kernel , window |
character. A string giving the smoothing kernel to be used.
Authorized kernels are listed in |
weights |
numeric. A vector of non-negative observation weights,
hence of same length as |
width |
this exists for compatibility with S;
if given, and |
n |
The number of equally spaced points at which the density
is to be estimated.
See the eponymous argument of |
from , to |
The left and right-most points of the grid at which the
density is to be estimated;
the defaults are |
cut |
By default, the values of |
na.rm |
logical. If |
... |
Additional arguments for (non-default) methods. |
Value
A function that can be called to generate a density.
Author(s)
Adapted from the density
function of package stats.
The C code of BinDist
is copied from package stats and authored
by the R Core Team with contributions from Adrian Baddeley.
See Also
density
and approxfun
from package stats.
Examples
x <- rlnorm(1000, 1, 1)
f <- densityfun(x, from = 0)
curve(f(x), xlim = c(0, 20))
Conversion between abbreviated distribution names and proper names
Description
The function distr2name()
converts abbreviated
distribution names to proper distribution names
(e.g. "norm"
becomes "Gaussian"
).
The function name2distr()
does the reciprocal operation.
Usage
distr2name(x)
name2distr(x)
Arguments
x |
character. A vector of abbreviated distribution names or proper distribution names. |
Value
A character vector of the same length as x
.
Elements of x
that are not recognized are kept unchanged
(yet in lowercase).
Examples
distr2name(c("norm", "dnorm", "rhyper", "ppois"))
name2distr(c("Cauchy", "Gaussian", "Generalized Extreme Value"))
Error function
Description
The function erf()
encodes the
error function,
defined as erf(x) = 2 * F(x * sqrt(2)) - 1
, where
F
is the Gaussian distribution function.
Usage
erf(x, ...)
Arguments
x |
numeric. A vector of input values. |
... |
Additional arguments to be passed to |
Value
A numeric vector of the same length as x
.
References
https://en.wikipedia.org/wiki/Error_function.
See Also
pnorm
from package stats.
Breakpoints to be passed to a Histogram
Description
The function find_breaks()
isolates a piece of code of
the function truehist()
from package MASS
that is used to compute the set of breakpoints to be applied for the
construction of the histogram.
Usage
find_breaks(x, nbins = "Scott", h, x0 = -h/1000)
Arguments
x |
numeric. A vector. |
nbins |
integer or character. The suggested number of bins.
Either a positive integer, or a character string naming a rule:
|
h |
numeric. The bin width, a strictly positive number (takes precedence over nbins). |
x0 |
numeric. Shift for the bins -
the breaks are at |
Value
A numeric vector.
See Also
histo()
in this package;
truehist()
from package MASS;
hist()
from package graphics.
Hellinger distance
Description
Estimate the Hellinger distance between two random samples whose underdyling distributions are continuous.
Usage
hellinger(x, y, lower = -Inf, upper = Inf, method = 1, ...)
Arguments
x |
numeric. A vector giving the first sample. |
y |
numeric. A vector giving the second sample. |
lower |
numeric. Lower limit passed to |
upper |
numeric. Upper limit passed to |
method |
integer. If |
... |
Additional parameters to be passed to |
Details
Probability density functions are estimated with
densityfun
.
Then numeric integration is performed with integrate
.
Value
A numeric value, the Hellinger distance.
References
https://en.wikipedia.org/wiki/Hellinger_distance.
See Also
HellingerDist
in package distrEx.
Examples
x <- rnorm(200, 0, 2)
y <- rnorm(1000, 10, 15)
hellinger(x, y, -Inf, Inf)
hellinger(x, y, -Inf, Inf, method = 2)
Alternative Histograms
Description
A simplified version of
hist()
from package graphics.
Usage
histo(x, breaks, ...)
Arguments
x |
numeric. A vector. |
breaks |
numeric. A vector of breakpoints to build the histogram,
possibly given by |
... |
Additional parameters (currently not used). |
Value
An object of class "histogram"
, which can be plotted
by plot.histogram
from package graphics.
This object is a list with components:
-
breaks
: then+1
cell boundaries; -
counts
:n
integers giving the number ofx
inside each cell; -
xname
: a string with the actualx
argument name.
See Also
find_breaks()
in this package;
truehist()
from package MASS;
hist()
from package graphics.
Smoothing kernels
Description
The generic function kernelfun
creates
a smoothing kernel function.
Usage
kernel_properties(name, derivative = FALSE)
kernelfun(name, ...)
## S3 method for class ''function''
kernelfun(name, ...)
## S3 method for class 'character'
kernelfun(name, derivative = FALSE, ...)
.kernelsList()
Arguments
name |
character.
The name of the kernel to be used.
Authorized kernels are listed in |
derivative |
logical. If |
... |
Additional arguments to be passed to the kernel function. |
Value
A function.
See Also
density
in package stats.
Examples
kernel_properties("gaussian")
k <- kernelfun("epanechnikov")
curve(k(x), xlim = c(-1, 1))
Lag a vector
Description
This function computes a lagged vector, shifting it back or forward.
Usage
lagk(x, k, na = FALSE, cst = FALSE)
Arguments
x |
A vector. |
k |
integer. The number of lags.
If |
na |
logical. If |
cst |
logical.
If |
Value
A vector of the same type and length as x
.
Examples
v <- sample(1:10)
print(v)
lagk(v, 1)
lagk(v, 1, na = TRUE)
lagk(v, -2)
lagk(v, -3, na = TRUE)
lagk(v, -3, na = FALSE, cst = TRUE)
lagk(v, -3, na = FALSE)
Most frequent value(s)
Description
The function mfv()
returns the most frequent value(s) (or mode(s))
found in a vector.
The function mfv1
returns the first of these values, so that
mfv1(x)
is identical to mfv(x)[[1L]]
.
Usage
mfv(x, ...)
## Default S3 method:
mfv(x, na_rm = FALSE, ...)
## S3 method for class 'tableNA'
mfv(x, na_rm = FALSE, ...)
mfv1(x, na_rm = FALSE, ...)
Arguments
x |
Vector of observations (of type numeric, integer, character, factor, or
logical).
|
... |
Additional arguments (currently not used). |
na_rm |
logical. If |
Details
See David Smith' blog post
here
to understand the philosophy followed in the code of mfv
for missing
values treatment.
Value
The function mfv
returns a vector of the same type as x
.
One should be aware that this vector can be of length > 1
, in case of
multiple modes.
mfv1
always returns a vector of length 1
(the first of the modes found).
Note
mfv()
calls the function tabulate
.
References
Dutta S. and Goswami A. (2010). Mode estimation for discrete distributions. Mathematical Methods of Statistics, 19(4):374–384.
Examples
# Basic examples:
mfv(integer(0)) # NaN
mfv(c(3, 3, 3, 2, 4)) # 3
mfv(c(TRUE, FALSE, TRUE)) # TRUE
mfv(c("a", "a", "b", "a", "d")) # "a"
mfv(c("a", "a", "b", "b", "d")) # c("a", "b")
mfv1(c("a", "a", "b", "b", "d")) # "a"
# With missing values:
mfv(c(3, 3, 3, 2, NA)) # 3
mfv(c(3, 3, 2, NA)) # NA
mfv(c(3, 3, 2, NA), na_rm = TRUE) # 3
mfv(c(3, 3, 2, 2, NA)) # NA
mfv(c(3, 3, 2, 2, NA), na_rm = TRUE) # c(2, 3)
mfv1(c(3, 3, 2, 2, NA), na_rm = TRUE)# 2
# With only missing values:
mfv(c(NA, NA)) # NA
mfv(c(NA, NA), na_rm = TRUE) # NaN
# With factors
mfv(factor(c("a", "b", "a")))
mfv(factor(c("a", "b", "a", NA)))
mfv(factor(c("a", "b", "a", NA)), na_rm = TRUE)
Midhinge
Description
Compute the midhinge of a numeric vector x
,
defined as the average of the first and third quartiles.
Usage
midhinge(x, na_rm = FALSE, ...)
Arguments
x |
numeric. A numeric vector. |
na_rm |
logical. Should missing values be removed before computing the midhinge? |
... |
Additional arguments to be passed to |
Value
A numeric value, the midhinge.
References
https://en.wikipedia.org/wiki/Midhinge.
Mid-range
Description
Compute the mid-range of a numeric vector x
,
defined as the mean of the minimum and the maximum.
Usage
midrange(x, na_rm = FALSE)
Arguments
x |
numeric. A numeric vector. |
na_rm |
logical. Should missing values be removed before computing the mid-range? |
Value
A numeric value, the mid-range.
References
https://en.wikipedia.org/wiki/Mid-range.
Piecewise-constant regression
Description
picor
looks for a piecewise-constant function as a regression
function. The regression is necessarily univariate.
This is essentially a wrapper for rpart
(regression
tree) and isoreg
.
Usage
picor(formula, data, method, min_length = 0, ...)
## S3 method for class 'picor'
knots(Fn, ...)
## S3 method for class 'picor'
predict(object, newdata, ...)
## S3 method for class 'picor'
plot(x, ...)
## S3 method for class 'picor'
print(x, ...)
Arguments
formula |
formula of the model to be fitted. |
data |
optional data frame. |
method |
character. If |
min_length |
integer. The minimal distance between two consecutive knots. |
... |
Additional arguments to be passed to |
object , x , Fn |
An object of class |
newdata |
data.frame to be passed to the |
Value
An object of class "picor"
, which is a list composed of the
following elements:
formula: the formula passed as an argument;
x: the numeric vector of predictors;
y: the numeric vector of responses;
knots: a numeric vector (possibly of length 0), the knots found;
values: a numeric vector (of length
length(knots)+1
), the constant values taken by the regression function between the knots.
Examples
## Not run:
s <- stats::stepfun(c(-1,0,1), c(1., 2., 4., 3.))
x <- stats::rnorm(1000)
y <- s(x)
p <- picor(y ~ x, data.frame(x = x, y = y))
print(p)
plot(p)
## End(Not run)
Basic plot of a loess object
Description
Plots a loess object adjusted on one unique explanatory variable.
Usage
## S3 method for class 'loess'
plot(x, ...)
Arguments
x |
An object of class |
... |
Additional graphical arguments. |
See Also
loess
from package stats.
Examples
reg <- loess(dist ~ speed, cars)
plot(reg)
Default model predictions
Description
Default method of the predict
generic
function, which can be used when the model object is empty.
Usage
## Default S3 method:
predict(object, newdata, ...)
Arguments
object |
A model object, possibly empty. |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
... |
Additional arguments. |
Value
A vector of predictions.
See Also
predict
from package stats.
Examples
stats::predict(NULL)
stats::predict(NULL, newdata = data.frame(x = 1:2, y = 2:3))
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Alternative Table Creation
Description
Count the occurrences of each factor level or value in a vector.
Usage
tableNA(x)
Arguments
x |
numeric. An atomic vector or a factor. |
Value
An object of class "tableNA"
, which is the result of
tabulate()
with three attributes:
-
type_of_x
: the result oftypeof(x)
; -
is_factor_x
: the result ofis.factor(x)
; -
levels
: the result oflevels(x)
.
The number of missing values is always reported.
Examples
tableNA(c(1,2,2,1,3))
tableNA(c(1,2,2,1,3, NA))
Tukey's trimean
Description
Compute the trimean of a numeric vector x
.
Usage
trimean(x, na_rm = FALSE, ...)
Arguments
x |
numeric. A numeric vector. |
na_rm |
logical. Should missing values be removed before computing the trimean? |
... |
Additional arguments to be passed to |
Value
A numeric value, the trimean.