Title: | Fit Distributions and Neural Networks to Censored and Truncated Data |
Version: | 0.0.3 |
Description: | Define distribution families and fit them to interval-censored and interval-truncated data, where the truncation bounds may depend on the individual observation. The defined distributions feature density, probability, sampling and fitting methods as well as efficient implementations of the log-density log f(x) and log-probability log P(x0 <= X <= x1) for use in 'TensorFlow' neural networks via the 'tensorflow' package. Allows training parametric neural networks on interval-censored and interval-truncated data with flexible parameterization. Applications include Claims Development in Non-Life Insurance, e.g. modelling reporting delay distributions from incomplete data, see Bücher, Rosenstock (2022) <doi:10.1007/s13385-022-00314-4>. |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
BugReports: | https://github.com/AshesITR/reservr/issues |
Depends: | R (≥ 3.5) |
Imports: | assertthat (≥ 0.2.1), generics, glue (≥ 1.3.1), keras3, matrixStats, nloptr, numDeriv, purrr (≥ 0.3.3), R6 (≥ 2.4.1), Rcpp, RcppParallel, rlang (≥ 0.4.5), stats, utils |
Suggests: | covr, callr, colorspace, data.table, dplyr (≥ 0.8.4), evmix, fitdistrplus (≥ 1.0.14), flextable (≥ 0.5.8), formattable (≥ 0.2.0.1), furrr (≥ 0.1.0), ggplot2 (≥ 3.2.1), ggridges (≥ 0.5.2), knitr (≥ 1.28), logKDE (≥ 0.3.2), officer (≥ 0.3.7), patchwork (≥ 1.0.0), reticulate, rmarkdown (≥ 2.1), rstudioapi, tensorflow (≥ 2.0.0), testthat (≥ 2.1.0), tidyr (≥ 1.0.2), tibble, bench, survival, rticles, bookdown |
LinkingTo: | BH, Rcpp, RcppArmadillo, RcppParallel |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
SystemRequirements: | GNU make |
Collate: | 'RcppExports.R' 'fit_util.R' 'distribution_class.R' 'zzz.R' 'interval.R' 'aaa.R' 'blended_transition.R' 'callback_adaptive_lr.R' 'callback_debug_dist_gradients.R' 'check_lengths.R' 'compiler.R' 'dist_bdegp.R' 'dist_beta.R' 'dist_binomial.R' 'fit_blended.R' 'dist_blended.R' 'dist_dirac.R' 'dist_discrete.R' 'dist_empirical.R' 'fit_erlang_mixture.R' 'dist_erlangmix.R' 'dist_exponential.R' 'dist_gamma.R' 'gpd.R' 'dist_genpareto.R' 'dist_lognormal.R' 'fit_mixture.R' 'dist_mixture.R' 'dist_negbinomial.R' 'dist_normal.R' 'pareto.R' 'dist_pareto.R' 'dist_poisson.R' 'dist_translate.R' 'dist_trunc.R' 'dist_uniform.R' 'dist_weibull.R' 'distribution_generics.R' 'distribution_methods.R' 'flatten_params.R' 'integrate.R' 'plot_distributions.R' 'prob_report.R' 'reservr-package.R' 'softmax.R' 'tf_compile.R' 'tf_compile_loss.R' 'tf_constants.R' 'tf_fit.R' 'tf_initialise.R' 'tf_util.R' 'trunc_erlangmix_init.R' 'trunc_obs.R' 'truncate_claims.R' 'weighted_stats.R' |
URL: | https://ashesitr.github.io/reservr/, https://github.com/AshesITR/reservr |
NeedsCompilation: | yes |
Packaged: | 2024-06-24 10:33:37 UTC; alex |
Author: | Alexander Rosenstock [aut, cre, cph] |
Maintainer: | Alexander Rosenstock <alexander.rosenstock@web.de> |
Repository: | CRAN |
Date/Publication: | 2024-06-24 16:40:02 UTC |
Base class for Distributions
Description
Represents a modifiable Distribution family
Active bindings
default_params
Get or set (non-recursive) default parameters of a Distribution
param_bounds
Get or set (non-recursive) parameter bounds (box constraints) of a Distribution
Methods
Public methods
Method new()
Usage
Distribution$new(type, caps, params, name, default_params)
Arguments
type
Type of distribution. This is a string constant for the default implementation. Distributions with non-constant type must override the
get_type()
function.caps
Character vector of capabilities to fuel the default implementations of
has_capability()
andrequire_capability()
. Distributions with dynamic capabilities must override thehas_capability()
function.params
Initial parameter bounds structure, backing the
param_bounds
active binding (usually a list of intervals).name
Name of the Distribution class. Should be
CamelCase
and end with"Distribution"
.default_params
Initial fixed parameters backing the
default_params
active binding (usually a list of numeric / NULLs).
Details
Construct a Distribution instance
Used internally by the dist_*
functions.
Method sample()
Usage
Distribution$sample(n, with_params = list())
Arguments
n
number of samples to draw.
with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
n
. In that case thei
-th sample will use thei
-th parameters.
Details
Sample from a Distribution
Returns
A length n
vector of i.i.d. random samples from the
Distribution with the specified parameters.
Examples
dist_exponential(rate = 2.0)$sample(10)
Method density()
Usage
Distribution$density(x, log = FALSE, with_params = list())
Arguments
x
Vector of points to evaluate the density at.
log
Flag. If
TRUE
, return the log-density instead.with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
length(x)
. In that case, thei
-th density point will use thei
-th parameters.
Details
Density of a Distribution
Returns
A numeric vector of (log-)densities
Examples
dist_exponential()$density(c(1.0, 2.0), with_params = list(rate = 2.0))
Method tf_logdensity()
Usage
Distribution$tf_logdensity()
Details
Compile a TensorFlow function for log-density evaluation
Returns
A tf_function
taking arguments x
and args
returning the
log-density of the Distribution evaluated at x
with parameters args
.
Method probability()
Usage
Distribution$probability( q, lower.tail = TRUE, log.p = FALSE, with_params = list() )
Arguments
q
Vector of points to evaluate the probability function at.
lower.tail
If
TRUE
, return P(X <= q). Otherwise return P(X > q).log.p
If
TRUE
, probabilities are returned aslog(p)
.with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
length(q)
. In that case, thei
-th probability point will use thei
-th parameters.
Details
Cumulative probability of a Distribution
Returns
A numeric vector of (log-)probabilities
Examples
dist_exponential()$probability( c(1.0, 2.0), with_params = list(rate = 2.0) )
Method tf_logprobability()
Usage
Distribution$tf_logprobability()
Details
Compile a TensorFlow function for log-probability evaluation
Returns
A tf_function
taking arguments qmin
, qmax
and args
returning the log-probability of the Distribution evaluated over the
closed interval [qmin
, qmax
] with parameters args
.
Method quantile()
Usage
Distribution$quantile( p, lower.tail = TRUE, log.p = FALSE, with_params = list() )
Arguments
p
Vector of probabilities.
lower.tail
If
TRUE
, return P(X <= q). Otherwise return P(X > q).log.p
If
TRUE
, probabilities are returned aslog(p)
.with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
length(p)
. In that case, thei
-th quantile will use thei
-th parameters.
Details
Quantile function of a Distribution
Returns
A numeric vector of quantiles
Examples
dist_exponential()$quantile(c(0.1, 0.5), with_params = list(rate = 2.0))
Method hazard()
Usage
Distribution$hazard(x, log = FALSE, with_params = list())
Arguments
x
Vector of points.
log
Flag. If
TRUE
, return the log-hazard instead.with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
length(x)
. In that case, thei
-th hazard point will use thei
-th parameters.
Details
Hazard function of a Distribution
Returns
A numeric vector of (log-)hazards
Examples
dist_exponential(rate = 2.0)$hazard(c(1.0, 2.0))
Method diff_density()
Usage
Distribution$diff_density(x, log = FALSE, with_params = list())
Arguments
x
Vector of points.
log
Flag. If
TRUE
, return the gradient of the log-density instead.with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
length(x)
. In that case, thei
-th density point will use thei
-th parameters.
Details
Gradients of the density of a Distribution
Returns
A list structure containing the (log-)density gradients of all
free parameters of the Distribution evaluated at x
.
Examples
dist_exponential()$diff_density( c(1.0, 2.0), with_params = list(rate = 2.0) )
Method diff_probability()
Usage
Distribution$diff_probability( q, lower.tail = TRUE, log.p = FALSE, with_params = list() )
Arguments
q
Vector of points to evaluate the probability function at.
lower.tail
If
TRUE
, return P(X <= q). Otherwise return P(X > q).log.p
If
TRUE
, probabilities are returned aslog(p)
.with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
length(q)
. In that case, thei
-th probability point will use thei
-th parameters.
Details
Gradients of the cumulative probability of a Distribution
Returns
A list structure containing the cumulative (log-)probability
gradients of all free parameters of the Distribution evaluated at q
.
Examples
dist_exponential()$diff_probability( c(1.0, 2.0), with_params = list(rate = 2.0) )
Method is_in_support()
Usage
Distribution$is_in_support(x, with_params = list())
Arguments
x
Vector of points
with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
length(x)
. In that case, thei
-th point will use thei
-th parameters.
Details
Determine if a value is in the support of a Distribution
Returns
A logical vector with the same length as x
indicating whether
x
is part of the support of the distribution given its parameters.
Examples
dist_exponential(rate = 1.0)$is_in_support(c(-1.0, 0.0, 1.0))
Method is_discrete_at()
Usage
Distribution$is_discrete_at(x, with_params = list())
Arguments
x
Vector of points
with_params
Distribution parameters to use. Each parameter value can also be a numeric vector of length
length(x)
. In that case, thei
-th point will use thei
-th parameters.
Details
Determine if a value has positive probability
Returns
A logical vector with the same length as x
indicating whether
there is a positive probability mass at x
given the Distribution
parameters.
Examples
dist_dirac(point = 0.0)$is_discrete_at(c(0.0, 1.0))
Method tf_is_discrete_at()
Usage
Distribution$tf_is_discrete_at()
Details
Compile a TensorFlow function for discrete support checking
Returns
A tf_function
taking arguments x
and args
returning whether
the Distribution has a point mass at x
given parameters args
.
Method has_capability()
Usage
Distribution$has_capability(caps)
Arguments
caps
Character vector of capabilities
Details
Check if a capability is present
Returns
A logical vector the same length as caps
.
Examples
dist_exponential()$has_capability("density")
Method get_type()
Usage
Distribution$get_type()
Details
Get the type of a Distribution. Type can be one of discrete
,
continuous
or mixed
.
Returns
A string representing the type of the Distribution.
Examples
dist_exponential()$get_type() dist_dirac()$get_type() dist_mixture(list(dist_dirac(), dist_exponential()))$get_type() dist_mixture(list(dist_dirac(), dist_binomial()))$get_type()
Method get_components()
Usage
Distribution$get_components()
Details
Get the component Distributions of a transformed Distribution.
Returns
A possibly empty list of Distributions
Examples
dist_trunc(dist_exponential())$get_components() dist_dirac()$get_components() dist_mixture(list(dist_exponential(), dist_gamma()))$get_components()
Method is_discrete()
Usage
Distribution$is_discrete()
Details
Check if a Distribution is discrete, i.e. it has a density with respect to the counting measure.
Returns
TRUE
if the Distribution is discrete, FALSE
otherwise.
Note that mixed distributions are not discrete but can have point masses.
Examples
dist_exponential()$is_discrete() dist_dirac()$is_discrete()
Method is_continuous()
Usage
Distribution$is_continuous()
Details
Check if a Distribution is continuous, i.e. it has a density with respect to the Lebesgue measure.
Returns
TRUE
if the Distribution is continuous, FALSE
otherwise.
Note that mixed distributions are not continuous.
Examples
dist_exponential()$is_continuous() dist_dirac()$is_continuous()
Method require_capability()
Usage
Distribution$require_capability( caps, fun_name = paste0(sys.call(-1)[[1]], "()") )
Arguments
caps
Character vector of Capabilities to require
fun_name
Frienly text to use for generating the error message in case of failure.
Details
Ensure that a Distribution has all required capabilities. Will throw an error if any capability is missing.
Returns
Invisibly TRUE
.
Examples
dist_exponential()$require_capability("diff_density")
Method get_dof()
Usage
Distribution$get_dof()
Details
Get the number of degrees of freedom of a Distribution family. Only parameters without a fixed default are considered free.
Returns
An integer representing the degrees of freedom suitable e.g. for AIC calculations.
Examples
dist_exponential()$get_dof() dist_exponential(rate = 1.0)$get_dof()
Method get_placeholders()
Usage
Distribution$get_placeholders()
Details
Get Placeholders of a Distribution family.
Returns a list of free parameters of the family.
Their values will be NULL
.
If the Distribution has Distributions as parameters, placeholders will be computed recursively.
Returns
A named list containing any combination of (named or unnamed)
lists and NULL
s.
Examples
dist_exponential()$get_placeholders() dist_mixture(list(dist_dirac(), dist_exponential()))$get_placeholders()
Method get_params()
Usage
Distribution$get_params(with_params = list())
Arguments
with_params
Optional parameter overrides with the same structure as
dist$get_params()
. Given Parameter values are expected to be length 1.
Details
Get a full list of parameters, possibly including placeholders.
Returns
A list representing the (recursive) parameter structure of the
Distribution with values for specified parameters and NULL
for free
parameters that are missing both in the Distributions parameters and in
with_params
.
Examples
dist_mixture(list(dist_dirac(), dist_exponential()))$get_params( with_params = list(probs = list(0.5, 0.5)) )
Method tf_make_constants()
Usage
Distribution$tf_make_constants(with_params = list())
Arguments
with_params
Optional parameter overrides with the same structure as
dist$tf_make_constants()
. Given Parameter values are expected to be length 1.
Details
Get a list of constant TensorFlow parameters
Returns
A list representing the (recursive) constant parameters of the
Distribution with values sprecified by parameters. Each constant is a
TensorFlow Tensor of dtype floatx
.
Method tf_compile_params()
Usage
Distribution$tf_compile_params(input, name_prefix = "")
Arguments
input
A keras layer to bind all outputs to
name_prefix
Prefix to use for layer names
Details
Compile distribution parameters into tensorflow outputs
Returns
A list with two elements
-
outputs
a flat list of keras output layers, one for each parameter. -
output_inflater
a function taking keras output layers and transforming them into a list structure suitable for passing to the loss function returned bytf_compile_model()
Method get_param_bounds()
Usage
Distribution$get_param_bounds()
Details
Get Interval bounds on all Distribution parameters
Returns
A list representing the free (recursive) parameter structure of
the Distribution with Interval
objects as values representing the
bounds of the respective free parameters.
Examples
dist_mixture( list(dist_dirac(), dist_exponential()), probs = list(0.5, 0.5) )$get_param_bounds() dist_mixture( list(dist_dirac(), dist_exponential()) )$get_param_bounds() dist_genpareto()$get_param_bounds() dist_genpareto1()$get_param_bounds()
Method get_param_constraints()
Usage
Distribution$get_param_constraints()
Details
Get additional (non-linear) equality constraints on Distribution parameters
Returns
NULL
if the box constraints specified by
dist$get_param_bounds()
are sufficient, or a function taking full
Distribution parameters and returning either a numeric vector
(which must be 0 for valid parameter combinations) or a list with
elements
-
constraints
: The numeric vector of constraints -
jacobian
: The Jacobi matrix of the constraints with respect to the parameters
Examples
dist_mixture( list(dist_dirac(), dist_exponential()) )$get_param_constraints()
Method export_functions()
Usage
Distribution$export_functions( name, envir = parent.frame(), with_params = list() )
Arguments
name
common suffix of the exported functions
envir
Environment to export the functions to
with_params
Optional list of parameters to use as default values for the exported functions
Details
Export sampling, density, probability and quantile functions to plain R functions
Creates new functions in envir
named {r,d,p,q}<name>
which implement
dist$sample
, dist$density
, dist$probability
and dist$quantile
as
plain functions with default arguments specified by with_params
or the
fixed parameters.
The resulting functions will have signatures taking all parameters as separate arguments.
Returns
Invisibly NULL
.
Examples
tmp_env <- new.env(parent = globalenv()) dist_exponential()$export_functions( name = "exp", envir = tmp_env, with_params = list(rate = 2.0) ) evalq( fitdistrplus::fitdist(rexp(100), "exp"), envir = tmp_env )
Method clone()
The objects of this class are cloneable with this method.
Usage
Distribution$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
Other Distributions:
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
# Example for param_bounds:
# Create an Exponential Distribution with rate constrained to (0, 2)
# instead of (0, Inf)
my_exp <- dist_exponential()
my_exp$param_bounds$rate <- interval(c(0, 2))
my_exp$get_param_bounds()
fit_dist(my_exp, rexp(100, rate = 3), start = list(rate = 1))$params$rate
## ------------------------------------------------
## Method `Distribution$sample`
## ------------------------------------------------
dist_exponential(rate = 2.0)$sample(10)
## ------------------------------------------------
## Method `Distribution$density`
## ------------------------------------------------
dist_exponential()$density(c(1.0, 2.0), with_params = list(rate = 2.0))
## ------------------------------------------------
## Method `Distribution$probability`
## ------------------------------------------------
dist_exponential()$probability(
c(1.0, 2.0),
with_params = list(rate = 2.0)
)
## ------------------------------------------------
## Method `Distribution$quantile`
## ------------------------------------------------
dist_exponential()$quantile(c(0.1, 0.5), with_params = list(rate = 2.0))
## ------------------------------------------------
## Method `Distribution$hazard`
## ------------------------------------------------
dist_exponential(rate = 2.0)$hazard(c(1.0, 2.0))
## ------------------------------------------------
## Method `Distribution$diff_density`
## ------------------------------------------------
dist_exponential()$diff_density(
c(1.0, 2.0),
with_params = list(rate = 2.0)
)
## ------------------------------------------------
## Method `Distribution$diff_probability`
## ------------------------------------------------
dist_exponential()$diff_probability(
c(1.0, 2.0),
with_params = list(rate = 2.0)
)
## ------------------------------------------------
## Method `Distribution$is_in_support`
## ------------------------------------------------
dist_exponential(rate = 1.0)$is_in_support(c(-1.0, 0.0, 1.0))
## ------------------------------------------------
## Method `Distribution$is_discrete_at`
## ------------------------------------------------
dist_dirac(point = 0.0)$is_discrete_at(c(0.0, 1.0))
## ------------------------------------------------
## Method `Distribution$has_capability`
## ------------------------------------------------
dist_exponential()$has_capability("density")
## ------------------------------------------------
## Method `Distribution$get_type`
## ------------------------------------------------
dist_exponential()$get_type()
dist_dirac()$get_type()
dist_mixture(list(dist_dirac(), dist_exponential()))$get_type()
dist_mixture(list(dist_dirac(), dist_binomial()))$get_type()
## ------------------------------------------------
## Method `Distribution$get_components`
## ------------------------------------------------
dist_trunc(dist_exponential())$get_components()
dist_dirac()$get_components()
dist_mixture(list(dist_exponential(), dist_gamma()))$get_components()
## ------------------------------------------------
## Method `Distribution$is_discrete`
## ------------------------------------------------
dist_exponential()$is_discrete()
dist_dirac()$is_discrete()
## ------------------------------------------------
## Method `Distribution$is_continuous`
## ------------------------------------------------
dist_exponential()$is_continuous()
dist_dirac()$is_continuous()
## ------------------------------------------------
## Method `Distribution$require_capability`
## ------------------------------------------------
dist_exponential()$require_capability("diff_density")
## ------------------------------------------------
## Method `Distribution$get_dof`
## ------------------------------------------------
dist_exponential()$get_dof()
dist_exponential(rate = 1.0)$get_dof()
## ------------------------------------------------
## Method `Distribution$get_placeholders`
## ------------------------------------------------
dist_exponential()$get_placeholders()
dist_mixture(list(dist_dirac(), dist_exponential()))$get_placeholders()
## ------------------------------------------------
## Method `Distribution$get_params`
## ------------------------------------------------
dist_mixture(list(dist_dirac(), dist_exponential()))$get_params(
with_params = list(probs = list(0.5, 0.5))
)
## ------------------------------------------------
## Method `Distribution$get_param_bounds`
## ------------------------------------------------
dist_mixture(
list(dist_dirac(), dist_exponential()),
probs = list(0.5, 0.5)
)$get_param_bounds()
dist_mixture(
list(dist_dirac(), dist_exponential())
)$get_param_bounds()
dist_genpareto()$get_param_bounds()
dist_genpareto1()$get_param_bounds()
## ------------------------------------------------
## Method `Distribution$get_param_constraints`
## ------------------------------------------------
dist_mixture(
list(dist_dirac(), dist_exponential())
)$get_param_constraints()
## ------------------------------------------------
## Method `Distribution$export_functions`
## ------------------------------------------------
tmp_env <- new.env(parent = globalenv())
dist_exponential()$export_functions(
name = "exp",
envir = tmp_env,
with_params = list(rate = 2.0)
)
evalq(
fitdistrplus::fitdist(rexp(100), "exp"),
envir = tmp_env
)
The Generalized Pareto Distribution (GPD)
Description
These functions provide information about the generalized Pareto distribution
with threshold u
. dgpd
gives the density, pgpd
gives the distribution
function, qgpd
gives the quantile function and rgpd
generates random
deviates.
Usage
rgpd(n = 1L, u = 0, sigmau = 1, xi = 0)
dgpd(x, u = 0, sigmau = 1, xi = 0, log = FALSE)
pgpd(q, u = 0, sigmau = 1, xi = 0, lower.tail = TRUE, log.p = FALSE)
qgpd(p, u = 0, sigmau = 1, xi = 0, lower.tail = TRUE, log.p = FALSE)
Arguments
n |
integer number of observations. |
u |
threshold parameter (minimum value). |
sigmau |
scale parameter (must be positive). |
xi |
shape parameter |
x , q |
vector of quantiles. |
log , log.p |
logical; if |
lower.tail |
logical; if |
p |
vector of probabilities. |
Details
If u
, sigmau
or xi
are not specified, they assume the default values of
0
, 1
and 0
respectively.
The generalized Pareto distribution has density
f(x) = 1 / \sigma_u (1 + \xi z)^(- 1 / \xi - 1)
where z = (x - u) / \sigma_u
and f(x) = exp(-z)
if
\xi
is 0.
The support is x \ge u
for \xi \ge 0
and
u \le x \le u - \sigma_u / \xi
for \xi < 0
.
The Expected value exists if \xi < 1
and is equal to
E(X) = u + \sigma_u / (1 - \xi)
k-th moments exist in general for k\xi < 1
.
Value
rgpd
generates random deviates.
dgpd
gives the density.
pgpd
gives the distribution function.
qgpd
gives the quantile function.
References
https://en.wikipedia.org/wiki/Generalized_Pareto_distribution
Examples
x <- rgpd(1000, u = 1, sigmau = 0.5, xi = 0.1)
xx <- seq(-1, 10, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 10))
lines(xx, dgpd(xx, u = 1, sigmau = 0.5, xi = 0.1))
plot(xx, dgpd(xx, u = 1, sigmau = 1, xi = 0), type = "l")
lines(xx, dgpd(xx, u = 0.5, sigmau = 1, xi = -0.3), col = "blue", lwd = 2)
lines(xx, dgpd(xx, u = 1.5, sigmau = 1, xi = 0.3), col = "red", lwd = 2)
plot(xx, dgpd(xx, u = 1, sigmau = 1, xi = 0), type = "l")
lines(xx, dgpd(xx, u = 1, sigmau = 0.5, xi = 0), col = "blue", lwd = 2)
lines(xx, dgpd(xx, u = 1, sigmau = 2, xi = 0), col = "red", lwd = 2)
The Pareto Distribution
Description
These functions provide information about the Pareto distribution.
dpareto
gives the density, ppareto
gives the distribution
function, qpareto
gives the quantile function and rpareto
generates random
deviates.
Usage
rpareto(n = 1L, shape = 0, scale = 1)
dpareto(x, shape = 1, scale = 1, log = FALSE)
ppareto(q, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE)
qpareto(p, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE)
Arguments
n |
integer number of observations. |
shape |
shape parameter (must be positive). |
scale |
scale parameter (must be positive). |
x , q |
vector of quantiles. |
log , log.p |
logical; if |
lower.tail |
logical; if |
p |
vector of probabilities. |
Details
If shape
or scale
are not specified, they assume the default values of 1
.
The Pareto distribution with scale \theta
and shape \xi
has density
f(x) = \xi \theta^\xi / (x + \theta)^(\xi + 1)
The support is x \ge 0
.
The Expected value exists if \xi > 1
and is equal to
E(X) = \theta / (\xi - 1)
k-th moments exist in general for k < \xi
.
Value
rpareto
generates random deviates.
dpareto
gives the density.
ppareto
gives the distribution function.
qpareto
gives the quantile function.
References
https://en.wikipedia.org/wiki/Pareto_distribution - named Lomax therein.
Examples
x <- rpareto(1000, shape = 10, scale = 5)
xx <- seq(-1, 10, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 10))
lines(xx, dpareto(xx, shape = 10, scale = 5))
plot(xx, dpareto(xx, shape = 10, scale = 5), type = "l")
lines(xx, dpareto(xx, shape = 3, scale = 5), col = "red", lwd = 2)
plot(xx, dpareto(xx, shape = 10, scale = 10), type = "l")
lines(xx, dpareto(xx, shape = 10, scale = 5), col = "blue", lwd = 2)
lines(xx, dpareto(xx, shape = 10, scale = 20), col = "red", lwd = 2)
Convert TensorFlow tensors to distribution parameters recursively
Description
Convert TensorFlow tensors to distribution parameters recursively
Usage
as_params(x)
Arguments
x |
possibly nested list structure of |
Value
A nested list of vectors suitable as distribution parameters
Examples
if (interactive()) {
tf_params <- list(
probs = k_matrix(t(c(0.5, 0.3, 0.2))),
shapes = k_matrix(t(c(1L, 2L, 3L)), dtype = "int32"),
scale = keras3::as_tensor(1.0, keras3::config_floatx())
)
params <- as_params(tf_params)
dist <- dist_erlangmix(vector("list", 3L))
dist$sample(10L, with_params = params)
}
Transition functions for blended distributions
Description
Transition functions for blended distributions
Usage
blended_transition(x, u, eps, .gradient = FALSE, .extend_na = FALSE)
blended_transition_inv(x, u, eps, .component)
Arguments
x |
Points to evaluate at |
u |
Sorted vector of blending thresholds, or rowwise sorted matrix of blending thresholds |
eps |
Corresponding vector or matrix of blending bandwidths.
Must be positive and the same dimensions as |
.gradient |
Also evaluate the gradient with respect to |
.extend_na |
Extend out-of range transitions by the last in-range value (i.e. the corresponding u) or by NA? |
.component |
Component index (up to |
Value
blended_transition
returns a matrix with length(x)
rows and length(u) + 1
columns containing the
transformed values for each of the blending components.
If .gradient
is TRUE, an attribute "gradient"
is attached with the same dimensions, containing the derivative
of the respective transition component with respect to x
.
blended_transition_inv
returns a vector with length(x)
values containing the inverse of the transformed
values for the .component
th blending component.
Examples
library(ggplot2)
xx <- seq(from = 0, to = 20, length.out = 101)
blend_mat <- blended_transition(xx, u = 10, eps = 3, .gradient = TRUE)
ggplot(
data.frame(
x = rep(xx, 2L),
fun = rep(c("p", "q"), each = length(xx)),
y = as.numeric(blend_mat),
relevant = c(xx <= 13, xx >= 7)
),
aes(x = x, y = y, color = fun, linetype = relevant)
) %+%
geom_line() %+%
theme_bw() %+%
theme(
legend.position = "bottom", legend.box = "horizontal"
) %+%
guides(color = guide_legend(direction = "horizontal", title = ""), linetype = guide_none()) %+%
scale_linetype_manual(values = c("TRUE" = 1, "FALSE" = 3))
ggplot(
data.frame(
x = rep(xx, 2L),
fun = rep(c("p'", "q'"), each = length(xx)),
y = as.numeric(attr(blend_mat, "gradient")),
relevant = c(xx <= 13, xx >= 7)
),
aes(x = x, y = y, color = fun, linetype = relevant)
) %+%
geom_line() %+%
theme_bw() %+%
theme(
legend.position = "bottom", legend.box = "horizontal"
) %+%
guides(color = guide_legend(direction = "horizontal", title = ""), linetype = guide_none()) %+%
scale_linetype_manual(values = c("TRUE" = 1, "FALSE" = 3))
Keras Callback for adaptive learning rate with weight restoration
Description
Provides a keras callback similar to keras3::callback_reduce_lr_on_plateau()
but which also restores the weights
to the best seen so far whenever a learning rate reduction occurs, and with slightly more restrictive improvement
detection.
Usage
callback_adaptive_lr(
monitor = "val_loss",
factor = 0.1,
patience = 10L,
verbose = 0L,
mode = c("auto", "min", "max"),
delta_abs = 1e-04,
delta_rel = 0,
cooldown = 0L,
min_lr = 0,
restore_weights = TRUE
)
Arguments
monitor |
quantity to be monitored. |
factor |
factor by which the learning rate will be reduced. |
patience |
number of epochs with no significant improvement after which the learning rate will be reduced. |
verbose |
integer. Set to 1 to receive update messages. |
mode |
Optimisation mode. "auto" detects the mode from the name of |
delta_abs |
Minimum absolute metric improvement per epoch. The learning rate will be reduced if the average
improvement is less than |
delta_rel |
Minimum relative metric improvement per epoch. The learning rate will be reduced if the average
improvement is less than |
cooldown |
number of epochs to wait before resuming normal operation after learning rate has been reduced.
The minimum number of epochs between two learning rate reductions is |
min_lr |
lower bound for the learning rate. If a learning rate reduction would lower the learning rate below
|
restore_weights |
Bool. If TRUE, the best weights will be restored at each learning rate reduction. This is very useful if the metric oscillates. |
Details
Note that while keras3::callback_reduce_lr_on_plateau()
automatically logs the learning rate as a metric 'lr',
this is currently impossible from R.
Thus, if you want to also log the learning rate, you should add keras3::callback_reduce_lr_on_plateau()
with a high
min_lr
to effectively disable the callback but still monitor the learning rate.
Value
A KerasCallback
suitable for passing to keras3::fit()
.
Examples
dist <- dist_exponential()
group <- sample(c(0, 1), size = 100, replace = TRUE)
x <- dist$sample(100, with_params = list(rate = group + 1))
global_fit <- fit(dist, x)
if (interactive()) {
library(keras3)
l_in <- layer_input(shape = 1L)
mod <- tf_compile_model(
inputs = list(l_in),
intermediate_output = l_in,
dist = dist,
optimizer = optimizer_adam(),
censoring = FALSE,
truncation = FALSE
)
tf_initialise_model(mod, global_fit$params)
fit_history <- fit(
mod,
x = as_tensor(group, config_floatx()),
y = as_trunc_obs(x),
epochs = 20L,
callbacks = list(
callback_adaptive_lr("loss", factor = 0.5, patience = 2L, verbose = 1L, min_lr = 1.0e-4),
callback_reduce_lr_on_plateau("loss", min_lr = 1.0) # to track lr
)
)
plot(fit_history)
predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx()))
}
Callback to monitor likelihood gradient components
Description
Provides a keras callback to monitor the individual components of the censored and truncated likelihood. Useful for debugging TensorFlow implementations of Distributions.
Usage
callback_debug_dist_gradients(
object,
data,
obs,
keep_grads = FALSE,
stop_on_na = TRUE,
verbose = TRUE
)
Arguments
object |
A |
data |
Input data for the model. |
obs |
Observations associated to |
keep_grads |
Log actual gradients? (memory hungry!) |
stop_on_na |
Stop if any likelihood component as NaN in its gradients? |
verbose |
Print a message if training is halted? The Message will contain information about which likelihood components have NaN in their gradients. |
Value
A KerasCallback
suitable for passing to keras3::fit()
.
Examples
dist <- dist_exponential()
group <- sample(c(0, 1), size = 100, replace = TRUE)
x <- dist$sample(100, with_params = list(rate = group + 1))
global_fit <- fit(dist, x)
if (interactive()) {
library(keras3)
l_in <- layer_input(shape = 1L)
mod <- tf_compile_model(
inputs = list(l_in),
intermediate_output = l_in,
dist = dist,
optimizer = optimizer_adam(),
censoring = FALSE,
truncation = FALSE
)
tf_initialise_model(mod, global_fit$params)
gradient_tracker <- callback_debug_dist_gradients(
mod,
as_tensor(group, config_floatx()),
x,
keep_grads = TRUE
)
fit_history <- fit(
mod,
x = as_tensor(group, config_floatx()),
y = x,
epochs = 20L,
callbacks = list(
callback_adaptive_lr("loss", factor = 0.5, patience = 2L, verbose = 1L, min_lr = 1.0e-4),
gradient_tracker,
callback_reduce_lr_on_plateau("loss", min_lr = 1.0) # to track lr
)
)
gradient_tracker$gradient_logs[[20]]$dens
plot(fit_history)
predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx()))
}
Construct a BDEGP-Family
Description
Constructs a BDEGP-Family distribution with fixed number of components and blending interval.
Usage
dist_bdegp(n, m, u, epsilon)
Arguments
n |
Number of dirac components, starting with a point mass at 0. |
m |
Number of erlang components, translated by |
u |
Blending cut-off, must be a positive real. |
epsilon |
Blending radius, must be a positive real less than |
Value
A
MixtureDistribution
of-
n
DiracDistribution
s at 0 .. n - 1 and a
BlendedDistribution
object with child Distributionsa
TranslatedDistribution
with offsetn - 0.5
of anErlangMixtureDistribution
withm
shapesand a
GeneralizedParetoDistribution
with shape parameter restricted to [0, 1] and location parameter fixed atu
With breaku
and bandwidthepsilon
.
-
See Also
Other Distributions:
Distribution
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
dist <- dist_bdegp(n = 1, m = 2, u = 10, epsilon = 3)
params <- list(
dists = list(
list(),
list(
dists = list(
list(
dist = list(
shapes = list(1L, 2L),
scale = 1.0,
probs = list(0.7, 0.3)
)
),
list(
sigmau = 1.0,
xi = 0.1
)
),
probs = list(0.1, 0.9)
)
),
probs = list(0.95, 0.05)
)
x <- dist$sample(100, with_params = params)
plot_distributions(
theoretical = dist,
empirical = dist_empirical(x),
.x = seq(0, 20, length.out = 101),
with_params = list(theoretical = params)
)
Beta Distribution
Description
See stats::Beta
Usage
dist_beta(shape1 = NULL, shape2 = NULL, ncp = NULL)
Arguments
shape1 |
First scalar shape parameter, or |
shape2 |
Second scalar shape parameter, or |
ncp |
Scalar non-centrality parameter, or |
Details
All parameters can be overridden with
with_params = list(shape = ..., scale = ...)
.
Value
A BetaDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_beta <- dist_beta(shape1 = 2, shape2 = 2, ncp = 0)
x <- d_beta$sample(100)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_beta,
estimated = d_beta,
with_params = list(
estimated = inflate_params(
fitdistrplus::fitdist(x, distr = "beta")$estimate
)
),
.x = seq(0, 2, length.out = 100)
)
Binomial Distribution
Description
See stats::Binomial
Usage
dist_binomial(size = NULL, prob = NULL)
Arguments
size |
Number of trials parameter (integer), or |
prob |
Success probability parameter, or |
Details
Both parameters can be overridden with
with_params = list(size = ..., prob = ...)
.
Value
A BinomialDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_binom <- dist_binomial(size = 10, prob = 0.5)
x <- d_binom$sample(100)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_binom,
estimated = d_binom,
with_params = list(
estimated = list(
size = max(x),
prob = mean(x) / max(x)
)
),
.x = 0:max(x)
)
Blended distribution
Description
Blended distribution
Usage
dist_blended(dists, probs = NULL, breaks = NULL, bandwidths = NULL)
Arguments
dists |
A list of k >= 2 component Distributions. |
probs |
k Mixture weight parameters |
breaks |
k - 1 Centers of the blending zones.
|
bandwidths |
k - 1 Radii of the blending zones.
The i-th blending zone will begin at |
Value
A BlendedDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
bd <- dist_blended(
list(
dist_normal(mean = 0.0, sd = 1.0),
dist_genpareto(u = 3.0, sigmau = 1.0, xi = 3.0)
),
breaks = list(3.0),
bandwidths = list(0.5),
probs = list(0.9, 0.1)
)
plot_distributions(
bd,
.x = seq(-3, 10, length.out = 100),
plots = c("d", "p")
)
Dirac (degenerate point) Distribution
Description
A degenerate distribution with all mass at a single point.
Usage
dist_dirac(point = NULL)
Arguments
point |
The point with probability mass 1. |
Details
The parameter can be overridden with
with_params = list(point = ...)
.
Value
A DiracDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_dirac <- dist_dirac(1.5)
d_dirac$sample(2L)
d_dirac$sample(2L, list(point = 42.0))
Discrete Distribution
Description
A full-flexibility discrete distribution with values from 1 to size
.
Usage
dist_discrete(size = NULL, probs = NULL)
Arguments
size |
Number of classes parameter (integer). Required if |
probs |
Vector of probabilties parameter, or |
Details
Parameters can be overridden with
with_params = list(probs = ...)
.
Value
A DiscreteDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_discrete <- dist_discrete(probs = list(0.5, 0.25, 0.15, 0.1))
x <- d_discrete$sample(100)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_discrete,
estimated = d_discrete,
with_params = list(
estimated = list(
size = max(x),
probs = as.list(unname(table(x)) / 100)
)
),
.x = 0:max(x)
)
Empirical distribution
Description
Creates an empirical distribution object from a sample.
Assumes iid. samples. with_params
should not be used with this
distribution because estimation of the relevant indicators happens during
construction.
Usage
dist_empirical(sample, positive = FALSE, bw = "nrd0")
Arguments
sample |
Sample to build the empirical distribution from |
positive |
Is the underlying distribution known to be positive?
This will effect the density estimation procedure.
|
bw |
Bandwidth parameter for density estimation. Passed to the density
estimation function selected by |
Details
-
sample()
samples iid. fromsample
. This approach is similar to bootstrapping. -
density()
evaluates a kernel density estimate, approximating with zero outside of the known support. This estimate is either obtained using stats::density or logKDE::logdensity_fft, depending onpositive
. -
probability()
evaluates the empirical cumulative density function obtained by stats::ecdf. -
quantile()
evaluates the empirical quantiles using stats::quantile -
hazard()
estimates the hazard rate using the density estimate and the empirical cumulative density function:h(t) = df(t) / (1 - cdf(t))
.
Value
An EmpiricalDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
x <- rexp(20, rate = 1)
dx <- dist_empirical(sample = x, positive = TRUE)
y <- rnorm(20)
dy <- dist_empirical(sample = y)
plot_distributions(
exponential = dx,
normal = dy,
.x = seq(-3, 3, length.out = 100)
)
Erlang Mixture distribution
Description
Erlang Mixture distribution
Usage
dist_erlangmix(shapes, scale = NULL, probs = NULL)
Arguments
shapes |
Shape parameters, a trunc_erlangmix fit, or |
scale |
Common scale parameter, or |
probs |
Mixing probabilities, or |
Value
An ErlangMixtureDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
params <- list(scale = 1.0, probs = list(0.5, 0.3, 0.2), shapes = list(1L, 2L, 3L))
dist <- dist_erlangmix(vector("list", 3L))
x <- dist$sample(20, with_params = params)
d_emp <- dist_empirical(x, positive = TRUE)
plot_distributions(
empirical = d_emp,
theoretical = dist,
with_params = list(
theoretical = params
),
.x = seq(1e-4, 5, length.out = 100)
)
Exponential distribution
Description
See stats::Exponential.
Usage
dist_exponential(rate = NULL)
Arguments
rate |
Scalar rate parameter, or |
Details
The parameter can be overridden with with_params = list(rate = ...)
.
Value
An ExponentialDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
rate <- 1
d_exp <- dist_exponential()
x <- d_exp$sample(20, with_params = list(rate = rate))
d_emp <- dist_empirical(x, positive = TRUE)
plot_distributions(
empirical = d_emp,
theoretical = d_exp,
estimated = d_exp,
with_params = list(
theoretical = list(rate = rate),
estimated = list(rate = 1 / mean(x))
),
.x = seq(1e-4, 5, length.out = 100)
)
Gamma distribution
Description
See stats::GammaDist.
Usage
dist_gamma(shape = NULL, rate = NULL)
Arguments
shape |
Scalar shape parameter, or |
rate |
Scalar rate parameter, or |
Details
Both parameters can be overridden with
with_params = list(shape = ..., rate = ...)
.
Value
A GammaDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
alpha <- 2
beta <- 2
d_gamma <- dist_gamma(shape = alpha, rate = beta)
x <- d_gamma$sample(100)
d_emp <- dist_empirical(x, positive = TRUE)
plot_distributions(
empirical = d_emp,
theoretical = d_gamma,
estimated = d_gamma,
with_params = list(
estimated = inflate_params(
fitdistrplus::fitdist(x, distr = "gamma")$estimate
)
),
.x = seq(1e-3, max(x), length.out = 100)
)
Generalized Pareto Distribution
Description
See evmix::gpd
Usage
dist_genpareto(u = NULL, sigmau = NULL, xi = NULL)
dist_genpareto1(u = NULL, sigmau = NULL, xi = NULL)
Arguments
u |
Scalar location parameter, or |
sigmau |
Scalar scale parameter, or |
xi |
Scalar shape parameter, or |
Details
All parameters can be overridden with
with_params = list(u = ..., sigmau = ..., xi = ...)
.
dist_genpareto1
is equivalent to dist_genpareto
but enforces
bound constraints on xi
to [0, 1]
.
This ensures unboundedness and finite expected value unless xi == 1.0
.
Value
A GeneralizedParetoDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_genpareto <- dist_genpareto(u = 0, sigmau = 1, xi = 1)
x <- d_genpareto$sample(100)
d_emp <- dist_empirical(x)
d_genpareto$export_functions("gpd") # so fitdistrplus finds it
plot_distributions(
empirical = d_emp,
theoretical = d_genpareto,
estimated = d_genpareto,
with_params = list(
estimated = fit(dist_genpareto(), x)$params
),
.x = seq(0, 5, length.out = 100)
)
Log Normal distribution
Description
See stats::Lognormal.
Usage
dist_lognormal(meanlog = NULL, sdlog = NULL)
Arguments
meanlog |
Scalar mean parameter on the log scale,
or |
sdlog |
Scalar standard deviation parameter on the log scale,
or |
Details
Both parameters can be overridden with
with_params = list(meanlog = ..., sdlog = ...)
.
Value
A LognormalDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
mu <- 0
sigma <- 1
d_lnorm <- dist_lognormal(meanlog = mu, sdlog = sigma)
x <- d_lnorm$sample(20)
d_emp <- dist_empirical(x, positive = TRUE)
plot_distributions(
empirical = d_emp,
theoretical = d_lnorm,
estimated = d_lnorm,
with_params = list(
estimated = inflate_params(
fitdistrplus::fitdist(x, distr = "lnorm")$estimate
)
),
.x = seq(1e-3, 5, length.out = 100)
)
Mixture distribution
Description
Parameters of mixing components can be overridden with
with_params = list(dists = list(..., ..., ...))
.
#' Mixing probabilites can be overridden with
with_params = list(probs = list(..., ..., ...))
.
The number of components cannot be overridden.
Usage
dist_mixture(dists = list(), probs = NULL)
Arguments
dists |
A list of mixing distributions. May contain placeholders and duplicates. |
probs |
A list of mixing probabilities with the same length as |
Details
Does not support the quantile()
capability!
Value
A MixtureDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
# A complicated way to define a uniform distribution on \[0, 2\]
dist_mixture(
dists = list(
dist_uniform(min = 0, max = 1),
dist_uniform(min = 1, max = 2)
),
probs = list(0.5, 0.5)
)
Negative binomial Distribution
Description
Usage
dist_negbinomial(size = NULL, mu = NULL)
Arguments
size |
Number of successful trials parameter, or |
mu |
Mean parameter, or |
Details
Both parameters can be overridden with
with_params = list(size = ..., prob = ...)
.
Value
A NegativeBinomialDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_nbinom <- dist_negbinomial(size = 3.5, mu = 8.75)
x <- d_nbinom$sample(100)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_nbinom,
estimated = d_nbinom,
with_params = list(
estimated = inflate_params(
fitdistrplus::fitdist(x, distr = "nbinom")$estimate
)
),
.x = 0:max(x)
)
Normal distribution
Description
See stats::Normal.
Usage
dist_normal(mean = NULL, sd = NULL)
Arguments
mean |
Scalar mean parameter, or |
sd |
Scalar standard deviation parameter, or |
Details
Both parameters can be overridden with
with_params = list(mean = ..., sd = ...)
.
Value
A NormalDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
mu <- 0
sigma <- 1
d_norm <- dist_normal(mean = mu, sd = sigma)
x <- d_norm$sample(20)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_norm,
estimated = d_norm,
with_params = list(
estimated = list(mean = mean(x), sd = sd(x))
),
.x = seq(-3, 3, length.out = 100)
)
Pareto Distribution
Description
See Pareto
Usage
dist_pareto(shape = NULL, scale = NULL)
Arguments
shape |
Scalar shape parameter, or |
scale |
Scalar scale parameter, or |
Details
Both parameters can be overridden with
with_params = list(shape = ..., scale = ...)
.
Value
A ParetoDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_pareto <- dist_pareto(shape = 3, scale = 1)
x <- d_pareto$sample(100)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_pareto,
estimated = d_pareto,
with_params = list(
estimated = inflate_params(
fitdistrplus::fitdist(x, distr = "pareto")$estimate
)
),
.x = seq(0, 2, length.out = 100)
)
Poisson Distribution
Description
See stats::Poisson
Usage
dist_poisson(lambda = NULL)
Arguments
lambda |
Scalar rate parameter, or |
Details
The parameter can be overridden with
with_params = list(lambda = ...)
.
Value
A PoissonDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_pois <- dist_poisson(lambda = 5.0)
x <- d_pois$sample(100)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_pois,
estimated = d_pois,
with_params = list(
estimated = inflate_params(
fitdistrplus::fitdist(x, distr = "pois")$estimate
)
),
.x = 0:max(x)
)
Tranlsated distribution
Description
Tranlsated distribution
Usage
dist_translate(dist = NULL, offset = NULL, multiplier = 1)
Arguments
dist |
An underlying distribution, or |
offset |
Offset to be added to each observation, or |
multiplier |
Factor to multiply each observation by, or |
Value
A TranslatedDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
Examples
d_norm <- dist_normal(mean = 0, sd = 1)
d_tnorm <- dist_translate(dist = d_norm, offset = 1)
plot_distributions(d_norm, d_tnorm, .x = seq(-2, 3, length.out = 100))
Truncated distribution
Description
Truncated distribution
Usage
dist_trunc(dist = NULL, min = NULL, max = NULL, offset = 0, max_retry = 100)
Arguments
dist |
An underlying distribution, or |
min |
Minimum value to truncate at (exclusive),
or |
max |
Maxmimum value to truncate at (inclusive),
or |
offset |
Offset to be added to each observation after truncation,
or |
max_retry |
Maximum number of resample attempts when trying to sample with rejection. |
Value
A TruncatedDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_uniform()
,
dist_weibull()
Examples
d_norm <- dist_normal(mean = 0, sd = 1)
d_tnorm <- dist_trunc(dist = d_norm, min = -2, max = 2, offset = 1)
plot_distributions(d_norm, d_tnorm, .x = seq(-2, 3, length.out = 100))
Uniform distribution
Description
See stats::Uniform
Usage
dist_uniform(min = NULL, max = NULL)
Arguments
min |
Lower limit, or |
max |
Upper limit, or |
Details
Both parameters can be overridden with
with_params = list(min = ..., max = ...)
.
Value
A UniformDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_weibull()
Examples
d_unif <- dist_uniform(min = 0, max = 1)
x <- d_unif$sample(100)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_unif,
estimated = d_unif,
with_params = list(
estimated = inflate_params(
fitdistrplus::fitdist(x, distr = "unif")$estimate
)
),
.x = seq(0, 1, length.out = 100)
)
Weibull Distribution
Description
See stats::Weibull
Usage
dist_weibull(shape = NULL, scale = NULL)
Arguments
shape |
Scalar shape parameter, or |
scale |
Scalar scale parameter, or |
Details
Both parameters can be overridden with
with_params = list(shape = ..., scale = ...)
.
Value
A WeibullDistribution
object.
See Also
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
Examples
d_weibull <- dist_weibull(shape = 3, scale = 1)
x <- d_weibull$sample(100)
d_emp <- dist_empirical(x)
plot_distributions(
empirical = d_emp,
theoretical = d_weibull,
estimated = d_weibull,
with_params = list(
estimated = inflate_params(
fitdistrplus::fitdist(x, distr = "weibull")$estimate
)
),
.x = seq(0, 2, length.out = 100)
)
Fit a neural network based distribution model to data
Description
This function delegates most work to keras3::fit.keras.src.models.model.Model()
and performs additional consistency
checks to make sure tf_compile_model()
was called with the appropriate options to support fitting the observations
y
as well as automatically converting y
to a n x 6 matrix needed by the compiled loss function.
Usage
## S3 method for class 'reservr_keras_model'
fit(
object,
x,
y,
batch_size = NULL,
epochs = 10,
verbose = getOption("keras.fit_verbose", default = 1),
callbacks = NULL,
view_metrics = getOption("keras.view_metrics", default = "auto"),
validation_split = 0,
validation_data = NULL,
shuffle = TRUE,
class_weight = NULL,
sample_weight = NULL,
initial_epoch = 0,
steps_per_epoch = NULL,
validation_steps = NULL,
...
)
Arguments
object |
A compiled |
x |
A list of input tensors (predictors) |
y |
A |
batch_size |
Integer or |
epochs |
Integer. Number of epochs to train the model.
An epoch is an iteration over the entire |
verbose |
|
callbacks |
List of |
view_metrics |
View realtime plot of training metrics (by epoch). The
default ( |
validation_split |
Float between 0 and 1.
Fraction of the training data to be used as validation data.
The model will set apart this fraction of the training data,
will not train on it, and will evaluate
the loss and any model metrics
on this data at the end of each epoch.
The validation data is selected from the last samples
in the |
validation_data |
Data on which to evaluate
the loss and any model metrics at the end of each epoch.
The model will not be trained on this data. Thus, note the fact
that the validation loss of data provided using
|
shuffle |
Boolean, whether to shuffle the training data
before each epoch. This argument is
ignored when |
class_weight |
Optional named list mapping class indices (integers, 0-based)
to a weight (float) value, used for weighting the loss function
(during training only).
This can be useful to tell the model to
"pay more attention" to samples from
an under-represented class. When |
sample_weight |
Optional array of weights for
the training samples, used for weighting the loss function
(during training only). You can either pass a flat (1D)
array/vector with the same length as the input samples
(1:1 mapping between weights and samples),
or in the case of temporal data,
you can pass a 2D array (matrix) with shape
|
initial_epoch |
Integer. Epoch at which to start training (useful for resuming a previous training run). |
steps_per_epoch |
Integer or |
validation_steps |
Only relevant if |
... |
Unused. If old arguments are supplied, an error message will be raised informing how to fix the issue. |
Details
Additionally, the default batch_size
is min(nrow(y), 10000)
instead of keras default of 32
because the latter
is a very bad choice for fitting most distributions since the involved loss is much less stable than typical losses
used in machine learning, leading to divergence for small batch sizes.
Value
A history
object that contains all information collected during training.
The model object will be updated in-place as a side-effect.
See Also
predict.reservr_keras_model tf_compile_model keras3::fit.keras.src.models.model.Model
Examples
dist <- dist_exponential()
params <- list(rate = 1.0)
N <- 100L
rand_input <- runif(N)
x <- dist$sample(N, with_params = params)
if (interactive()) {
tf_in <- keras3::layer_input(1L)
mod <- tf_compile_model(
inputs = list(tf_in),
intermediate_output = tf_in,
dist = dist,
optimizer = keras3::optimizer_adam(),
censoring = FALSE,
truncation = FALSE
)
tf_fit <- fit(
object = mod,
x = k_matrix(rand_input),
y = x,
epochs = 10L,
callbacks = list(
callback_debug_dist_gradients(mod, k_matrix(rand_input), x, keep_grads = TRUE)
)
)
}
Fit a Blended mixture using an ECME-Algorithm
Description
Fit a Blended mixture using an ECME-Algorithm
Usage
fit_blended(
dist,
obs,
start,
min_iter = 0L,
max_iter = 100L,
skip_first_e = FALSE,
tolerance = 1e-05,
trace = FALSE,
...
)
Arguments
dist |
A |
obs |
Set of observations as produced by |
start |
Initial values of all placeholder parameters.
If missing, starting values are obtained from |
min_iter |
Minimum number of EM-Iterations |
max_iter |
Maximum number of EM-Iterations (weight updates) |
skip_first_e |
Skip the first E-Step (update Probability weights)? This can help if the initial values cause a mixture component to vanish in the first E-Step before the starting values can be improved. |
tolerance |
Numerical tolerance. |
trace |
Include tracing information in output?
If |
... |
Passed to |
Value
A list with elements
-
params
the fitted parameters in the same structure asinit
. -
params_hist
(iftrace
is TRUE) the history of parameters (after each e- and m- step) -
iter
the number of outer EM-iterations -
logLik
the final log-likelihood
See Also
Other distribution fitting functions:
fit_dist()
,
fit_erlang_mixture()
,
fit_mixture()
Examples
dist <- dist_blended(
list(
dist_exponential(),
dist_genpareto()
)
)
params <- list(
probs = list(0.9, 0.1),
dists = list(
list(rate = 2.0),
list(u = 1.5, xi = 0.2, sigmau = 1.0)
),
breaks = list(1.5),
bandwidths = list(0.3)
)
x <- dist$sample(100L, with_params = params)
dist$default_params$breaks <- params$breaks
dist$default_params$bandwidths <- params$bandwidths
if (interactive()) {
fit_blended(dist, x)
}
Fit a general distribution to observations
Description
The default implementation performs maximum likelihood estimation on all placeholder parameters.
Usage
fit_dist(dist, obs, start, ...)
fit_dist_direct(dist, obs, start, ..., .start_with_default = FALSE)
## S3 method for class 'Distribution'
fit(object, obs, start, ...)
Arguments
dist |
A |
obs |
Set of observations as produced by |
start |
Initial values of all placeholder parameters.
If missing, starting values are obtained from |
... |
Distribution-specific arguments for the fitting procedure |
.start_with_default |
Before directly optimising the likelihood, use an optimised algorithm for finding better starting values? |
object |
same as parameter |
Details
For Erlang mixture distributions and for Mixture distributions, an EM-Algorithm is instead used to improve stability.
fit()
and fit_dist()
will chose an optimisation method optimized for the specific distribution given.
fit_dist_direct()
can be used to force direct maximisation of the likelihood.
Value
A list with at least the elements
-
params
the fitted parameters in the same structure asinit
. -
logLik
the final log-likelihood
Additional information may be provided depending on dist
.
See Also
Other distribution fitting functions:
fit_blended()
,
fit_erlang_mixture()
,
fit_mixture()
Other distribution fitting functions:
fit_blended()
,
fit_erlang_mixture()
,
fit_mixture()
Examples
x <- rexp(100)
lambda_hat <- 1 / mean(x)
lambda_hat2 <- fit_dist(dist_exponential(), x)$params$rate
identical(lambda_hat, lambda_hat2)
dist <- dist_mixture(list(dist_normal(), dist_translate(dist_exponential(), offset = 6)))
params <- list(
dists = list(list(mean = 5, sd = 1), list(dist = list(rate = 1))), probs = list(0.95, 0.05)
)
set.seed(2000)
u <- runif(100, 10, 20)
x <- dist$sample(100, with_params = params)
obs <- trunc_obs(x = x[x <= u], tmin = -Inf, tmax = u[x <= u])
default_fit <- fit_dist(dist, obs)
direct_fit <- fit_dist_direct(dist, obs)
# NB: direct optimisation steps with pre-run take a few seconds
direct_fit_init <- fit_dist_direct(dist, obs, start = default_fit$params)
direct_fit_auto_init <- fit_dist_direct(dist, obs, .start_with_default = TRUE)
stopifnot(direct_fit_init$logLik == direct_fit_auto_init$logLik)
c(default_fit$logLik, direct_fit$logLik, direct_fit_init$logLik)
Find starting values for distribution parameters
Description
Find starting values for distribution parameters
Usage
## S3 method for class 'MixtureDistribution'
fit_dist_start(dist, obs, dists_start = NULL, ...)
fit_dist_start(dist, obs, ...)
Arguments
dist |
A |
obs |
Observations to fit to. |
dists_start |
List of initial parameters for all component
distributions. If left empty, initialisation will be automatically performed
using |
... |
Additional arguments for the initialisation procedure |
Value
A list of initial parameters suitable for passing to fit_dist()
.
Examples
fit_dist_start(dist_exponential(), rexp(100))
Fit an Erlang mixture using an ECME-Algorithm
Description
Fit an Erlang mixture using an ECME-Algorithm
Usage
fit_erlang_mixture(
dist,
obs,
start,
min_iter = 0L,
max_iter = 100L,
skip_first_e = FALSE,
tolerance = 1e-05,
trace = FALSE,
parallel = FALSE,
...
)
Arguments
dist |
An |
obs |
Set of observations as produced by |
start |
Initial values of all placeholder parameters.
If missing, starting values are obtained from |
min_iter |
Minimum number of EM-Iterations |
max_iter |
Maximum number of EM-Iterations (weight updates) |
skip_first_e |
Skip the first E-Step (update Probability weights)? This can help if the initial values cause a mixture component to vanish in the first E-Step before the starting values can be improved. |
tolerance |
Numerical tolerance. |
trace |
Include tracing information in output?
If |
parallel |
Enable experimental parallel evaluation of expected log-likelihood? |
... |
Passed to |
Value
A list with elements
-
params
the fitted parameters in the same structure asinit
. -
params_hist
(iftrace
is TRUE) the history of parameters (after each e- and m- step). Otherwise an empty list. -
iter
the number of outer EM-iterations -
logLik
the final log-likelihood
See Also
Other distribution fitting functions:
fit_blended()
,
fit_dist()
,
fit_mixture()
Examples
dist <- dist_erlangmix(list(NULL, NULL, NULL))
params <- list(
shapes = list(1L, 4L, 12L),
scale = 2.0,
probs = list(0.5, 0.3, 0.2)
)
x <- dist$sample(100L, with_params = params)
fit_erlang_mixture(dist, x, init = "kmeans")
Fit a generic mixture using an ECME-Algorithm
Description
Fit a generic mixture using an ECME-Algorithm
Usage
fit_mixture(
dist,
obs,
start,
min_iter = 0L,
max_iter = 100L,
skip_first_e = FALSE,
tolerance = 1e-05,
trace = FALSE,
...
)
Arguments
dist |
A |
obs |
Set of observations as produced by |
start |
Initial values of all placeholder parameters.
If missing, starting values are obtained from |
min_iter |
Minimum number of EM-Iterations |
max_iter |
Maximum number of EM-Iterations (weight updates) |
skip_first_e |
Skip the first E-Step (update Probability weights)? This can help if the initial values cause a mixture component to vanish in the first E-Step before the starting values can be improved. |
tolerance |
Numerical tolerance. |
trace |
Include tracing information in output?
If |
... |
Passed to |
Value
A list with elements
-
params
the fitted parameters in the same structure asinit
. -
params_hist
(iftrace
is TRUE) the history of parameters (after each e- and m- step) -
iter
the number of outer EM-iterations -
logLik
the final log-likelihood
See Also
Other distribution fitting functions:
fit_blended()
,
fit_dist()
,
fit_erlang_mixture()
Examples
dist <- dist_mixture(
list(
dist_dirac(0.0),
dist_exponential()
)
)
params <- list(
probs = list(0.1, 0.9),
dists = list(
list(),
list(rate = 1.0)
)
)
x <- dist$sample(100L, with_params = params)
fit_mixture(dist, x)
Flatten / Inflate parameter lists / vectors
Description
Flatten / Inflate parameter lists / vectors
Usage
flatten_params(params)
flatten_params_matrix(params)
flatten_bounds(bounds)
inflate_params(flat_params)
Arguments
params |
A named list of parameters to be flattened.
Should be in a form to be passed as the |
bounds |
List of parameter bounds as returned by
|
flat_params |
A named numeric vector of parameters |
Value
flatten_params
returns a 'flattened' vector of parameters.
It is intended as an adapter for multi-dimensional optimisation functions
to distribution objects.
flatten_params_matrix
returns a 'flattened' matrix of parameters.
It is intended as an adapter for multi-dimensional optimisation functions
to distribution objects. Each column corresponds to one input element.
flatten_bounds
returns a named list of vectors with names lower
and upper
. Containing the upper and lower bounds of each parameter.
inflate_params
returns an 'inflated' list of parameters.
This can be passed as the with_params
argument to most distribution
functions.
Examples
library(ggplot2)
mm <- dist_mixture(list(
dist_exponential(NULL),
dist_lognormal(0.5, NULL)
), list(NULL, 1))
ph <- mm$get_placeholders()
ph_flat <- flatten_params(ph)
ph_reinflated <- inflate_params(ph_flat)
ph_flat[] <- c(1, 1, 6)
ph_sample <- inflate_params(ph_flat)
x <- mm$sample(
100,
with_params = ph_sample
)
emp_cdf <- ecdf(x)
ggplot(data.frame(t = seq(from = min(x), to = max(x), length.out = 100))) %+%
geom_point(aes(x = t, y = emp_cdf(t))) %+%
geom_line(aes(x = t, y = mm$probability(t, with_params = ph_sample)),
linetype = 2)
Adaptive Gauss-Kronrod Quadrature for multiple limits
Description
Integrates fun over the bounds [ lower, upper ] vectorized over lower
and
upper
. Vectorized list structures of parameters can also be passed.
Usage
integrate_gk(
fun,
lower,
upper,
params = list(),
.tolerance = .Machine$double.eps^0.25,
.max_iter = 100L
)
Arguments
fun |
A function to integrate. Must be vectorized and take one or two arguments, the first being points to evaluate at and the second (optionally) being parameters to apply. It must return a numeric vector the same length as its first input. Currently, infinite bounds are not supported. |
lower , upper |
Integration bounds. Must have the same length. |
params |
Parameters to pass as a second argument to |
.tolerance |
Absolute element-wise tolerance. |
.max_iter |
Maximum number of iterations. The number of
integration intervals will be at most |
Details
The integration error is estimated by the Gauss-Kronrod quadrature as the
absolute difference between the 7-point quadrature and the 15-point
quadrature. Integrals that did not converge will be bisected at the midpoint.
The params
object will be recursively subsetted on all numeric vectors with
the same length as the number of observations.
Value
A vector of integrals with the i-th entry containing an approximation of the integral of fun(t, pick_params_at(params, i)) dt over the interval lower[i] to upper[i]
Examples
# Argument recycling and parallel integration of two intervals
integrate_gk(sin, 0, c(pi, 2 * pi))
dist <- dist_exponential()
integrate_gk(
function(x, p) dist$density(x, with_params = p),
lower = 0, upper = 1:10,
params = list(rate = 1 / 1:10)
)
dist$probability(1:10, with_params = list(rate = 1 / 1:10))
Intervals
Description
Intervals
Usage
interval(
range = c(-Inf, Inf),
...,
include_lowest = closed,
include_highest = closed,
closed = FALSE,
integer = FALSE,
read_only = FALSE
)
is.Interval(x)
Arguments
range |
The interval boundaries as a sorted two-element numeric vector. |
... |
First argument is used as the endpoint if |
include_lowest |
Is the lower boundary part of the interval? |
include_highest |
Is the upper boundary part of the interval? |
closed |
Is the interval closed? |
integer |
Is the interval only over the integers? |
read_only |
Make the interval object read-only? |
x |
An object. |
Value
interval
returns an Interval
.
is.Interval
returns TRUE
if x
is an Interval
, FALSE
otherwise.
See Also
interval-operations
Examples
# The real line
interval()
# Closed unit interval
interval(c(0, 1), closed = TRUE)
# Alternative form
interval(0, 1, closed = TRUE)
# Non-negative real line
interval(c(0, Inf), include_lowest = TRUE)
Convex union and intersection of intervals
Description
Convex union and intersection of intervals
Usage
interval_union(..., intervals = list())
interval_intersection(..., intervals = list())
Arguments
... |
appened to |
intervals |
A list of |
Value
interval_union
returns the convex union of all intervals in intervals
.
This is the smallest interval completely containing all intervals.
interval_intersection
returns the set intersection of all intervals in
intervals
. The empty set is represented by the open interval (0, 0).
See Also
interval
Examples
interval_union(
interval(c(0, 1), closed = TRUE),
interval(c(1, 2))
)
interval_union(
interval(c(0, 5)),
interval(c(1, 4), closed = TRUE)
)
# Convex union is not equal to set union:
interval_union(
interval(c(0, 1)),
interval(c(2, 3))
)
# The empty union is {}
interval_union()
interval_intersection(
interval(c(0, 1)),
interval(c(0.5, 2))
)
interval_intersection(
interval(c(0, Inf)),
interval(c(-Inf, 0))
)
interval_intersection(
interval(c(0, Inf), include_lowest = TRUE),
interval(c(-Inf, 0), include_highest = TRUE)
)
interval_intersection(
interval(c(0, 5)),
interval(c(1, 6), closed = TRUE)
)
# The empty intersection is (-Inf, Inf)
interval_intersection()
Test if object is a Distribution
Description
Test if object is a Distribution
Usage
is.Distribution(object)
Arguments
object |
An R object. |
Value
TRUE
if object
is a Distribution, FALSE
otherwise.
Examples
is.Distribution(dist_dirac())
Cast to a TensorFlow matrix
Description
Cast to a TensorFlow matrix
Usage
k_matrix(x, dtype = NULL)
Arguments
x |
Numeric object to be converted to a matrix Tensor. |
dtype |
Type of the elements of the resulting tensor. Defaults to |
Value
A two-dimensional tf.Tensor
with values from x
.
The shape will be (nrow(x), ncol(x))
where x
is first converted to an R matrix via as.matrix()
.
Examples
if (interactive()) {
k_matrix(diag(1:3))
k_matrix(diag(1:3), dtype = "int32")
# Vectors are converted to columns:
k_matrix(1:3)
}
Plot several distributions
Description
Plot several distributions
Usage
plot_distributions(
...,
distributions = list(),
.x,
plots = c("density", "probability", "hazard"),
with_params = list(),
as_list = FALSE
)
Arguments
... |
distribution objects (must be named) |
distributions |
Named list of distribution objects.
This is concatenated with |
.x |
Numeric vector of points to evaluate at. |
plots |
Plots to be created. May be abbreviated. The plots will be stacked in the order given from top to bottom. |
with_params |
list of distribution parameters to be given to each
distribution using |
as_list |
return a list of ggplots instead of a patchwork? |
Value
A stacked patchwork of the requested ggplots
Examples
rate <- 1
x <- rexp(20, rate)
d_emp <- dist_empirical(x, positive = TRUE)
d_exp <- dist_exponential()
plot_distributions(
empirical = d_emp,
theoretical = d_exp,
estimated = d_exp,
with_params = list(
theoretical = list(rate = rate),
estimated = list(rate = 1 / mean(x))
),
.x = seq(1e-4, 5, length.out = 100)
)
Predict individual distribution parameters
Description
Predict individual distribution parameters
Usage
## S3 method for class 'reservr_keras_model'
predict(object, data, as_matrix = FALSE, ...)
Arguments
object |
A compiled and trained |
data |
Input data compatible with the model. |
as_matrix |
Return a parameter matrix instead of a list structure? |
... |
ignored |
Value
A parameter list suitable for the with_params
argument of the distribution family used for the model.
Contains one set of parameters per row in data
.
Examples
if (interactive()) {
dist <- dist_exponential()
params <- list(rate = 1.0)
N <- 100L
rand_input <- runif(N)
x <- dist$sample(N, with_params = params)
tf_in <- keras3::layer_input(1L)
mod <- tf_compile_model(
inputs = list(tf_in),
intermediate_output = tf_in,
dist = dist,
optimizer = keras3::optimizer_adam(),
censoring = FALSE,
truncation = FALSE
)
tf_fit <- fit(
object = mod,
x = k_matrix(rand_input),
y = x,
epochs = 10L,
callbacks = list(
callback_debug_dist_gradients(mod, k_matrix(rand_input), x)
)
)
tf_preds <- predict(mod, data = k_matrix(rand_input))
}
Determine probability of reporting under a Poisson arrival Process
Description
Determines the probability that claims occuring under a Poisson process with
arrival intensity expo
and reporting delay distribution dist
during the
time between t_min
and t_max
are reported between tau_min
and
tau_max
.
Usage
prob_report(
dist,
intervals,
expo = NULL,
with_params = list(),
.tolerance = .Machine$double.eps^0.5,
.max_iter = 100L,
.try_compile = TRUE
)
Arguments
dist |
A reporting delay Distribution, or a compiled interval probability function. |
intervals |
A data frame with columns |
expo |
Poisson intensity. If given, must be a vectorised function that
yields the intensity of the claim arrival process at a specified time.
|
with_params |
Parameters of |
.tolerance |
Absolute element-wise tolerance. |
.max_iter |
Maximum number of iterations. The number of
integration intervals will be at most |
.try_compile |
Try compiling the distributions probability function to speed up integration? |
Details
The reporting probability is given by
P(x + d in [tmin, tmax] | x in [xmin, xmax]) = E(P(x + d in [tmin, tmax] | x) | x in [xmin, xmax]) / P(x in [xmin, xmax]) = int_[xmin, xmax] expo(x) P(x + d in [tmin, tmax]) dx = int_[xmin, xmax] expo(x) P(d in [tmin - x, tmax - x]) dx / int_[xmin, xmax] expo(x) dx
prob_report
uses integrate_gk()
to compute the two integrals.
Value
A vector of reporting probabilities, with one entry per row of intervals
.
Examples
dist <- dist_exponential()
ints <- data.frame(
xmin = 0,
xmax = 1,
tmin = seq_len(10) - 1.0,
tmax = seq_len(10)
)
params <- list(rate = rep(c(1, 0.5), each = 5))
prob_report(dist, ints, with_params = params)
Quantiles of Distributions
Description
Produces quantiles corresponding to the given probabilities with configurable distribution parameters.
Usage
## S3 method for class 'Distribution'
quantile(x, probs = seq(0, 1, 0.25), with_params = list(), ..., .start = 0)
Arguments
x |
A |
probs |
Quantiles to compute. |
with_params |
Optional list of distribution parameters. Note that if |
... |
ignored |
.start |
Starting value if quantiles are computed numerically. Must be within the support of |
Details
If x$has_capability("quantile")
is true, this returns the same as x$quantile(probs, with_params = with_params)
.
In this case, with_params
may contain separate sets of parameters for each quantile to be determined.
Otherwise, a numerical estimation of the quantiles is done using the density and probability function.
This method assumes with_params
to cantain only one set of parameters.
The strategy uses two steps:
Find the smallest and largest quantiles in
probs
using a newton method starting from.start
.Find the remaining quantiles with bisection using
stats::uniroot()
.
Value
The quantiles of x
corresponding to probs
with parameters with_params
.
Examples
# With quantiles available
dist <- dist_normal(sd = 1)
qqs <- quantile(dist, probs = rep(0.5, 3), with_params = list(mean = 1:3))
stopifnot(all.equal(qqs, 1:3))
# Without quantiles available
dist <- dist_erlangmix(shapes = list(1, 2, 3), scale = 1.0)
my_probs <- c(0, 0.01, 0.25, 0.5, 0.75, 1)
qqs <- quantile(
dist, probs = my_probs,
with_params = list(probs = list(0.5, 0.3, 0.2)), .start = 2
)
all.equal(dist$probability(qqs, with_params = list(probs = list(0.5, 0.3, 0.2))), my_probs)
# Careful: Numerical estimation of extreme quantiles can result in out-of-bounds values.
# The correct 0-quantile would be 0 in this case, but it was estimated < 0.
qqs[1L]
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- generics
Soft-Max function
Description
Softmax for a vector x is defined as
Usage
softmax(x)
dsoftmax(x)
Arguments
x |
A numeric vector or matrix |
Details
s_i = \exp(x_i) / \sum_k \exp(x_k)
It satisfies sum(s) == 1.0
and can be used to smoothly enforce a sum
constraint.
Value
softmax
returns the softmax of x
; rowwise if x
is a matrix.
dsoftmax
returns the Jacobi-matrix of softmax(x)
at x
. x
must be a vector.
Examples
softmax(c(5, 5))
softmax(diag(nrow = 5, ncol = 6))
Compile a Keras model for truncated data under dist
Description
Compile a Keras model for truncated data under dist
Usage
tf_compile_model(
inputs,
intermediate_output,
dist,
optimizer,
censoring = TRUE,
truncation = TRUE,
metrics = NULL,
weighted_metrics = NULL
)
Arguments
inputs |
List of keras input layers |
intermediate_output |
Intermediate model layer to be used as input to distribution parameters |
dist |
A |
optimizer |
String (name of optimizer) or optimizer instance. See
|
censoring |
A flag, whether the compiled model should support censored
observations. Set to |
truncation |
A flag, whether the compiled model should support truncated
observations. Set to |
metrics |
List of metrics to be evaluated by the model during training and testing. Each of these can be:
Typically you will use
If providing an anonymous R function, you can customize the printed name
during training by assigning |
weighted_metrics |
List of metrics to be evaluated and weighted by
|
Value
A reservr_keras_model
that can be used to train truncated
and censored observations from dist
based on input data from inputs
.
Examples
dist <- dist_exponential()
params <- list(rate = 1.0)
N <- 100L
rand_input <- runif(N)
x <- dist$sample(N, with_params = params)
if (interactive()) {
tf_in <- keras3::layer_input(1L)
mod <- tf_compile_model(
inputs = list(tf_in),
intermediate_output = tf_in,
dist = dist,
optimizer = keras3::optimizer_adam(),
censoring = FALSE,
truncation = FALSE
)
}
Initialise model weights to a global parameter fit
Description
Initialises a compiled reservr_keras_model
weights such that the predictions are equal to, or close to, the
distribution parameters given by params
.
Usage
tf_initialise_model(
model,
params,
mode = c("scale", "perturb", "zero", "none")
)
Arguments
model |
A |
params |
A list of distribution parameters compatible with |
mode |
An initialisation mode
|
Value
Invisibly model
with changed weights
Examples
dist <- dist_exponential()
group <- sample(c(0, 1), size = 100, replace = TRUE)
x <- dist$sample(100, with_params = list(rate = group + 1))
global_fit <- fit(dist, x)
if (interactive()) {
library(keras3)
l_in <- layer_input(shape = 1L)
mod <- tf_compile_model(
inputs = list(l_in),
intermediate_output = l_in,
dist = dist,
optimizer = optimizer_adam(),
censoring = FALSE,
truncation = FALSE
)
tf_initialise_model(mod, global_fit$params)
fit_history <- fit(
mod,
x = group,
y = x,
epochs = 200L
)
predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx()))
}
Define a set of truncated observations
Description
If x
is missing, both xmin
and xmax
must be specified.
Usage
trunc_obs(x, xmin = x, xmax = x, tmin = -Inf, tmax = Inf, w = 1)
as_trunc_obs(.data)
truncate_obs(.data, tmin_new = -Inf, tmax_new = Inf, .partial = FALSE)
repdel_obs(.data, accident, delay, time, .truncate = FALSE)
Arguments
x |
Observations |
xmin , xmax |
Censoring bounds. If |
tmin , tmax |
Truncation bounds. May vary per observation. |
w |
Case weights |
.data |
A data frame or numeric vector. |
tmin_new |
New truncation minimum |
tmax_new |
New truncation maximum |
.partial |
Enable partial truncation of censored observations? This could potentially create inconsistent data if the actual observation lies outside of the truncation bounds but the censoring interval overlaps. |
accident |
accident time (unquoted, evaluated in |
delay |
reporting delay (unquoted, evaluated in |
time |
evaluation time (unquoted, evaluated in |
.truncate |
Should claims reported after |
Details
Uncensored observations must satisfy tmin <= xmin = x = xmax <= tmax
.
Censored observations must satisfy tmin <= xmin < xmax <= tmax
and x = NA
.
Value
trunc_obs: A trunc_obs
tibble with columns x
, xmin
, xmax
,
tmin
and tmax
describing possibly interval-censored observations with
truncation
as_trunc_obs
returns a trunc_obs
tibble.
truncate_obs
returns a trunc_obs
tibble with possibly fewer
observations than .data
and updated truncation bounds.
repdel_obs
returns a trunc_obs
tibble corresponding to the reporting
delay observations of each claim. If .truncate
is FALSE
, the result is
guaranteed to have the same number of rows as .data
.
Examples
N <- 100
x <- rexp(N, 0.5)
# Random, observation dependent truncation intervals
tmin <- runif(N, 0, 1)
tmax <- tmin + runif(N, 1, 2)
oob <- x < tmin | x > tmax
x <- x[!oob]
tmin <- tmin[!oob]
tmax <- tmax[!oob]
# Number of observations after truncation
N <- length(x)
# Randomly interval censor 30% of observations
cens <- rbinom(N, 1, 0.3) == 1L
xmin <- x
xmax <- x
xmin[cens] <- pmax(tmin[cens], floor(x[cens]))
xmax[cens] <- pmin(tmax[cens], ceiling(x[cens]))
x[cens] <- NA
trunc_obs(x, xmin, xmax, tmin, tmax)
as_trunc_obs(c(1, 2, 3))
as_trunc_obs(data.frame(x = 1:3, tmin = 0, tmax = 10))
as_trunc_obs(data.frame(x = c(1, NA), xmin = c(1, 2), xmax = c(1, 3)))
truncate_obs(1:10, tmin_new = 2.0, tmax_new = 8.0)
Truncate claims data subject to reporting delay
Description
Truncate claims data subject to reporting delay
Usage
truncate_claims(data, accident, delay, time, .report_col = "report")
Arguments
data |
Full claims data including IBNR |
accident |
Accident times. May be an unquoted column name from |
delay |
Reporting delays. May be an unquoted column name from |
time |
Observation time (scalar number or one per claim).
Claims with |
.report_col |
|
Value
Truncated data
.
The reporting time is stored in a colnumn named by .report_col
unless
.report_col
is NULL
.
If both .report_col
is NULL
and time
contains only Inf
s,
a warning will be issued since data
will be
returned unchanged and no work will be done.
Examples
claims_full <- data.frame(
acc = runif(100),
repdel = rexp(100)
)
tau <- 2.0
truncate_claims(claims_full, acc, repdel, tau)
Compute weighted moments
Description
Compute weighted moments
Usage
weighted_moments(x, w, n = 2L, center = TRUE)
Arguments
x |
Observations |
w |
Case weights (optional) |
n |
Number of moments to calculate |
center |
Calculate centralized moments (default) or noncentralized moments, i.e. E((X - E(X))^k) or E(X^k). |
Value
A vector of length n
where the k
th entry is the k
th weighted
moment of x
with weights w
. If center
is TRUE
the moments are
centralized, i.e. E((X - E(X))^k). The first moment is never centralized.
The moments are scaled with 1 / sum(w), so they are not de-biased.
e.g. the second central weighted moment
weighted_moment(x, w)[2L]
is equal to
var(rep(x, w)) * (sum(w) - 1) / sum(w)
for integer w
See Also
Other weighted statistics:
weighted_quantile()
,
weighted_tabulate()
Examples
weighted_moments(rexp(100))
weighted_moments(c(1, 2, 3), c(1, 2, 3))
c(mean(rep(1:3, 1:3)), var(rep(1:3, 1:3)) * 5 / 6)
Compute weighted quantiles
Description
Compute weighted quantiles
Usage
weighted_quantile(x, w, probs)
weighted_median(x, w)
Arguments
x |
Observations |
w |
Case weights (optional) |
probs |
Quantiles to calculate |
Value
A vector the same length as probs
with the corresponding weighted
quantiles of x
with weight w
. For integer weights, this is equivalent to
quantile(rep(x, w), probs)
The weighted median of x
with weights w
.
For integer weights, this is equivalent to median(rep(x, w))
See Also
Other weighted statistics:
weighted_moments()
,
weighted_tabulate()
Examples
weighted_median(1:6)
weighted_median(1:3, c(1, 4, 9))
weighted_median(1:3, c(9, 4, 1))
weighted_quantile(1:3, c(1, 4, 9), seq(0.0, 1.0, by = 0.25))
quantile(rep(1:3, c(1, 4, 9)), seq(0.0, 1.0, by = 0.25))
Compute weighted tabulations
Description
Computes the sum of w
grouped by bin
.
If w
is missing the result is equivalent to
tabulate(bin, nbins)
Usage
weighted_tabulate(bin, w, nbins = max(1L, bin, na.rm = TRUE))
Arguments
bin |
An integer vector with values from |
w |
Weights per entry in |
nbins |
Number of bins |
Value
A vector with length nbins
where the i
th result is equal to
sum(w[bin == i])
or sum(bin == i)
if w
is missing.
For integer weights, this is equivalent to tabulate(rep(bin, w), nbins)
.
See Also
Other weighted statistics:
weighted_moments()
,
weighted_quantile()
Examples
weighted_tabulate(c(1, 1, 2))
weighted_tabulate(c(1, 1, 2), nbins = 3L)
weighted_tabulate(c(1, 1, 2), w = c(0.5, 0.5, 1), nbins = 3L)