Title: | A Tidy Framework for Changepoint Detection Analysis |
Version: | 1.0.1 |
Description: | Changepoint detection algorithms for R are widespread but have different interfaces and reporting conventions. This makes the comparative analysis of results difficult. We solve this problem by providing a tidy, unified interface for several different changepoint detection algorithms. We also provide consistent numerical and graphical reporting leveraging the 'broom' and 'ggplot2' packages. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | broom, changepoint, changepointGA, cli, dplyr, GA, ggplot2, lifecycle, lubridate, memoise, methods, patchwork, prettyunits, purrr, rlang, scales, segmented, stringr, tibble, tidyr, tsibble, vctrs, wbs, xts, zoo |
Depends: | R (≥ 4.2) |
LazyData: | true |
Suggests: | bench, knitr, here, multitaper, readr, rmarkdown, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr |
URL: | https://beanumber.github.io/tidychangepoint/ |
NeedsCompilation: | no |
Packaged: | 2025-07-09 15:07:40 UTC; bbaumer |
Author: | Benjamin S. Baumer
|
Maintainer: | Benjamin S. Baumer <ben.baumer@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-09 16:50:02 UTC |
tidychangepoint: A Tidy Framework for Changepoint Detection Analysis
Description
Changepoint detection algorithms for R are widespread but have different interfaces and reporting conventions. This makes the comparative analysis of results difficult. We solve this problem by providing a tidy, unified interface for several different changepoint detection algorithms. We also provide consistent numerical and graphical reporting leveraging the 'broom' and 'ggplot2' packages.
Author(s)
Maintainer: Benjamin S. Baumer ben.baumer@gmail.com (ORCID) [copyright holder]
Authors:
Biviana Marcela Suárez Sierra bmsuarezs@eafit.edu.co (ORCID)
Arrigo Coen (ORCID)
Carlos A. Taimal (ORCID)
Other contributors:
Xueheng Shi [contributor]
See Also
Useful links:
Bayesian Maximum Descriptive Length
Description
Generic function to compute the Bayesian Maximum Descriptive Length for a changepoint detection model.
Usage
BMDL(object, ...)
## Default S3 method:
BMDL(object, ...)
## S3 method for class 'nhpp'
BMDL(object, ...)
Arguments
object |
any object from which a log-likelihood value, or a contribution to a log-likelihood value, can be extracted. |
... |
some methods for this generic function require additional arguments. |
Details
Currently, the BMDL function is only defined for the NHPP model
(see fit_nhpp()
).
Given a changepoint set \tau
, the BMDL is:
BMDL(\tau, NHPP(y | \hat{\theta}_\tau) =
P_{MDL}(\tau) - 2 \ln{ L_{NHPP}(y | \hat{\theta}_\tau) }
- 2 \ln{ g(\hat{\theta}_\tau) }
where P_{MDL}(\tau)
is the MDL()
penalty.
Value
A double
vector of length 1
See Also
Other penalty-functions:
HQC()
,
MBIC()
,
MDL()
,
SIC()
Examples
# Compute the BMDL
BMDL(fit_nhpp(DataCPSim, tau = NULL))
BMDL(fit_nhpp(DataCPSim, tau = c(365, 830)))
Hadley Centre Central England Temperature
Description
Mean annual temperatures in Central England
Usage
CET
Format
An object of class xts
(inherits from zoo
) with 366 rows and 1 columns.
Details
The CET time series is perhaps the longest instrumental record of surface temperatures in the world, commencing in 1659 and spanning 362 years through 2020. The CET series is a benchmark for European climate studies, as it is sensitive to atmospheric variability in the North Atlantic (Parker et al. 1992). This record has been previously analyzed for long-term changes (Plaut et al. 1995; Harvey and Mills 2003; Hillebrand and Proietti 2017); however, to our knowledge, no detailed changepoint analysis of it has been previously conducted. The length of the CET record affords us the opportunity to explore a variety of temperature features.
Source
https://www.metoffice.gov.uk/hadobs/hadcet/
References
Shi, et al. (2022, doi:10.1175/JCLI-D-21-0489.1),
Parker, et al. (1992, doi:10.1002/joc.3370120402)
See Also
Simulated time series data
Description
Randomly-generated time series data, using the stats::rlnorm()
function.
For
rlnorm_ts_1
, there is one changepoint located at 826.For
rlnorm_ts_2
, there are two changepoints, located at 366 and 731.For
rlnorm_ts_3
, there are three changepoints, located at 548, 823, and 973.
Usage
DataCPSim
rlnorm_ts_1
rlnorm_ts_2
rlnorm_ts_3
Format
An object of class numeric
of length 1096.
An object of class ts
of length 1096.
An object of class ts
of length 1096.
An object of class ts
of length 1096.
Details
-
DataCPSim
: Simulated time series of the same length as bogota_pm.
See Also
Examples
plot(rlnorm_ts_1)
plot(rlnorm_ts_2)
plot(rlnorm_ts_3)
changepoints(rlnorm_ts_1)
Hannan–Quinn information criterion
Description
Hannan–Quinn information criterion
Usage
HQC(object, ...)
## Default S3 method:
HQC(object, ...)
## S3 method for class 'logLik'
HQC(object, ...)
Arguments
object |
any object from which a log-likelihood value, or a contribution to a log-likelihood value, can be extracted. |
... |
some methods for this generic function require additional arguments. |
Details
Computes the Hannan-Quinn information criterion for a model M
HQC(\tau, M(y|\hat{\theta}_{\tau})) = 2k \cdot \ln{\ln{n}} - 2 \cdot L_M(y|\hat{\theta}_\tau) \,,
where k
is the number of parameters and n
is the number of observations.
See Also
Other penalty-functions:
BMDL()
,
MBIC()
,
MDL()
,
SIC()
Examples
# Compute the HQC
HQC(fit_meanvar(CET, tau = NULL))
HQC(fit_meanshift_norm_ar1(CET, tau = c(42, 330)))
HQC(fit_trendshift(CET, tau = c(42, 81, 330)))
Modified Bayesian Information Criterion
Description
Generic function to compute the Modified Bayesian Information Criterion for a changepoint detection model.
Usage
MBIC(object, ...)
## Default S3 method:
MBIC(object, ...)
## S3 method for class 'logLik'
MBIC(object, ...)
Arguments
object |
any object from which a log-likelihood value, or a contribution to a log-likelihood value, can be extracted. |
... |
some methods for this generic function require additional arguments. |
Value
A double
vector of length 1
References
Zhang and Seigmmund (2007) for MBIC: doi:10.1111/j.1541-0420.2006.00662.x
See Also
Other penalty-functions:
BMDL()
,
HQC()
,
MDL()
,
SIC()
Maximum Descriptive Length
Description
Generic function to compute the Maximum Descriptive Length for a changepoint detection model.
Usage
MDL(object, ...)
## Default S3 method:
MDL(object, ...)
## S3 method for class 'logLik'
MDL(object, ...)
Arguments
object |
any object from which a log-likelihood value, or a contribution to a log-likelihood value, can be extracted. |
... |
some methods for this generic function require additional arguments. |
Details
P_{MDL}(\tau) = \frac{a(\theta_\tau)}{2} \cdot
\sum_{j=0}^m \log{\left(\tau_j - \tau_{j-1} \right)} + 2 \ln{m} + \sum_{j=2}^m \ln{\tau_j} +
\left( 2 + b(\theta_\tau) \right) \ln{n}
where a(\theta)
is the number of parameters in \theta
that are
fit in each region, and b(\theta)
is the number of parameters
fit to the model as a whole.
These quantities should be base::attributes()
of the object returned by
logLik()
.
Value
A double
vector of length 1
See Also
Other penalty-functions:
BMDL()
,
HQC()
,
MBIC()
,
SIC()
Examples
MDL(fit_meanshift_norm_ar1(CET, tau = c(42, 330)))
MDL(fit_trendshift(CET, tau = c(42, 81, 330)))
Schwarz information criterion
Description
Schwarz information criterion
Usage
SIC(object, ...)
Arguments
object |
a fitted model object for which there exists a
|
... |
optionally more fitted model objects. |
Value
The value of stats::BIC()
See Also
Other penalty-functions:
BMDL()
,
HQC()
,
MBIC()
,
MDL()
Examples
# The SIC is just the BIC
SIC(fit_meanvar(CET, tau = NULL))
BIC(fit_meanvar(CET, tau = NULL))
Convert, retrieve, or verify a model object
Description
Convert, retrieve, or verify a model object
Usage
as.model(object, ...)
## Default S3 method:
as.model(object, ...)
## S3 method for class 'tidycpt'
as.model(object, ...)
is_model(x, ...)
Arguments
object |
|
... |
currently ignored |
x |
An object, typically returned by |
Details
tidycpt objects have a model
component.
The functions documented here are convenience utility functions
for working with the model
components.
as.model()
is especially useful in pipelines to avoid having to use
the $
or [
notation for subsetting.
When applied to a tidycpt object, as.model()
simply returns the
model
component of that object.
However, when applied to a segmenter
object, as.model()
attempts to
converts that object into a mod_cpt model object.
is_model()
checks to see if a model object implements all of the
S3 methods necessary to be considered a model.
Value
-
as.model()
returns a mod_cpt model object
-
is_model()
alogical
vector of length 1
See Also
Other tidycpt-generics:
as.segmenter()
,
changepoints()
,
diagnose()
,
fitness()
,
model_name()
Examples
# Segment a time series using PELT
x <- segment(CET, method = "pelt")
# Retrieve the model component
x |>
as.model()
# Explicitly convert the segmenter to a model
x |>
as.segmenter() |>
as.model()
# Is that model valid?
x |>
as.model() |>
is_model()
# Fit a model directly, without using [segment()]
x <- fit_nhpp(CET, tau = 330)
is_model(x)
Convert, retrieve, or verify a segmenter object
Description
Convert, retrieve, or verify a segmenter object
Usage
as.segmenter(object, ...)
as.seg_cpt(object, ...)
## S3 method for class 'seg_basket'
as.seg_cpt(object, ...)
## S3 method for class 'seg_cpt'
as.seg_cpt(object, ...)
## S3 method for class 'tidycpt'
as.segmenter(object, ...)
## S3 method for class 'ga'
as.seg_cpt(object, ...)
## S3 method for class 'cpt'
as.seg_cpt(object, ...)
## S3 method for class 'cga'
as.seg_cpt(object, ...)
## S3 method for class 'segmented'
as.seg_cpt(object, ...)
## S3 method for class 'wbs'
as.seg_cpt(object, ...)
is_segmenter(object, ...)
Arguments
object |
A tidycpt or |
... |
Arguments passed to methods |
Details
tidycpt objects have a segmenter
component (that is typically
created by a class to segment()
).
The functions documented here are convenience utility functions
for working with the segmenter
components.
as.segmenter()
is especially useful in pipelines to avoid having to use
the $
or [
notation for subsetting.
as.segmenter()
simply returns the segmenter of a tidycpt
object.
as.seg_cpt()
takes a wild-caught segmenter
object of arbitrary class
and converts it into a seg_cpt object.
is_segmenter()
checks to see if a segmenter object implements all of the
S3 methods necessary to be considered a segmenter.
Value
-
as.segmenter()
returns thesegmenter
object of atidycpt
object. Note that this could be of any class, depending on the class returned by the segmenting function.
-
as.seg_cpt()
returns a seg_cpt object
-
is_segmenter()
alogical
vector of length 1
See Also
Other tidycpt-generics:
as.model()
,
changepoints()
,
diagnose()
,
fitness()
,
model_name()
Other segmenter-functions:
fitness()
,
model_args()
,
seg_params()
Examples
# Segment a time series using PELT
x <- segment(CET, method = "pelt")
# Return the segmenter component
x |>
as.segmenter()
# Note the class of this object could be anything
x |>
as.segmenter() |>
class()
# Convert the segmenter into the standardized seg_cpt class
x |>
as.segmenter() |>
as.seg_cpt()
# Is the segmenter valid?
x |>
as.segmenter() |>
is_segmenter()
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Usage
## S3 method for class 'mod_cpt'
as.ts(x, ...)
## S3 method for class 'mod_cpt'
nobs(object, ...)
## S3 method for class 'mod_cpt'
logLik(object, ...)
## S3 method for class 'mod_cpt'
fitted(object, ...)
## S3 method for class 'mod_cpt'
residuals(object, ...)
## S3 method for class 'mod_cpt'
coef(object, ...)
## S3 method for class 'mod_cpt'
augment(x, ...)
## S3 method for class 'mod_cpt'
tidy(x, ...)
## S3 method for class 'mod_cpt'
glance(x, ...)
## S3 method for class 'mod_cpt'
plot(x, ...)
## S3 method for class 'mod_cpt'
print(x, ...)
## S3 method for class 'mod_cpt'
summary(object, ...)
## S3 method for class 'seg_basket'
as.ts(x, ...)
## S3 method for class 'seg_basket'
plot(x, ...)
## S3 method for class 'seg_cpt'
as.ts(x, ...)
## S3 method for class 'seg_cpt'
glance(x, ...)
## S3 method for class 'seg_cpt'
nobs(object, ...)
## S3 method for class 'seg_cpt'
print(x, ...)
## S3 method for class 'seg_cpt'
summary(object, ...)
## S3 method for class 'tidycpt'
as.ts(x, ...)
## S3 method for class 'tidycpt'
augment(x, ...)
## S3 method for class 'tidycpt'
tidy(x, ...)
## S3 method for class 'tidycpt'
glance(x, ...)
## S3 method for class 'tidycpt'
plot(x, use_time_index = FALSE, ylab = NULL, ...)
## S3 method for class 'tidycpt'
print(x, ...)
## S3 method for class 'tidycpt'
summary(object, ...)
## S3 method for class 'meanshift_lnorm'
logLik(object, ...)
## S3 method for class 'nhpp'
logLik(object, ...)
## S3 method for class 'nhpp'
glance(x, ...)
## S3 method for class 'ga'
as.ts(x, ...)
## S3 method for class 'ga'
nobs(object, ...)
## S3 method for class 'cpt'
as.ts(x, ...)
## S3 method for class 'cpt'
logLik(object, ...)
## S3 method for class 'cpt'
nobs(object, ...)
## S3 method for class 'cga'
as.ts(x, ...)
## S3 method for class 'cga'
nobs(object, ...)
## S3 method for class 'segmented'
as.ts(x, ...)
## S3 method for class 'segmented'
nobs(object, ...)
## S3 method for class 'wbs'
as.ts(x, ...)
## S3 method for class 'wbs'
nobs(object, ...)
Arguments
... |
some methods for this generic function require additional arguments. |
object |
any object from which a log-likelihood value, or a contribution to a log-likelihood value, can be extracted. |
use_time_index |
Should the x-axis labels be the time indices? Or the time labels? |
Examples
# Plot a meanshift model fit
plot(fit_meanshift_norm(CET, tau = 330))
#' # Plot a trendshift model fit
plot(fit_trendshift(CET, tau = 330))
#' # Plot a quadratic polynomial model fit
plot(fit_lmshift(CET, tau = 330, deg_poly = 2))
#' # Plot a 4th degree polynomial model fit
plot(fit_lmshift(CET, tau = 330, deg_poly = 10))
# Plot a segmented time series
plot(segment(CET, method = "pelt"))
# Plot a segmented time series and show the time labels on the x-axis
plot(segment(CET, method = "pelt"), use_time_index = TRUE)
# Label the y-axis correctly
segment(CET, method = "pelt") |>
plot(use_time_index = TRUE, ylab = "Degrees Celsius")
# Summarize a tidycpt object
summary(segment(CET, method = "pelt"))
summary(segment(DataCPSim, method = "pelt"))
Convert a date into a year
Description
Convert a date into a year
Usage
as_year(x)
Arguments
x |
an object coercible into a base::Date. See |
Value
A character
vector representing the years of the input
Examples
# Retrieve only the year
as_year("1988-01-01")
Convert changepoint sets to binary strings
Description
Convert changepoint sets to binary strings
Usage
binary2tau(x)
tau2binary(tau, n)
Arguments
x |
A binary string that encodes a changepoint set. See
|
tau |
a numeric vector of changepoint indices |
n |
the length of the original time series |
Details
In order to use GA::ga()
in a genetic algorithm, we need to encoude a
changepoint set as a binary string.
binary2tau()
takes a binary string representation of a changepoint set and
converts it into a set of changepoint indices.
tau2binary()
takes a set of changepoint indices the number of observations
in the time series and converts them into a binary string representation of
that changepoint set.
Value
-
binary2tau()
: aninteger
vector
-
tau2binary()
: aninteger
vector of lengthn
Examples
# Recover changepoint set indices from binary strings
binary2tau(c(0, 0, 1, 0, 1))
binary2tau(round(runif(10)))
# Recover binary strings from changepoint set indices
tau2binary(c(7, 17), n = 24)
tau2binary(binary2tau(c(0, 0, 1, 1, 0, 1)), n = 6)
Particulate matter in Bogotá, Colombia
Description
Particulate matter of less than 2.5 microns of diameter in Bogotá, Colombia.
Usage
bogota_pm
Format
An object of class xts
(inherits from zoo
) with 1096 rows and 1 columns.
Details
Daily readings from 2018-2020 are included.
Examples
class(bogota_pm)
Initialize populations in genetic algorithms
Description
Build an initial population set for genetic algorithms
Usage
build_gabin_population(x, ...)
log_gabin_population(x, ...)
Arguments
x |
a numeric vector coercible into a stats::ts object |
... |
arguments passed to methods |
Details
Genetic algorithms require a method for randomly generating initial
populations (i.e., a first generation).
The default method used by GA::ga()
for changepoint detection is usually
GA::gabin_Population()
, which selects candidate changepoints uniformly at
random with probability 0.5.
This leads to an initial population with excessively large candidate
changepoint sets (on the order of n/2
), which makes the genetic
algorithm slow.
-
build_gabin_population()
takes ats
object and runs several fast changepoint detection algorithms on it, then sets the initial probability to 3 times the average value of the size of the changepoint sets returned by those algorithms. This is a conservative guess as to the likely size of the optimal changepoint set. -
log_gabin_population()
takes ats
object and sets the initial probability to the natural logarithm of the length of the time series.
Value
A function
that can be passed to the population
argument of
GA::ga()
(through segment_ga()
)
See Also
GA::gabin_Population()
, segment_ga()
Examples
# Build a function to generate the population
f <- build_gabin_population(CET)
# Segment the time series using the population generation function
segment(CET, method = "ga", population = f, maxiter = 5)
f <- log_gabin_population(CET)
segment(CET, method = "ga", population = f, maxiter = 10)
Extract changepoints
Description
Retrieve the indices of the changepoints identified by an algorithm or model.
Usage
changepoints(x, ...)
## Default S3 method:
changepoints(x, ...)
## S3 method for class 'mod_cpt'
changepoints(x, ...)
## S3 method for class 'seg_basket'
changepoints(x, ...)
## S3 method for class 'seg_cpt'
changepoints(x, ...)
## S3 method for class 'tidycpt'
changepoints(x, use_labels = FALSE, ...)
## S3 method for class 'ga'
changepoints(x, ...)
## S3 method for class 'cpt'
changepoints(x, ...)
## S3 method for class 'cga'
changepoints(x, ...)
## S3 method for class 'segmented'
changepoints(x, ...)
## S3 method for class 'wbs'
changepoints(x, ...)
Arguments
x |
|
... |
arguments passed to methods |
use_labels |
return the time labels for the changepoints instead of the indices. |
Details
tidycpt objects, as well as their segmenter
and model
components,
implement changepoints()
methods.
Note that this function is not to be confused with wbs::changepoints()
,
which returns different information.
For the default
method, changepoints()
will attempt to return the
cpt_true
attribute, which is set by test_set()
.
Value
a numeric vector of changepoint indices, or, if use_labels
is
TRUE
, a character
of time labels.
See Also
Other tidycpt-generics:
as.model()
,
as.segmenter()
,
diagnose()
,
fitness()
,
model_name()
Examples
cpts <- segment(DataCPSim, method = "ga", maxiter = 5)
changepoints(cpts$segmenter)
# Segment a times series using a genetic algorithm
cpts <- segment(DataCPSim, method = "cga")
changepoints(cpts$segmenter)
cpts <- segment(DataCPSim, method = "segmented")
changepoints(cpts$segmenter)
cpts <- segment(DataCPSim, method = "wbs")
changepoints(cpts$segmenter)
Compare various models or algorithms for a given changepoint set
Description
Compare various models or algorithms for a given changepoint set
Usage
compare_models(x, ...)
compare_algorithms(x, ...)
Arguments
x |
A tidycpt object |
... |
currently ignored |
Details
A tidycpt object has a set of changepoints returned by the
algorithm that segmented the time series.
That changepoint set was obtained using a specific model.
Treating this changepoint set as fixed, the compare_models()
function
fits several common changepoint models to the time series and changepoint
set, and returns the results of glance()
.
Comparing the fits of various models could lead to improved understanding.
Alternatively, compare_algorithms()
runs several fast changepoint detection
algorithms on the original time series, and consolidates the results.
Value
Examples
# Segment a times series using PELT
x <- segment(CET, method = "pelt")
# Compare models
compare_models(x)
# Compare algorithms
compare_algorithms(x)
Use a changepoint set to break a time series into regions
Description
Use a changepoint set to break a time series into regions
Usage
cut_by_tau(x, tau)
split_by_tau(x, tau)
Arguments
x |
A numeric vector |
tau |
a numeric vector of changepoint indices |
Details
A changepoint set tau
of length k
breaks a time series of length
n
into k+1
non-empty regions.
These non-empty regions can be defined by half-open intervals, starting with
1 and ending with n+1
.
cut_by_tau()
splits a set of indices into a base::factor()
of
half-open intervals
split_by_tau()
splits a time series into a named base::list()
of numeric
vectors
Value
-
cut_by_tau()
abase::factor()
of half-open intervals
-
split_by_tau()
a namedbase::list()
of numeric vectors
See Also
Examples
n <- length(CET)
# Return a factor of intervals
cut_by_tau(1:n, tau = pad_tau(c(42, 81, 330), n))
# Return a list of observations
split_by_tau(DataCPSim, c(365, 826))
Retrieve the degrees of freedom from a logLik
object
Description
Retrieve the degrees of freedom from a logLik
object
Usage
deg_free(x)
Arguments
x |
An object that implements a method for |
Value
The df
attribute of the stats::logLik()
of the given object.
Examples
# Retrieve the degrees of freedom model a changepoint model
DataCPSim |>
segment() |>
as.model() |>
deg_free()
Diagnose the fit of a segmented time series
Description
Depending on the input, this function returns a diagnostic plot.
Usage
diagnose(x, ...)
## S3 method for class 'mod_cpt'
diagnose(x, ...)
## S3 method for class 'seg_basket'
diagnose(x, ...)
## S3 method for class 'tidycpt'
diagnose(x, ...)
## S3 method for class 'nhpp'
diagnose(x, ...)
Arguments
x |
A tidycpt object, or a |
... |
currently ignored |
Value
A ggplot2::ggplot()
object
See Also
Other tidycpt-generics:
as.model()
,
as.segmenter()
,
changepoints()
,
fitness()
,
model_name()
Examples
# For meanshift models, show the distribution of the residuals by region
fit_meanshift_norm(CET, tau = 330) |>
diagnose()
# For Coen's algorithm, show the histogram of changepoint selections
x <- segment(DataCPSim, method = "coen", num_generations = 3)
x |>
as.segmenter() |>
diagnose()
# Show various iterations of diagnostic plots
diagnose(segment(DataCPSim))
diagnose(segment(DataCPSim, method = "single-best"))
diagnose(segment(DataCPSim, method = "pelt"))
# Show diagnostic plots for test sets
diagnose(segment(test_set()))
diagnose(segment(test_set(n = 2, sd = 4), method = "pelt"))
# For NHPP models, show the growth in the number of exceedances
diagnose(fit_nhpp(DataCPSim, tau = 826))
diagnose(fit_nhpp(DataCPSim, tau = 826, threshold = 200))
Evaluate candidate changepoints sets
Description
Evaluate candidate changepoints sets
Usage
evaluate_cpts(x, ...)
## S3 method for class 'seg_basket'
evaluate_cpts(x, ...)
## S3 method for class 'list'
evaluate_cpts(x, .data, model_fn, ...)
## S3 method for class 'tbl_df'
evaluate_cpts(x, .data, model_fn, ...)
Arguments
x |
An object to evaluate |
... |
arguments passed to methods |
.data |
A time series |
model_fn |
Name of the function to fit the model.
See, for examples, |
Value
Generate a list of candidate changepoints using a genetic algorithm
Description
Generate a list of candidate changepoints using a genetic algorithm
Usage
evolve_gbmdl(x, mat_cp, these_bmdls)
junta_1_puntos_cambio(padres, mat_cp)
junta_k_puntos_cambio(mat_padres, mat_cp)
mata_1_tau_volado(cp, prob_volado = 0.5)
mata_k_tau_volado(mat_cp)
muta_1_cp_BMDL(
cp,
x,
probs_nuevos_muta0N = c(0.8, 0.1, 0.1),
dist_extremos = 10,
min_num_cpts = 1,
mutation_possibilities = c(-1, 0, 1),
mutation_probs = c(0.3, 0.4, 0.3),
max_num_cp = 20
)
muta_k_cp_BMDL(mat_cp, x)
sim_1_cp_BMDL(x, max_num_cp = 20, prob_inicial = 0.06)
sim_k_cp_BMDL(x, generation_size = 50, max_num_cp = 20)
probs_vec_MDL(vec_MDL, probs_rank0_MDL1 = 0)
selec_k_pares_de_padres(vec_probs)
chromo2tau(chromo)
mat_cp_2_list(mat_cp)
Arguments
x |
A time series object |
mat_cp |
A matrix of potential changepoints |
these_bmdls |
vector of |
padres |
vector de longitud dos con índice de papa e índice de mama |
mat_padres |
matriz de kx2 la cual contiene en sus renglones las parejas de padres |
cp |
vector cromosoma que se va a poner a prueba |
prob_volado |
probabilidad de quitar un tiempo de cambio existente utilizado por mata_k_tau_volado para quitar elementos de más. Se recomienda dejar el valor de 0.5 ya que así al juntar los pc del padre y madre se eliminará la mitad de estos |
probs_nuevos_muta0N |
probabilidades de mutar 0,1,2,...,l hasta cierto numero l; eg si vale c(.5,.2,.2,.1) se tiene una probabilidad 0.5 de mutar 0 (de no mutar), probabilidad 0.2 de mutar 1,, probabilidad 0.2 de mutar 2, y, probabilidad 0.1 de mutar 3. |
dist_extremos |
distancia entre el primer los puntos de cambio v_0 y v_1 al igual que entre v_m y v_{m+1}; distancia minima que debe de haber de un punto de cambio y los valores 1 y T, donde T es la longitud total de la serie |
min_num_cpts |
es la cota inferior del número de puntos de cambio que puede tener un cromosoma |
mutation_possibilities |
vector con mutaciones posibles; eg si mutaciones=c(-1,0,1) entonces un punto de cambio puede ser movido una unidad a la izquierda, puede quedarse igual, o moverse una unidad a la derecha |
mutation_probs |
probabilidades de mutación. Las longitudes de este vector y mutaciones tienen que ser iguales; eg si mutaciones=c(-1,0,1) y probs_muta = c(.2, .6, .2) entonces se tiene una probabilidad .2 de que el punto de cambio se desplace a la izquierda, probabilidad .6 de quedar igual, y probabilidad . 2 de ser movido a la derecha |
max_num_cp |
el máximo número de rebases. Este parámetro se ocupa en particular para que todos los cromosomas quepan en una matriz. |
prob_inicial |
probabilidad de que en la primera generación un punto cualquiera sea punto de cambio. Se recomienda =.5 ya que con esto se distribuyen relativamente uniformes los puntos de cambio |
generation_size |
tamaño de las generaciones |
vec_MDL |
vector con valores MDL OBSERVACIÓN: Esto regresa numeros negativos, los cuales mientras más negativo mejor, ya que dará que es un mejor vector de tiempos de cambio. Es decir, un MDL de -6000 es mejor que -4000 |
probs_rank0_MDL1 |
para medir obtener la probabilidad de los padres se pueden tomar o las probabilidades con respecto a los rangos (como en el artículo) o se pueden tomar las probabilidades con respecto a el MDL. La diferencia radica en que si se toma con respecto al MDL se tendrá que un cromosoma con un gran MDL este tendrá una gran ventaja de ocurrir, en cambio cuando solo se tiene rank esta ventaja gran ventaja se reduce |
vec_probs |
vector de probabilidades de selección de cada uno de los cromosomas |
chromo |
Chromosome, from a row of the matrix |
Details
regresa un vector de tamaño max_num_cp+3
donde la primera entrada es
m, la segunda v_0=1, ...., v_{m+1}=N,0,...,0
por ejemplo: c(4,1,3,8,11,15,20,0,0,0,0)
para m=4
, max\_num\_cp=8
, N=20
.
Se tienen m
puntos de cambio, los cuales
\tau_0=1
y \tau_{m+1}= N+1
,
pero en nuestro caso
tenemos que los vectores cp
tienen c(m,\tau_0=1,\tau_1,...,\tau_{m-1},\tau_m= N,0,0,0)
por lo cual se nosotros:
empieza con el número de puntos de cambio;
la segunda entrada es un uno;
la tercera entrada es el primer punto de cambio;
las siguientes son otros puntos de cambio;
la siguiente entrada después de punto de cambio tiene el valor
N
; ylos siguientes son númores cero hasta llenarlo para que sea de tamaño
max_num_cp
Value
regresa una matriz de las mismas dimensiones que mat_cp, pero con los nuevos cromosomas
el mismo cromosoma sin algunos de sus puntos de cambio
regresa una matriz a la cual se le quitaron a sus cromosomas algunos puntos de cambio
regresa un vector mutado
regreas una mat_cp mutada
regresa una matriz de k
por max_num_cp+3
, la cual en cada renglón tiene
una simulación de un vector de tiempos de cambio
regresa un vector de probabilidades
Examples
mat_cp <- sim_k_cp_BMDL(DataCPSim)
bmdls <- mat_cp |>
mat_cp_2_list() |>
evaluate_cpts(.data = as.ts(DataCPSim), model_fn = fit_nhpp) |>
dplyr::pull(BMDL)
evolve_gbmdl(exceedances(DataCPSim), mat_cp, bmdls)
sim_1_cp_BMDL(exceedances(DataCPSim))
sim_1_cp_BMDL(exceedances(rlnorm_ts_1))
sim_1_cp_BMDL(exceedances(rlnorm_ts_2))
sim_1_cp_BMDL(exceedances(rlnorm_ts_3))
sim_1_cp_BMDL(exceedances(bogota_pm))
sim_k_cp_BMDL(DataCPSim)
chromo <- c(4, 1, 557, 877 , 905, 986, 1096, 0, 0, 0)
chromo2tau(chromo)
Compute exceedances of a threshold for a time series
Description
Compute exceedances of a threshold for a time series
Usage
exceedances(x, ...)
## Default S3 method:
exceedances(x, ...)
## S3 method for class 'nhpp'
exceedances(x, ...)
## S3 method for class 'ts'
exceedances(x, ...)
## S3 method for class 'double'
exceedances(x, threshold = mean(x, na.rm = TRUE), ...)
Arguments
x |
a numeric vector coercible into a stats::ts object |
... |
arguments passed to methods |
threshold |
A value above which to exceed. Default is the |
Value
An ordered integer
vector giving the indices of the values of x
that exceed the threshold
.
Examples
# Retrieve exceedances of the series mean
fit_nhpp(DataCPSim, tau = 826) |>
exceedances()
# Retrieve exceedances of a supplied threshold
fit_nhpp(DataCPSim, tau = 826, threshold = 200) |>
exceedances()
Obtain a descriptive filename for a tidycpt object
Description
Obtain a descriptive filename for a tidycpt object
Usage
file_name(x, data_name_slug = "data")
Arguments
x |
A tidycpt object |
data_name_slug |
character string that will identify the data set used in the file name |
Details
file_name()
generates a random, unique string indicating the algorithm and
fitness()
for a tidycpt object.
Value
A character
string giving a unique file name.
Examples
# Generate a unique name for the file
DataCPSim |>
segment(method = "pelt") |>
file_name()
Fit an ARIMA model
Description
Fit an ARIMA model
Usage
fit_arima(x, tau, ...)
Arguments
x |
A time series |
tau |
a set of indices representing a changepoint set |
... |
currently ignored |
Details
Fits an ARIMA model using stats::arima()
.
Value
A mod_cpt object.
See Also
Other model-fitting:
fit_lmshift()
,
fit_meanshift()
,
fit_meanvar()
,
fit_nhpp()
,
model_args()
,
model_name()
,
new_fun_cpt()
,
whomademe()
Examples
# Fit a mean-variance model
fit_arima(CET, tau = c(42, 330))
Regression-based model fitting
Description
Regression-based model fitting
Usage
fit_lmshift(x, tau, deg_poly = 0, ...)
fit_lmshift_ar1(x, tau, ...)
fit_trendshift(x, tau, ...)
fit_trendshift_ar1(x, tau, ...)
Arguments
x |
A time series |
tau |
a set of indices representing a changepoint set |
deg_poly |
integer indicating the degree of the polynomial spline to be
fit. Passed to |
... |
arguments passed to |
Details
These model-fitting functions use stats::lm()
to fit the corresponding
regression model to a time series, using the changepoints specified by the
tau
argument.
Each changepoint is treated as a categorical fixed-effect, while the deg_poly
argument controls the degree of the polynomial that interacts with those
fixed-effects.
For example, setting deg_poly
equal to 0 will return the same model as
calling fit_meanshift_norm()
, but the latter is faster for larger changepoint
sets because it doesn't have to fit all of the regression models.
Setting deg_poly
equal to 1 fits the trendshift
model.
-
fit_lmshift_ar1()
: will apply auto-regressive lag 1 errors
-
fit_trendshift()
: will fit a line in each region
-
fit_trendshift_ar1()
: will fit a line in each region and autoregress lag 1 errors
Value
A mod_cpt object
See Also
Other model-fitting:
fit_arima()
,
fit_meanshift()
,
fit_meanvar()
,
fit_nhpp()
,
model_args()
,
model_name()
,
new_fun_cpt()
,
whomademe()
Examples
# Manually specify a changepoint set
tau <- c(365, 826)
# Fit the model
mod <- fit_lmshift(DataCPSim, tau)
# Retrieve model parameters
logLik(mod)
deg_free(mod)
# Manually specify a changepoint set
cpts <- c(1700, 1739, 1988)
ids <- time2tau(cpts, as_year(time(CET)))
# Fit the model
mod <- fit_lmshift(CET, tau = ids)
# View model parameters
glance(mod)
glance(fit_lmshift(CET, tau = ids, deg_poly = 1))
glance(fit_lmshift_ar1(CET, tau = ids))
glance(fit_lmshift_ar1(CET, tau = ids, deg_poly = 1))
glance(fit_lmshift_ar1(CET, tau = ids, deg_poly = 2))
# Empty changepoint sets are allowed
fit_lmshift(CET, tau = NULL)
# Duplicate changepoints are removed
fit_lmshift(CET, tau = c(42, 42))
Fast implementation of meanshift model
Description
Fast implementation of meanshift model
Usage
fit_meanshift(x, tau, distribution = "norm", ...)
fit_meanshift_norm(x, tau, ...)
fit_meanshift_lnorm(x, tau, ...)
fit_meanshift_norm_ar1(x, tau, ...)
Arguments
x |
A time series |
tau |
a set of indices representing a changepoint set |
distribution |
A character indicating the distribution of the data. Should match R distribution function naming conventions (e.g., "norm" for the Normal distribution, etc.) |
... |
arguments passed to |
Details
fit_meanshift_norm()
returns the same model as fit_lmshift()
with the
deg_poly
argument set to 0.
However, it is faster on large changepoint sets.
fit_meanshift_lnorm()
fit the meanshift model with the assumption of
log-normally distributed data.
fit_meanshift_norm_ar1()
applies autoregressive errors.
Value
A mod_cpt object.
Author(s)
Xueheng Shi, Ben Baumer
See Also
Other model-fitting:
fit_arima()
,
fit_lmshift()
,
fit_meanvar()
,
fit_nhpp()
,
model_args()
,
model_name()
,
new_fun_cpt()
,
whomademe()
Examples
# Manually specify a changepoint set
tau <- c(365, 826)
# Fit the model
mod <- fit_meanshift_norm_ar1(DataCPSim, tau)
# View model parameters
logLik(mod)
deg_free(mod)
# Manually specify a changepoint set
cpts <- c(1700, 1739, 1988)
ids <- time2tau(cpts, as_year(time(CET)))
# Fit the model
mod <- fit_meanshift_norm(CET, tau = ids)
# Review model parameters
glance(mod)
# Fit an autoregressive model
mod <- fit_meanshift_norm_ar1(CET, tau = ids)
# Review model parameters
glance(mod)
Fit a model for mean and variance
Description
Fit a model for mean and variance
Usage
fit_meanvar(x, tau, ...)
Arguments
x |
A time series |
tau |
a set of indices representing a changepoint set |
... |
currently ignored |
Details
In a mean-variance model, both the means and variances are allowed to vary
across regions.
Thus, this model fits a separate \mu_j
and \sigma_j
for each
region j
.
Value
A mod_cpt object.
See Also
Other model-fitting:
fit_arima()
,
fit_lmshift()
,
fit_meanshift()
,
fit_nhpp()
,
model_args()
,
model_name()
,
new_fun_cpt()
,
whomademe()
Examples
# Fit a mean-variance model
fit_meanvar(CET, tau = c(42, 330))
Fit a non-homogeneous Poisson process model to the exceedances of a time series.
Description
Fit a non-homogeneous Poisson process model to the exceedances of a time series.
Usage
fit_nhpp(x, tau, ...)
Arguments
x |
A time series |
tau |
A vector of changepoints |
... |
currently ignored |
Details
Any time series can be modeled as a non-homogeneous Poisson process of the
locations of the exceedances of a threshold in the series.
This function uses the BMDL criteria to determine the best fit
parameters for each
region defined by the changepoint set tau
.
Value
An nhpp
object, which inherits from mod_cpt.
See Also
Other model-fitting:
fit_arima()
,
fit_lmshift()
,
fit_meanshift()
,
fit_meanvar()
,
model_args()
,
model_name()
,
new_fun_cpt()
,
whomademe()
Examples
# Fit an NHPP model using the mean as a threshold
fit_nhpp(DataCPSim, tau = 826)
# Fit an NHPP model using other thresholds
fit_nhpp(DataCPSim, tau = 826, threshold = 20)
fit_nhpp(DataCPSim, tau = 826, threshold = 200)
# Fit an NHPP model using changepoints determined by PELT
fit_nhpp(DataCPSim, tau = changepoints(segment(DataCPSim, method = "pelt")))
Fit an NHPP model to one specific region
Description
Fit an NHPP model to one specific region
Usage
fit_nhpp_region(exc, tau_left, tau_right, params = parameters_weibull(), ...)
Arguments
exc |
Output from |
tau_left |
left-most changepoint |
tau_right |
right-most changepoint |
params |
Output from |
... |
arguments passed to |
Details
This is an internal function not to be called by users.
Use fit_nhpp()
.
Value
Modified output from stats::optim()
.
See Also
Retrieve the optimal fitness (or objective function) value used by an algorithm
Description
Retrieve the optimal fitness (or objective function) value used by an algorithm
Usage
fitness(object, ...)
## S3 method for class 'seg_basket'
fitness(object, ...)
## S3 method for class 'seg_cpt'
fitness(object, ...)
## S3 method for class 'tidycpt'
fitness(object, ...)
## S3 method for class 'ga'
fitness(object, ...)
## S3 method for class 'cpt'
fitness(object, ...)
## S3 method for class 'cga'
fitness(object, ...)
## S3 method for class 'segmented'
fitness(object, ...)
## S3 method for class 'wbs'
fitness(object, ...)
Arguments
object |
A |
... |
currently ignored |
Details
Segmenting algorithms use a fitness metric, typically through the use of a penalized objective function, to determine which changepoint sets are more or less optimal. This function returns the value of that metric for the changepoint set implied by the object provided.
Value
A named double
vector with the fitness value.
See Also
Other tidycpt-generics:
as.model()
,
as.segmenter()
,
changepoints()
,
diagnose()
,
model_name()
Other segmenter-functions:
as.segmenter()
,
model_args()
,
seg_params()
Examples
# Segment a times series using a genetic algorithm
x <- segment(DataCPSim, method = "ga", maxiter = 10)
# Retrieve its fitness value
fitness(x)
# Segment a times series using a genetic algorithm
x <- segment(DataCPSim, method = "cga")
# Retrieve its fitness value
fitness(x)
# Segment a time series using Segmented
x <- segment(DataCPSim, method = "segmented")
# Retrieve its fitness
fitness(x)
# Segment a time series using Wild Binary Segmentation
x <- segment(DataCPSim, method = "wbs")
# Retrieve its fitness
fitness(x)
Italian University graduates by disciplinary groups from 1926-2013
Description
Italian University graduates by disciplinary groups during the years 1926-2013.
Usage
italy_grads
Format
An object of class tbl_ts
(inherits from tbl_df
, tbl
, data.frame
) with 88 rows and 17 columns.
Source
https://seriestoriche.istat.it/
Source: Istat- Ministero dell'istruzione pubblica, years 1926-1942
Istat- Rilevazione sulle Università, years 1943-1997
Miur- Rilevazione sulle Università, years 1998-2013
Weibull distribution functions
Description
Weibull distribution functions
Usage
iweibull(x, shape, scale = 1)
mweibull(x, shape, scale = 1)
parameters_weibull(...)
Arguments
x |
A numeric vector |
shape |
Shape parameter for Weibull distribution. See |
scale |
Scale parameter for Weibull distribution. See |
... |
currently ignored |
Details
Intensity function for the Weibull distribution.
iweibull(x) = \left( \frac{shape}{scale} \right) \cdot
\left( \frac{x}{scale} \right)^{shape - 1}
Mean intensity function for the Weibull distribution.
mweibull(x) = \left( \frac{x}{scale} \right)^{shape}
parameters_weibull()
returns a list()
with two components: shape
and scale
, each of which is a list()
of distribution parameters.
These parameters are used to define the prior distributions for the
hyperparameters.
Value
A numeric vector
See Also
Examples
# Compute the intensities and plot them
iweibull(1, shape = 1, scale = 1)
plot(x = 1:10, y = iweibull(1:10, shape = 2, scale = 2))
# Compute various values of the distribution
mweibull(1, shape = 1, scale = 1)
plot(x = 1:10, y = mweibull(1:10, shape = 1, scale = 1))
plot(x = 1:10, y = mweibull(1:10, shape = 1, scale = 2))
plot(x = 1:10, y = mweibull(1:10, shape = 0.5, scale = 2))
plot(x = 1:10, y = mweibull(1:10, shape = 0.5, scale = 100))
plot(x = 1:10, y = mweibull(1:10, shape = 2, scale = 2))
plot(x = 1:10, y = mweibull(1:10, shape = 2, scale = 100))
# Generate prior distribution hyperparameters
parameters_weibull()
Log-Likelihood functions for regions (Weibull)
Description
Log-Likelihood functions for regions (Weibull)
Usage
log_likelihood_region_weibull(t, tau_left, tau_right, theta)
log_prior_region_weibull(theta, params = parameters_weibull())
D_log_prior_region_weibull(theta, params = parameters_weibull())
D_log_likelihood_region_weibull(t, tau_left, tau_right, theta)
Arguments
t |
vector of |
tau_left |
Left endpoint of the region |
tau_right |
Right endpoint of the region |
theta |
numeric vector of parameters for the NHPP model |
params |
Possibly modified output from |
Value
A numeric vector
Algorithmic coverage through tidychangepoint
Description
Algorithmic coverage through tidychangepoint
Usage
ls_models()
ls_pkgs()
ls_methods()
ls_penalties()
ls_cpt_penalties()
ls_coverage()
Value
A tibble::tibble or character
See Also
Examples
# List all model-fitting functions
ls_models()
# List packages supported by tidychangepoint
ls_pkgs()
# List methods supported by segment()
ls_methods()
# List penalty functions provided by tidychangepoint
ls_penalties()
# List penalty functions supported by changepoint
ls_cpt_penalties()
# List combinations of method, model, and penalty supported by tidychangepoint
ls_coverage()
Cumulative distribution of the exceedances of a time series
Description
Cumulative distribution of the exceedances of a time series
Usage
mcdf(x, dist = "weibull")
Arguments
x |
An NHPP |
dist |
Name of the distribution. Currently only |
Value
a numeric vector of length equal to the exceedances of x
See Also
Examples
# Fit an NHPP model using the mean as a threshold
nhpp <- fit_nhpp(DataCPSim, tau = 826)
# Compute the cumulative exceedances of the mean
mcdf(nhpp)
# Fit an NHPP model using another threshold
nhpp <- fit_nhpp(DataCPSim, tau = 826, threshold = 200)
# Compute the cumulative exceedances of the threshold
mcdf(nhpp)
Rainfall in Medellín, Colombia
Description
Rainfall in Medellín, Colombia
Usage
mde_rain
mde_rain_monthly
Format
An object of class spec_tbl_df
(inherits from tbl_df
, tbl
, data.frame
) with 185705 rows and 8 columns.
An object of class xts
(inherits from zoo
) with 444 rows and 1 columns.
Details
Daily rainfall measurements for 13 different weather stations positioned around Medellín, Colombia. Variables:
-
station_id
: -
lat
,long
: latitude and longitude for the weather station -
date
,year
,month
,day
: date variables -
rainfall
: daily rainfall (in cubic centimeters) as measured by the weather station
-
mean_rainfall
: average rainfall across all weather stations
References
Differences between leagues in Major League Baseball
Description
The difference in various statistics between the American League and the National League from 1925 to 2023. Statistics include:
-
PA
: The total number of plate appearances -
hr_rate_diff
: The difference in home runs per plate appearance -
bavg_dff
: The difference in batting average -
obp_diff
: The difference in on-base percentage -
slg_diff
: The difference in slugging percentage
Usage
mlb_diffs
Format
An object of class tbl_ts
(inherits from tbl_df
, tbl
, data.frame
) with 99 rows and 6 columns.
Source
The Lahman
package
Retrieve the arguments that a model-fitting function used
Description
Retrieve the arguments that a model-fitting function used
Usage
model_args(object, ...)
## Default S3 method:
model_args(object, ...)
## S3 method for class 'seg_cpt'
model_args(object, ...)
## S3 method for class 'ga'
model_args(object, ...)
## S3 method for class 'cpt'
model_args(object, ...)
## S3 method for class 'segmented'
model_args(object, ...)
## S3 method for class 'wbs'
model_args(object, ...)
Arguments
object |
A |
... |
currently ignored |
Details
Every model is fit by a model-fitting function, and these functions sometimes
take arguments.
model_args()
recovers the arguments that were passed to
the model fitting function when it was called.
These are especially
important when using a genetic algorithm.
Value
A named list
of arguments, or NULL
See Also
Other model-fitting:
fit_arima()
,
fit_lmshift()
,
fit_meanshift()
,
fit_meanvar()
,
fit_nhpp()
,
model_name()
,
new_fun_cpt()
,
whomademe()
Other segmenter-functions:
as.segmenter()
,
fitness()
,
seg_params()
Examples
# Segment a time series using Coen's algorithm
x <- segment(CET, method = "ga-coen", maxiter = 3)
# Recover the arguments passed to the model-fitting function
x |>
as.segmenter() |>
model_args()
Retrieve the name of the model that a segmenter or model used
Description
Retrieve the name of the model that a segmenter or model used
Usage
model_name(object, ...)
## Default S3 method:
model_name(object, ...)
## S3 method for class 'character'
model_name(object, ...)
## S3 method for class 'mod_cpt'
model_name(object, ...)
## S3 method for class 'seg_basket'
model_name(object, ...)
## S3 method for class 'seg_cpt'
model_name(object, ...)
## S3 method for class 'tidycpt'
model_name(object, ...)
## S3 method for class 'ga'
model_name(object, ...)
## S3 method for class 'cpt'
model_name(object, ...)
## S3 method for class 'cga'
model_name(object, ...)
## S3 method for class 'segmented'
model_name(object, ...)
## S3 method for class 'wbs'
model_name(object, ...)
Arguments
object |
A |
... |
currently ignored |
Details
Every segmenter works by fitting a model to the data. model_name()
returns
the name of a model that can be passed to whomademe()
to retrieve the
model fitting function. These functions must begin with the prefix fit_
.
Note that the model fitting functions exist in tidychangepoint
are are
not necessarily the actual functions used by the segmenter.
Models also implement model_name()
.
Value
A character
vector of length 1.
See Also
Other model-fitting:
fit_arima()
,
fit_lmshift()
,
fit_meanshift()
,
fit_meanvar()
,
fit_nhpp()
,
model_args()
,
new_fun_cpt()
,
whomademe()
Other tidycpt-generics:
as.model()
,
as.segmenter()
,
changepoints()
,
diagnose()
,
fitness()
Examples
# Segment a time series using PELT
x <- segment(CET, method = "pelt")
# Retrieve the name of the model from the segmenter
x |>
as.segmenter() |>
model_name()
# What function created the model?
x |>
model_name() |>
whomademe()
model_name(x$segmenter)
# Retrieve the name of the model from the model
x |>
as.model() |>
model_name()
Compute model variance
Description
Compute model variance
Usage
model_variance(object, ...)
Arguments
object |
A model object implementing |
... |
currently ignored |
Details
Using the generic functions residuals()
and nobs()
, this function
computes the variance of the residuals.
Note that unlike stats::var()
, it does not use n-1
as the denominator.
Value
A double
vector of length 1
Class for model-fitting functions
Description
Class for model-fitting functions
Usage
new_fun_cpt(x, ...)
validate_fun_cpt(x)
fun_cpt(x, ...)
Arguments
x |
a |
... |
currently ignored |
Details
All model-fitting functions must be registered through a call to fun_cpt()
.
All model-fitting functions must take at least three arguments:
-
x
: a time series, -
tau
: a set of changepoint indices -
...
: other arguments passed to methods
See fit_meanshift_norm()
,
Value
A fun_cpt object.
See Also
Other model-fitting:
fit_arima()
,
fit_lmshift()
,
fit_meanshift()
,
fit_meanvar()
,
fit_nhpp()
,
model_args()
,
model_name()
,
whomademe()
Examples
# Register a model-fitting function
f <- fun_cpt("fit_meanvar")
# Verify that it now has class `fun_cpt`
str(f)
# Use it
f(CET, 42)
Base class for changepoint models
Description
Create changepoint detection model objects
Usage
new_mod_cpt(
x = numeric(),
tau = integer(),
region_params = tibble::tibble(),
model_params = double(),
fitted_values = double(),
model_name = character(),
...
)
validate_mod_cpt(x)
mod_cpt(x, ...)
Arguments
x |
a numeric vector coercible into a |
tau |
indices of the changepoint set |
region_params |
A |
model_params |
A numeric vector of parameters estimated by the model across the entire data set (not just in each region). |
fitted_values |
Fitted values returned by the model on the original data set. |
model_name |
A |
... |
currently ignored |
Details
Changepoint detection models know how they were created, on what data set, about the optimal changepoint set found, and the parameters that were fit to the model. Methods for various generic reporting functions are provided.
All changepoint detection models inherit from mod_cpt: the
base class for changepoint detection models.
These models are created by one of the fit_*()
functions, or by
as.model()
.
Value
A mod_cpt object
See Also
Examples
cpt <- mod_cpt(CET)
str(cpt)
as.ts(cpt)
changepoints(cpt)
Default class for candidate changepoint sets
Description
Default class for candidate changepoint sets
Usage
new_seg_basket(
x = numeric(),
algorithm = NA,
cpt_list = list(),
seg_params = list(),
model_name = "meanshift_norm",
penalty = "BIC",
...
)
seg_basket(x, ...)
Arguments
x |
a numeric vector coercible into a |
algorithm |
Algorithm used to find the changepoints |
cpt_list |
a possibly empty |
seg_params |
a possibly empty |
model_name |
character indicating the model used to find the changepoints. |
penalty |
character indicating the name of the penalty function used to find the changepoints. |
... |
currently ignored |
Value
A seg_basket()
object.
Examples
seg <- seg_basket(DataCPSim, cpt_list = list(c(365), c(330, 839)))
str(seg)
as.ts(seg)
changepoints(seg)
fitness(seg)
Base class for segmenters
Description
Base class for segmenters
Usage
new_seg_cpt(
x = numeric(),
pkg = character(),
base_class = character(),
algorithm = NA,
changepoints = integer(),
fitness = double(),
seg_params = list(),
model_name = "meanshift_norm",
penalty = "BIC",
...
)
seg_cpt(x, ...)
Arguments
x |
a numeric vector coercible into a |
pkg |
name of the package providing the segmenter |
base_class |
class of the underlying object |
algorithm |
Algorithm used to find the changepoints |
changepoints |
a possibly empty |
fitness |
A named |
seg_params |
a possibly empty |
model_name |
character indicating the model used to find the changepoints. |
penalty |
character indicating the name of the penalty function used to find the changepoints. |
... |
currently ignored |
Value
A seg_cpt object.
Pad and unpad changepoint sets with boundary points
Description
Pad and unpad changepoint sets with boundary points
Usage
pad_tau(tau, n)
unpad_tau(padded_tau)
is_valid_tau(tau, n)
regions_tau(tau, n)
validate_tau(tau, n)
Arguments
tau |
a numeric vector of changepoint indices |
n |
the length of the original time series |
padded_tau |
Output from |
Details
If a time series contains n
observations, we label them from 1
to n
.
Neither the 1st point nor the n
th point can be a changepoint, since the
regions they create on one side would be empty.
However, for dividing the time series into non-empty segments, we start with
1, add n+1
, and then divide the half-open interval [1, n+1)
into
half-open subintervals that define the regions.
pad_tau()
ensures that 1 and n+1
are included.
unpad_tau()
removes 1 and n+1
, should they exist.
is_valid_tau()
checks to see if the supplied set of changepoints is valid
validate_tau()
removes duplicates and boundary values.
Value
-
pad_tau()
: aninteger
vector that starts with 0 and ends inn
.
-
unpad_tau()
: aninteger
vector stripped of its first and last entries.
-
is_valid_tau()
: alogical
if all of the entries are between 2 andn-1
.
-
validate_tau()
: aninteger
vector with only thebase::unique()
entries between 2 andn-1
, inclusive.
Examples
# Anything less than 2 is not allowed
is_valid_tau(0, length(DataCPSim))
is_valid_tau(1, length(DataCPSim))
# Duplicates are allowed
is_valid_tau(c(42, 42), length(DataCPSim))
is_valid_tau(826, length(DataCPSim))
# Anything greater than \eqn{n} (in this case 1096) is not allowed
is_valid_tau(1096, length(DataCPSim))
is_valid_tau(1097, length(DataCPSim))
# Always return a factor with half-open intervals on the right
regions_tau(c(42, 330), 1096)
# Anything less than 2 is not allowed
validate_tau(0, length(DataCPSim))
validate_tau(1, length(DataCPSim))
validate_tau(826, length(DataCPSim))
# Duplicates are removed
validate_tau(c(826, 826), length(DataCPSim))
# Anything greater than \eqn{n} (in this case 1096) is not allowed
validate_tau(1096, length(DataCPSim))
validate_tau(1097, length(DataCPSim))
# Fix many problems
validate_tau(c(-4, 0, 1, 4, 5, 5, 824, 1096, 1097, 182384), length(DataCPSim))
Plot GA information
Description
Plot GA information
Usage
## S3 method for class 'tidyga'
plot(x, ...)
Arguments
x |
A |
... |
currently ignored |
Value
A ggplot2::ggplot()
object.
Examples
x <- segment(DataCPSim, method = "ga-coen", maxiter = 5)
plot(x$segmenter)
Diagnostic plots for seg_basket
objects
Description
Diagnostic plots for seg_basket
objects
Usage
plot_best_chromosome(x)
plot_cpt_repeated(x, i = nrow(x$basket))
Arguments
x |
A |
i |
index of basket to show |
Details
seg_basket()
objects contain baskets of candidate changepoint sets.
plot_best_chromosome()
shows how the size of the candidate changepoint
sets change across the generations of evolution.
plot_cpt_repeated()
shows how frequently individual observations appear in
the best candidate changepoint sets in each generation.
Value
A ggplot2::ggplot()
object
Examples
# Segment a time series using Coen's algorithm
x <- segment(DataCPSim, method = "coen", num_generations = 3)
# Plot the size of the sets during the evolution
x |>
as.segmenter() |>
plot_best_chromosome()
# Segment a time series using Coen's algorithm
x <- segment(DataCPSim, method = "coen", num_generations = 3)
# Plot overall frequency of appearance of changepoints
plot_cpt_repeated(x$segmenter)
# Plot frequency of appearance only up to a specific generation
plot_cpt_repeated(x$segmenter, 5)
Plot the intensity of an NHPP fit
Description
Plot the intensity of an NHPP fit
Usage
plot_intensity(x, ...)
Arguments
x |
An NHPP |
... |
currently ignored |
Value
A ggplot2::ggplot()
object
Examples
# Plot the estimated intensity function
plot_intensity(fit_nhpp(DataCPSim, tau = 826))
# Segment a time series using PELT
mod <- segment(bogota_pm, method = "pelt")
# Plot the estimated intensity function for the NHPP model using the
# changepoints found by PELT
plot_intensity(fit_nhpp(bogota_pm, tau = changepoints(mod)))
Extract the regions from a tidycpt object
Description
Extract the regions from a tidycpt object
Usage
regions(x, ...)
## S3 method for class 'mod_cpt'
regions(x, ...)
## S3 method for class 'tidycpt'
regions(x, ...)
Arguments
x |
An object that has regions |
... |
Currently ignored |
Value
A base::factor()
of intervals indicating the region
Examples
cpt <- fit_meanshift_norm(CET, tau = 330)
regions(cpt)
Retrieve parameters from a segmenter
Description
Retrieve parameters from a segmenter
Usage
seg_params(object, ...)
## S3 method for class 'seg_cpt'
seg_params(object, ...)
## S3 method for class 'ga'
seg_params(object, ...)
## S3 method for class 'cpt'
seg_params(object, ...)
## S3 method for class 'cga'
seg_params(object, ...)
## S3 method for class 'segmented'
seg_params(object, ...)
## S3 method for class 'wbs'
seg_params(object, ...)
Arguments
object |
A |
... |
currently ignored |
Details
Most segmenting algorithms have parameters. This function retrieves an informative set of those parameter values.
Value
A named list
of parameters with their values.
See Also
Other segmenter-functions:
as.segmenter()
,
fitness()
,
model_args()
Examples
# Segment a time series using PELT
x <- segment(CET, method = "pelt")
x |>
as.segmenter() |>
seg_params()
Segment a time series using a variety of algorithms
Description
A wrapper function that encapsulates various algorithms for detecting changepoint sets in univariate time series.
Usage
segment(x, method = "null", ...)
## S3 method for class 'tbl_ts'
segment(x, method = "null", ...)
## S3 method for class 'xts'
segment(x, method = "null", ...)
## S3 method for class 'numeric'
segment(x, method = "null", ...)
## S3 method for class 'ts'
segment(x, method = "null", ...)
Arguments
x |
a numeric vector coercible into a stats::ts object |
method |
a character string indicating the algorithm to use. See Details. |
... |
arguments passed to methods |
Details
Currently, segment()
can use the following algorithms, depending
on the value of the method
argument:
-
pelt
: Uses the PELT algorithm as implemented insegment_pelt()
, which wraps eitherchangepoint::cpt.mean()
orchangepoint::cpt.meanvar()
. Thesegmenter
is of classcpt
. -
binseg
: Uses the Binary Segmentation algorithm as implemented bychangepoint::cpt.meanvar()
. Thesegmenter
is of classcpt
. -
segneigh
: Uses the Segmented Neighborhood algorithm as implemented bychangepoint::cpt.meanvar()
. Thesegmenter
is of classcpt
. -
single-best
: Uses the AMOC criteria as implemented bychangepoint::cpt.meanvar()
. Thesegmenter
is of classcpt
. -
wbs
: Uses the Wild Binary Segmentation algorithm as implemented bywbs::wbs()
. Thesegmenter
is of classwbs
. -
segmented
: Uses the segmented algorithm as implemented bysegmented::segmented()
. Thesegmenter
is of classsegmented
. -
cga
: Uses the Genetic algorithm implemented bysegment_cga()
, which wrapschangepointGA::GA()
. Thesegmenter
is of classcga
. -
ga
: Uses the Genetic algorithm implemented bysegment_ga()
, which wrapsGA::ga()
. Thesegmenter
is of classtidyga
. -
ga-shi
: Uses the genetic algorithm implemented bysegment_ga_shi()
, which wrapssegment_ga()
. Thesegmenter
is of classtidyga
. -
ga-coen
: Uses Coen's heuristic as implemented bysegment_ga_coen()
. Thesegmenter
is of classtidyga
. This implementation supersedes the following one. -
coen
: Uses Coen's heuristic as implemented bysegment_coen()
. Thesegmenter
is of classseg_basket()
. Note that this function is deprecated. -
random
: Uses a random basket of changepoints as implemented bysegment_ga_random()
. Thesegmenter
is of classtidyga
. -
manual
: Uses the vector of changepoints in thetau
argument. Thesegmenter
is of class seg_cpt'. -
null
: The default. Uses no changepoints. Thesegmenter
is of class seg_cpt.
Value
An object of class tidycpt.
See Also
changepoint::cpt.meanvar()
, wbs::wbs()
, GA::ga()
,
segment_ga()
Examples
# Segment a time series using PELT
segment(DataCPSim, method = "pelt")
# Segment a time series using PELT and the BIC penalty
segment(DataCPSim, method = "pelt", penalty = "BIC")
# Segment a time series using Binary Segmentation
segment(DataCPSim, method = "binseg", penalty = "BIC")
# Segment a time series using a random changepoint set
segment(DataCPSim, method = "random")
# Segment a time series using a manually-specified changepoint set
segment(DataCPSim, method = "manual", tau = c(826))
# Segment a time series using a null changepoint set
segment(DataCPSim)
Segment a time series using a genetic algorithm
Description
Segmenting functions for various genetic algorithms
Usage
segment_cga(x, ...)
Arguments
x |
A time series |
... |
arguments passed to |
Details
segment_cga()
uses the genetic algorithm in GA::ga()
to "evolve" a random
set of candidate changepoint sets, using the penalized objective function
specified by penalty_fn
.
By default, the normal meanshift
model is fit (see fit_meanshift_norm()
)
and the BIC penalty is applied.
Value
A cga
object. This is just a changepointGA::GA()
object with an additional
slot for data
(the original time series).
Examples
# Segment a time series using a genetic algorithm
res <- segment_cga(CET)
summary(res)
# Segment a time series using changepointGA
x <- segment(CET, method = "cga")
summary(x)
changepoints(x)
Algoritmo genético de Bayesian MDL a un paso
Description
This implementation is deprecated. Please see
segment_ga_coen()
Usage
segment_coen(
x,
num_generations = 50,
nhpp_dist = c("W", "EW", "GGO", "MO", "GO")[1],
vec_dist_a_priori = c("Gamma", "Gamma"),
mat_phi = matrix(c(1, 3, 2, 1.2), ncol = 2),
generation_size = 50,
max_num_cp = 20,
show_progress_bar = TRUE,
...
)
Arguments
x |
an object coercible into a time series object via |
num_generations |
Number of generations to evolve |
nhpp_dist |
toma valores en c("W","EW","GGO","MO","GO") y es el nombre de la función de tasa del NHPP |
vec_dist_a_priori |
vector de los nobmres de las distribuciones a priori que se utilizan; eg c("Gamma","Gamma") y c("Gamma","Gamma","Gamma") |
mat_phi |
matriz cuyos renglones tiene los parámetros de las distribuciones a priori; cada renglón tiene todos los parametros de una distribución |
generation_size |
tamaño de las generaciones |
max_num_cp |
el máximo número de rebases. Este parámetro se ocupa en particular para que todos los cromosomas quepan en una matriz. |
show_progress_bar |
show the progress bar? |
... |
arguments passed to methods |
Value
A cpt_gbmdl
object
Examples
x <- segment_coen(DataCPSim, num_generations = 5)
Segment a time series using a genetic algorithm
Description
Segmenting functions for various genetic algorithms
Usage
segment_ga(
x,
model_fn = fit_meanshift_norm,
penalty_fn = BIC,
model_fn_args = list(),
...
)
segment_ga_shi(x, ...)
segment_ga_coen(x, ...)
segment_ga_random(x, ...)
Arguments
x |
A time series |
model_fn |
A |
penalty_fn |
A function that evaluates the changepoint set returned by
|
model_fn_args |
A |
... |
arguments passed to |
Details
segment_ga()
uses the genetic algorithm in GA::ga()
to "evolve" a random
set of candidate changepoint sets, using the penalized objective function
specified by penalty_fn
.
By default, the normal meanshift
model is fit (see fit_meanshift_norm()
)
and the BIC penalty is applied.
-
segment_ga_shi()
: Shi's algorithm is the algorithm used in doi:10.1175/JCLI-D-21-0489.1. Note that in order to achieve the reported results you have to run the algorithm for a really long time. Pass the valuesmaxiter
= 50000 andrun
= 10000 toGA::ga()
using the dots.
-
segment_ga_coen()
: Coen's algorithm is the one used in doi:10.1007/978-3-031-47372-2_20. Note that the speed of the algorithm is highly sensitive to the size of the changepoint sets under consideration, with large changepoint sets being slow. Consider setting thepopulation
argument toGA::ga()
to improve performance. Coen's algorithm uses thebuild_gabin_population()
function for this purpose by default.
-
segment_ga_random()
: Randomly select candidate changepoint sets. This is implemented as a genetic algorithm with only one generation (i.e.,maxiter = 1
). Note that this function useslog_gabin_population()
by default.
Value
A tidyga
object. This is just a GA::ga()
object with an additional
slot for data
(the original time series) and model_fn_args
(captures
the model_fn
and penalty_fn
arguments).
References
Shi, et al. (2022, doi:10.1175/JCLI-D-21-0489.1)
Taimal, et al. (2023, doi:10.1007/978-3-031-47372-2_20)
See Also
Examples
# Segment a time series using a genetic algorithm
res <- segment_ga(CET, maxiter = 5)
summary(res)
str(res)
plot(res)
# Segment a time series using Shi's algorithm
x <- segment(CET, method = "ga-shi", maxiter = 5)
str(x)
# Segment a time series using Coen's algorithm
y <- segment(CET, method = "ga-coen", maxiter = 5)
changepoints(y)
# Segment a time series using Coen's algorithm and an arbitrary threshold
z <- segment(CET, method = "ga-coen", maxiter = 5,
model_fn_args = list(threshold = 2))
changepoints(z)
## Not run:
# This will take a really long time!
x <- segment(CET, method = "ga-shi", maxiter = 500, run = 100)
changepoints(x)
# This will also take a really long time!
y <- segment(CET, method = "ga", model_fn = fit_lmshift, penalty_fn = BIC,
popSize = 200, maxiter = 5000, run = 1000,
model_fn_args = list(trends = TRUE),
population = build_gabin_population(CET)
)
## End(Not run)
## Not run:
x <- segment(method = "ga-coen", maxiter = 50)
## End(Not run)
x <- segment(CET, method = "random")
Manually segment a time series
Description
Segment a time series by manually inputting the changepoint set
Usage
segment_manual(x, tau, ...)
Arguments
x |
A time series |
tau |
a set of indices representing a changepoint set |
... |
arguments passed to seg_cpt |
Details
Sometimes you want to see how a manually input set of changepoints performs.
This function takes a time series and a changepoint detection set as inputs
and returns a seg_cpt object representing the segmenter.
Note that by default fit_meanshift_norm()
is used to fit the model and
BIC()
is used as the penalized objective function.
Value
A seg_cpt object
Examples
# Segment a time series manually
segment_manual(CET, tau = c(84, 330))
segment_manual(CET, tau = NULL)
Segment a time series using the PELT algorithm
Description
Segmenting functions for the PELT algorithm
Usage
segment_pelt(x, model_fn = fit_meanvar, ...)
Arguments
x |
A time series |
model_fn |
A |
... |
arguments passed to |
Details
This function wraps either changepoint::cpt.meanvar()
or
changepoint::cpt.mean()
.
Value
A cpt
object returned by changepoint::cpt.meanvar()
or
changepoint::cpt.mean()
Examples
# Segment a time series using PELT
res <- segment_pelt(DataCPSim)
res
str(res)
# Segment as time series while specifying a penalty function
segment_pelt(DataCPSim, penalty = "BIC")
# Segment a time series while specifying a meanshift normal model
segment_pelt(DataCPSim, model_fn = fit_meanshift_norm, penalty = "BIC")
Convert changepoint sets to time indices
Description
Convert changepoint sets to time indices
Usage
tau2time(tau, index)
time2tau(cpts, index)
Arguments
tau |
a numeric vector of changepoint indices |
index |
Index of times, typically returned by |
cpts |
Time series observation labels to be converted to indices |
Value
-
tau2time()
: acharacter
of time labels
-
time2tau()
: aninteger
vector of changepoint indices
See Also
Examples
# Recover the years from a set of changepoint indices
tau2time(c(42, 81, 330), index = as_year(time(CET)))
# Recover the changepoint set indices from the years
time2tau(c(1700, 1739, 1988), index = as_year(time(CET)))
Format the coefficients from a linear model as a tibble
Description
Format the coefficients from a linear model as a tibble
Usage
tbl_coef(mod, ...)
Arguments
mod |
An |
... |
currently ignored |
Value
A tibble::tbl_df object containing the fitted coefficients.
Examples
# Convert a time series into a data frame with indices
ds <- data.frame(y = as.ts(CET), t = 1:length(CET))
# Retrieve the coefficients from a null model
tbl_coef(lm(y ~ 1, data = ds))
# Retrieve the coefficients from a two changepoint model
tbl_coef(lm(y ~ (t >= 42) + (t >= 81), data = ds))
# Retrieve the coefficients from a trendshift model
tbl_coef(lm(y ~ poly(t, 1, raw = TRUE) * (t >= 42) + poly(t, 1, raw = TRUE) * (t >= 81), data = ds))
# Retrieve the coefficients from a quadratic model
tbl_coef(lm(y ~ poly(t, 2, raw = TRUE) * (t >= 42) + poly(t, 2, raw = TRUE) * (t >= 81), data = ds))
Simulate time series with known changepoint sets
Description
Simulate time series with known changepoint sets
Usage
test_set(n = 1, sd = 1, seed = NULL)
Arguments
n |
Number of true changepoints in set |
sd |
Standard deviation passed to |
seed |
Value passed to |
Value
A stats::ts()
object
See Also
Examples
x <- test_set()
plot(x)
changepoints(x)
Container class for tidycpt
objects
Description
Container class for tidycpt
objects
Details
Every tidycpt
object contains:
-
segmenter
: The object returned by the underlying changepoint detection algorithm. These can be of arbitrary class. Useas.segmenter()
to retrieve them. -
model
: A model object inheriting from mod_cpt, as created byas.model()
when called on thesegmenter
. -
elapsed_time
: The clock time that passed while the algorithm was running. -
time_index
: If available, the labels for the time indices of the time series.
Value
A tidycpt object.
Examples
# Segment a time series using PELT
x <- segment(CET, method = "pelt")
class(x)
str(x)
Vectors implementation for logLik
Description
Vectors implementation for logLik
Usage
## S3 method for class 'logLik.logLik'
vec_ptype2(x, y, ...)
## S3 method for class 'logLik.logLik'
vec_cast(x, to, ...)
Arguments
x , y |
Vector types. |
... |
These dots are for future extensions and must be empty. |
to |
Type to cast to. If |
Value
A stats::logLik()
vector.
See Also
Examples
a <- logLik(lm(mpg ~ disp, data = mtcars))
b <- logLik(lm(mpg ~ am, data = mtcars))
vec_ptype2(a, b)
c(a, b)
vec_cast(a, b)
Recover the function that created a model
Description
Recover the function that created a model
Usage
whomademe(x, ...)
Arguments
x |
A |
... |
currently ignored |
Details
Model objects (inheriting from mod_cpt) know the name of the function
that created them.
whomademe()
returns that function.
Value
A function
See Also
Other model-fitting:
fit_arima()
,
fit_lmshift()
,
fit_meanshift()
,
fit_meanvar()
,
fit_nhpp()
,
model_args()
,
model_name()
,
new_fun_cpt()
Examples
# Get the function that made a model
f <- whomademe(fit_meanshift_norm(CET, tau = 42))
str(f)