Type: | Package |
Title: | Fast Change Point Detection via Sequential Gradient Descent |
Version: | 0.16.2 |
Description: | Implements fast change point detection algorithm based on the paper "Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis" by Xianyang Zhang, Trisha Dawn https://proceedings.mlr.press/v206/zhang23b.html. The algorithm is based on dynamic programming with pruning and sequential gradient descent. It is able to detect change points a magnitude faster than the vanilla Pruned Exact Linear Time(PELT). The package includes examples of linear regression, logistic regression, Poisson regression, penalized linear regression data, and whole lot more examples with custom cost function in case the user wants to use their own cost function. |
License: | GPL (≥ 3) |
URL: | https://fastcpd.xingchi.li, https://github.com/doccstat/fastcpd |
BugReports: | https://github.com/doccstat/fastcpd/issues |
Depends: | R (≥ 2.10) |
Imports: | glmnet, Matrix, methods, Rcpp (≥ 0.11.0), stats |
Suggests: | ggplot2, gridExtra, knitr, matrixStats, mvtnorm, rmarkdown, testthat (≥ 3.0.0), xml2 |
LinkingTo: | progress, Rcpp, RcppArmadillo, RcppEigen, testthat |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | true |
Config/testthat/start-first: | examples-fastcpd_arima, examples-fastcpd_ts |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2025-04-25 04:01:10 UTC; doccstat |
Author: | Xingchi Li |
Maintainer: | Xingchi Li <anthony.li.stat.tamu.edu@lixingchi.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-25 04:20:01 UTC |
Bitcoin Market Price (USD)
Description
The average USD market price across major bitcoin exchanges.
Usage
bitcoin
Format
A data frame with 1354 rows and 2 variables:
- date
POSIXct,POSIXt (TZ: "UTC") from 2019-01-02 to 2023-10-28
- price
The average USD market price across major bitcoin exchanges
Source
<https://www.blockchain.com/explorer/charts/market-price>
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
p <- ggplot2::ggplot(bitcoin, ggplot2::aes(x = date, y = price)) +
ggplot2::geom_line()
print(p)
result <- suppressWarnings(fastcpd.garch(
diff(log(bitcoin$price[600:900])), c(1, 1),
beta = "BIC", cost_adjustment = "BIC"
))
summary(result)
bitcoin$date[result@cp_set + 600]
plot(result)
cp_dates <- bitcoin[600 + result@cp_set + 1, "date"]
ggplot2::ggplot(
data = data.frame(
x = bitcoin$date[600:900], y = bitcoin$price[600:900]
),
ggplot2::aes(x = x, y = y)
) +
ggplot2::geom_line(color = "steelblue") +
ggplot2::geom_vline(
xintercept = cp_dates,
color = "red",
linetype = "dotted",
linewidth = 0.5,
alpha = 0.7
) +
ggplot2::labs(
x = "Year",
y = "Bitcoin price in USD"
) +
ggplot2::annotate(
"text",
x = cp_dates,
y = 2000,
label = as.character(cp_dates),
color = "steelblue"
) +
ggplot2::theme_bw()
}
Find change points efficiently
Description
fastcpd()
takes in formulas, data, families and extra
parameters and returns a fastcpd object.
Usage
fastcpd(
formula = y ~ . - 1,
data,
beta = "MBIC",
cost_adjustment = "MBIC",
family = NULL,
cost = NULL,
cost_gradient = NULL,
cost_hessian = NULL,
line_search = c(1),
lower = rep(-Inf, p),
upper = rep(Inf, p),
pruning_coef = 0,
segment_count = 10,
trim = 0.05,
momentum_coef = 0,
multiple_epochs = function(x) 0,
epsilon = 1e-10,
order = c(0, 0, 0),
p = ncol(data) - 1,
variance_estimation = NULL,
cp_only = FALSE,
vanilla_percentage = 0,
warm_start = FALSE,
...
)
Arguments
formula |
A formula object specifying the model to be fitted. The
(optional) response variable should be on the LHS of the formula, while the
covariates should be on the RHS. The naming of variables used in the formula
should be consistent with the column names in the data frame provided in
|
data |
A data frame of dimension |
beta |
Penalty criterion for the number of change points. This parameter
takes a string value of |
cost_adjustment |
Cost adjustment criterion.
It can be |
family |
Family class of the change point model. It can be
|
cost |
Cost function to be used.
Users can specify a loss function using the second format that will be used
to calculate the cost value. In both formats, the input data is a subset of
the original data frame in the form of a matrix
(a matrix with a single column in the case of a univariate data set).
In the first format, the specified cost function directly calculates the cost
value. |
cost_gradient |
Gradient of the custom cost function. Example usage: cost_gradient = function(data, theta) { ... return(gradient) } The gradient function takes two inputs, the first being a matrix representing
a segment of the data, similar to the format used in the |
cost_hessian |
Hessian of the custom loss function. The Hessian function
takes two inputs, the first being a matrix representing a segment of the
data, similar to the format used in the |
line_search |
If a vector of numeric values is provided, a line search
will be performed to find the optimal step size for each update. Detailed
usage of |
lower |
Lower bound for the parameters. Used to specify the domain of
the parameters after each gradient descent step. If not specified, the lower
bound is set to be |
upper |
Upper bound for the parameters. Used to specify the domain of
the parameters after each gradient descent step. If not specified, the upper
bound is set to be |
pruning_coef |
Pruning coefficient $c_0$ used in the pruning step of the
PELT algorithm with the default value 0. If |
segment_count |
An initial guess of the number of segments. If not specified, the initial guess of the number of segments is 10. The initial guess affects the initial estimates of the parameters in SeGD. |
trim |
Trimming for the boundary change points so that a change point
close to the boundary will not be counted as a change point. This
parameter also specifies the minimum distance between two change points.
If several change points have mutual distances smaller than
|
momentum_coef |
Momentum coefficient to be applied to each update. This parameter is used when the loss function is bad-shaped so that maintaining a momentum from previous update is desired. Default value is 0, meaning the algorithm doesn't maintain a momentum by default. |
multiple_epochs |
A function can be specified such that an adaptive
number of multiple epochs can be utilized to improve the algorithm's
performance. multiple_epochs = function(segment_length) { if (segment_length < 100) 1 else 0 } This function will let SeGD perform parameter updates with an additional epoch for each segment with a length less than 100 and no additional epoch for segments with lengths greater or equal to 100. |
epsilon |
Epsilon to avoid numerical issues. Only used for the Hessian computation in Logistic Regression and Poisson Regression. |
order |
Order of the AR( |
p |
Number of covariates in the model. If not specified, the number of
covariates will be inferred from the data, i.e.,
|
variance_estimation |
An estimate of the variance / covariance matrix for the data. If not specified, the variance / covariance matrix will be estimated using the data. |
cp_only |
If |
vanilla_percentage |
The parameter |
warm_start |
If |
... |
Other parameters for specific models.
|
Value
A fastcpd object.
Gallery
https://github.com/doccstat/fastcpd/tree/main/tests/testthat/examples
References
Xingchi Li, Xianyang Zhang (2024). “fastcpd: Fast Change Point Detection in R.” arXiv:2404.05933, https://arxiv.org/abs/2404.05933.
Xianyang Zhang, Trisha Dawn (2023). “Sequential Gradient Descent and Quasi-Newton's Method for Change-Point Analysis.” In Ruiz, Francisco, Dy, Jennifer, van de Meent, Jan-Willem (eds.), Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 series Proceedings of Machine Learning Research, 1129-1143.
See Also
fastcpd.family for the family-specific function;
plot.fastcpd()
for plotting the results,
summary.fastcpd()
for summarizing the results.
Examples
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
n <- 200
p <- 4
d <- 2
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
theta_1 <- matrix(runif(8, -3, -1), nrow = p)
theta_2 <- matrix(runif(8, -1, 3), nrow = p)
y <- rbind(
x[1:125, ] %*% theta_1 + mvtnorm::rmvnorm(125, rep(0, d), 3 * diag(d)),
x[126:n, ] %*% theta_2 + mvtnorm::rmvnorm(75, rep(0, d), 3 * diag(d))
)
result_mlm <- fastcpd(
cbind(y.1, y.2) ~ . - 1, cbind.data.frame(y = y, x = x), family = "lm"
)
summary(result_mlm)
}
if (
requireNamespace("mvtnorm", quietly = TRUE) &&
requireNamespace("stats", quietly = TRUE)
) {
set.seed(1)
n <- 400 + 300 + 500
p <- 5
x <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = diag(p))
theta <- rbind(
mvtnorm::rmvnorm(1, mean = rep(0, p - 3), sigma = diag(p - 3)),
mvtnorm::rmvnorm(1, mean = rep(5, p - 3), sigma = diag(p - 3)),
mvtnorm::rmvnorm(1, mean = rep(9, p - 3), sigma = diag(p - 3))
)
theta <- cbind(theta, matrix(0, 3, 3))
theta <- theta[rep(seq_len(3), c(400, 300, 500)), ]
y_true <- rowSums(x * theta)
factor <- c(
2 * stats::rbinom(400, size = 1, prob = 0.95) - 1,
2 * stats::rbinom(300, size = 1, prob = 0.95) - 1,
2 * stats::rbinom(500, size = 1, prob = 0.95) - 1
)
y <- factor * y_true + stats::rnorm(n)
data <- cbind.data.frame(y, x)
huber_threshold <- 1
huber_loss <- function(data, theta) {
residual <- data[, 1] - data[, -1, drop = FALSE] %*% theta
indicator <- abs(residual) <= huber_threshold
sum(
residual^2 / 2 * indicator +
huber_threshold * (
abs(residual) - huber_threshold / 2
) * (1 - indicator)
)
}
huber_loss_gradient <- function(data, theta) {
residual <- c(data[nrow(data), 1] - data[nrow(data), -1] %*% theta)
if (abs(residual) <= huber_threshold) {
-residual * data[nrow(data), -1]
} else {
-huber_threshold * sign(residual) * data[nrow(data), -1]
}
}
huber_loss_hessian <- function(data, theta) {
residual <- c(data[nrow(data), 1] - data[nrow(data), -1] %*% theta)
if (abs(residual) <= huber_threshold) {
outer(data[nrow(data), -1], data[nrow(data), -1])
} else {
0.01 * diag(length(theta))
}
}
huber_regression_result <- fastcpd(
formula = y ~ . - 1,
data = data,
beta = (p + 1) * log(n) / 2,
cost = huber_loss,
cost_gradient = huber_loss_gradient,
cost_hessian = huber_loss_hessian
)
summary(huber_regression_result)
}
set.seed(1)
p <- 1
x <- matrix(rnorm(375 * p, 0, 1), ncol = p)
theta <- rbind(rnorm(p, 0, 1), rnorm(p, 2, 1))
y <- c(
rbinom(200, 1, 1 / (1 + exp(-x[1:200, ] %*% theta[1, , drop = FALSE]))),
rbinom(175, 1, 1 / (1 + exp(-x[201:375, ] %*% theta[2, , drop = FALSE])))
)
data <- data.frame(y = y, x = x)
result_builtin <- suppressWarnings(fastcpd.binomial(data))
logistic_loss <- function(data, theta) {
x <- data[, -1, drop = FALSE]
y <- data[, 1]
u <- x %*% theta
nll <- -y * u + log(1 + exp(u))
nll[u > 10] <- -y[u > 10] * u[u > 10] + u[u > 10]
sum(nll)
}
logistic_loss_gradient <- function(data, theta) {
x <- data[nrow(data), -1, drop = FALSE]
y <- data[nrow(data), 1]
c(-(y - 1 / (1 + exp(-x %*% theta)))) * x
}
logistic_loss_hessian <- function(data, theta) {
x <- data[nrow(data), -1]
prob <- 1 / (1 + exp(-x %*% theta))
(x %o% x) * c((1 - prob) * prob)
}
result_custom <- fastcpd(
formula = y ~ . - 1,
data = data,
epsilon = 1e-5,
cost = logistic_loss,
cost_gradient = logistic_loss_gradient,
cost_hessian = logistic_loss_hessian
)
result_builtin@cp_set
result_custom@cp_set
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
n <- 480
p_true <- 6
p <- 50
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
theta_0 <- rbind(
runif(p_true, -5, -2),
runif(p_true, -3, 3),
runif(p_true, 2, 5),
runif(p_true, -5, 5)
)
theta_0 <- cbind(theta_0, matrix(0, ncol = p - p_true, nrow = 4))
y <- c(
x[1:80, ] %*% theta_0[1, ] + rnorm(80, 0, 1),
x[81:200, ] %*% theta_0[2, ] + rnorm(120, 0, 1),
x[201:320, ] %*% theta_0[3, ] + rnorm(120, 0, 1),
x[321:n, ] %*% theta_0[4, ] + rnorm(160, 0, 1)
)
small_lasso_data <- cbind.data.frame(y, x)
result_no_vp <- fastcpd.lasso(
small_lasso_data,
beta = "BIC",
cost_adjustment = NULL,
pruning_coef = 0
)
summary(result_no_vp)
result_20_vp <- fastcpd.lasso(
small_lasso_data,
beta = "BIC",
cost_adjustment = NULL,
vanilla_percentage = 0.2,
pruning_coef = 0
)
summary(result_20_vp)
}
An S4 class to store the output created with fastcpd()
Description
This S4 class stores the output from fastcpd()
and
fastcpd.family. A fastcpd object consist
of several slots including the call to fastcpd()
, the data used, the
family of the model, the change points, the cost values, the residuals, the
estimated parameters and a boolean indicating whether the model was fitted
with only change points or with change points and parameters, which you can
select using @
.
Slots
call
The call of the function.
data
The data passed to the function.
order
The order of the time series model.
family
The family of the model.
cp_set
The set of change points.
cost_values
The cost function values for each segment.
residuals
The residuals of the model with change points. Used only for built-in families.
thetas
The estimated parameters for each segment. Used only for built-in families.
cp_only
A boolean indicating whether
fastcpd()
was run to return only the change points or the change points with the estimated parameters and cost values for each segment.
Find change points efficiently in AR(p
) models
Description
fastcpd_ar()
and fastcpd.ar()
are
wrapper functions of fastcpd()
to find change points in
AR(p
) models. The function is similar to fastcpd()
except that
the data is by default a one-column matrix or univariate vector
and thus a formula is not required here.
Usage
fastcpd_ar(data, order = 0, ...)
fastcpd.ar(data, order = 0, ...)
Arguments
data |
A numeric vector, a matrix, a data frame or a time series object. |
order |
A positive integer specifying the order of the AR model. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <- fastcpd.ar(x[3 + seq_len(n)], 3)
summary(result)
plot(result)
set.seed(1)
n <- 1000
x <- rep(0, n + 3)
for (i in 1:600) {
x[i + 3] <- 0.6 * x[i + 2] - 0.2 * x[i + 1] + 0.1 * x[i] + rnorm(1, 0, 3)
}
for (i in 601:1000) {
x[i + 3] <- 0.3 * x[i + 2] + 0.4 * x[i + 1] + 0.2 * x[i] + rnorm(1, 0, 3)
}
result <-
fastcpd.ar(x[3 + seq_len(n)], 3, beta = "MDL", cost_adjustment = "MDL")
summary(result)
plot(result)
Find change points efficiently in
ARIMA(p
, d
, q
) models
Description
fastcpd_arima()
and fastcpd.arima()
are
wrapper functions of fastcpd()
to find change points in
ARIMA(p
, d
, q
) models.
The function is similar to fastcpd()
except that the data is by default a one-column matrix or univariate vector
and thus a formula is not required here.
Usage
fastcpd_arima(data, order = 0, ...)
fastcpd.arima(data, order = 0, ...)
Arguments
data |
A numeric vector, a matrix, a data frame or a time series object. |
order |
A vector of length three specifying the order of the ARIMA model. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
set.seed(1)
n <- 801
w <- rnorm(n + 1, 0, 3)
dx <- rep(0, n + 1)
x <- rep(0, n + 1)
for (i in 1:400) {
dx[i + 1] <- 0.9 * dx[i] + w[i + 1] - 0.1 * w[i]
x[i + 1] <- x[i] + dx[i + 1]
}
for (i in 401:n) {
dx[i + 1] <- -0.6 * dx[i] + w[i + 1] + 0.3 * w[i]
x[i + 1] <- x[i] + dx[i + 1]
}
result <- fastcpd.arima(
diff(x[1 + seq_len(n)]),
c(1, 0, 1),
segment_count = 3,
include.mean = FALSE
)
summary(result)
plot(result)
Find change points efficiently in ARMA(p
, q
) models
Description
fastcpd_arma()
and fastcpd.arma()
are
wrapper functions of fastcpd()
to find change points in
ARMA(p
, q
) models. The function is similar to fastcpd()
except that the data is by default a one-column matrix or univariate vector
and thus a formula is not required here.
Usage
fastcpd_arma(data, order = c(0, 0), ...)
fastcpd.arma(data, order = c(0, 0), ...)
Arguments
data |
A numeric vector, a matrix, a data frame or a time series object. |
order |
A vector of length two specifying the order of the ARMA model. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
set.seed(1)
n <- 200
w <- rnorm(n + 3, 0, 3)
x <- rep(0, n + 3)
for (i in 1:150) {
x[i + 3] <- 0.1 * x[i + 2] - 0.3 * x[i + 1] + 0.1 * x[i] +
0.1 * w[i + 2] + 0.5 * w[i + 1] + w[i + 3]
}
for (i in 151:n) {
x[i + 3] <- 0.3 * x[i + 2] + 0.1 * x[i + 1] - 0.3 * x[i] -
0.6 * w[i + 2] - 0.1 * w[i + 1] + w[i + 3]
}
result <- suppressWarnings(
fastcpd.arma(
data = x[3 + seq_len(n)],
order = c(3, 2),
segment_count = 3,
lower = c(rep(-1, 3 + 2), 1e-10),
upper = c(rep(1, 3 + 2), Inf),
line_search = c(1, 0.1, 1e-2),
beta = "BIC",
cost_adjustment = "BIC",
trim = 0.025
)
)
summary(result)
plot(result)
Find change points efficiently in logistic regression models
Description
fastcpd_binomial()
and fastcpd.binomial()
are
wrapper functions of fastcpd()
to find change points in
logistic regression models. The function is similar to fastcpd()
except that the data is by default a matrix or data frame with the response
variable as the first column and thus a formula is not required here.
Usage
fastcpd_binomial(data, ...)
fastcpd.binomial(data, ...)
Arguments
data |
A matrix or a data frame with the response variable as the first column. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
n <- 500
p <- 4
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
theta <- rbind(rnorm(p, 0, 1), rnorm(p, 2, 1))
y <- c(
rbinom(300, 1, 1 / (1 + exp(-x[1:300, ] %*% theta[1, ]))),
rbinom(200, 1, 1 / (1 + exp(-x[301:n, ] %*% theta[2, ])))
)
result <- suppressWarnings(fastcpd.binomial(cbind(y, x)))
summary(result)
plot(result)
}
Wrapper functions for fastcpd
Description
Wrapper functions for fastcpd to find change points in various models.
See Also
fastcpd.mean()
, fastcpd.variance()
, fastcpd.mv()
,
fastcpd.meanvariance()
for basic statistics change models;
fastcpd.lm()
, fastcpd.binomial()
, fastcpd.poisson()
,
fastcpd.lasso()
for regression coefficients change models;
fastcpd.ar()
, fastcpd.var()
, fastcpd.arima()
, fastcpd.arma()
,
fastcpd.garch()
for change in time series models.
Find change points efficiently in GARCH(p
, q
) models
Description
fastcpd_garch()
and fastcpd.garch()
are
wrapper functions of fastcpd()
to find change points in
GARCH(p
, q
) models. The function is similar to fastcpd()
except that the data is by default a one-column matrix or univariate vector
and thus a formula is not required here.
Usage
fastcpd_garch(data, order = c(0, 0), ...)
fastcpd.garch(data, order = c(0, 0), ...)
Arguments
data |
A numeric vector, a matrix, a data frame or a time series object. |
order |
A positive integer vector of length two specifying the order of the GARCH model. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
set.seed(1)
n <- 1501
sigma_2 <- rep(1, n + 1)
x <- rep(0, n + 1)
for (i in seq_len(750)) {
sigma_2[i + 1] <- 20 + 0.8 * x[i]^2 + 0.1 * sigma_2[i]
x[i + 1] <- rnorm(1, 0, sqrt(sigma_2[i + 1]))
}
for (i in 751:n) {
sigma_2[i + 1] <- 1 + 0.1 * x[i]^2 + 0.5 * sigma_2[i]
x[i + 1] <- rnorm(1, 0, sqrt(sigma_2[i + 1]))
}
result <- suppressWarnings(
fastcpd.garch(x[-1], c(1, 1), include.mean = FALSE)
)
summary(result)
plot(result)
Find change points efficiently in penalized linear regression models
Description
fastcpd_lasso()
and fastcpd.lasso()
are wrapper
functions of fastcpd()
to find change points in penalized
linear regression models. The function is similar to fastcpd()
except that the data is by default a matrix or data frame with the response
variable as the first column and thus a formula is not required here.
Usage
fastcpd_lasso(data, ...)
fastcpd.lasso(data, ...)
Arguments
data |
A matrix or a data frame with the response variable as the first column. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
if (
requireNamespace("ggplot2", quietly = TRUE) &&
requireNamespace("mvtnorm", quietly = TRUE)
) {
set.seed(1)
n <- 480
p_true <- 5
p <- 50
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
theta_0 <- rbind(
runif(p_true, -5, -2),
runif(p_true, -3, 3),
runif(p_true, 2, 5),
runif(p_true, -5, 5)
)
theta_0 <- cbind(theta_0, matrix(0, ncol = p - p_true, nrow = 4))
y <- c(
x[1:80, ] %*% theta_0[1, ] + rnorm(80, 0, 1),
x[81:200, ] %*% theta_0[2, ] + rnorm(120, 0, 1),
x[201:320, ] %*% theta_0[3, ] + rnorm(120, 0, 1),
x[321:n, ] %*% theta_0[4, ] + rnorm(160, 0, 1)
)
result <- fastcpd.lasso(
cbind(y, x),
multiple_epochs = function(segment_length) if (segment_length < 30) 1 else 0
)
summary(result)
plot(result)
# Combine estimated thetas with true parameters
thetas <- result@thetas
thetas <- cbind.data.frame(thetas, t(theta_0))
names(thetas) <- c(
"segment 1", "segment 2", "segment 3", "segment 4",
"segment 1 truth", "segment 2 truth", "segment 3 truth", "segment 4 truth"
)
thetas$coordinate <- c(seq_len(p_true), rep("rest", p - p_true))
# Melt the data frame using base R (i.e., convert from wide to long format)
data_cols <- setdiff(names(thetas), "coordinate")
molten <- data.frame(
coordinate = rep(thetas$coordinate, times = length(data_cols)),
variable = rep(data_cols, each = nrow(thetas)),
value = as.vector(as.matrix(thetas[, data_cols]))
)
# Remove the "segment " and " truth" parts to extract the segment number
molten$segment <- gsub("segment ", "", molten$variable)
molten$segment <- gsub(" truth", "", molten$segment)
# Compute height: the numeric value of the segment plus an offset for truth values
molten$height <- as.numeric(gsub("segment.*", "", molten$segment)) +
0.2 * as.numeric(grepl("truth", molten$variable))
# Create a parameter indicator based on whether the variable corresponds to truth or estimation
molten$parameter <- ifelse(grepl("truth", molten$variable), "truth", "estimated")
p <- ggplot2::ggplot() +
ggplot2::geom_point(
data = molten,
ggplot2::aes(
x = value, y = height, shape = coordinate, color = parameter
),
size = 4
) +
ggplot2::ylim(0.8, 4.4) +
ggplot2::ylab("segment") +
ggplot2::theme_bw()
print(p)
}
Find change points efficiently in linear regression models
Description
fastcpd_lm()
and fastcpd.lm()
are wrapper
functions of fastcpd()
to find change points in linear
regression models. The function is similar to fastcpd()
except that
the data is by default a matrix or data frame with the response variable
as the first column and thus a formula is not required here.
Usage
fastcpd_lm(data, ...)
fastcpd.lm(data, ...)
Arguments
data |
A matrix or a data frame with the response variable as the first column. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
n <- 300
p <- 4
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
theta_0 <- rbind(c(1, 3.2, -1, 0), c(-1, -0.5, 2.5, -2), c(0.8, 0, 1, 2))
y <- c(
x[1:100, ] %*% theta_0[1, ] + rnorm(100, 0, 3),
x[101:200, ] %*% theta_0[2, ] + rnorm(100, 0, 3),
x[201:n, ] %*% theta_0[3, ] + rnorm(100, 0, 3)
)
result_lm <- fastcpd.lm(cbind(y, x))
summary(result_lm)
plot(result_lm)
}
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
n <- 600
p <- 4
d <- 2
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
theta_1 <- matrix(runif(8, -3, -1), nrow = p)
theta_2 <- matrix(runif(8, -1, 3), nrow = p)
y <- rbind(
x[1:350, ] %*% theta_1 + mvtnorm::rmvnorm(350, rep(0, d), 3 * diag(d)),
x[351:n, ] %*% theta_2 + mvtnorm::rmvnorm(250, rep(0, d), 3 * diag(d))
)
result_mlm <- fastcpd.lm(cbind.data.frame(y = y, x = x), p.response = 2)
summary(result_mlm)
}
Find change points efficiently in mean change models
Description
fastcpd_mean()
and fastcpd.mean()
are wrapper
functions of fastcpd()
to find the mean change. The function is
similar to fastcpd()
except that the data is by default a matrix or
data frame or a vector with each row / element as an observation and thus a
formula is not required here.
Usage
fastcpd_mean(data, ...)
fastcpd.mean(data, ...)
Arguments
data |
A matrix, a data frame or a vector. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
set.seed(1)
data <- matrix(c(
rnorm(300, mean = 0, sd = 10),
rnorm(400, mean = 50, sd = 10),
rnorm(300, mean = 2, sd = 10)
))
system.time(result <- fastcpd.mean(data))
summary(result)
plot(result)
set.seed(1)
p <- 3
data <- rbind(
matrix(rnorm(p * 3e+5, mean = 0, sd = 10), ncol = p),
matrix(rnorm(p * 4e+5, mean = 50, sd = 10), ncol = p),
matrix(rnorm(p * 3e+5, mean = 2, sd = 10), ncol = p)
)
system.time(result <- fastcpd.mean(data, r.progress = FALSE, cp_only = TRUE))
summary(result)
Find change points efficiently in mean variance change models
Description
fastcpd_meanvariance()
, fastcpd.meanvariance()
,
fastcpd_mv()
, fastcpd.mv()
are wrapper
functions of fastcpd()
to find the meanvariance change. The
function is similar to fastcpd()
except that the data is by
default a matrix or data frame or a vector with each row / element as an
observation and thus a formula is not required here.
Usage
fastcpd_meanvariance(data, ...)
fastcpd.meanvariance(data, ...)
fastcpd_mv(data, ...)
fastcpd.mv(data, ...)
Arguments
data |
A matrix, a data frame or a vector. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
set.seed(1)
data <- c(
rnorm(3000, 0, 1),
rnorm(1000, 10, 1),
rnorm(3000, 10, 20),
rnorm(1000, 0, 1)
)
system.time(result <- fastcpd.mv(data))
summary(result)
plot(result)
set.seed(1)
p <- 3
data <- if (requireNamespace("mvtnorm", quietly = TRUE)) {
rbind(
mvtnorm::rmvnorm(2e+5, mean = rep(0, p), sigma = diag(1, p)),
mvtnorm::rmvnorm(1e+5, mean = rep(50, p), sigma = diag(1, p)),
mvtnorm::rmvnorm(2e+5, mean = rep(0, p), sigma = diag(100, p)),
mvtnorm::rmvnorm(2e+5, mean = rep(0, p), sigma = diag(1, p)),
mvtnorm::rmvnorm(1e+5, mean = rep(50, p), sigma = diag(1, p)),
mvtnorm::rmvnorm(2e+5, mean = rep(50, p), sigma = diag(100, p))
)
} else {
rbind(
matrix(rnorm(p * 2e+5, mean = 0, sd = 1), ncol = p),
matrix(rnorm(p * 1e+5, mean = 50, sd = 1), ncol = p),
matrix(rnorm(p * 2e+5, mean = 0, sd = 10), ncol = p),
matrix(rnorm(p * 2e+5, mean = 0, sd = 1), ncol = p),
matrix(rnorm(p * 1e+5, mean = 50, sd = 1), ncol = p),
matrix(rnorm(p * 2e+5, mean = 50, sd = 10), ncol = p)
)
}
system.time(result <- fastcpd.mv(data))
summary(result)
result@thetas[seq_len(p), ]
lapply(result@thetas[seq_len(p^2) + p, ], function(thetas) matrix(thetas, p))
Find change points efficiently in Poisson regression models
Description
fastcpd_poisson()
and fastcpd.poisson()
are
wrapper functions of fastcpd()
to find change points in
Poisson regression models. The function is similar to fastcpd()
except that the data is by default a matrix or data frame with the response
variable as the first column and thus a formula is not required here.
Usage
fastcpd_poisson(data, ...)
fastcpd.poisson(data, ...)
Arguments
data |
A matrix or a data frame with the response variable as the first column. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
n <- 1100
p <- 3
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
delta <- rnorm(p)
theta_0 <- c(1, 0.3, -1)
y <- c(
rpois(500, exp(x[1:500, ] %*% theta_0)),
rpois(300, exp(x[501:800, ] %*% (theta_0 + delta))),
rpois(200, exp(x[801:1000, ] %*% theta_0)),
rpois(100, exp(x[1001:1100, ] %*% (theta_0 - delta)))
)
result <- fastcpd.poisson(cbind(y, x))
summary(result)
plot(result)
}
Find change points efficiently in time series data
Description
fastcpd_ts()
and fastcpd.ts()
are wrapper functions for
fastcpd()
to find change points in time series data. The function is
similar to fastcpd()
except that the data is a time series and the
family is one of "ar"
, "var"
, "arma"
, "arima"
or
"garch"
.
Usage
fastcpd_ts(data, family = NULL, order = c(0, 0, 0), ...)
fastcpd.ts(data, family = NULL, order = c(0, 0, 0), ...)
Arguments
data |
A numeric vector, a matrix, a data frame or a time series object. |
family |
A character string specifying the family of the time series.
The value should be one of |
order |
A positive integer or a vector of length less than four
specifying the order of the time series. Possible combinations with
|
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
# Set seed for reproducibility
set.seed(1)
# 1. Define Parameters
n <- 500 # Total length of the time series
cp1 <- 200 # First change point at time 200
cp2 <- 350 # Second change point at time 350
# Define MA(2) coefficients for each segment ensuring invertibility
# MA coefficients affect invertibility; to ensure invertibility, the roots of
# the MA polynomial should lie outside the unit circle.
# Segment 1: Time 1 to cp1
theta1_1 <- 0.5
theta2_1 <- -0.3
# Segment 2: Time (cp1+1) to cp2
theta1_2 <- -0.4
theta2_2 <- 0.25
# Segment 3: Time (cp2+1) to n
theta1_3 <- 0.6
theta2_3 <- -0.35
# Function to check invertibility for MA(2)
is_invertible_ma2 <- function(theta1, theta2) {
# The MA(2) polynomial is: 1 + theta1*z + theta2*z^2 = 0
# Compute the roots of the polynomial
roots <- polyroot(c(1, theta1, theta2))
# Invertible if all roots lie outside the unit circle
all(Mod(roots) > 1)
}
# Verify invertibility for each segment
stopifnot(is_invertible_ma2(theta1_1, theta2_1))
stopifnot(is_invertible_ma2(theta1_2, theta2_2))
stopifnot(is_invertible_ma2(theta1_3, theta2_3))
# 2. Simulate White Noise
e <- rnorm(n + 2, mean = 0, sd = 1) # Extra terms to handle lag
# 3. Initialize the Time Series
y <- numeric(n + 2) # Extra terms for initial lags (y[1], y[2] are zero)
# 4. Apply MA(2) Model with Change Points
for (t in 3:(n + 2)) { # Start from 3 to have enough lags for MA(2)
# Determine current segment
current_time <- t - 2 # Adjust for the extra lags
if (current_time <= cp1) {
theta <- c(theta1_1, theta2_1)
} else if (current_time <= cp2) {
theta <- c(theta1_2, theta2_2)
} else {
theta <- c(theta1_3, theta2_3)
}
# Compute MA(2) value
y[t] <- e[t] + theta[1] * e[t - 1] + theta[2] * e[t - 2]
}
# Remove the initial extra terms
y <- y[3:(n + 2)]
time <- 1:n
# Function to get roots data for plotting
get_roots_data <- function(theta1, theta2, segment) {
roots <- polyroot(c(1, theta1, theta2))
data.frame(
Re = Re(roots),
Im = Im(roots),
Distance = Mod(roots),
Segment = segment
)
}
roots_segment1 <- get_roots_data(theta1_1, theta2_1, "Segment 1")
roots_segment2 <- get_roots_data(theta1_2, theta2_2, "Segment 2")
roots_segment3 <- get_roots_data(theta1_3, theta2_3, "Segment 3")
(roots_data <- rbind(roots_segment1, roots_segment2, roots_segment3))
result <- fastcpd.ts(
y,
"arma",
c(0, 2),
lower = c(-2, -2, 1e-10),
upper = c(2, 2, Inf),
line_search = c(1, 0.1, 1e-2),
trim = 0.04
)
summary(result)
plot(result)
Find change points efficiently in VAR(p
) models
Description
fastcpd_var()
and fastcpd.var()
are
wrapper functions of fastcpd_ts()
to find change points in
VAR(p
) models. The function is similar to fastcpd_ts()
except that the data is by default a matrix with row as an observation
and thus a formula is not required here.
Usage
fastcpd_var(data, order = 0, ...)
fastcpd.var(data, order = 0, ...)
Arguments
data |
A matrix, a data frame or a time series object. |
order |
A positive integer specifying the order of the VAR model. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
set.seed(1)
n <- 300
p <- 2
theta_1 <- matrix(c(-0.3, 0.6, -0.5, 0.4, 0.2, 0.2, 0.2, -0.2), nrow = p)
theta_2 <- matrix(c(0.3, -0.4, 0.1, -0.5, -0.5, -0.2, -0.5, 0.2), nrow = p)
x <- matrix(0, n + 2, p)
for (i in 1:200) {
x[i + 2, ] <- theta_1 %*% c(x[i + 1, ], x[i, ]) + rnorm(p, 0, 1)
}
for (i in 201:n) {
x[i + 2, ] <- theta_2 %*% c(x[i + 1, ], x[i, ]) + rnorm(p, 0, 1)
}
result <- fastcpd.var(x, 2)
summary(result)
Find change points efficiently in variance change models
Description
fastcpd_variance()
and fastcpd.variance()
are wrapper
functions of fastcpd()
to find the variance change. The
function is similar to fastcpd()
except that the data is by
default a matrix or data frame or a vector with each row / element as an
observation and thus a formula is not required here.
Usage
fastcpd_variance(data, ...)
fastcpd.variance(data, ...)
Arguments
data |
A matrix, a data frame or a vector. |
... |
Other arguments passed to |
Value
A fastcpd object.
See Also
Examples
set.seed(1)
data <- c(rnorm(300, 0, 1), rnorm(400, 0, 10), rnorm(300, 0, 1))
system.time(result <- fastcpd.variance(data))
summary(result)
plot(result)
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
p <- 3
data <- rbind(
mvtnorm::rmvnorm(
3e+5, rep(0, p), crossprod(matrix(runif(p^2) * 2 - 1, p))
),
mvtnorm::rmvnorm(
4e+5, rep(0, p), crossprod(matrix(runif(p^2) * 2 - 1, p))
),
mvtnorm::rmvnorm(
3e+5, rep(0, p), crossprod(matrix(runif(p^2) * 2 - 1, p))
)
)
result_time <- system.time(
result <- fastcpd.variance(data, r.progress = FALSE, cp_only = TRUE)
)
print(result_time)
summary(result)
}
Occupancy Detection Data Set
Description
Data set for binary classification of room occupancy from temperature, humidity, light and CO2 measurements. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.
Usage
occupancy
Format
A data frame with 9752 rows and 7 variables:
- date
Character in the format "YYYY-MM-DD hh:mm:ss" from 2015-02-11 14:48:00 to 2015-02-18 09:19:00
- Temperature
Temperature in Celsius
- Humidity
Humidity
- Light
Light
- CO2
CO2
- HumidityRatio
Humidity Ratio
- Occupancy
Binary variable with values 0 (unoccupied) and 1
Source
<https://github.com/LuisM78/Occupancy-detection-data>
Plot the data and the change points for a fastcpd object
Description
Plot the data and the change points for a fastcpd object
Usage
## S3 method for class 'fastcpd'
plot(
x,
color_max_count = Inf,
data_point_alpha = 0.8,
data_point_linewidth = 0.5,
data_point_size = 1,
legend_position = "none",
panel_background = ggplot2::element_blank(),
panel_border = ggplot2::element_rect(fill = NA, colour = "grey20"),
panel_grid_major = ggplot2::element_line(colour = "grey98"),
panel_grid_minor = ggplot2::element_line(colour = "grey98"),
segment_separator_alpha = 0.8,
segment_separator_color = "grey",
segment_separator_linetype = "dashed",
strip_background = ggplot2::element_rect(fill = "grey85", colour = "grey20"),
xlab = NULL,
ylab = NULL,
...
)
## S4 method for signature 'fastcpd,missing'
plot(
x,
color_max_count = Inf,
data_point_alpha = 0.8,
data_point_linewidth = 0.5,
data_point_size = 1,
legend_position = "none",
panel_background = ggplot2::element_blank(),
panel_border = ggplot2::element_rect(fill = NA, colour = "grey20"),
panel_grid_major = ggplot2::element_line(colour = "grey98"),
panel_grid_minor = ggplot2::element_line(colour = "grey98"),
segment_separator_alpha = 0.8,
segment_separator_color = "grey",
segment_separator_linetype = "dashed",
strip_background = ggplot2::element_rect(fill = "grey85", colour = "grey20"),
xlab = NULL,
ylab = NULL,
...
)
Arguments
x |
A fastcpd object. |
color_max_count |
Maximum number of colors to use for the plotting of segments. |
data_point_alpha |
Alpha of the data points. |
data_point_linewidth |
Linewidth of the data points. |
data_point_size |
Size of the data points. |
legend_position |
Position of the legend. |
panel_background |
Background of the panel. |
panel_border |
Border of the panel. |
panel_grid_major |
Major grid lines of the panel. |
panel_grid_minor |
Minor grid lines of the panel. |
segment_separator_alpha |
Alpha of the segment separator lines. |
segment_separator_color |
Color of the segment separator lines. |
segment_separator_linetype |
Linetype of the segment separator lines. |
strip_background |
Background of the strip. |
xlab |
Label for the x-axis. |
ylab |
Label for the y-axis. |
... |
Ignored. |
Value
No return value, called for plotting.
Examples
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
p <- 1
x <- mvtnorm::rmvnorm(300, rep(0, p), diag(p))
theta_0 <- matrix(c(1, -1, 0.5))
y <- c(
x[1:100, ] * theta_0[1, ] + rnorm(100, 0, 1),
x[101:200, ] * theta_0[2, ] + rnorm(100, 0, 1),
x[201:300, ] * theta_0[3, ] + rnorm(100, 0, 1)
)
result <- fastcpd.lm(cbind(y, x))
summary(result)
plot(result)
}
Print the call and the change points for a fastcpd object
Description
Print the call and the change points for a fastcpd object
Usage
## S3 method for class 'fastcpd'
print(x, ...)
## S4 method for signature 'fastcpd'
print(x, ...)
Arguments
x |
A fastcpd object. |
... |
Ignored. |
Value
Return a (temporarily) invisible copy of the fastcpd object. Called primarily for printing the change points in the model.
Show the available methods for a fastcpd object
Description
Show the available methods for a fastcpd object
Usage
## S3 method for class 'fastcpd'
show(object)
## S4 method for signature 'fastcpd'
show(object)
Arguments
object |
A fastcpd object. |
Value
No return value, called for showing a list of available methods for a fastcpd object.
Show the summary of a fastcpd object
Description
Show the summary of a fastcpd object
Usage
## S3 method for class 'fastcpd'
summary(object, ...)
## S4 method for signature 'fastcpd'
summary(object, ...)
Arguments
object |
A fastcpd object. |
... |
Ignored. |
Value
Return a (temporarily) invisible copy of the fastcpd object. Called primarily for printing the summary of the model including the call, the change points, the cost values and the estimated parameters.
Transcription Profiling of 57 Human Bladder Carcinoma Samples
Description
Transcriptome analysis of 57 bladder carcinomas on Affymetrix HG-U95A and HG-U95Av2 microarrays
Usage
transcriptome
Format
A data frame with 2215 rows and 43 variables:
- 3
Individual 3
- 4
Individual 4
- 5
Individual 5
- 6
Individual 6
- 7
Individual 7
- 8
Individual 8
- 9
Individual 9
- 10
Individual 10
- 14
Individual 14
- 15
Individual 15
- 16
Individual 16
- 17
Individual 17
- 18
Individual 18
- 19
Individual 19
- 21
Individual 21
- 22
Individual 22
- 24
Individual 24
- 26
Individual 26
- 28
Individual 28
- 30
Individual 30
- 31
Individual 31
- 33
Individual 33
- 34
Individual 34
- 35
Individual 35
- 36
Individual 36
- 37
Individual 37
- 38
Individual 38
- 39
Individual 39
- 40
Individual 40
- 41
Individual 41
- 42
Individual 42
- 43
Individual 43
- 44
Individual 44
- 45
Individual 45
- 46
Individual 46
- 47
Individual 47
- 48
Individual 48
- 49
Individual 49
- 50
Individual 50
- 51
Individual 51
- 53
Individual 53
- 54
Individual 54
- 57
Individual 57
Source
<https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-TABM-147>
<https://github.com/cran/ecp/tree/master/data>
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
result <- fastcpd.mean(transcriptome$"10", trim = 0.005)
summary(result)
plot(result)
result_all <- fastcpd.mean(
transcriptome,
beta = (ncol(transcriptome) + 1) * log(nrow(transcriptome)) / 2 * 5,
trim = 0
)
plots <- lapply(
seq_len(ncol(transcriptome)), function(i) {
ggplot2::ggplot(
data = data.frame(
x = seq_along(transcriptome[, i]), y = transcriptome[, i]
),
ggplot2::aes(x = x, y = y)
) +
ggplot2::geom_line(color = "steelblue") +
ggplot2::geom_vline(
xintercept = result_all@cp_set,
color = "red",
linetype = "dotted",
linewidth = 0.5,
alpha = 0.7
) +
ggplot2::theme_void()
}
)
if (requireNamespace("gridExtra", quietly = TRUE)) {
gridExtra::grid.arrange(grobs = plots, ncol = 1, nrow = ncol(transcriptome))
}
}
UK Seatbelts Data
Description
Road Casualties in Great Britain 1969–84.
Usage
uk_seatbelts
Format
uk_seatbelts
is a multiple time series, with columns
- DriversKilled
car drivers killed.
- front
front-seat passengers killed or seriously injured.
- rear
rear-seat passengers killed or seriously injured.
- kms
distance driven.
- PetrolPrice
petrol price.
- VanKilled
number of van (‘light goods vehicle’) drivers.
- law
0/1: was the law in effect that month?
Source
R package datasets
Examples
if (requireNamespace("ggplot2", quietly = TRUE)) {
result_ar <- fastcpd.ar(diff(uk_seatbelts[, "drivers"], 12), 1, beta = "BIC")
summary(result_ar)
plot(result_ar)
result_lm <- suppressMessages(fastcpd.lm(
diff(uk_seatbelts[, c("drivers", "kms", "PetrolPrice", "law")], lag = 12)
))
# Compute change point dates:
cp_dates <- as.POSIXlt(as.Date("1969-01-01", format = "%Y-%m-%d"))
cp_dates$mon <- cp_dates$mon + (1 + result_lm@cp_set + 12)
cp_dates <- as.Date(cp_dates)
# Convert the time series to Date objects:
# For a monthly ts object, extract year and month manually.
time_vals <- time(uk_seatbelts)
years <- floor(time_vals)
months <- round((time_vals - years) * 12 + 1)
dates <- as.Date(paste(years, months, "01", sep = "-"), format = "%Y-%m-%d")
# Prepare the data frame for plotting
# 'color' is defined similarly to the original code.
uk_seatbelts_df <- data.frame(
dates = dates,
drivers = as.numeric(uk_seatbelts[, "drivers"]),
color = as.factor((dates < cp_dates[1]) + (dates < cp_dates[2]))
)
p <- ggplot2::ggplot(
data = uk_seatbelts_df,
ggplot2::aes(x = dates, y = drivers, color = color)
) +
ggplot2::geom_line() +
ggplot2::geom_vline(
xintercept = cp_dates,
linetype = "dashed",
color = "red"
) +
ggplot2::scale_x_date(date_labels = "%b %Y", date_breaks = "1 year") +
ggplot2::annotate(
"text",
x = cp_dates,
y = 1025,
label = format(cp_dates, "%b %Y"),
color = "blue"
) +
ggplot2::theme_bw() +
ggplot2::theme(legend.position = "none")
print(p)
}
Variance estimation for ARMA model with change points
Description
Estimate the variance for each block and then take the average.
Usage
variance_arma(data, p, q, max_order = p * q)
variance.arma(data, p, q, max_order = p * q)
Arguments
data |
A one-column matrix or a vector. |
p |
The order of the autoregressive part. |
q |
The order of the moving average part. |
max_order |
The maximum order of the AR model to consider. |
Value
A numeric value representing the variance.
Examples
set.seed(1)
n <- 300
w <- rnorm(n + 3, 0, 10)
x <- rep(0, n + 3)
for (i in 1:200) {
x[i + 3] <- 0.1 * x[i + 2] - 0.3 * x[i + 1] + 0.1 * x[i] +
0.1 * w[i + 2] + 0.5 * w[i + 1] + w[i + 3]
}
for (i in 201:n) {
x[i + 3] <- 0.3 * x[i + 2] + 0.1 * x[i + 1] - 0.3 * x[i] -
0.6 * w[i + 2] - 0.1 * w[i + 1] + w[i + 3]
}
(result <- variance.arma(x[-seq_len(3)], p = 3, q = 2))
Variance estimation for linear models with change points
Description
Estimate the variance for each block and then take the average.
Usage
variance_lm(data, d = 1, block_size = ncol(data) - d + 1, outlier_iqr = Inf)
variance.lm(data, d = 1, block_size = ncol(data) - d + 1, outlier_iqr = Inf)
Arguments
data |
A matrix or a data frame with the response variable as the first column. |
d |
The dimension of the response variable. |
block_size |
The size of the blocks to use for variance estimation. |
outlier_iqr |
The number of interquartile ranges to use as a threshold for outlier detection. |
Value
A numeric value representing the variance.
Examples
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
n <- 300
p <- 4
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
theta <- rbind(c(1, 3.2, -1, 0), c(-1, -0.5, 2.5, -2), c(0.8, 0, 1, 2))
y <- c(
x[1:100, ] %*% theta[1, ] + rnorm(100, 0, 3),
x[101:200, ] %*% theta[2, ] + rnorm(100, 0, 3),
x[201:n, ] %*% theta[3, ] + rnorm(100, 0, 3)
)
(sigma2 <- variance.lm(cbind(y, x)))
set.seed(1)
n <- 300
p <- 4
d <- 2
x <- mvtnorm::rmvnorm(n, rep(0, p), diag(p))
theta <- cbind(c(1, 3.2, -1, 0), c(-1, -0.5, 2.5, -2), c(0.8, 0, 1, 2))
theta <- cbind(theta, theta)
y <- rbind(
x[1:100, ] %*% theta[, 1:2] +
mvtnorm::rmvnorm(100, rep(0, d), diag(3, d)),
x[101:200, ] %*% theta[, 3:4] +
mvtnorm::rmvnorm(100, rep(0, d), diag(3, d)),
x[201:n, ] %*% theta[, 5:6] +
mvtnorm::rmvnorm(100, rep(0, d), diag(3, d))
)
(sigma <- variance.lm(cbind(y, x), d = d))
}
Variance estimation for mean change models
Description
Implement Rice estimator for variance in mean change models.
Usage
variance_mean(data)
variance.mean(data)
Arguments
data |
A matrix or a data frame with data points as each row. |
Value
A matrix representing the variance-covariance matrix or a numeric value representing the variance.
Examples
if (requireNamespace("mvtnorm", quietly = TRUE)) {
set.seed(1)
p <- 3
data <- rbind(
mvtnorm::rmvnorm(300, mean = rep(0, p), sigma = diag(100, p)),
mvtnorm::rmvnorm(400, mean = rep(50, p), sigma = diag(100, p)),
mvtnorm::rmvnorm(300, mean = rep(2, p), sigma = diag(100, p))
)
(sigma <- variance.mean(data))
}
Variance estimation for median change models
Description
Implement Rice estimator.
Usage
variance_median(data)
variance.median(data)
Arguments
data |
A vector of data points. |
Value
A numeric value representing the variance.
Examples
(sigma2 <- variance.median(well_log))
Well-log Dataset from Numerical Bayesian Methods Applied to Signal Processing
Description
This is the well-known well-log dataset used in many changepoint papers obtained from Alan Turing Institute GitHub repository and licensed under the MIT license.
Usage
well_log
Format
A Time-Series of length 4050.
Source
<https://github.com/alan-turing-institute/TCPD>
Examples
result <- fastcpd.mean(well_log, trim = 0.001)
summary(result)
plot(result)
if (requireNamespace("matrixStats", quietly = TRUE)) {
sigma2 <- variance.median(well_log)
median_loss <- function(data) {
sum(abs(data - matrixStats::colMedians(data))) / sqrt(sigma2) / 2
}
result <- fastcpd(
formula = ~ x - 1,
data = cbind.data.frame(x = well_log),
cost = median_loss,
trim = 0.002
)
summary(result)
segment_starts <- c(1, result@cp_set)
segment_ends <- c(result@cp_set - 1, length(well_log))
residual <- NULL
for (segment_index in seq_along(segment_starts)) {
segment <-
well_log[segment_starts[segment_index]:segment_ends[segment_index]]
residual <- c(residual, segment - median(segment))
}
result@residuals <- matrix(residual)
result@data <- data.frame(x = c(well_log))
plot(result)
}