Type: | Package |
Title: | Efficient Outlier Detection for Large Time Series Databases |
Version: | 1.0.1 |
Maintainer: | Pedro Galeano <pedro.galeano@uc3m.es> |
Description: | Programs for detecting and cleaning outliers in single time series and in time series from homogeneous and heterogeneous databases using an Orthogonal Greedy Algorithm (OGA) for saturated linear regression models. The programs implement the procedures presented in the paper entitled "Efficient Outlier Detection for Large Time Series Databases" by Pedro Galeano, Daniel Peña and Ruey S. Tsay (2025), working paper, Universidad Carlos III de Madrid. Version 1.0.1 contains some improvements to the algorithm, so the results may vary slightly compared to those obtained with version 0.0.1. |
License: | GPL-3 |
Encoding: | UTF-8 |
Depends: | R (≥ 4.3.0) |
Imports: | caret (≥ 6.0-94), forecast (≥ 8.22.0), gsarima (≥ 0.1-5), parallel (≥ 3.6.2), parallelly (≥ 1.37.1), robust (≥ 0.7-4), SLBDD (≥ 0.0.4) |
Suggests: | knitr, rmarkdown |
NeedsCompilation: | no |
Packaged: | 2025-02-27 09:31:31 UTC; PGALEANO |
Author: | Pedro Galeano |
Repository: | CRAN |
Date/Publication: | 2025-02-27 09:50:02 UTC |
Detecting and cleaning outliers in a heterogeneous time series database with OGA
Description
Detects and cleans Additive Outliers (AOs) and Level Shifts (LSs) in time series that form a heterogeneous database, i.e. the series may have different definitions, sample sizes and/or frequencies. The function runs in parallel on the computer cores.
Usage
db_het_oga(Y)
Arguments
Y |
The database, a |
Details
The function applies the single_oga
function to each of the time series that make up the database to detect outlier effects and clean the series of such effects. This process is run in parallel on the computer cores, which saves a lot of computational cost. The function provides a list
of ts
objects with the original series cleaned from the effect of the AOs and LSs, in addition to the location, size and t-statistic corresponding to each of them.
Value
n_AOs |
A |
n_LSs |
A |
AOs |
A |
LSs |
A |
Y_clean |
The cleaned database, a |
result |
A message indicating when the procedure has worked correctly or the problem encountered if the procedure stops. |
Note
The computational cost depends on the size of the database and the level of contamination of the series. Note that the function may take several minutes if the database contains hundred of series with thousands of observations.
Author(s)
Pedro Galeano.
References
Galeano, P., Peña, D. and Tsay, R. S. (2025). Efficient outlier detection for large time series databases. Working paper, Universidad Carlos III de Madrid.
See Also
Examples
# Load FREDMDApril19 dataset from the SLBDD package
data("FREDMDApril19",package="SLBDD")
# Define frequency s, the same for all series
s <- 12
# Define a list with the first 10 time series with frequency s
X <- FREDMDApril19[,1:10]
Y <- vector(mode='list',length=ncol(X))
for (k in 1:ncol(X)){Y[[k]] <- ts(X[,k],frequency=s)}
# Apply the function to Y
out_db_het_oga <- db_het_oga(Y)
Detecting and cleaning outliers in a homogeneous time series database with OGA
Description
Detects and cleans Additive Outliers (AOs) and Level Shifts (LSs) in time series that form a homogeneous database, i.e. all series are defined similarly, have the same length and the same frequency. The function runs in parallel on the computer cores.
Usage
db_hom_oga(Y,s=NULL)
Arguments
Y |
The database, a |
s |
Optional, the time series frequency, i.e., the number of observations per unit of time ( |
Details
The function applies the single_oga
function to each of the time series that make up the database to detect outlier effects and clean the series of such effects. This process is run in parallel on the computer cores, which saves a lot of computational cost. The function provides a matrix
with the original series cleaned from the effect of the AOs and LSs, in addition to the location, size and t-statistic corresponding to each of them.
Value
n_AOs |
A |
n_LSs |
A |
AOs |
A |
LSs |
A |
Y_clean |
The cleaned database, a |
result |
A message indicating when the procedure has worked correctly or the problem encountered if the procedure stops. |
Note
The computational cost depends on the size of the database and the level of contamination of the series. Note that the function may take several minutes if the database contains hundred of series with thousands of observations.
Author(s)
Pedro Galeano.
References
Galeano, P., Peña, D. and Tsay, R. S. (2025). Efficient outlier detection for large time series databases. Working paper, Universidad Carlos III de Madrid.
See Also
Examples
# Load FREDMDApril19 dataset from the SLBDD package
data("FREDMDApril19",package="SLBDD")
# Define frequency s
s <- 12
# Apply the procedure to the first 10 time series in FREDMDApril19
Y <- FREDMDApril19[,1:10]
out_db_hom_oga <- db_hom_oga(Y,s=s)
Detect and clean outlying effects in a single time series with OGA
Description
Algorithm for detecting and cleaning additive outliers and level shifts in a single time series with an Orthogonal Greedy Algorithm (OGA).
Usage
single_oga(yt,s=NULL)
Arguments
yt |
A numeric |
s |
Optional, the time series frequency, i.e., the number of observations per unit of time ( |
Details
The program detects and cleans a time series from the effect of Additive Outliers (AOs) and Level Shifts (LSs). For this purpose, the procedure proposed in the paper 'Efficient outlier detection in heterogeneous time series databases' by Galeano, Peña and Tsay (2024) is used. The procedure can be divided into three automatic steps. The initial step involves fitting a sufficiently high-order AR model to yt
using robust regression to obtain an AR representation and a residual series. Then, an Orthogonal Greedy Algorithm (OGA) procedure is applied to the residual series to identify a set of potential AOs and LSs and to remove their effects from yt
. The identified set of outlying effects is referred to as the first set of potential outliers. The second step is to identify and fit an ARIMA or SARIMA model, depending on whether seasonality is detected, to the outlier-adjusted series of the first step and to obtain a new residual series. The OGA procedure is then applied to this new residual series to identify a new set of potential AOs and LSs, if any. The detected outlying effects form the second set of potential outliers. The third step involves combining the potential outliers identified in the first and second steps to remove any redundancies so as to obtain a final set of potential AOs and LSs, and fitting an ARIMA (or SARIMA) model jointly with the final set of potential outliers. Then, any negligible outlying effects, if any, are removed. Finally, any detected AOs and LSs are removed from the observed time series yt
to produce an outlier-free time series.
Value
yt_clean |
A |
aos |
A |
lss |
A |
Author(s)
Pedro Galeano.
References
Galeano, P., Peña, D. and Tsay, R. S. (2025). Efficient outlier detection for large time series databases. Working paper, Universidad Carlos III de Madrid.
See Also
Examples
## Load FREDMDApril19 dataset from the SLBDD package
data("FREDMDApril19",package="SLBDD")
Y <- FREDMDApril19
## Define time series yt and frequency s
yt <- Y[,1]
s <- 12
## Apply the function to yt
out_single_oga <- single_oga(yt,s=s)