Type: | Package |
Title: | Temporal Encoder-Masked Probabilistic Ensemble Regressor |
Version: | 1.0.0 |
Maintainer: | Giancarlo Vercellino <giancarlo.vercellino@gmail.com> |
Description: | Implements a probabilistic ensemble time-series forecaster that combines an auto-encoder with a neural decision forest whose split variables are learned through a differentiable feature-mask layer. Functions are written with 'torch' tensors and provide CRPS (Continuous Ranked Probability Scores) training plus mixture-distribution post-processing. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
Imports: | torch (≥ 0.11.0), purrr (≥ 1.0.1), imputeTS (≥ 3.3), lubridate (≥ 1.9.2), ggplot2 (≥ 3.5.1), scales (≥ 1.3.0) |
URL: | https://rpubs.com/giancarlo_vercellino/temper |
Suggests: | knitr, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2025-07-10 14:15:49 UTC; gianc |
Author: | Giancarlo Vercellino [aut, cre, cph] |
Repository: | CRAN |
Date/Publication: | 2025-07-15 11:40:02 UTC |
Temporal Encoder–Masked Probabilistic Ensemble Regressor
Description
Temper trains and deploys a hybrid forecasting model that couples a temporal auto-encoder (shrinks a sliding window of length 'past' into a latent representation of size 'latent_dim') and a masked neural decision forest (an ensemble of 'n_trees' soft decision trees of depth 'depth'; feature-level dropout is governed by 'init_prob' and annealed by a Gumbel–Softmax with parameter 'temperature') and a CRPS loss (Continuous Ranked Probability Score) that blends the probabilistic forecasting error with a reconstruction term ('lambda_rec × MSE'), to yield multi-step probabilistic forecasts and their fan chart. Model weights are optimized with ADAM or other options, optional early stopping.
Implements a probabilistic ensemble time-series forecaster that combines an auto-encoder with a neural decision forest whose split variables are learned through a differentiable feature-mask layer. Functions are written with 'torch' tensors and provide CRPS (Continuous Ranked Probability Scores) training plus mixture-distribution post-processing.
Usage
temper(
ts,
future,
past,
latent_dim,
n_trees = 30,
depth = 6,
init_prob = 0.8,
temperature = 0.5,
n_bases = 10,
train_rate = 0.7,
epochs = 30,
optimizer = "adam",
lr = 0.005,
batch = 32,
lambda_rec = 0.3,
patience = 15,
verbose = TRUE,
alpha = 0.1,
dates = NULL,
seed = 42
)
Arguments
ts |
Numeric vector of length at least past + future. Represents the input time series in levels (not log-returns). Missing values are automatically imputed using na_kalman. |
future |
Integer |
past |
Integer |
latent_dim |
Integer |
n_trees |
Integer |
depth |
Integer |
init_prob |
Numeric in |
temperature |
Positive numeric. Temperature parameter for the Gumbel–Softmax distribution used during feature masking. Lower values lead to harder (closer to binary) masks; higher values encourage smoother gradients. Default: 0.5. |
n_bases |
Integer |
train_rate |
Numeric in |
epochs |
Positive integer. Maximum number of training epochs. Have a look at the loss plot to decide the right number of epochs. Default: 30. |
optimizer |
Character string. Optimizer to use for training (adam, adamw, sgd, rprop, rmsprop, adagrad, asgd, adadelta). Default: adam. |
lr |
Positive numeric. Learning rate for the optimizer. Default: 0.005. |
batch |
Positive integer. Mini-batch size used during training. Default: 32. |
lambda_rec |
Non-negative numeric. Weight applied to the reconstruction loss relative to the probabilistic CRPS forecasting loss. Default: 0.3. |
patience |
Positive integer. Number of consecutive epochs without improvement on the validation CRPS before early stopping is triggered. Default: 15. |
verbose |
Logical. If |
alpha |
Numeric in |
dates |
Optional |
seed |
Optional integer. Used to seed both R and Torch random number generators for reproducibility. Default: 42. |
Value
A named list with four components
- 'loss'
A ggplot in which training and validation CRPS are plotted against epoch number, useful for diagnosing over-/under-fitting.
- 'pred_funs'
A length-'future' list. Each element contains four empirical distribution functions (pdf, cdf, icdf, sampler) created by empfun
- 'plot'
A ggplot object showing the historical series, median forecast and predictive interval. A print-ready fan chart.
- 'time_log'
An object measuring the wall-clock training time.
Author(s)
Maintainer: Giancarlo Vercellino giancarlo.vercellino@gmail.com [copyright holder]
See Also
Useful links:
Examples
set.seed(2025)
ts <- cumsum(rnorm(250)) # synthetic price series
fit <- temper(ts, future = 3, past = 20, latent_dim = 5, epochs = 2)
# 80 % predictive interval for the 3-step-ahead forecast
pfun <- fit$pred_funs$t3$pfun
pred_interval_80 <- c(pfun(0.1), pfun(0.9))
# Visual diagnostics
print(fit$plot)
print(fit$loss)
Tech Stock Time Series Dataset
Description
A multivariate dataset for closing prices for several major tech stocks over time. Source: YahooFinance.
Usage
data(dummy_set)
Format
A data frame with 2133 observations of 4 variables:
- dates
Character vector of dates in "YYYY-MM-DD" format.
- TSLA.Close
Numeric. Closing prices for Tesla.
- MSFT.Close
Numeric. Closing prices for Microsoft.
- MARA.Close
Numeric. Closing prices for MARA Holdings.
Examples
data(dummy_set)
plot(as.Date(dummy_set$dates), dummy_set$TSLA.Close, type = "l")