Type: | Package |
Title: | Ensemble Algorithms for Time Series Forecasting with Modeltime |
Version: | 1.0.4 |
Description: | A 'modeltime' extension that implements time series ensemble forecasting methods including model averaging, weighted averaging, and stacking. These techniques are popular methods to improve forecast accuracy and stability. |
URL: | https://business-science.github.io/modeltime.ensemble/, https://github.com/business-science/modeltime.ensemble |
BugReports: | https://github.com/business-science/modeltime.ensemble/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Depends: | modeltime (≥ 1.2.3), modeltime.resample (≥ 0.2.1), R (≥ 3.5) |
Imports: | tune (≥ 0.1.2), rsample, yardstick, workflows (≥ 0.2.1), recipes (≥ 0.1.15), timetk (≥ 2.5.0), tibble, dplyr (≥ 1.0.0), tidyr, purrr, stringr, rlang (≥ 0.1.2), cli, generics, magrittr, tictoc, parallel, doParallel, foreach, glmnet |
Suggests: | gt, dials, utils, earth, testthat, tidymodels, xgboost, lubridate, knitr, rmarkdown |
RoxygenNote: | 7.3.1 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2024-07-16 21:37:51 UTC; mdancho |
Author: | Matt Dancho [aut, cre], Business Science [cph] |
Maintainer: | Matt Dancho <mdancho@business-science.io> |
Repository: | CRAN |
Date/Publication: | 2024-07-19 15:30:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Value
A function call
Creates an Ensemble Model using Mean/Median Averaging
Description
Creates an Ensemble Model using Mean/Median Averaging
Usage
ensemble_average(object, type = c("mean", "median"))
Arguments
object |
A Modeltime Table |
type |
Specify the type of average ("mean" or "median") |
Details
The input to an ensemble_average()
model is always a Modeltime Table,
which contains the models that you will ensemble.
Averaging Methods
The average method uses an un-weighted average using type
of either:
-
"mean"
: Performs averaging usingmean(x, na.rm = TRUE)
to aggregate each underlying models forecast at each timestamp -
"median"
: Performs averaging usingstats::median(x, na.rm = TRUE)
to aggregate each underlying models forecast at each timestamp
Value
A mdl_time_ensemble
object.
Examples
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(dplyr)
library(timetk)
# Make an ensemble from a Modeltime Table
ensemble_fit <- m750_models %>%
ensemble_average(type = "mean")
ensemble_fit
# Forecast with the Ensemble
modeltime_table(
ensemble_fit
) %>%
modeltime_forecast(
new_data = testing(m750_splits),
actual_data = m750
) %>%
plot_modeltime_forecast(
.interactive = FALSE,
.conf_interval_show = FALSE
)
Creates a Stacked Ensemble Model from a Model Spec
Description
A 2-stage stacking regressor that follows:
Stage 1: Sub-Model's are Trained & Predicted using
modeltime.resample::modeltime_fit_resamples()
.Stage 2: A Meta-learner (
model_spec
) is trained on Out-of-Sample Sub-Model Predictions usingensemble_model_spec()
.
Usage
ensemble_model_spec(
object,
model_spec,
kfolds = 5,
param_info = NULL,
grid = 6,
control = control_grid()
)
Arguments
object |
A Modeltime Table. Used for ensemble sub-models. |
model_spec |
A Can be either:
|
kfolds |
K-Fold Cross Validation for tuning the Meta-Learner.
Controls the number of folds used in the meta-learner's cross-validation.
Gets passed to |
param_info |
A |
grid |
Grid specification or grid size for tuning the Meta Learner.
Gets passed to |
control |
An object used to modify the tuning process.
Uses |
Details
Stacked Ensemble Process
Start with a Modeltime Table to define your sub-models.
Step 1: Use
modeltime.resample::modeltime_fit_resamples()
to perform the submodel resampling procedure.Step 2: Use
ensemble_model_spec()
to define and train the meta-learner.
What goes on inside the Meta Learner?
The Meta-Learner Ensembling Process uses the following basic steps:
-
Make Cross-Validation Predictions. Cross validation predictions are made for each sub-model with
modeltime.resample::modeltime_fit_resamples()
. The out-of-sample sub-model predictions contained in.resample_results
are used as the input to the meta-learner. -
Train a Stacked Regressor (Meta-Learner). The sub-model out-of-sample cross validation predictions are then modeled using a
model_spec
with options:-
Tuning: If the
model_spec
does include tuning parameters viatune::tune()
then the meta-learner will be hypeparameter tuned using K-Fold Cross Validation. The parameters and grid can adjusted usingkfolds
,grid
, andparam_info
. -
No-Tuning: If the
model_spec
does not include tuning parameters viatune::tune()
then the meta-learner will not be hypeparameter tuned and will have the model fitted to the sub-model predictions.
-
-
Final Model Selection.
-
If tuned, the final model is selected based on RMSE, then retrained on the full set of out of sample predictions.
-
If not-tuned, the fitted model from Stage 2 is used.
-
Progress
The best way to follow the training process and watch progress is to use
control = control_grid(verbose = TRUE)
to see progress.
Parallelize
Portions of the process can be parallelized. To parallelize, set
up parallelization using tune
via one of the backends such as
doFuture
. Then set control = control_grid(allow_par = TRUE)
Value
A mdl_time_ensemble
object.
Examples
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(dplyr)
library(timetk)
library(glmnet)
# Step 1: Make resample predictions for submodels
resamples_tscv <- training(m750_splits) %>%
time_series_cv(
assess = "2 years",
initial = "5 years",
skip = "2 years",
slice_limit = 1
)
submodel_predictions <- m750_models %>%
modeltime_fit_resamples(
resamples = resamples_tscv,
control = control_resamples(verbose = TRUE)
)
# Step 2: Metalearner ----
# * No Metalearner Tuning
ensemble_fit_lm <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg() %>% set_engine("lm"),
control = control_grid(verbose = TRUE)
)
ensemble_fit_lm
# * With Metalearner Tuning ----
ensemble_fit_glmnet <- submodel_predictions %>%
ensemble_model_spec(
model_spec = linear_reg(
penalty = tune(),
mixture = tune()
) %>%
set_engine("glmnet"),
grid = 2,
control = control_grid(verbose = TRUE)
)
ensemble_fit_glmnet
Nested Ensemble Average
Description
Creates an Ensemble Model using Mean/Median Averaging in the Modeltime Nested Forecasting Workflow.
Usage
ensemble_nested_average(
object,
type = c("mean", "median"),
keep_submodels = TRUE,
model_ids = NULL,
control = control_nested_fit()
)
Arguments
object |
A nested modeltime object (inherits class |
type |
One of "mean" for mean averaging or "median" for median averaging |
keep_submodels |
Whether or not to keep the submodels in the nested modeltime table results |
model_ids |
A vector of id's ( |
control |
Controls various aspects of the ensembling process. See |
Details
If we start with a nested modeltime table, we can add ensembles.
nested_modeltime_tbl # Nested Modeltime Table Trained on: .splits | Model Errors: [0] # A tibble: 2 x 5 id .actual_data .future_data .splits .modeltime_tables <fct> <list> <list> <list> <list> 1 1_1 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_time_tbl [2 x 5]> 2 1_3 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_time_tbl [2 x 5]>
An ensemble can be added to a Nested modeltime table.
ensem <- nested_modeltime_tbl %>% ensemble_nested_average( type = "mean", keep_submodels = TRUE, control = control_nested_fit(allow_par = FALSE, verbose = TRUE) )
We can then verify the model has been added.
ensem %>% extract_nested_modeltime_table()
This produces an ensemble .model_id 3, which is an ensemble of the first two models.
# A tibble: 4 x 6 id .model_id .model .model_desc .type .calibration_data <fct> <dbl> <list> <chr> <chr> <list> 1 1_1 1 <workflow> PROPHET Test <tibble [52 x 4]> 2 1_1 2 <workflow> XGBOOST Test <tibble [52 x 4]> 3 1_1 3 <ensemble [2]> ENSEMBLE (MEAN): 2 MODELS Test <tibble [52 x 4]>
Additional ensembles can be added by simply adding onto the nested modeltime table.
Notice that we make use of model_ids
to make sure it only uses model id's 1 and 2.
ensem_2 <- ensem %>% ensemble_nested_average( type = "median", keep_submodels = TRUE, model_ids = c(1,2), control = control_nested_fit(allow_par = FALSE, verbose = TRUE) )
This returns a 4th model that is a median ensemble of the first two models.
ensem_2 %>% extract_nested_modeltime_table() # A tibble: 4 x 6 id .model_id .model .model_desc .type .calibration_data <fct> <dbl> <list> <chr> <chr> <list> 1 1_1 1 <workflow> PROPHET Test <tibble [52 x 4]> 2 1_1 2 <workflow> XGBOOST Test <tibble [52 x 4]> 3 1_1 3 <ensemble [2]> ENSEMBLE (MEAN): 2 MODELS Test <tibble [52 x 4]> 4 1_1 4 <ensemble [2]> ENSEMBLE (MEDIAN): 2 MODELS Test <tibble [52 x 4]>
Value
The nested modeltime table with an ensemble model added.
Nested Ensemble Weighted
Description
Creates an Ensemble Model using Weighted Averaging in the Modeltime Nested Forecasting Workflow.
Usage
ensemble_nested_weighted(
object,
loadings,
scale_loadings = TRUE,
metric = "rmse",
keep_submodels = TRUE,
model_ids = NULL,
control = control_nested_fit()
)
Arguments
object |
A nested modeltime object (inherits class |
loadings |
A vector of weights corresponding to the loadings |
scale_loadings |
If TRUE, divides by the sum of the loadings to proportionally weight the submodels. |
metric |
The accuracy metric to rank models by the test accuracy table.
Loadings are then applied in the order from best to worst models.
Default: |
keep_submodels |
Whether or not to keep the submodels in the nested modeltime table results |
model_ids |
A vector of id's ( |
control |
Controls various aspects of the ensembling process. See |
Details
If we start with a nested modeltime table, we can add ensembles.
nested_modeltime_tbl # Nested Modeltime Table Trained on: .splits | Model Errors: [0] # A tibble: 2 x 5 id .actual_data .future_data .splits .modeltime_tables <fct> <list> <list> <list> <list> 1 1_1 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_time_tbl [2 x 5]> 2 1_3 <tibble [104 x 2]> <tibble [52 x 2]> <split [52|52]> <mdl_time_tbl [2 x 5]>
An ensemble can be added to a Nested modeltime table.
ensem <- nested_modeltime_tbl %>% ensemble_nested_weighted( loadings = c(2,1), control = control_nested_fit(allow_par = FALSE, verbose = TRUE) )
We can then verify the model has been added.
ensem %>% extract_nested_modeltime_table()
This produces an ensemble .model_id 3, which is an ensemble of the first two models.
# A tibble: 4 x 6 id .model_id .model .model_desc .type .calibration_data <fct> <dbl> <list> <chr> <chr> <list> 1 1_3 1 <workflow> PROPHET Test <tibble [52 x 4]> 2 1_3 2 <workflow> XGBOOST Test <tibble [52 x 4]> 3 1_3 3 <ensemble [2]> ENSEMBLE (WEIGHTED): 2 MODELS Test <tibble [52 x 4]>
We can verify the loadings have been applied correctly. Note that the loadings will be applied based on the model with the lowest RMSE.
ensem %>% extract_nested_modeltime_table(1) %>% slice(3) %>% pluck(".model", 1)
Note that the xgboost model gets the 66% loading and prophet gets 33% loading. This is because xgboost has the lower RMSE in this case.
-- Modeltime Ensemble ------------------------------------------- Ensemble of 2 Models (WEIGHTED) # Modeltime Table # A tibble: 2 x 6 .model_id .model .model_desc .type .calibration_data .loadings <int> <list> <chr> <chr> <list> <dbl> 1 1 <workflow> PROPHET Test <tibble [52 x 4]> 0.333 2 2 <workflow> XGBOOST Test <tibble [52 x 4]> 0.667
Value
The nested modeltime table with an ensemble model added.
Creates a Weighted Ensemble Model
Description
Makes an ensemble by applying loadings
to weight sub-model predictions
Usage
ensemble_weighted(object, loadings, scale_loadings = TRUE)
Arguments
object |
A Modeltime Table |
loadings |
A vector of weights corresponding to the loadings |
scale_loadings |
If TRUE, divides by the sum of the loadings to proportionally weight the submodels. |
Details
The input to an ensemble_weighted()
model is always a Modeltime Table,
which contains the models that you will ensemble.
Weighting Method
The weighted method uses uses loadings
by applying a
loading x model prediction for each submodel.
Value
A mdl_time_ensemble
object.
Examples
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
library(dplyr)
library(timetk)
# Make an ensemble from a Modeltime Table
ensemble_fit <- m750_models %>%
ensemble_weighted(
loadings = c(3, 3, 1),
scale_loadings = TRUE
)
ensemble_fit
# Forecast with the Ensemble
modeltime_table(
ensemble_fit
) %>%
modeltime_forecast(
new_data = testing(m750_splits),
actual_data = m750
) %>%
plot_modeltime_forecast(
.interactive = FALSE,
.conf_interval_show = FALSE
)
Tidy eval helpers
Description
-
sym()
creates a symbol from a string andsyms()
creates a list of symbols from a character vector. -
enquo()
andenquos()
delay the execution of one or several function arguments.enquo()
returns a single quoted expression, which is like a blueprint for the delayed computation.enquos()
returns a list of such quoted expressions. -
expr()
quotes a new expression locally. It is mostly useful to build new expressions around arguments captured withenquo()
orenquos()
:expr(mean(!!enquo(arg), na.rm = TRUE))
. -
as_name()
transforms a quoted variable name into a string. Supplying something else than a quoted variable name is an error.That's unlike
as_label()
which also returns a single string but supports any kind of R object as input, including quoted function calls and vectors. Its purpose is to summarise that object into a single label. That label is often suitable as a default name.If you don't know what a quoted expression contains (for instance expressions captured with
enquo()
could be a variable name, a call to a function, or an unquoted constant), then useas_label()
. If you know you have quoted a simple variable name, or would like to enforce this, useas_name()
.
To learn more about tidy eval and how to use these tools, visit the Metaprogramming section of Advanced R.
Value
Nothing.