version 2.1.3
SVEMnet
implements Self-Validated Ensemble Models (SVEM,
Lemkus et al. 2021) and the SVEM whole model test (Karl 2024) using
Elastic Net regression via the glmnet
package Friedman et
al. (2010). This vignette provides an overview of the package’s
functionality and usage.
library(SVEMnet)
# Example data
data <- iris
svem_model <- SVEMnet(Sepal.Length ~ ., data = data, relaxed=FALSE,glmnet_alpha=c(1),nBoot = 50)
coef(svem_model)
## Percent of Bootstraps Nonzero
## Sepal.Width 100
## Petal.Length 100
## Petal.Width 92
## Speciesvirginica 88
## Speciesversicolor 84
Generate a plot of actual versus predicted values:
Predict outcomes for new data using the predict()
function:
## [1] 5.005893 4.751664 4.781659 4.874206 5.056739 5.370057 4.927893 5.026744
## [9] 4.700818 4.901360 5.179281 5.098440 4.778818 4.563729 5.116730 5.480894
## [17] 5.083272 4.978739 5.346366 5.202973 5.170137 5.124973 4.769953 5.037828
## [25] 5.313529 4.895057 5.044132 5.077590 4.955047 4.996748 4.945903 4.972436
## [33] 5.409819 5.361814 4.874206 4.709963 4.934197 5.083893 4.679968 5.026744
## [41] 4.907043 4.296893 4.781659 5.040670 5.462604 4.724510 5.301823 4.853356
## [49] 5.179281 4.904202 6.455221 6.284674 6.520614 5.522888 6.152987 6.135599
## [57] 6.451759 5.153321 6.258141 5.627421 5.093331 5.967893 5.553504 6.302684
## [65] 5.541177 6.189286 6.182982 5.879430 5.776216 5.607191 6.418301 5.777117
## [73] 6.215539 6.306146 6.043052 6.138440 6.323534 6.487156 6.132137 5.398406
## [81] 5.484649 5.440107 5.681729 6.433469 6.182982 6.359212 6.377221 5.809673
## [89] 5.950505 5.624580 5.989365 6.281833 5.702579 5.102475 5.869664 6.049356
## [97] 5.971356 6.043052 4.961924 5.848813 6.947142 6.159724 6.831525 6.647052
## [105] 6.732674 7.333400 5.682163 7.148927 6.587062 7.171376 6.386799 6.303117
## [113] 6.544739 5.959182 6.074800 6.448730 6.626202 7.784708 7.290797 5.942415
## [121] 6.735515 6.040023 7.330558 6.043486 6.840669 7.086375 6.022635 6.196023
## [129] 6.514744 6.895599 6.927534 7.623928 6.487590 6.319187 6.603131 6.920609
## [137] 6.738357 6.677047 6.124327 6.523889 6.585819 6.254491 6.159724 6.878908
## [145] 6.732053 6.275342 5.986336 6.356804 6.622118 6.339416
This is the serial version of the significance test. It is slower but the code is less complicated to read than the faster parallel version.
test_result <- svem_significance_test(Sepal.Length ~ ., data = data)
print(test_result)
plot(test_result)
SVEM Significance Test p-value:
[1] 0
Whole model test result
Note that there is a parallelized version that runs much faster
# Simulate data
set.seed(1)
n <- 25
X1 <- runif(n)
X2 <- runif(n)
X3 <- runif(n)
X4 <- runif(n)
X5 <- runif(n)
#y only depends on X1 and X2
y <- 1 + X1 + X2 + X1 * X2 + X1^2 + rnorm(n)
data <- data.frame(y, X1, X2, X3, X4, X5)
# Perform the SVEM significance test
test_result <- svem_significance_test_parallel(
y ~ (X1 + X2 + X3)^2 + I(X1^2) + I(X2^2) + I(X3^2),
data = data
)
# View the p-value
print(test_result)
SVEM Significance Test p-value:
[1] 0.009399093
test_result2 <- svem_significance_test_parallel(
y ~ (X1 + X2 )^2 + I(X1^2) + I(X2^2),
data = data
)
# View the p-value
print(test_result2)
SVEM Significance Test p-value:
[1] 0.006475736
#note that the response does not depend on X4 or X5
test_result3 <- svem_significance_test_parallel(
y ~ (X4 + X5)^2 + I(X4^2) + I(X5^2),
data = data
)
# View the p-value
print(test_result3)
SVEM Significance Test p-value:
[1] 0.8968502
# Plot the Mahalanobis distances
plot(test_result,test_result2,test_result3)
Whole Model Test Results for Example 2
Newly added wrapper for cv.glmnet() to compare performance of SVEM to glmnet’s native CV implementation.
Simulations show improved behavior from a relaxed grid search that allows the model to apply a lighter penalty to parameteres retained from the initial elastic net fit. This option tends to hurt RMSE on holdout data for cross validated glmnet, but the SVEM bootstraps average over the addtional variability introduced by this option and produce smaller RMSE on holdout data.
Lemkus, T., Gotwalt, C., Ramsey, P., & Weese, M. L.
(2021). Self-Validated Ensemble Models for Elastic Net
Regression.
Chemometrics and Intelligent Laboratory Systems, 219,
104439.
DOI: 10.1016/j.chemolab.2021.104439
Karl, A. T. (2024). A Randomized Permutation
Whole-Model Test for SVEM.
Chemometrics and Intelligent Laboratory Systems, 249,
105122.
DOI: 10.1016/j.chemolab.2024.105122
Friedman, J. H., Hastie, T., & Tibshirani, R.
(2010). Regularization Paths for Generalized Linear Models
via Coordinate Descent.
Journal of Statistical Software, 33(1), 1–22.
DOI: 10.18637/jss.v033.i01
Gotwalt, C., & Ramsey, P. (2018). Model
Validation Strategies for Designed Experiments Using Bootstrapping
Techniques With Applications to Biopharmaceuticals.
JMP Discovery Conference.
Link
Ramsey, P., Gaudard, M., & Levin, W. (2021).
Accelerating Innovation with Space-Filling Mixture Designs, Neural
Networks, and SVEM.
JMP Discovery Conference.
Link
Ramsey, P., & Gotwalt, C. (2018). Model
Validation Strategies for Designed Experiments Using Bootstrapping
Techniques With Applications to Biopharmaceuticals.
JMP Discovery Summit Europe.
Link
Ramsey, P., Levin, W., Lemkus, T., & Gotwalt, C.
(2021). SVEM: A Paradigm Shift in Design and Analysis of
Experiments.
JMP Discovery Summit Europe.
Link
Ramsey, P., & McNeill, P. (2023). CMC,
SVEM, Neural Networks, DOE, and Complexity: It’s All About
Prediction.
JMP Discovery Conference.
Karl, A., Wisnowski, J., & Rushing, H.
(2022). JMP Pro 17 Remedies for Practical Struggles with
Mixture Experiments.
JMP Discovery Conference.
Link
Xu, L., Gotwalt, C., Hong, Y., King, C. B., & Meeker,
W. Q. (2020). Applications of the Fractional-Random-Weight
Bootstrap.
The American Statistician, 74(4), 345–358.
Link
Karl, A. T. (2024). SVEMnet: Self-Validated
Ensemble Models with Elastic Net Regression.
R package
JMP Help Documentation Overview of
Self-Validated Ensemble Models.
Link