Type: | Package |
Title: | Estimation and Prediction Methods for High-Dimensional Mixed Frequency Time Series Data |
Version: | 0.1.10 |
Maintainer: | Jonas Striaukas <jonas.striaukas@gmail.com> |
Description: | The 'midasml' package implements estimation and prediction methods for high-dimensional mixed-frequency (MIDAS) time-series and panel data regression models. The regularized MIDAS models are estimated using orthogonal (e.g. Legendre) polynomials and sparse-group LASSO (sg-LASSO) estimator. For more information on the 'midasml' approach see Babii, Ghysels, and Striaukas (2021, JBES forthcoming) <doi:10.1080/07350015.2021.1899933>. The package is equipped with the fast implementation of the sg-LASSO estimator by means of proximal block coordinate descent. High-dimensional mixed frequency time-series data can also be easily manipulated with functions provided in the package. |
BugReports: | https://github.com/jstriaukas/midasml/issues |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Depends: | Matrix, R (≥ 3.5.0) |
Imports: | doRNG, doParallel, foreach, graphics, randtoolbox, snow, methods, lubridate, stats |
Encoding: | UTF-8 |
RoxygenNote: | 7.1.2 |
NeedsCompilation: | yes |
Packaged: | 2022-04-28 23:16:47 UTC; Utilisateur |
Author: | Jonas Striaukas [cre, aut], Andrii Babii [aut], Eric Ghysels [aut], Alex Kostrov [ctb] (Contributions to analytical gradients for non-linear low-dimensional MIDAS estimation code) |
Repository: | CRAN |
Date/Publication: | 2022-04-29 07:00:02 UTC |
midasml
Description
Estimation and Prediction Methods for High-Dimensional Mixed Frequency Time Series Data
Author(s)
Jonas Striaukas (maintainer) jonas.striaukas@gmail.com, Andrii Babii andrii@email.unc.edu, Eric Ghysels eghysels@unc.edu
ALFRED monthly and quarterly series vintages
Description
ALFRED monthly and quarterly series vintages
Usage
data(alfred_vintages)
Format
A list
objects
Source
Examples
data(alfred_vintages)
i <- 1
alfred_vintages[[i]] # ith variable
Cross-validation fit for panel sg-LASSO
Description
Does k-fold cross-validation for panel data sg-LASSO regression model.
The function runs sglfit nfolds+1
times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the folds omitted.
The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min
and lam.1se
. Solutions are computed for a fixed γ
Usage
cv.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, nfolds = 10,
foldid, method = c("pooled", "fe"), nf = NULL, parallel = FALSE, ...)
Arguments
x |
NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors. |
y |
NT by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on |
gamma |
sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution. |
gindex |
p by 1 vector indicating group membership of each covariate. |
nfolds |
number of folds of the cv loop. Default set to |
foldid |
the fold assignments used. |
method |
choose between 'pooled' and 'fe'; 'pooled' forces the intercept to be fitted in sglfit, 'fe' computes the fixed effects. User must input the number of fixed effects |
nf |
number of fixed effects. Used only if |
parallel |
if |
... |
Other arguments that can be passed to sglfit. |
Details
The cross-validation is run for sg-LASSO linear model. The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is (casemethod='pooled'
) ||y - ια - xβ||2NT + 2λ Ωγ(β),
where ι∈RNT and α is common intercept to all N items or (case
method='fe'
) ||y - Bα - xβ||2NT + 2λ Ωγ(β),
where B = IN⊗ι and ||u||2NT=<u,u>/NT is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is
Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,
a convex combination of LASSO and group LASSO penalty functions.
Value
cv.panel.sglfit object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10,
standardize = FALSE, intercept = FALSE)
## Not run:
# Parallel
require(doMC)
registerDoMC(cores = 2)
x = matrix(rnorm(1000 * 20), 1000, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(1000)
gindex = sort(rep(1:4,times=5))
system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10,
standardize = FALSE, intercept = FALSE))
system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10,
standardize = FALSE, intercept = FALSE, parallel = TRUE))
## End(Not run)
Sorts cross-validation output for panel data regressions
Description
Computes cvm
and cvsd
based on cross-validation fit.
Usage
cv.panel.sglpath(outlist, lambda, x, y, foldid, method, ...)
Arguments
outlist |
list of cross-validation fits. |
lambda |
a sequence of |
x |
regressors |
y |
response |
foldid |
the fold assignment |
method |
'pooled' or 'fe'. |
... |
other arguments passed to |
Value
cvm
and cvsd
.
Cross-validation fit for sg-LASSO
Description
Does k-fold cross-validation for sg-LASSO regression model.
The function runs sglfit nfolds+1
times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the folds omitted.
The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min
and lam.1se
. Solutions are computed for a fixed γ
Usage
cv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p,
nfolds = 10, foldid, parallel = FALSE, ...)
Arguments
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on |
gamma |
sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution. |
gindex |
p by 1 vector indicating group membership of each covariate. |
nfolds |
number of folds of the cv loop. Default set to |
foldid |
the fold assignments used. |
parallel |
if |
... |
Other arguments that can be passed to sglfit. |
Details
The cross-validation is run for sg-LASSO linear model. The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is||y - ια - xβ||2T + 2λ Ωγ(β),
where ι∈RTenter> and ||u||2T=<u,u>/T is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is
Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,
a convex combination of LASSO and group LASSO penalty functions.
Value
cv.sglfit object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5,
standardize = FALSE, intercept = FALSE)
## Not run:
# Parallel
require(doMC)
registerDoMC(cores = 2)
x = matrix(rnorm(1000 * 20), 1000, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(1000)
gindex = sort(rep(1:4,times=5))
system.time(cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5,
standardize = FALSE, intercept = FALSE))
system.time(cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5,
standardize = FALSE, intercept = FALSE, parallel = TRUE))
## End(Not run)
Sorts cross-validation output
Description
Computes cvm
and cvsd
based on cross-validation fit.
Usage
cv.sglpath(outlist, lambda, x, y, foldid, ...)
Arguments
outlist |
list of cross-validation fits. |
lambda |
a sequence of |
x |
regressors |
y |
response |
foldid |
the fold assignment. |
... |
other arguments passed to |
Value
cvm
and cvsd
.
Identify data frequency
Description
Identify data frequency
Usage
data_freq(DateVec)
Arguments
DateVec |
n by 6 vector format dates: [year,month,day,hour,min,sec] |
Value
a list of arguments that define data frequency. 'period' - length of two consecutive dates. 'unit' - unit of length measure (=1 year;=2 month;=3 day;=4 hour;=5 minutes;=6 seconds)
Match dates
Description
Change the date to the beginning of the month date.
Usage
dateMatch(x, y)
Arguments
x |
date vector to match with y date vector. |
y |
date vector. |
Value
changed date vector.
Author(s)
Jonas Striaukas
Examples
x <- seq(as.Date("2020-01-01"),as.Date("2020-12-01"), by = "day")
set.seed(100)
x <- x[-sample(1:336, 100)]
y <- seq(as.Date("2020-01-01"),as.Date("2020-12-01"), by = "month")
dateMatch(x,y)
Transform date vector to numeric matrix
Description
Transform date vector to numeric matrix
Usage
date_vec(s)
Arguments
s |
date vector. |
Value
numeric matrix of date vector.
Computes the difference between two dates.
Description
Computes the difference between two dates.
Usage
diff_time_mf(
time1,
time2,
origin,
units = c("auto", "secs", "mins", "hours", "days", "weeks")
)
Arguments
time1 |
date 1. |
time2 |
date 2. |
origin |
date origin. |
units |
units. |
Value
numeric value of difference in two dates.
Gegenbauer polynomials shifted to [a,b]
Description
For a given set of points in X, computes the orthonormal Gegenbauer polynomials basis of L2 [a,b] for a given degree and \alpha
parameter. The Gegenbauer polynomials are a special case of more general Jacobi polynomials. In turn, you may get Legendre polynomials from Gegenbauer by setting \alpha
= 0, or Chebychev's polynomials
by setting \alpha
= 1/2 or -1/2.
Usage
gb(degree, alpha, a = 0, b = 1, jmax = NULL, X = NULL)
Arguments
degree |
polynomial degree. |
alpha |
Gegenbauer polynomials parameter. |
a |
lower shift value (default - 0). |
b |
upper shift value (default - 1). |
jmax |
number of high-frequency lags. |
X |
optional evaluation grid vector. |
Value
Psi weight matrix with Gegenbauer functions upto degree
.
Author(s)
Jonas Striaukas
Examples
degree <- 3
alpha <- 1
jmax <- 66
gb(degree = degree, alpha = alpha, a = 0, b = 1, jmax = jmax)
Information criteria fit for panel sg-LASSO
Description
Does information criteria for panel data sg-LASSO regression model.
The function runs sglfit 1 time; computes the path solution in lambda
sequence.
Solutions for BIC
, AIC
and AICc
information criteria are returned.
Usage
ic.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p,
method = c("pooled","fe"), nf = NULL, ...)
Arguments
x |
NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors. |
y |
NT by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own |
gamma |
sg-LASSO mixing parameter. |
gindex |
p by 1 vector indicating group membership of each covariate. |
method |
choose between 'pooled' and 'fe'; 'pooled' forces the intercept to be fitted in sglfit, 'fe' computes the fixed effects. User must input the number of fixed effects |
nf |
number of fixed effects. Used only if |
... |
Other arguments that can be passed to sglfit. |
Details
The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is (casemethod='pooled'
) ||y - ια - xβ||2NT + 2λ Ωγ(β),
where ι∈RNT and α is common intercept to all N items or (case
method='fe'
) ||y - Bα - xβ||2NT + 2λ Ωγ(β),
where B = IN⊗ι and ||u||2NT=<u,u>/NT is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is
Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,
a convex combination of LASSO and group LASSO penalty functions.
Value
ic.panel.sglfit object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
ic.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5,
standardize = FALSE, intercept = FALSE)
Compute the penalty based on chosen information criteria
Description
Compute the penalty based on chosen information criteria
Usage
ic.pen(ic_choice, df, t)
Arguments
ic_choice |
information criteria choice: BIC, AIC or AICc. |
df |
effective degrees of freedom. |
t |
sample size. |
Value
penalty value.
Information criteria fit for sg-LASSO
Description
Does information criteria for sg-LASSO regression model.
The function runs sglfit 1 time; computes the path solution in lambda
sequence.
Solutions for BIC
, AIC
and AICc
information criteria are returned.
Usage
ic.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, ...)
Arguments
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own |
gamma |
sg-LASSO mixing parameter. |
gindex |
p by 1 vector indicating group membership of each covariate. |
... |
Other arguments that can be passed to sglfit. |
Details
The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is||y - ια - xβ||2T + 2λ Ωγ(β),
where ι∈RTenter> and ||u||2T=<u,u>/T is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is
Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,
a convex combination of LASSO and group LASSO penalty functions.
Value
ic.sglfit object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
ic.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5,
standardize = FALSE, intercept = FALSE)
Compute the number of lags
Description
Compute the number of lags
Usage
lag_num(x.lag, period, unit)
Arguments
x.lag |
number of high-frequency lags to construct in high-frequency time units. |
period |
high-frequency data period. |
unit |
units. |
Value
numeric value of the number of lags.
Legendre polynomials shifted to [a,b]
Description
For a given set of points in X, computes the orthonormal Legendre polynomials basis of L2 [a,b] for a given degree.
Usage
lb(degree, a = 0, b = 1, jmax = NULL, X = NULL)
Arguments
degree |
polynomial degree. |
a |
lower shift value (default - 0). |
b |
upper shift value (default - 1). |
jmax |
number of high-frequency lags. |
X |
optional evaluation grid vector. |
Value
Psi weight matrix with Legendre functions upto degree
.
Author(s)
Jonas Striaukas
Examples
degree <- 3
jmax <- 66
lb(degree = degree, a = 0, b = 1, jmax = jmax)
SNP500 returns
Description
SNP500 returns
Usage
data(market_ret)
Format
A data.frame
object.a
Source
market_ret
- FRED
Examples
data(market_ret)
market_ret$snp500ret
MIDAS regression
Description
Fits MIDAS regression model with single high-frequency covariate. Options include linear-in-parameters polynomials (e.g. Legendre) or non-linear polynomials (e.g. exponential Almon). Nonlinear polynomial optimization routines are equipped with analytical gradients, which allows fast and accurate optimization.
Usage
midas.ardl(y, x, z = NULL, loss_choice = c("mse","logit"),
poly_choice = c("legendre","expalmon","beta"),
poly_spec = 0, legendre_degree = 3, nbtrials = 500)
Arguments
y |
response variable. Continuous for |
x |
high-frequency covariate lags. |
z |
other lower-frequency covariate(s) or AR lags (both can be supplied in an appended matrix). Either must be supplied. |
loss_choice |
which loss function to fit: |
poly_choice |
which MIDAS lag polynomial function to use: |
poly_spec |
which Beta density function specification to apply (applicable only for |
legendre_degree |
the degree of legendre polynomials (applicable only for |
nbtrials |
number of initial values tried in multistart optimization. Default is set to |
Details
Several polynomial functional forms are available (poly_choice
): -
beta
: Beta polynomial -
expalmon
: exponential Almon polynomial -
legendre
: Legendre polynomials. The ARDL-MIDAS model is:
yt = μ + Σp ρp yt-p + β Σj ωj(θ)xt-1
where μ, β, θ and ρp are model parameters, p is the number of low-frequency lags and ω is the weight function.
Value
midas.ardl object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
z = rnorm(100)
y = rnorm(100)
midas.ardl(y = y, x = x, z = z)
MIDAS data structure
Description
Creates a MIDAS data structure for a single high-frequency covariate and a single low-frequency dependent variable.
Usage
mixed_freq_data(data.y, data.ydate, data.x, data.xdate, x.lag, y.lag,
horizon, est.start, est.end, disp.flag = TRUE)
Arguments
data.y |
n by 1 low-frequency time series data vector. |
data.ydate |
n by 1 low-frequency time series date vector. |
data.x |
m by 1 high-frequency time series data vector. |
data.xdate |
m by 1 high-frequency time series date vector. |
x.lag |
number of high-frequency lags to construct in high-frequency time units. |
y.lag |
number of low-frequency lags to construct in low-frequency time units. |
horizon |
forecast horizon relative to |
est.start |
estimation start date, taken as the first ... . |
est.end |
estimation end date, taken as the last ... . Remaining data after this date is dropped to out-of-sample evaluation data. |
disp.flag |
display flag to indicate whether or not to display obtained MIDAS data structure in console. |
Value
a list of MIDAS data structure.
Author(s)
Jonas Striaukas
Examples
data(us_rgdp)
rgdp <- us_rgdp$rgdp
payems <- us_rgdp$payems
payems[-1, 2] <- log(payems[-1, 2]/payems[-dim(payems)[1], 2])*100
payems <- payems[-1, ]
rgdp[-1, 2] <- ((rgdp[-1, 2]/rgdp[-dim(rgdp)[1], 2])^4-1)*100
rgdp <- rgdp[-1, ]
est.start <- as.Date("1990-01-01")
est.end <- as.Date("2002-03-01")
mixed_freq_data(rgdp[,2], as.Date(rgdp[,1]), payems[,2],
as.Date(payems[,1]), x.lag = 9, y.lag = 4, horizon = 1,
est.start, est.end, disp.flag = FALSE)
MIDAS data structure
Description
Creates a MIDAS data structure for a single high-frequency covariate based on low-frequency reference date.
Usage
mixed_freq_data_single(data.refdate, data.x, data.xdate, x.lag, horizon,
est.start, est.end, disp.flag = TRUE)
Arguments
data.refdate |
n by 1 date vector. |
data.x |
m by 1 high-frequency time series data vector. |
data.xdate |
m by 1 high-frequency time series date vector. |
x.lag |
number of high-frequency lags to construct in high-frequency time units. |
horizon |
forecast horizon relative to |
est.start |
estimation start date, taken as the first ... . |
est.end |
estimation end date, taken as the last ... . Remaining data after this date is dropped to out-of-sample evaluation data. |
disp.flag |
display flag to indicate whether or not to display obtained MIDAS data strcuture in console. |
Value
a list of midas data structure.
Author(s)
Jonas Striaukas
Examples
data(us_rgdp)
rgdp <- us_rgdp$rgdp
cfnai <- us_rgdp$cfnai
data.refdate <- rgdp$date
data.x <- cfnai$cfnai
data.xdate <- cfnai$date
est.start <- as.Date("1990-01-01")
est.end <- as.Date("2002-03-01")
mixed_freq_data_single(data.refdate, data.x, data.xdate, x.lag = 12, horizon = 1,
est.start, est.end, disp.flag = FALSE)
Compute mode of a vector
Description
Compute mode of a vector
Usage
mode_midasml(data)
Arguments
data |
data vector. |
Value
a list of arguments.
Beginning of the month date
Description
Change the date to the beginning of the month date.
Usage
monthBegin(x)
Arguments
x |
date value. |
Value
changed date value.
Author(s)
Jonas Striaukas
Examples
monthBegin(as.Date("2020-05-15"))
End of the month date
Description
Change the date to the end of the month date.
Usage
monthEnd(x)
Arguments
x |
date value. |
Value
changed date value.
Author(s)
Jonas Striaukas
Examples
monthEnd(as.Date("2020-05-15"))
Computes prediction
Description
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
Usage
## S3 method for class 'cv.panel.sglfit'
predict(
object,
newx,
s = c("lam.min", "lam.1se"),
type = c("response"),
method = c("pooled", "fe"),
...
)
Arguments
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
choose between 'lam.min' and 'lam.1se'. |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
method |
choose between 'pooled', and 'fe'. |
... |
Not used. Other arguments to predict. |
Details
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right lambda
indices.
Value
The object returned depends on type.
Computes prediction
Description
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
Usage
## S3 method for class 'cv.sglfit'
predict(object, newx, s = c("lam.min", "lam.1se"), type = c("response"), ...)
Arguments
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
choose between 'lam.min' and 'lam.1se'. |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
... |
Not used. Other arguments to predict. |
method |
choose between 'single', 'pooled', and 'fe'. |
Details
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right lambda
indices.
Value
The object returned depends on type.
Computes prediction
Description
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
Usage
## S3 method for class 'ic.panel.sglfit'
predict(
object,
newx,
s = c("bic", "aic", "aicc"),
type = c("response"),
method = c("pooled", "fe"),
...
)
Arguments
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
choose between 'bic', 'aic', and 'aicc'. |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
method |
choose between 'pooled', and 'fe'. |
... |
Not used. Other arguments to predict. |
Details
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right lambda
indices.
Value
The object returned depends on type.
Computes prediction
Description
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
Usage
## S3 method for class 'ic.sglfit'
predict(object, newx, s = c("bic", "aic", "aicc"), type = c("response"), ...)
Arguments
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
choose between 'bic', 'aic', and 'aicc'. |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
... |
Not used. Other arguments to predict. |
Details
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right lambda
indices.
Value
The object returned depends on type.
Computes prediction
Description
Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.
Usage
## S3 method for class 'sglpath'
predict(
object,
newx,
s = NULL,
type = c("response"),
method = c("single", "pooled", "fe"),
...
)
Arguments
object |
fitted |
newx |
matrix of new values for x at which predictions are to be made. NOTE: |
s |
value(s) of the penalty parameter |
type |
type of prediction required. Only response is available. Gives predicted response for regression problems. |
method |
choose between 'single', 'pooled', and 'fe'. |
... |
Not used. Other arguments to predict. |
Details
s
is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right lambda
indices.
Value
The object returned depends on type.
Regression fit for panel sg-LASSO
Description
Fits panel data sg-LASSO regression model.
The function fits sg-LASSO regression based on chosen tuning parameter selection method_choice. Options include cross-validation and information criteria.
Usage
reg.panel.sgl(x, y, gamma = NULL, gindex, intercept = TRUE,
method_choice = c("ic","cv"), nfolds = 10,
method = c("pooled", "fe"), nf = NULL,
verbose = FALSE, ...)
Arguments
x |
NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors. |
y |
NT by 1 response variable. |
gamma |
sg-LASSO mixing parameter. |
gindex |
p by 1 vector indicating group membership of each covariate. |
intercept |
whether intercept be fitted ( |
method_choice |
choose between |
nfolds |
number of folds of the cv loop. Default set to |
method |
choose between 'pooled' and 'fe'; 'pooled' forces the intercept to be fitted in sglfit, 'fe' computes the fixed effects. User must input the number of fixed effects |
nf |
number of fixed effects. Used only if |
verbose |
flag to print information. |
... |
Other arguments that can be passed to |
Details
The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is either (casemethod='pooled'
) ||y - ια - xβ||2NT + 2λ Ωγ(β),
where ι∈RNT and α is common intercept to all N items or (case
method='fe'
) ||y - Bα - xβ||2NT + 2λ Ωγ(β),
where B = IN⊗ι and ||u||2NT=<u,u>/NT is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is
Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,
a convex combination of LASSO and group LASSO penalty functions.
Value
reg.panel.sgl object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
reg.panel.sgl(x = x, y = y,
gindex = gindex, gamma = 0.5,
method = "fe", nf = 10,
standardize = FALSE, intercept = FALSE)
Fit for sg-LASSO regression
Description
Fits sg-LASSO regression model.
The function fits sg-LASSO regression based on chosen tuning parameter selection method_choice
. Options include cross-validation and information criteria.
Usage
reg.sgl(x, y, gamma = NULL, gindex, intercept = TRUE,
method_choice = c("tscv","ic","cv"), verbose = FALSE, ...)
Arguments
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
gamma |
sg-LASSO mixing parameter. |
gindex |
p by 1 vector indicating group membership of each covariate. |
intercept |
whether intercept be fitted ( |
method_choice |
choose between |
verbose |
flag to print information. |
... |
Other arguments that can be passed to |
Details
The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is||y - ια - xβ||2T + 2λ Ωγ(β),
where ι∈RTenter> and ||u||2T=<u,u>/T is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is
Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,
a convex combination of LASSO and group LASSO penalty functions.
Value
reg.sgl object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
reg.sgl(x = x, y = y, gamma = 0.5, gindex = gindex)
Real GDP release dates
Description
Real GDP release dates
Usage
data(rgdp_dates)
Format
A list
objects
Source
Examples
data(rgdp_dates)
rgdp_dates$Quarter_q # reference quarters in quarters
rgdp_dates$Quarter_m # reference quarters in months
rgdp_dates$Quarter_d # reference quarters in days
rgdp_dates$`First release` # first release date for the reference
rgdp_dates$`Second release` # second release date for the reference
rgdp_dates$`Third release` # third release date for the reference
Real GDP vintages
Description
Real GDP vintages
Usage
data(rgdp_vintages)
Format
A list
objects
Source
Examples
data(rgdp_vintages)
rgdp_vintages$date # dates
rgdp_vintages$time_series # series, q-q annual rate
rgdp_vintages$realtime_period # real time dates
Fits sg-LASSO regression
Description
Fits sg-LASSO regression model.
The function fits sg-LASSO regression model for a sequence of \lambda
tuning parameter and fixed \gamma
tuning parameter. The optimization is based on block coordinate-descent. Optionally, fixed effects are fitted.
Usage
sglfit(x, y, gamma = 1.0, nlambda = 100L, method = c("single", "pooled", "fe"),
nf = NULL, lambda.factor = ifelse(nobs < nvars, 1e-02, 1e-04),
lambda = NULL, pf = rep(1, nvars), gindex = 1:nvars,
dfmax = nvars + 1, pmax = min(dfmax * 1.2, nvars), standardize = FALSE,
intercept = FALSE, eps = 1e-08, maxit = 1000000L, peps = 1e-08)
Arguments
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
gamma |
sg-LASSO mixing parameter. |
nlambda |
number of |
method |
choose between 'single', 'pooled' and 'fe'; 'single' implies standard sg-LASSO regression, 'pooled' forces the intercept to be fitted, 'fe' computes the fixed effects. User needs to input the number of fixed effects |
nf |
number of fixed effects. Used only if |
lambda.factor |
The factor for getting the minimal |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own |
pf |
the ℓ1 penalty factor of length |
gindex |
p by 1 vector indicating group membership of each covariate. |
dfmax |
the maximum number of variables allowed in the model. Useful for very large |
pmax |
the maximum number of coefficients allowed ever to be nonzero. For example, once βi ≠ 0 for some i ∈ [p], no matter how many times it exits or re-enters the model through the path, it will be counted only once. Default is |
standardize |
logical flag for variable standardization, prior to fitting the model sequence. The coefficients are always returned to the original scale. It is recommended to keep |
intercept |
whether intercept be fitted ( |
eps |
convergence threshold for block coordinate descent. Each inner block coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null deviance. Defaults value is |
maxit |
maximum number of outer-loop iterations allowed at fixed lambda values. Default is |
peps |
convergence threshold for proximal map of sg-LASSO penalty. Each loop continues until G group difference sup-norm, || βkG - βk-1G ||∞, is less than |
Details
The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is||y - ια - xβ||2T + 2λ Ωγ(β),
where ι∈RTenter> and ||u||2T=<u,u>/T is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is
Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,
a convex combination of LASSO and group LASSO penalty functions.
Value
sglfit object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
sglfit(x = x, y = y, gindex = gindex, gamma = 0.5)
Nodewise LASSO regressions to fit the precision matrix Θ
Description
Fits the precision matrix Θ by running nodewise LASSO regressions.
Usage
thetafit(x, parallel = FALSE, ncores = getOption("mc.cores", NULL),
intercept = FALSE, K = 20, l = 5, seed = NULL, verbose = FALSE,
registerpar = TRUE, ...)
Arguments
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
parallel |
if |
ncores |
number of cores used in parallelization |
intercept |
whether intercept be fitted ( |
K |
number of folds of the cv loop. Default set to |
l |
the gap used to drop observations round test set data. See tscv.sglfit for more details. |
seed |
set a value for seed to control results replication, i.e. |
verbose |
if |
registerpar |
if |
... |
Other arguments that can be passed to tscv.sglfit. |
Details
The function runs tscv.sglfit p
times by regressing j
-th covariate on all other covariates excluding j
-th covariate. The precision matrix is then constructed based on LASSO estimates. Each nodewise LASSO regression tuning parameter λ is optimized using time series cross-validation. See tscv.sglfit for more details on cross-validation implementation.
Value
thetafit object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
thetafit(x = x, parallel = FALSE)
Time series cross-validation fit for sg-LASSO
Description
Does k-fold time series cross-validation for sg-LASSO regression model.
The function runs sglfit K+1
times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the test observation k ∈ K
The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min
and lam.1se
. Solutions are computed for a fixed γ
Usage
tscv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p,
K = 20, l = 5, parallel = FALSE, seed = NULL, ...)
Arguments
x |
T by p data matrix, where T and p respectively denote the sample size and the number of regressors. |
y |
T by 1 response variable. |
lambda |
a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on |
gamma |
sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution. |
gindex |
p by 1 vector indicating group membership of each covariate. |
K |
number of observations drawn for the test set. Default set to |
l |
the gap used to drop observations round the test set data point. Default set to |
parallel |
if |
seed |
set a value for seed to control results replication, i.e. |
... |
Other arguments that can be passed to sglfit. |
Details
The cross-validation is run for sg-LASSO linear model. The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is||y - ια - xβ||2T + 2λ Ωγ(β),
where ι∈RTenter> and ||u||2T=<u,u>/T is the empirical inner product. The penalty function Ωγ(.) is applied on β coefficients and is
Ωγ(β) = γ |β|1 + (1-γ)|β|2,1,
a convex combination of LASSO and group LASSO penalty functions.
Value
tscv.sglfit object.
Author(s)
Jonas Striaukas
Examples
set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5,
standardize = FALSE, intercept = FALSE)
## Not run:
# Parallel
require(doMC)
registerDoMC(cores = 2)
x = matrix(rnorm(1000 * 20), 1000, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(1000)
gindex = sort(rep(1:4,times=5))
system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5,
standardize = FALSE, intercept = FALSE))
system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5,
standardize = FALSE, intercept = FALSE, parallel = TRUE))
## End(Not run)
Sorts time series cross-validation output
Description
Computes cvm
and cvsd
based on times series cross-validation fit.
Usage
tscv.sglpath(outlist, lambda, x, y, foldid, ...)
Arguments
outlist |
list of cross-validation fits. |
lambda |
a sequence of |
x |
regressors |
y |
response |
foldid |
the fold assignment. |
... |
other arguments passed to |
Value
cvm
and cvsd
.
US real GDP data with several high-frequency predictors
Description
US real GDP, Chicago National Activity Index, Nonfarm payrolls and ADS Index
Usage
data(us_rgdp)
Format
A list
object.a
Source
Examples
data(us_rgdp)
us_rgdp$rgdp # - GDP data
us_rgdp$cfnai # - CFNAI predictor data
us_rgdp$payems # - Nonfarm payrolls predictor data
us_rgdp$ads # - ADS predictor data