Help for package midasml

Type:

Package

Title:

Estimation and Prediction Methods for High-Dimensional Mixed Frequency Time Series Data

Version:

0.1.10

Maintainer:

Jonas Striaukas <jonas.striaukas@gmail.com>

Description:

The 'midasml' package implements estimation and prediction methods for high-dimensional mixed-frequency (MIDAS) time-series and panel data regression models. The regularized MIDAS models are estimated using orthogonal (e.g. Legendre) polynomials and sparse-group LASSO (sg-LASSO) estimator. For more information on the 'midasml' approach see Babii, Ghysels, and Striaukas (2021, JBES forthcoming) <doi:10.1080/07350015.2021.1899933>. The package is equipped with the fast implementation of the sg-LASSO estimator by means of proximal block coordinate descent. High-dimensional mixed frequency time-series data can also be easily manipulated with functions provided in the package.

BugReports:

https://github.com/jstriaukas/midasml/issues

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Depends:

Matrix, R (≥ 3.5.0)

Imports:

doRNG, doParallel, foreach, graphics, randtoolbox, snow, methods, lubridate, stats

Encoding:

UTF-8

RoxygenNote:

7.1.2

NeedsCompilation:

yes

Packaged:

2022-04-28 23:16:47 UTC; Utilisateur

Author:

Jonas Striaukas [cre, aut], Andrii Babii [aut], Eric Ghysels [aut], Alex Kostrov [ctb] (Contributions to analytical gradients for non-linear low-dimensional MIDAS estimation code)

Repository:

CRAN

Date/Publication:

2022-04-29 07:00:02 UTC

midasml

Description

Estimation and Prediction Methods for High-Dimensional Mixed Frequency Time Series Data

Author(s)

Jonas Striaukas (maintainer) jonas.striaukas@gmail.com, Andrii Babii andrii@email.unc.edu, Eric Ghysels eghysels@unc.edu

ALFRED monthly and quarterly series vintages

Description

ALFRED monthly and quarterly series vintages

Usage

data(alfred_vintages)

Format

A list objects

Source

ALFRED

Examples


data(alfred_vintages)
i <- 1
alfred_vintages[[i]] # ith variable

Cross-validation fit for panel sg-LASSO

Description

Does k-fold cross-validation for panel data sg-LASSO regression model.

The function runs sglfit nfolds+1 times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the folds omitted. The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min and lam.1se. Solutions are computed for a fixed γ

Usage

cv.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, nfolds = 10, 
  foldid, method = c("pooled", "fe"), nf = NULL, parallel = FALSE, ...)

Arguments

x

NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors.

y

NT by 1 response variable.

lambda

a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own λ sequence based on nlambda and γ lambda.factor. It is better to supply, if necessary, a decreasing sequence of lambda values than a single (small) value, as warm-starts are used in the optimization algorithm. The program will ensure that the user-supplied lambda sequence is sorted in decreasing order before fitting the model.

gamma

sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

nfolds

number of folds of the cv loop. Default set to 10.

foldid

the fold assignments used.

method

choose between 'pooled' and 'fe'; 'pooled' forces the intercept to be fitted in sglfit, 'fe' computes the fixed effects. User must input the number of fixed effects nf for method = 'fe', and it is recommended to do so for method = 'pooled'. Program uses supplied nf to construct foldsid. Default is set to method = 'pooled'.

nf

number of fixed effects. Used only if method = 'fe'.

parallel

if TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC or others. See the example below.

...

Other arguments that can be passed to sglfit.

Details

The cross-validation is run for sg-LASSO linear model. The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is (case method='pooled')

||y - ια - xβ||²_NT + 2λ Ω_γ(β),
where ι∈R^NT and α is common intercept to all N items or (case method='fe')

||y - Bα - xβ||²_NT + 2λ Ω_γ(β),
where B = I_N⊗ι and ||u||²_NT=<u,u>/NT is the empirical inner product. The penalty function Ω_γ(.) is applied on β coefficients and is

Ω_γ(β) = γ |β|₁ + (1-γ)|β|_2,1,
a convex combination of LASSO and group LASSO penalty functions.

Value

cv.panel.sglfit object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, 
  standardize = FALSE, intercept = FALSE)
## Not run:  
# Parallel
require(doMC)
registerDoMC(cores = 2)
x = matrix(rnorm(1000 * 20), 1000, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(1000)
gindex = sort(rep(1:4,times=5))
system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, 
  standardize = FALSE, intercept = FALSE))
system.time(cv.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, method = "fe", nf = 10, 
  standardize = FALSE, intercept = FALSE, parallel = TRUE))

## End(Not run)

Sorts cross-validation output for panel data regressions

Description

Computes cvm and cvsd based on cross-validation fit.

Usage

cv.panel.sglpath(outlist, lambda, x, y, foldid, method, ...)

Arguments

outlist

list of cross-validation fits.

lambda

a sequence of \lambda parameter.

x

regressors

y

response

foldid

the fold assignment

method

'pooled' or 'fe'.

...

other arguments passed to predict.sgl

Value

cvm and cvsd.

Cross-validation fit for sg-LASSO

Description

Does k-fold cross-validation for sg-LASSO regression model.

Usage

cv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, 
  nfolds = 10, foldid, parallel = FALSE, ...)

Arguments

x

T by p data matrix, where T and p respectively denote the sample size and the number of regressors.

y

T by 1 response variable.

lambda

gamma

sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

nfolds

number of folds of the cv loop. Default set to 10.

foldid

the fold assignments used.

parallel

if TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC or others. See the example below.

...

Other arguments that can be passed to sglfit.

Details

The cross-validation is run for sg-LASSO linear model. The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is

||y - ια - xβ||²_T + 2λ Ω_γ(β),
where ι∈R^Tenter> and ||u||²_T=<u,u>/T is the empirical inner product. The penalty function Ω_γ(.) is applied on β coefficients and is

Ω_γ(β) = γ |β|₁ + (1-γ)|β|_2,1,
a convex combination of LASSO and group LASSO penalty functions.

Value

cv.sglfit object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE)
## Not run:  
# Parallel
require(doMC)
registerDoMC(cores = 2)
x = matrix(rnorm(1000 * 20), 1000, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(1000)
gindex = sort(rep(1:4,times=5))
system.time(cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE))
system.time(cv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE, parallel = TRUE))

## End(Not run)

Sorts cross-validation output

Description

Computes cvm and cvsd based on cross-validation fit.

Usage

cv.sglpath(outlist, lambda, x, y, foldid, ...)

Arguments

outlist

list of cross-validation fits.

lambda

a sequence of \lambda parameter.

x

regressors

y

response

foldid

the fold assignment.

...

other arguments passed to predict.sgl

Value

cvm and cvsd.

Identify data frequency

Description

Identify data frequency

Usage

data_freq(DateVec)

Arguments

DateVec

n by 6 vector format dates: [year,month,day,hour,min,sec]

Value

a list of arguments that define data frequency. 'period' - length of two consecutive dates. 'unit' - unit of length measure (=1 year;=2 month;=3 day;=4 hour;=5 minutes;=6 seconds)

Match dates

Description

Change the date to the beginning of the month date.

Usage

dateMatch(x, y)

Arguments

x

date vector to match with y date vector.

y

date vector.

Value

changed date vector.

Author(s)

Jonas Striaukas

Examples

x <- seq(as.Date("2020-01-01"),as.Date("2020-12-01"), by = "day")
set.seed(100)
x <- x[-sample(1:336, 100)]
y <- seq(as.Date("2020-01-01"),as.Date("2020-12-01"), by = "month")
dateMatch(x,y)

Transform date vector to numeric matrix

Description

Transform date vector to numeric matrix

Usage

date_vec(s)

Arguments

s

date vector.

Value

numeric matrix of date vector.

Computes the difference between two dates.

Description

Computes the difference between two dates.

Usage

diff_time_mf(
  time1,
  time2,
  origin,
  units = c("auto", "secs", "mins", "hours", "days", "weeks")
)

Arguments

time1

date 1.

time2

date 2.

origin

date origin.

units

units.

Value

numeric value of difference in two dates.

Gegenbauer polynomials shifted to [a,b]

Description

For a given set of points in X, computes the orthonormal Gegenbauer polynomials basis of L2 [a,b] for a given degree and \alpha parameter. The Gegenbauer polynomials are a special case of more general Jacobi polynomials. In turn, you may get Legendre polynomials from Gegenbauer by setting \alpha = 0, or Chebychev's polynomials by setting \alpha = 1/2 or -1/2.

Usage

gb(degree, alpha, a = 0, b = 1, jmax = NULL, X = NULL)

Arguments

degree

polynomial degree.

alpha

Gegenbauer polynomials parameter.

a

lower shift value (default - 0).

b

upper shift value (default - 1).

jmax

number of high-frequency lags.

X

optional evaluation grid vector.

Value

Psi weight matrix with Gegenbauer functions upto degree.

Author(s)

Jonas Striaukas

Examples

degree <- 3
alpha <- 1
jmax <- 66
gb(degree = degree, alpha = alpha, a = 0, b = 1, jmax = jmax)

Information criteria fit for panel sg-LASSO

Description

Does information criteria for panel data sg-LASSO regression model.

The function runs sglfit 1 time; computes the path solution in lambda sequence. Solutions for BIC, AIC and AICc information criteria are returned.

Usage

ic.panel.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, 
                method = c("pooled","fe"), nf = NULL, ...)

Arguments

x

NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors.

y

NT by 1 response variable.

lambda

a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own \lambda sequence based on nlambda and lambda.factor. It is better to supply, if necessary, a decreasing sequence of lambda values than a single (small) value, as warm-starts are used in the optimization algorithm. The program will ensure that the user-supplied \lambda sequence is sorted in decreasing order before fitting the model.

gamma

sg-LASSO mixing parameter. \gamma = 1 gives LASSO solution and \gamma = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

method

nf

number of fixed effects. Used only if method = 'fe'.

...

Other arguments that can be passed to sglfit.

Details

The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is (case method='pooled')

||y - ια - xβ||²_NT + 2λ Ω_γ(β),
where ι∈R^NT and α is common intercept to all N items or (case method='fe')

||y - Bα - xβ||²_NT + 2λ Ω_γ(β),
where B = I_N⊗ι and ||u||²_NT=<u,u>/NT is the empirical inner product. The penalty function Ω_γ(.) is applied on β coefficients and is

Ω_γ(β) = γ |β|₁ + (1-γ)|β|_2,1,
a convex combination of LASSO and group LASSO penalty functions.

Value

ic.panel.sglfit object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
ic.panel.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE)

Compute the penalty based on chosen information criteria

Description

Compute the penalty based on chosen information criteria

Usage

ic.pen(ic_choice, df, t)

Arguments

ic_choice

information criteria choice: BIC, AIC or AICc.

df

effective degrees of freedom.

t

sample size.

Value

penalty value.

Information criteria fit for sg-LASSO

Description

Does information criteria for sg-LASSO regression model.

The function runs sglfit 1 time; computes the path solution in lambda sequence. Solutions for BIC, AIC and AICc information criteria are returned.

Usage

ic.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p,  ...)

Arguments

x

T by p data matrix, where T and p respectively denote the sample size and the number of regressors.

y

T by 1 response variable.

lambda

gamma

sg-LASSO mixing parameter. \gamma = 1 gives LASSO solution and \gamma = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

...

Other arguments that can be passed to sglfit.

Details

The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is

||y - ια - xβ||²_T + 2λ Ω_γ(β),
where ι∈R^Tenter> and ||u||²_T=<u,u>/T is the empirical inner product. The penalty function Ω_γ(.) is applied on β coefficients and is

Ω_γ(β) = γ |β|₁ + (1-γ)|β|_2,1,
a convex combination of LASSO and group LASSO penalty functions.

Value

ic.sglfit object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
ic.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE)

Compute the number of lags

Description

Compute the number of lags

Usage

lag_num(x.lag, period, unit)

Arguments

x.lag

number of high-frequency lags to construct in high-frequency time units.

period

high-frequency data period.

unit

units.

Value

numeric value of the number of lags.

Legendre polynomials shifted to [a,b]

Description

For a given set of points in X, computes the orthonormal Legendre polynomials basis of L2 [a,b] for a given degree.

Usage

lb(degree, a = 0, b = 1, jmax = NULL, X = NULL)

Arguments

degree

polynomial degree.

a

lower shift value (default - 0).

b

upper shift value (default - 1).

jmax

number of high-frequency lags.

X

optional evaluation grid vector.

Value

Psi weight matrix with Legendre functions upto degree.

Author(s)

Jonas Striaukas

Examples

degree <- 3
jmax <- 66
lb(degree = degree, a = 0, b = 1, jmax = jmax)

SNP500 returns

Description

SNP500 returns

Usage

data(market_ret)

Format

A data.frame object.a

Source

market_ret - FRED

Examples


data(market_ret)
market_ret$snp500ret

MIDAS regression

Description

Fits MIDAS regression model with single high-frequency covariate. Options include linear-in-parameters polynomials (e.g. Legendre) or non-linear polynomials (e.g. exponential Almon). Nonlinear polynomial optimization routines are equipped with analytical gradients, which allows fast and accurate optimization.

Usage

midas.ardl(y, x, z = NULL, loss_choice = c("mse","logit"), 
           poly_choice = c("legendre","expalmon","beta"), 
           poly_spec = 0, legendre_degree = 3, nbtrials = 500)

Arguments

y

response variable. Continuous for loss_choice = "mse", binary for loss_choice = "logit".

x

high-frequency covariate lags.

z

other lower-frequency covariate(s) or AR lags (both can be supplied in an appended matrix). Either must be supplied.

loss_choice

which loss function to fit: loss_choice="mse" fits least squares MIDAS regression, loss_choice="logit" fits logit MIDAS regression.

poly_choice

which MIDAS lag polynomial function to use: poly_choice="expalmon" - exponential Almon polynomials, poly_choice="beta" - Beta density function (need to set poly_spec), poly_choice="legendre" - legendre polynomials (need to set legendre_degree). Default is set to poly_choice="expalmon".

poly_spec

which Beta density function specification to apply (applicable only for poly_choice="beta"). poly_spec = 0 - all three parameters are fitted, poly_spec = 1 (\theta_2,\theta_3) are fitted, poly_spec = 2 (\theta_1,\theta_2) are fitted, poly_spec = 3 (\theta_2) is fitted. Default is set to poly_spec = 0.

legendre_degree

the degree of legendre polynomials (applicable only for legendre="beta"). Default is set to 3.

nbtrials

number of initial values tried in multistart optimization. Default is set to poly_spec = 500.

Details

Several polynomial functional forms are available (poly_choice):

- beta: Beta polynomial
- expalmon: exponential Almon polynomial
- legendre: Legendre polynomials.

The ARDL-MIDAS model is:
y_t = μ + Σ_p ρ_p y_t-p + β Σ_j ω_j(θ)x_t-1
where μ, β, θ and ρ_p are model parameters, p is the number of low-frequency lags and ω is the weight function.

Value

midas.ardl object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
z = rnorm(100)
y = rnorm(100)
midas.ardl(y = y, x = x, z = z)

MIDAS data structure

Description

Creates a MIDAS data structure for a single high-frequency covariate and a single low-frequency dependent variable.

Usage

mixed_freq_data(data.y, data.ydate, data.x, data.xdate, x.lag, y.lag, 
  horizon, est.start, est.end, disp.flag = TRUE)

Arguments

data.y

n by 1 low-frequency time series data vector.

data.ydate

n by 1 low-frequency time series date vector.

data.x

m by 1 high-frequency time series data vector.

data.xdate

m by 1 high-frequency time series date vector.

x.lag

number of high-frequency lags to construct in high-frequency time units.

y.lag

number of low-frequency lags to construct in low-frequency time units.

horizon

forecast horizon relative to data.ydate date in high-frequency time units.

est.start

estimation start date, taken as the first ... .

est.end

estimation end date, taken as the last ... . Remaining data after this date is dropped to out-of-sample evaluation data.

disp.flag

display flag to indicate whether or not to display obtained MIDAS data structure in console.

Value

a list of MIDAS data structure.

Author(s)

Jonas Striaukas

Examples

data(us_rgdp)
rgdp <- us_rgdp$rgdp
payems <- us_rgdp$payems
payems[-1, 2] <- log(payems[-1, 2]/payems[-dim(payems)[1], 2])*100
payems <- payems[-1, ]
rgdp[-1, 2] <- ((rgdp[-1, 2]/rgdp[-dim(rgdp)[1], 2])^4-1)*100
rgdp <- rgdp[-1, ]
est.start <- as.Date("1990-01-01")
est.end <- as.Date("2002-03-01")
mixed_freq_data(rgdp[,2], as.Date(rgdp[,1]), payems[,2], 
  as.Date(payems[,1]), x.lag = 9, y.lag = 4, horizon = 1, 
  est.start, est.end, disp.flag = FALSE)

MIDAS data structure

Description

Creates a MIDAS data structure for a single high-frequency covariate based on low-frequency reference date.

Usage

mixed_freq_data_single(data.refdate, data.x, data.xdate, x.lag, horizon,
  est.start, est.end, disp.flag = TRUE)

Arguments

data.refdate

n by 1 date vector.

data.x

m by 1 high-frequency time series data vector.

data.xdate

m by 1 high-frequency time series date vector.

x.lag

number of high-frequency lags to construct in high-frequency time units.

horizon

forecast horizon relative to data.refdate date in high-frequency time units.

est.start

estimation start date, taken as the first ... .

est.end

estimation end date, taken as the last ... . Remaining data after this date is dropped to out-of-sample evaluation data.

disp.flag

display flag to indicate whether or not to display obtained MIDAS data strcuture in console.

Value

a list of midas data structure.

Author(s)

Jonas Striaukas

Examples

data(us_rgdp)
rgdp <- us_rgdp$rgdp
cfnai <- us_rgdp$cfnai
data.refdate <- rgdp$date
data.x <- cfnai$cfnai
data.xdate <- cfnai$date
est.start <- as.Date("1990-01-01")
est.end <- as.Date("2002-03-01")
mixed_freq_data_single(data.refdate, data.x, data.xdate, x.lag = 12, horizon = 1,
 est.start, est.end, disp.flag = FALSE)

Compute mode of a vector

Description

Compute mode of a vector

Usage

mode_midasml(data)

Arguments

data

data vector.

Value

a list of arguments.

Beginning of the month date

Description

Change the date to the beginning of the month date.

Usage

monthBegin(x)

Arguments

x

date value.

Value

changed date value.

Author(s)

Jonas Striaukas

Examples

monthBegin(as.Date("2020-05-15"))

End of the month date

Description

Change the date to the end of the month date.

Usage

monthEnd(x)

Arguments

x

date value.

Value

changed date value.

Author(s)

Jonas Striaukas

Examples

monthEnd(as.Date("2020-05-15"))

Computes prediction

Description

Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.

Usage

## S3 method for class 'cv.panel.sglfit'
predict(
  object,
  newx,
  s = c("lam.min", "lam.1se"),
  type = c("response"),
  method = c("pooled", "fe"),
  ...
)

Arguments

object

fitted cv.panel.sglfit model object.

newx

matrix of new values for x at which predictions are to be made. NOTE: newx must be a matrix, predict function does not accept a vector or other formats of newx.

s

choose between 'lam.min' and 'lam.1se'.

type

type of prediction required. Only response is available. Gives predicted response for regression problems.

method

choose between 'pooled', and 'fe'.

...

Not used. Other arguments to predict.

Details

s is the new vector at which predictions are to be made. If s is not in the lambda sequence used for fitting the model, the predict function will use linear interpolation to make predictions. The new values are interpolated using a fraction of predicted values from both left and right lambda indices.

Value

The object returned depends on type.

Computes prediction

Description

Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.

Usage

## S3 method for class 'cv.sglfit'
predict(object, newx, s = c("lam.min", "lam.1se"), type = c("response"), ...)

Arguments

object

fitted cv.sglfit model object.

newx

matrix of new values for x at which predictions are to be made. NOTE: newx must be a matrix, predict function does not accept a vector or other formats of newx.

s

choose between 'lam.min' and 'lam.1se'.

type

type of prediction required. Only response is available. Gives predicted response for regression problems.

...

Not used. Other arguments to predict.

method

choose between 'single', 'pooled', and 'fe'.

Details

Value

The object returned depends on type.

Computes prediction

Description

Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.

Usage

## S3 method for class 'ic.panel.sglfit'
predict(
  object,
  newx,
  s = c("bic", "aic", "aicc"),
  type = c("response"),
  method = c("pooled", "fe"),
  ...
)

Arguments

object

fitted ic.panel.sglfit model object.

newx

matrix of new values for x at which predictions are to be made. NOTE: newx must be a matrix, predict function does not accept a vector or other formats of newx.

s

choose between 'bic', 'aic', and 'aicc'.

type

type of prediction required. Only response is available. Gives predicted response for regression problems.

method

choose between 'pooled', and 'fe'.

...

Not used. Other arguments to predict.

Details

Value

The object returned depends on type.

Computes prediction

Description

Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.

Usage

## S3 method for class 'ic.sglfit'
predict(object, newx, s = c("bic", "aic", "aicc"), type = c("response"), ...)

Arguments

object

fitted cv.sglfit model object.

newx

matrix of new values for x at which predictions are to be made. NOTE: newx must be a matrix, predict function does not accept a vector or other formats of newx.

s

choose between 'bic', 'aic', and 'aicc'.

type

type of prediction required. Only response is available. Gives predicted response for regression problems.

...

Not used. Other arguments to predict.

Details

Value

The object returned depends on type.

Computes prediction

Description

Similar to other predict methods, this functions predicts fitted values from a fitted sglfit object.

Usage

## S3 method for class 'sglpath'
predict(
  object,
  newx,
  s = NULL,
  type = c("response"),
  method = c("single", "pooled", "fe"),
  ...
)

Arguments

object

fitted sglfit model object.

newx

matrix of new values for x at which predictions are to be made. NOTE: newx must be a matrix, predict function does not accept a vector or other formats of newx.

s

value(s) of the penalty parameter lambda at which predictions are to be made. Default is the entire sequence used to create the model.

type

type of prediction required. Only response is available. Gives predicted response for regression problems.

method

choose between 'single', 'pooled', and 'fe'.

...

Not used. Other arguments to predict.

Details

Value

The object returned depends on type.

Regression fit for panel sg-LASSO

Description

Fits panel data sg-LASSO regression model.

The function fits sg-LASSO regression based on chosen tuning parameter selection method_choice. Options include cross-validation and information criteria.

Usage

reg.panel.sgl(x, y, gamma = NULL, gindex, intercept = TRUE, 
              method_choice = c("ic","cv"), nfolds = 10, 
              method = c("pooled", "fe"), nf = NULL, 
              verbose = FALSE, ...)

Arguments

x

NT by p data matrix, where NT and p respectively denote the sample size of pooled data and the number of regressors.

y

NT by 1 response variable.

gamma

sg-LASSO mixing parameter. \gamma = 1 gives LASSO solution and \gamma = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

intercept

whether intercept be fitted (TRUE) or set to zero (FALSE). Default is TRUE.

method_choice

choose between ic and cv. ic gives fit based on information criteria (BIC, AIC or AICc) by running ic.fit, while cv gives fit based on cross-validation by running cv.sglfit. If cv is chosen, optional number of folds nfolds can be supplied.

nfolds

number of folds of the cv loop. Default set to 10.

method

nf

number of fixed effects. Used only if method = 'fe'.

verbose

flag to print information.

...

Other arguments that can be passed to sglfit.

Details

The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is either (case method='pooled')

||y - ια - xβ||²_NT + 2λ Ω_γ(β),
where ι∈R^NT and α is common intercept to all N items or (case method='fe')

||y - Bα - xβ||²_NT + 2λ Ω_γ(β),
where B = I_N⊗ι and ||u||²_NT=<u,u>/NT is the empirical inner product. The penalty function Ω_γ(.) is applied on β coefficients and is

Ω_γ(β) = γ |β|₁ + (1-γ)|β|_2,1,
a convex combination of LASSO and group LASSO penalty functions.

Value

reg.panel.sgl object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
reg.panel.sgl(x = x, y = y, 
  gindex = gindex, gamma = 0.5, 
  method = "fe", nf = 10, 
  standardize = FALSE, intercept = FALSE)

Fit for sg-LASSO regression

Description

Fits sg-LASSO regression model.

The function fits sg-LASSO regression based on chosen tuning parameter selection method_choice. Options include cross-validation and information criteria.

Usage

reg.sgl(x, y, gamma = NULL, gindex, intercept = TRUE, 
        method_choice = c("tscv","ic","cv"), verbose = FALSE, ...)

Arguments

x

T by p data matrix, where T and p respectively denote the sample size and the number of regressors.

y

T by 1 response variable.

gamma

sg-LASSO mixing parameter. \gamma = 1 gives LASSO solution and \gamma = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

intercept

whether intercept be fitted (TRUE) or set to zero (FALSE). Default is TRUE.

method_choice

choose between tscv ic and cv. tscv fits sg-LASSO based on time series cross-validation (see tscv.sglfit), ic fits sg-LASSO based on information criteria (BIC, AIC or AICc, see ic.sglfit), cv fits sg-LASSO based on cross-validation (see cv.sglfit). Additional arguments for each method choice are passed on to the relevant functions.

verbose

flag to print information.

...

Other arguments that can be passed to sglfit.

Details

Value

reg.sgl object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
reg.sgl(x = x, y = y, gamma = 0.5, gindex = gindex)

Real GDP release dates

Description

Real GDP release dates

Usage

data(rgdp_dates)

Format

A list objects

Source

ALFRED

Examples


data(rgdp_dates)
rgdp_dates$Quarter_q # reference quarters in quarters
rgdp_dates$Quarter_m # reference quarters in months
rgdp_dates$Quarter_d # reference quarters in days
rgdp_dates$`First release` # first release date for the reference
rgdp_dates$`Second release` # second release date for the reference
rgdp_dates$`Third release` # third release date for the reference

Real GDP vintages

Description

Real GDP vintages

Usage

data(rgdp_vintages)

Format

A list objects

Source

ALFRED

Examples


data(rgdp_vintages)
rgdp_vintages$date # dates
rgdp_vintages$time_series # series, q-q annual rate
rgdp_vintages$realtime_period # real time dates

Fits sg-LASSO regression

Description

Fits sg-LASSO regression model. The function fits sg-LASSO regression model for a sequence of \lambda tuning parameter and fixed \gamma tuning parameter. The optimization is based on block coordinate-descent. Optionally, fixed effects are fitted.

Usage

sglfit(x, y, gamma = 1.0, nlambda = 100L, method = c("single", "pooled", "fe"), 
       nf = NULL, lambda.factor = ifelse(nobs < nvars, 1e-02, 1e-04), 
       lambda = NULL, pf = rep(1, nvars), gindex = 1:nvars, 
       dfmax = nvars + 1, pmax = min(dfmax * 1.2, nvars), standardize = FALSE, 
       intercept = FALSE, eps = 1e-08, maxit = 1000000L, peps = 1e-08)

Arguments

x

T by p data matrix, where T and p respectively denote the sample size and the number of regressors.

y

T by 1 response variable.

gamma

sg-LASSO mixing parameter. \gamma = 1 gives LASSO solution and \gamma = 0 gives group LASSO solution.

nlambda

number of \lambda's to use in the regularization path; used if lambda = NULL.

method

choose between 'single', 'pooled' and 'fe'; 'single' implies standard sg-LASSO regression, 'pooled' forces the intercept to be fitted, 'fe' computes the fixed effects. User needs to input the number of fixed effects nf. Default is set to 'single'.

nf

number of fixed effects. Used only if method = 'fe'.

lambda.factor

The factor for getting the minimal \lambda in the \lambda sequence, where min(lambda) = lambda.factor * max(lambda). max(lambda) is the smallest value of lambda for which all coefficients are zero. λ _max is determined for each \gamma tuning parameter separately. The default depends on the relationship between T (the sample size) and p (the number of predictors). If T < p, the default is 0.01. If T > p, the default is 0.0001, closer to zero. The smaller the value of lambda.factor is, the denser is the fit for λ_min. Used only if lambda = NULL.

lambda

a user-supplied lambda sequence. By leaving this option unspecified (recommended), users can have the program compute its own lambda sequence based on nlambda and lambda.factor. It is better to supply, if necessary, a decreasing sequence of lambda values than a single (small) value, as warm-starts are used in the optimization algorithm. The program will ensure that the user-supplied \lambda sequence is sorted in decreasing order before fitting the model.

pf

the ℓ₁ penalty factor of length p used for the adaptive sg-LASSO. Separate ℓ₁ penalty weights can be applied to each coefficient to allow different ℓ₁ + ℓ_2,1 shrinkage. Can be 0 for some variables, which imposes no shrinkage, and results in that variable always be included in the model. Default is 1 for all variables.

gindex

p by 1 vector indicating group membership of each covariate.

dfmax

the maximum number of variables allowed in the model. Useful for very large p when a partial path is desired. Default is p+1. In case method='fe', dfmax is ignored.

pmax

the maximum number of coefficients allowed ever to be nonzero. For example, once β_i ≠ 0 for some i ∈ [p], no matter how many times it exits or re-enters the model through the path, it will be counted only once. Default is min(dfmax*1.2, p).

standardize

logical flag for variable standardization, prior to fitting the model sequence. The coefficients are always returned to the original scale. It is recommended to keep standardize=TRUE. Default is FALSE.

intercept

whether intercept be fitted (TRUE) or set to zero (FALSE). Default is FALSE. In case method='pooled', intercept=TRUE is forced. In case method='fe', intercept=FALSE is forced and entity specific intercepts are fitted in a separate output variable a0.

eps

convergence threshold for block coordinate descent. Each inner block coordinate-descent loop continues until the maximum change in the objective after any coefficient update is less than thresh times the null deviance. Defaults value is 1e-8.

maxit

maximum number of outer-loop iterations allowed at fixed lambda values. Default is 1e6. If the algorithm does not converge, consider increasing maxit.

peps

convergence threshold for proximal map of sg-LASSO penalty. Each loop continues until G group difference sup-norm, || β^k_G - β^k-1_G ||_∞, is less than peps. Defaults value is 1e-8.

Details

Value

sglfit object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
sglfit(x = x, y = y, gindex = gindex, gamma = 0.5)

Nodewise LASSO regressions to fit the precision matrix Θ

Description

Fits the precision matrix Θ by running nodewise LASSO regressions.

Usage

thetafit(x, parallel = FALSE, ncores = getOption("mc.cores", NULL), 
         intercept = FALSE, K = 20, l = 5, seed = NULL, verbose = FALSE, 
         registerpar = TRUE, ...)

Arguments

x

T by p data matrix, where T and p respectively denote the sample size and the number of regressors.

parallel

if TRUE, use parallel foreach to fit nodewise LASSO regressions. Parallel registered within the function.

ncores

number of cores used in parallelization

intercept

whether intercept be fitted (TRUE) or set to zero (FALSE). Default is FALSE.

K

number of folds of the cv loop. Default set to 20.

l

the gap used to drop observations round test set data. See tscv.sglfit for more details.

seed

set a value for seed to control results replication, i.e. set.seed(seed) is used. seed is stored in the output list. Default set to as.numeric(Sys.Date()).

verbose

if TRUE, prints progress bar. Default set to FALSE.

registerpar

if TRUE, register parallelization using registerDoParallel. Default set to TRUE.

...

Other arguments that can be passed to tscv.sglfit.

Details

The function runs tscv.sglfit p times by regressing j-th covariate on all other covariates excluding j-th covariate. The precision matrix is then constructed based on LASSO estimates. Each nodewise LASSO regression tuning parameter λ is optimized using time series cross-validation. See tscv.sglfit for more details on cross-validation implementation.

Value

thetafit object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
thetafit(x = x, parallel = FALSE)

Time series cross-validation fit for sg-LASSO

Description

Does k-fold time series cross-validation for sg-LASSO regression model.

The function runs sglfit K+1 times; the first to get the path solution in λ sequence, the rest to compute the fit with each of the test observation k ∈ K The average error and standard deviation over the folds is computed, and the optimal regression coefficients are returned for lam.min and lam.1se. Solutions are computed for a fixed γ

Usage

tscv.sglfit(x, y, lambda = NULL, gamma = 1.0, gindex = 1:p, 
  K = 20, l = 5, parallel = FALSE, seed = NULL, ...)

Arguments

x

T by p data matrix, where T and p respectively denote the sample size and the number of regressors.

y

T by 1 response variable.

lambda

gamma

sg-LASSO mixing parameter. γ = 1 gives LASSO solution and γ = 0 gives group LASSO solution.

gindex

p by 1 vector indicating group membership of each covariate.

K

number of observations drawn for the test set. Default set to 20.

l

the gap used to drop observations round the test set data point. Default set to 5.

parallel

if TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC or others. See the example below.

seed

set a value for seed to control results replication, i.e. set.seed(seed) is used. seed is stored in the output list. Default set to as.numeric(Sys.Date()).

...

Other arguments that can be passed to sglfit.

Details

The cross-validation is run for sg-LASSO linear model. The sequence of linear regression models implied by λ vector is fit by block coordinate-descent. The objective function is

||y - ια - xβ||²_T + 2λ Ω_γ(β),
where ι∈R^Tenter> and ||u||²_T=<u,u>/T is the empirical inner product. The penalty function Ω_γ(.) is applied on β coefficients and is

Ω_γ(β) = γ |β|₁ + (1-γ)|β|_2,1,
a convex combination of LASSO and group LASSO penalty functions.

Value

tscv.sglfit object.

Author(s)

Jonas Striaukas

Examples

set.seed(1)
x = matrix(rnorm(100 * 20), 100, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(100)
gindex = sort(rep(1:4,times=5))
tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE)
## Not run:  
# Parallel
require(doMC)
registerDoMC(cores = 2)
x = matrix(rnorm(1000 * 20), 1000, 20)
beta = c(5,4,3,2,1,rep(0, times = 15))
y = x%*%beta + rnorm(1000)
gindex = sort(rep(1:4,times=5))
system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE))
system.time(tscv.sglfit(x = x, y = y, gindex = gindex, gamma = 0.5, 
  standardize = FALSE, intercept = FALSE, parallel = TRUE))

## End(Not run)

Sorts time series cross-validation output

Description

Computes cvm and cvsd based on times series cross-validation fit.

Usage

tscv.sglpath(outlist, lambda, x, y, foldid, ...)

Arguments

outlist

list of cross-validation fits.

lambda

a sequence of \lambda parameter.

x

regressors

y

response

foldid

the fold assignment.

...

other arguments passed to predict.sgl

Value

cvm and cvsd.

US real GDP data with several high-frequency predictors

Description

US real GDP, Chicago National Activity Index, Nonfarm payrolls and ADS Index

Usage

data(us_rgdp)

Format

A list object.a

Source

rgdp
cfnai
payems
ads

Examples


data(us_rgdp)
us_rgdp$rgdp # - GDP data
us_rgdp$cfnai # - CFNAI predictor data
us_rgdp$payems # - Nonfarm payrolls predictor data 
us_rgdp$ads # - ADS predictor data