Help for package countSTAR

Type:

Package

Title:

Flexible Modeling of Count Data

Version:

1.0.2

Description:

For Bayesian and classical inference and prediction with count-valued data, Simultaneous Transformation and Rounding (STAR) Models provide a flexible, interpretable, and easy-to-use approach. STAR models the observed count data using a rounded continuous data model and incorporates a transformation for greater flexibility. Implicitly, STAR formalizes the commonly-applied yet incoherent procedure of (i) transforming count-valued data and subsequently (ii) modeling the transformed data using Gaussian models. STAR is well-defined for count-valued data, which is reflected in predictive accuracy, and is designed to account for zero-inflation, bounded or censored data, and over- or underdispersion. Importantly, STAR is easy to combine with existing MCMC or point estimation methods for continuous data, which allows seamless adaptation of continuous data models (such as linear regressions, additive models, BART, random forests, and gradient boosting machines) for count-valued data. The package also includes several methods for modeling count time series data, namely via warped Dynamic Linear Models. For more details and background on these methodologies, see the works of Kowal and Canale (2020) <doi:10.1214/20-EJS1707>, Kowal and Wu (2022) <doi:10.1111/biom.13617>, King and Kowal (2022) <doi:10.48550/arXiv.2110.14790>, and Kowal and Wu (2023) <doi:10.48550/arXiv.2110.12316>.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

LinkingTo:

Rcpp, RcppArmadillo

Imports:

stats, utils, coda, dbarts, FastGP, gbm, graphics, Matrix, spikeSlabGAM, splines2, randomForest, Rcpp, TruncatedNormal, truncdist, KFAS

RoxygenNote:

7.2.3

Suggests:

knitr, rmarkdown, bayesplot, mgcv

VignetteBuilder:

knitr

URL:

https://bking124.github.io/countSTAR/ https://github.com/bking124/countSTAR

BugReports:

https://github.com/bking124/countSTAR/issues

Depends:

R (≥ 2.10)

LazyData:

true

NeedsCompilation:

yes

Packaged:

2023-06-29 21:33:30 UTC; brian

Author:

Brian King [aut, cre], Dan Kowal [aut]

Maintainer:

Brian King <brianking387@gmail.com>

Repository:

CRAN

Date/Publication:

2023-06-30 18:30:08 UTC

Brent's method for optimization

Description

Implementation for Brent's algorithm for minimizing a univariate function over an interval. The code is based on a function in the stsm package.

Usage

BrentMethod(a = 0, b, fcn, tol = .Machine$double.eps^0.25)

Arguments

a

lower limit for search

b

upper limit for search

fcn

function to minimize

tol

tolerance level for convergence of the optimization procedure

Value

a list of containing the following elements:

fx the minimum value of the input function
x the argument that minimizes the function
iter number of iterations to converge
vx a vector that stores the arguments until convergence

Inverse rounding function

Description

Define the intervals associated with y = j based on the flooring function. The function returns -Inf for j = 0 (or smaller) and Inf for any j >= y_max + 1, where y_max is a known upper bound on the data y (if specified).

Usage

a_j(j, y_max = Inf)

Arguments

j

the integer-valued input(s)

y_max

a fixed and known upper bound for all observations; default is Inf

Value

The (lower) interval endpoint(s) associated with j.

Examples

# Standard cases:
a_j(1)
a_j(20)

# Boundary cases:
a_j(0)
a_j(20, y_max = 15)

Fit Bayesian Additive STAR Model with MCMC

Description

Run the MCMC algorithm for a STAR Bayesian additive model The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility.

Usage

bam_star(
  y,
  X_lin,
  X_nonlin,
  splinetype = "orthogonal",
  transformation = "np",
  y_max = Inf,
  nsave = 5000,
  nburn = 5000,
  nskip = 2,
  save_y_hat = FALSE,
  verbose = TRUE
)

Arguments

y

n x 1 vector of observed counts

X_lin

n x pL matrix of predictors to be modelled as linear

X_nonlin

n x pNL matrix of predictors to be modelled as nonlinear

splinetype

Type of spline to use for modelling the nonlinear predictors; must be either "orthogonal" (orthogonalized splines–the default) or "thinplate" (low-rank thin plate splines)

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)
"ispline" (transformation is modeled as unknown, monotone function using I-splines)

y_max

a fixed and known upper bound for all observations; default is Inf

nsave

number of MCMC iterations to save

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

save_y_hat

logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute

verbose

logical; if TRUE, print time remaining

Details

STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation.

Posterior and predictive inference is obtained via a Gibbs sampler that combines (i) a latent data augmentation step (like in probit regression) and (ii) an existing sampler for a continuous data model.

There are several options for the transformation. First, the transformation can belong to the *Box-Cox* family, which includes the known transformations 'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter is inferred within the MCMC sampler ('box-cox'). Second, the transformation can be estimated (before model fitting) using the empirical distribution of the data y. Options in this case include the empirical cumulative distribution function (CDF), which is fully nonparametric ('np'), or the parametric alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin') distributions. For the parametric distributions, the parameters of the distribution are estimated using moments (means and variances) of y. Third, the transformation can be modeled as an unknown, monotone function using I-splines ('ispline'). The Robust Adaptive Metropolis (RAM) sampler is used for drawing the parameter of the transformation function.

Value

a list with at least the following elements:

coefficients: the posterior mean of the coefficients
fitted.values: the posterior mean of the conditional expectation of the counts y
post.coefficients: posterior draws of the coefficients
post.fitted.values: posterior draws of the conditional mean of the counts y
post.pred: draws from the posterior predictive distribution of y
post.lambda: draws from the posterior distribution of lambda
post.sigma: draws from the posterior distribution of sigma
post.log.like.point: draws of the log-likelihood for each of the n observations
WAIC: Widely-Applicable/Watanabe-Akaike Information Criterion
p_waic: Effective number of parameters based on WAIC

In the case of transformation="ispline", the list also contains

post.g: draws from the posterior distribution of the transformation g
post.sigma.gamma: draws from the posterior distribution of sigma.gamma, the prior standard deviation of the transformation g() coefficients

Examples


# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 5, seed=32)
y = sim_dat$y; X = sim_dat$X

# Linear and nonlinear components:
X_lin = as.matrix(X[,-(1:3)])
X_nonlin = as.matrix(X[,(1:3)])

# STAR: nonparametric transformation
fit <- bam_star(y,X_lin, X_nonlin, nburn=1000, nskip=0)

# Posterior mean of each coefficient:
coef(fit)

# WAIC:
fit$WAIC

# MCMC diagnostics:
plot(as.ts(fit$post.coefficients[,1:3]))

# Posterior predictive check:
hist(apply(fit$post.pred, 1,
           function(x) mean(x==0)), main = 'Proportion of Zeros', xlab='');
abline(v = mean(y==0), lwd=4, col ='blue')

MCMC Algorithm for BART-STAR

Description

Run the MCMC algorithm for a BART model for count-valued responses using STAR. The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility.

Usage

bart_star(
  y,
  X,
  X_test = NULL,
  y_test = NULL,
  transformation = "np",
  y_max = Inf,
  n.trees = 200,
  sigest = NULL,
  sigdf = 3,
  sigquant = 0.9,
  k = 2,
  power = 2,
  base = 0.95,
  nsave = 5000,
  nburn = 5000,
  nskip = 2,
  save_y_hat = FALSE,
  verbose = TRUE
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

X_test

n0 x p matrix of predictors for test data

y_test

n0 x 1 vector of the test data responses (used for computing log-predictive scores)

transformation

transformation to use for the latent process; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)
"ispline" (transformation is modeled as unknown, monotone function using I-splines)

y_max

a fixed and known upper bound for all observations; default is Inf

n.trees

number of trees to use in BART; default is 200

sigest

positive numeric estimate of the residual standard deviation (see ?bart)

sigdf

degrees of freedom for error variance prior (see ?bart)

sigquant

quantile of the error variance prior that the rough estimate (sigest) is placed at. The closer the quantile is to 1, the more aggressive the fit will be (see ?bart)

k

the number of prior standard deviations E(Y|x) = f(x) is away from +/- 0.5. The response is internally scaled to range from -0.5 to 0.5. The bigger k is, the more conservative the fitting will be (see ?bart)

power

power parameter for tree prior (see ?bart)

base

base parameter for tree prior (see ?bart)

nsave

number of MCMC iterations to save

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

save_y_hat

logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute

verbose

logical; if TRUE, print time remaining

Details

Value

a list with the following elements:

post.pred: draws from the posterior predictive distribution of y
post.sigma: draws from the posterior distribution of sigma
post.log.like.point: draws of the log-likelihood for each of the n observations
WAIC: Widely-Applicable/Watanabe-Akaike Information Criterion
p_waic: Effective number of parameters based on WAIC
post.pred.test: draws from the posterior predictive distribution at the test points X_test (NULL if X_test is not given)
post.fitted.values.test: posterior draws of the conditional mean at the test points X_test (NULL if X_test is not given)
post.mu.test: draws of the conditional mean of z_star at the test points X_test (NULL if X_test is not given)
post.log.pred.test: draws of the log-predictive distribution for each of the n0 test cases (NULL if X_test is not given)
fitted.values: the posterior mean of the conditional expectation of the counts y (NULL if save_y_hat=FALSE)
post.fitted.values: posterior draws of the conditional mean of the counts y (NULL if save_y_hat=FALSE)

In the case of transformation="ispline", the list also contains

post.g: draws from the posterior distribution of the transformation g
post.sigma.gamma: draws from the posterior distribution of sigma.gamma, the prior standard deviation of the transformation g() coefficients

If transformation="box-cox", then the list also contains

post.lambda: draws from the posterior distribution of lambda

Examples


# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 10)
y = sim_dat$y; X = sim_dat$X

# BART-STAR with log-transformation:
fit_log = bart_star(y = y, X = X, transformation = 'log',
                    save_y_hat = TRUE, nburn=1000, nskip=0)

# Fitted values
plot_fitted(y = sim_dat$Ey,
            post_y = fit_log$post.fitted.values,
            main = 'Fitted Values: BART-STAR-log')

# WAIC for BART-STAR-log:
fit_log$WAIC

# MCMC diagnostics:
plot(as.ts(fit_log$post.fitted.values[,1:10]))

# Posterior predictive check:
hist(apply(fit_log$post.pred, 1,
           function(x) mean(x==0)), main = 'Proportion of Zeros', xlab='');
abline(v = mean(y==0), lwd=4, col ='blue')

# BART-STAR with nonparametric transformation:
fit = bart_star(y = y, X = X,
                     transformation = 'np', save_y_hat = TRUE)

# Fitted values
plot_fitted(y = sim_dat$Ey,
            post_y = fit$post.fitted.values,
            main = 'Fitted Values: BART-STAR-np')

# WAIC for BART-STAR-np:
fit$WAIC

# MCMC diagnostics:
plot(as.ts(fit$post.fitted.values[,1:10]))

# Posterior predictive check:
hist(apply(fit$post.pred, 1,
           function(x) mean(x==0)), main = 'Proportion of Zeros', xlab='');
abline(v = mean(y==0), lwd=4, col ='blue')

MCMC sampler for BART-STAR with a monotone spline model for the transformation

Description

Run the MCMC algorithm for BART model for count-valued responses using STAR. The transformation is modeled as an unknown, monotone function using I-splines. The Robust Adaptive Metropolis (RAM) sampler is used for drawing the parameter of the transformation function.

Usage

bart_star_ispline(
  y,
  X,
  X_test = NULL,
  y_test = NULL,
  lambda_prior = 1/2,
  y_max = Inf,
  n.trees = 200,
  sigest = NULL,
  sigdf = 3,
  sigquant = 0.9,
  k = 2,
  power = 2,
  base = 0.95,
  nsave = 5000,
  nburn = 5000,
  nskip = 2,
  save_y_hat = FALSE,
  target_acc_rate = 0.3,
  adapt_rate = 0.75,
  stop_adapt_perc = 0.5,
  verbose = TRUE
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

X_test

n0 x p matrix of predictors for test data

y_test

n0 x 1 vector of the test data responses (used for computing log-predictive scores)

lambda_prior

the prior mean for the transformation g() is the Box-Cox function with parameter lambda_prior

y_max

a fixed and known upper bound for all observations; default is Inf

n.trees

number of trees to use in BART; default is 200

sigest

positive numeric estimate of the residual standard deviation (see ?bart)

sigdf

degrees of freedom for error variance prior (see ?bart)

sigquant

quantile of the error variance prior that the rough estimate (sigest) is placed at. The closer the quantile is to 1, the more aggresive the fit will be (see ?bart)

k

power

power parameter for tree prior (see ?bart)

base

base parameter for tree prior (see ?bart)

nsave

number of MCMC iterations to save

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

save_y_hat

logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute

target_acc_rate

target acceptance rate (between zero and one)

adapt_rate

rate of adaptation in RAM sampler (between zero and one)

stop_adapt_perc

stop adapting at the proposal covariance at stop_adapt_perc*nburn

verbose

logical; if TRUE, print time remaining

Value

a list with the following elements:

fitted.values: the posterior mean of the conditional expectation of the counts y
post.fitted.values: posterior draws of the conditional mean of the counts y
post.pred.test: draws from the posterior predictive distribution at the test points X_test
post.fitted.values.test: posterior draws of the conditional mean at the test points X_test
post.pred: draws from the posterior predictive distribution of y
post.sigma: draws from the posterior distribution of sigma
post.mu.test: draws of the conditional mean of z_star at the test points
post.log.like.point: draws of the log-likelihood for each of the n observations
post.log.pred.test: draws of the log-predictive distribution for each of the n0 test cases
WAIC: Widely-Applicable/Watanabe-Akaike Information Criterion
p_waic: Effective number of parameters based on WAIC
post.g: draws from the posterior distribution of the transformation g
post.sigma.gamma: draws from the posterior distribution of sigma.gamma, the prior standard deviation of the transformation g coefficients

STAR Bayesian Linear Regression

Description

Posterior inference for STAR linear model

Usage

blm_star(
  y,
  X,
  X_test = NULL,
  transformation = "np",
  y_max = Inf,
  prior = "gprior",
  use_MCMC = TRUE,
  nsave = 5000,
  nburn = 5000,
  nskip = 0,
  method_sigma = "mle",
  approx_Fz = FALSE,
  approx_Fy = FALSE,
  psi = NULL,
  compute_marg = FALSE
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

X_test

n0 x p matrix of predictors for test data

transformation

transformation to use for the latent process; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)
"ispline" (transformation is modeled as unknown, monotone function using I-splines)
"bnp" (Bayesian nonparametric transformation using the Bayesian bootstrap)

y_max

a fixed and known upper bound for all observations; default is Inf

prior

prior to use for the latent linear regression; currently implemented options are "gprior", "horseshoe", and "ridge". Not all modeling options and transformations are available with the latter two priors.

use_MCMC

= TRUE,

nsave

number of MCMC iterations to save (or MC samples to draw if use_MCMC=FALSE)

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

method_sigma

method to estimate the latent data standard deviation in exact sampler; must be one of

"mle" use the MLE from the STAR EM algorithm
"mmle" use the marginal MLE (Note: slower!)

approx_Fz

logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data

approx_Fy

logical; in BNP transformation, approximate the marginal CDF of y using the empirical CDF

psi

prior variance (g-prior)

compute_marg

logical; if TRUE, compute and return the marginal likelihood (only available when using exact sampler, i.e. use_MCMC=FALSE)

Details

There are several options for the transformation. First, the transformation can belong to the *Box-Cox* family, which includes the known transformations 'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter is inferred within the MCMC sampler ('box-cox'). Second, the transformation can be estimated (before model fitting) using the empirical distribution of the data y. Options in this case include the empirical cumulative distribution function (CDF), which is fully nonparametric ('np'), or the parametric alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin') distributions. For the parametric distributions, the parameters of the distribution are estimated using moments (means and variances) of y. The distribution-based transformations approximately preserve the mean and variance of the count data y on the latent data scale, which lends interpretability to the model parameters. Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'), which is a Bayesian nonparametric model and incorporates the uncertainty about the transformation into posterior and predictive inference.

The Monte Carlo sampler (use_MCMC=FALSE) produces direct, discrete, and joint draws from the posterior distribution and the posterior predictive distribution of the linear regression model with a g-prior.

Value

a list with at least the following elements:

coefficients: the posterior mean of the regression coefficients
post.beta: posterior draws of the regression coefficients
post.pred: draws from the posterior predictive distribution of y
post.log.like.point: draws of the log-likelihood for each of the n observations
WAIC: Widely-Applicable/Watanabe-Akaike Information Criterion
p_waic: Effective number of parameters based on WAIC

If test points are passed in, then the list will also have post.predtest, which contains draws from the posterior predictive distribution at test points.

Other elements may be present depending on the choice of prior, transformation, and sampling approach.

Note

The 'bnp' transformation (without the Fy approximation) is slower than the other transformations because of the way the TruncatedNormal sampler must be updated as the lower and upper limits change (due to the sampling of g). Thus, computational improvements are likely available.

Gibbs sampler for STAR linear regression with BNP transformation

Description

Compute MCMC samples from the posterior and predictive distributions of a STAR linear regression model with a g-prior and BNP transformation.

Usage

blm_star_bnpgibbs(
  y,
  X,
  X_test = X,
  y_max = Inf,
  psi = NULL,
  approx_Fz = FALSE,
  approx_Fy = FALSE,
  nsave = 1000,
  nburn = 1000,
  nskip = 0,
  verbose = TRUE
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

X_test

n0 x p matrix of predictors for test data; default is the observed covariates X

y_max

a fixed and known upper bound for all observations; default is Inf

psi

prior variance (g-prior)

approx_Fz

logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data

approx_Fy

logical; in BNP transformation, approximate the marginal CDF of y using the empirical CDF

nsave

number of MCMC iterations to save

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

Details

There are several options for the transformation. First, the transformation can belong to the *Box-Cox* family, which includes the known transformations 'identity', 'log', and 'sqrt'. Second, the transformation can be estimated (before model fitting) using the empirical distribution of the data y. Options in this case include the empirical cumulative distribution function (CDF), which is fully nonparametric ('np'), or the parametric alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin') distributions. For the parametric distributions, the parameters of the distribution are estimated using moments (means and variances) of y. The distribution-based transformations approximately preserve the mean and variance of the count data y on the latent data scale, which lends interpretability to the model parameters. Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'), which is a Bayesian nonparametric model and incorporates the uncertainty about the transformation into posterior and predictive inference.

Value

a list with the following elements:

coefficients the posterior mean of the regression coefficients
post_beta: nsave x p samples from the posterior distribution of the regression coefficients
post_ytilde: nsave x n0 samples from the posterior predictive distribution at test points X_test
post_g: nsave posterior samples of the transformation evaluated at the unique y values (only applies for 'bnp' transformations)

Monte Carlo sampler for STAR linear regression with a g-prior

Description

Compute direct Monte Carlo samples from the posterior and predictive distributions of a STAR linear regression model with a g-prior.

Usage

blm_star_exact(
  y,
  X,
  X_test = X,
  transformation = "np",
  y_max = Inf,
  psi = NULL,
  method_sigma = "mle",
  approx_Fz = FALSE,
  approx_Fy = FALSE,
  nsave = 5000,
  compute_marg = FALSE
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

X_test

n0 x p matrix of predictors for test data

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"bnp" (Bayesian nonparametric transformation using the Bayesian bootstrap)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)

y_max

a fixed and known upper bound for all observations; default is Inf

psi

prior variance (g-prior)

method_sigma

method to estimate the latent data standard deviation; must be one of

"mle" use the MLE from the STAR EM algorithm
"mmle" use the marginal MLE (Note: slower!)

approx_Fz

logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data

approx_Fy

logical; in BNP transformation, approximate the marginal CDF of y using the empirical CDF

nsave

number of Monte Carlo simulations

compute_marg

logical; if TRUE, compute and return the marginal likelihood

Details

There are several options for the transformation. First, the transformation can belong to the *Box-Cox* family, which includes the known transformations 'identity', 'log', and 'sqrt'. Second, the transformation can be estimated (before model fitting) using the empirical distribution of the data y. Options in this case include the empirical cumulative distribution function (CDF), which is fully nonparametric ('np'), or the parametric alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin') distributions. For the parametric distributions, the parameters of the distribution are estimated using moments (means and variances) of y. The distribution-based transformations approximately preserve the mean and variance of the count data y on the latent data scale, which lends interpretability to the model parameters. Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'), which is a Bayesian nonparametric model and incorporates the uncertainty about the transformation into posterior and predictive inference.

The Monte Carlo sampler produces direct, discrete, and joint draws from the posterior distribution and the posterior predictive distribution of the linear regression model with a g-prior.

Value

a list with the following elements:

coefficients the posterior mean of the regression coefficients
post.beta: nsave x p samples from the posterior distribution of the regression coefficients
post.pred: draws from the posterior predictive distribution of y
post.pred.test: nsave x n0 samples from the posterior predictive distribution at test points X_test (if given, otherwise NULL)
sigma: The estimated latent data standard deviation
post.g: nsave posterior samples of the transformation evaluated at the unique y values (only applies for 'bnp' transformations)
marg.like: the marginal likelihood (if requested; otherwise NULL)

Note

Estimate the remaining time in the MCMC based on previous samples

Description

Estimate the remaining time in the MCMC based on previous samples

Usage

computeTimeRemaining(nsi, timer0, nsims, nrep = 1000)

Arguments

nsi

Current iteration

timer0

Initial timer value, returned from proc.time()[3]

nsims

Total number of simulations

nrep

Print the estimated time remaining every nrep iterations

Value

Table of summary statistics using the function summary

Compute asymptotic confidence intervals for STAR linear regression

Description

For a linear regression model within the STAR framework, compute (asymptotic) confidence intervals for a regression coefficient of interest. Confidence intervals are computed by inverting the likelihood ratio test and profiling the log-likelihood.

Usage

## S3 method for class 'lmstar'
confint(object, parm, level = 0.95, ...)

Arguments

object

Object of class "lmstar" as output by lm_star

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

confidence level; default is 0.95

...

Ignored

Value

A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in

Examples

#Simulate data with count-valued response y:
sim_dat = simulate_nb_lm(n = 100, p = 2)
y = sim_dat$y; X = sim_dat$X

#Select a transformation:
transformation = 'np'

#Estimate model
fit = lm_star(y~X, transformation=transformation)

#Confidence interval for all parameters
confint(fit)

Compute Simultaneous Credible Bands

Description

Compute (1-alpha)% credible BANDS for a function based on MCMC samples using Crainiceanu et al. (2007)

Usage

credBands(sampFuns, alpha = 0.05)

Arguments

sampFuns

Nsims x m matrix of Nsims MCMC samples and m points along the curve

alpha

confidence level

Value

m x 2 matrix of credible bands; the first column is the lower band, the second is the upper band

Note

The input needs not be curves: the simultaneous credible "bands" may be computed for vectors. The resulting credible intervals will provide joint coverage at the (1-alpha) level across all components of the vector.

Compute the ergodic (running) mean.

Description

Compute the ergodic (running) mean.

Usage

ergMean(x)

Arguments

x

vector for which to compute the running mean

Value

A vector y with each element defined by y[i] = mean(x[1:i])

Examples

# Compare:
ergMean(1:10)
mean(1:10)

# Running mean for iid N(5, 1) samples:
x = rnorm(n = 10^4, mean = 5, sd = 1)
plot(ergMean(x))
abline(h=5)

Compute E(Y^2) for a STAR process

Description

Compute the conditional expectation of Y^2 for a STAR process Y under a generic link function g.

Usage

expectation2_gRcpp(g_a_j, g_a_jp1, mu, sigma, Jmax)

Arguments

g_a_j

Jmax x 1 vector of g(a(j))

g_a_jp1

Jmax x 1 vector of g(a(j + 1))

mu

m x 1 vector of conditional expectations

sigma

m x 1 vector of conditional standard deviations

Jmax

m x 1 vector of maximum integer values to consider

Value

y2_hat m x 1 vector of conditional expectations

Note

This function uses Rcpp for computational efficiency.

Estimate the mean for a STAR process

Description

Estimate the conditional expectation for a STAR process under a generic link function g.

Usage

expectation_gRcpp(g_a_j, g_a_jp1, mu, sigma, Jmax)

Arguments

g_a_j

Jmax x 1 vector of g(a(j))

g_a_jp1

Jmax x 1 vector of g(a(j + 1))

mu

m x 1 vector of conditional expectations

sigma

m x 1 vector of conditional standard deviations

Jmax

m x 1 vector of maximum integer values to consider

Value

y_hat m x 1 vector of conditional expectations

Note

This function uses Rcpp for computational efficiency.

Estimate the mean for a STAR process

Description

Estimate the conditional expectation for a STAR process under the identity link function.

Usage

expectation_identity(a, Jmax, Mu, sigma_t, Offset)

Arguments

a

Jmaxmax-dimensional vector of STAR integers a_j

Jmax

T x m matrix of maximum integer values to consider

Mu

T x m matrix of latent means

sigma_t

T-dimensional vector of time-dependent latent error sd's

Offset

T x m matrix of offsets

Value

Zhat T x m matrix of conditional expectations

Note

This function uses Rcpp for computational efficiency.

Estimate the mean for a STAR process

Description

Estimate the conditional expectation for a STAR process under the log link function.

Usage

expectation_log(a, Jmax, Mu, sigma_t, Offset)

Arguments

a

Jmaxmax-dimensional vector of STAR integers a_j

Jmax

T x m matrix of maximum integer values to consider

Mu

T x m matrix of latent means

sigma_t

T-dimensional vector of time-dependent latent error sd's

Offset

T x m matrix of offsets

Value

Zhat T x m matrix of conditional expectations

Note

This function uses Rcpp for computational efficiency.

Estimate the mean for a STAR process

Description

Estimate the conditional expectation for a STAR process under the square root link function.

Usage

expectation_sqrt(a, Jmax, Mu, sigma_t, Offset)

Arguments

a

Jmaxmax-dimensional vector of STAR integers a_j

Jmax

T x m matrix of maximum integer values to consider

Mu

T x m matrix of latent means

sigma_t

T-dimensional vector of time-dependent latent error sd's

Offset

T x m matrix of offsets

Value

Zhat T x m matrix of conditional expectations

Note

This function uses Rcpp for computational efficiency.

Box-Cox transformation

Description

Evaluate the Box-Cox transformation, which is a scaled power transformation to preserve continuity in the index lambda at zero. Negative values are permitted.

Usage

g_bc(t, lambda)

Arguments

t

argument(s) at which to evaluate the function

lambda

Box-Cox parameter

Value

The evaluation(s) of the Box-Cox function at the given input(s) t.

Note

Special cases include the identity transformation (lambda = 1), the square-root transformation (lambda = 1/2), and the log transformation (lambda = 0).

Examples

# Log-transformation:
g_bc(1:5, lambda = 0); log(1:5)

# Square-root transformation: note the shift and scaling
g_bc(1:5, lambda = 1/2); sqrt(1:5)

Bayesian bootstrap-based transformation

Description

Compute one posterior draw from the smoothed transformation implied by (separate) Bayesian bootstrap models for the CDFs of y and X.

Usage

g_bnp(
  y,
  xtSigmax = rep(0, length(y)),
  zgrid = NULL,
  sigma_epsilon = 1,
  approx_Fz = FALSE
)

Arguments

y

n x 1 vector of observed counts

xtSigmax

n x 1 vector of t(X_i) Sigma_theta X_i, where Sigma_theta is the prior variance

zgrid

optional vector of grid points for evaluating the CDF of z (Fz)

sigma_epsilon

latent standard deviation

approx_Fz

logical; if TRUE, use a normal approximation for Fz, the marginal CDF of the latent z, which is faster and more stable

Value

A smooth monotone function which can be used for evaluations of the transformation at each posterior draw.

Examples

# Sample some data:
y = rpois(n = 200, lambda = 5)
# Compute 200 draws of g on a grid:
t = seq(0, max(y), length.out = 100) # grid
g_post = t(sapply(1:500, function(s) g_bnp(y, approx_Fz = TRUE)(t)))
# Plot together:
plot(t, t, ylim = range(g_post), type='n', ylab = 'g(t)',  main = 'Bayesian bootstrap posterior: g')
apply(g_post, 1, function(g) lines(t, g, col='gray'))
# And the posterior mean of g:
lines(t, colMeans(g_post), lwd=3)

Cumulative distribution function (CDF)-based transformation

Description

Compute a CDF-based transformation using the observed count data. The CDF can be estimated nonparametrically or parametrically based on the Poisson or Negative Binomial distributions. In the parametric case, the parameters are determined based on the moments of y. Note that this is a fixed quantity and does not come with uncertainty quantification.

Usage

g_cdf(y, distribution = "np")

Arguments

y

n x 1 vector of observed counts

distribution

the distribution used for the CDF; must be one of

"np" (empirical CDF)
"pois" (moment-matched marginal Poisson CDF)
"neg-bin" (moment-matched marginal Negative Binomial CDF)

Value

A smooth monotone function which can be used for evaluations of the transformation.

Examples

# Sample some data:
y = rpois(n = 500, lambda = 5)

# Empirical CDF version:
g_np = g_cdf(y, distribution = 'np')

# Poisson version:
g_pois = g_cdf(y, distribution = 'pois')

# Negative binomial version:
g_negbin = g_cdf(y, distribution = 'neg-bin')

# Plot together:
t = 1:max(y) # grid
plot(t, g_np(t), type='l')
lines(t, g_pois(t), lty = 2)
lines(t, g_negbin(t), lty = 3)

Approximate inverse transformation

Description

Compute the inverse function of a transformation g based on a grid search.

Usage

g_inv_approx(g, t_grid)

Arguments

g

the transformation function

t_grid

grid of arguments at which to evaluate the transformation function

Value

A function which can be used for evaluations of the (approximate) inverse transformation function.

Examples

# Sample some data:
y = rpois(n = 500, lambda = 5)

# Empirical CDF transformation:
g_np = g_cdf(y, distribution = 'np')

# Grid for approximation:
t_grid = seq(1, max(y), length.out = 100)

# Approximate inverse:
g_inv = g_inv_approx(g = g_np, t_grid = t_grid)

# Check the approximation:
plot(t_grid, g_inv(g_np(t_grid)), type='p')
lines(t_grid, t_grid)

Inverse Box-Cox transformation

Description

Evaluate the inverse Box-Cox transformation. Negative values are permitted.

Usage

g_inv_bc(s, lambda)

Arguments

s

argument(s) at which to evaluate the function

lambda

Box-Cox parameter

Value

The evaluation(s) of the inverse Box-Cox function at the given input(s) s.

Note

Special cases include the identity transformation (lambda = 1), the square-root transformation (lambda = 1/2), and the log transformation (lambda = 0).

#' @examples # (Inverse) log-transformation: g_inv_bc(1:5, lambda = 0); exp(1:5)

# (Inverse) square-root transformation: note the shift and scaling g_inv_bc(1:5, lambda = 1/2); (1:5)^2

Fitting STAR Gradient Boosting Machines via EM algorithm

Description

Compute the MLEs and log-likelihood for the Gradient Boosting Machines (GBM) STAR model. The STAR model requires a *transformation* and an *estimation function* for the conditional mean given observed data. The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility. The estimator in this case is a GBM. Standard function calls including fitted and residuals apply.

Usage

gbm_star(
  y,
  X,
  X.test = NULL,
  transformation = "np",
  y_max = Inf,
  sd_init = 10,
  tol = 10^-10,
  max_iters = 1000,
  n.trees = 100,
  interaction.depth = 1,
  shrinkage = 0.1,
  bag.fraction = 1
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

X.test

m x p matrix of out-of-sample predictors

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)

y_max

a fixed and known upper bound for all observations; default is Inf

sd_init

add random noise for EM algorithm initialization scaled by sd_init times the Gaussian MLE standard deviation; default is 10

tol

tolerance for stopping the EM algorithm; default is 10^-10;

max_iters

maximum number of EM iterations before stopping; default is 1000

n.trees

Integer specifying the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. Default is 100.

interaction.depth

Integer specifying the maximum depth of each tree (i.e., the highest level of variable interactions allowed). A value of 1 implies an additive model, a value of 2 implies a model with up to 2-way interactions, etc. Default is 1.

shrinkage

a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction; 0.001 to 0.1 usually work, but a smaller learning rate typically requires more trees. Default is 0.1.

bag.fraction

the fraction of the training set observations randomly selected to propose the next tree in the expansion. This introduces randomnesses into the model fit. If bag.fraction < 1 then running the same model twice will result in similar but different fits. Default is 1 (for a deterministic prediction).

Details

Value

a list with the following elements:

fitted.values: the fitted values at the MLEs (training)
fitted.values.test: the fitted values at the MLEs (testing)
g.hat a function containing the (known or estimated) transformation
sigma.hat the MLE of the standard deviation
mu.hat the MLE of the conditional mean (on the transformed scale)
z.hat the estimated latent data (on the transformed scale) at the MLEs
residuals the Dunn-Smyth residuals (randomized)
residuals_rep the Dunn-Smyth residuals (randomized) for 10 replicates
logLik the log-likelihood at the MLEs
logLik0 the log-likelihood at the MLEs for the *unrounded* initialization
lambda the Box-Cox nonlinear parameter
gbmObj: the object returned by gbm() at the MLEs
and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications

Note

Infinite latent data values may occur when the transformed Gaussian model is highly inadequate. In that case, the function returns the *indices* of the data points with infinite latent values, which are significant outliers under the model. Deletion of these indices and re-running the model is one option, but care must be taken to ensure that (i) it is appropriate to treat these observations as outliers and (ii) the model is adequate for the remaining data points.

References

Kowal, D. R., & Wu, B. (2021). Semiparametric count data regression for self‐reported mental health. Biometrics. doi:10.1111/biom.13617

Examples

# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 10)
y = sim_dat$y; X = sim_dat$X

# EM algorithm for STAR (using the log-link)
fit_em = gbm_star(y = y, X = X,
                 transformation = 'log')

# Evaluate convergence:
plot(fit_em$logLik_all, type='l', main = 'GBM-STAR-log', xlab = 'Iteration', ylab = 'log-lik')

# Fitted values:
y_hat = fitted(fit_em)
plot(y_hat, y);

# Residuals:
plot(residuals(fit_em))
qqnorm(residuals(fit_em)); qqline(residuals(fit_em))

# Log-likelihood at MLEs:
fit_em$logLik

Generalized EM estimation for STAR

Description

Compute MLEs and log-likelihood for a generalized STAR model. The STAR model requires a *transformation* and an *estimation function* for the conditional mean given observed data. The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility. The estimator can be any least squares estimator, including nonlinear models. Standard function calls including coefficients(), fitted(), and residuals() apply.

Usage

genEM_star(
  y,
  estimator,
  transformation = "np",
  y_max = Inf,
  sd_init = 10,
  tol = 10^-10,
  max_iters = 1000
)

Arguments

y

n x 1 vector of observed counts

estimator

a function that inputs data y and outputs a list with two elements:

The fitted values fitted.values
The parameter estimates coefficients

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)

y_max

a fixed and known upper bound for all observations; default is Inf

sd_init

add random noise for EM algorithm initialization scaled by sd_init times the Gaussian MLE standard deviation; default is 10

tol

tolerance for stopping the EM algorithm; default is 10^-10;

max_iters

maximum number of EM iterations before stopping; default is 1000

Details

The expectation-maximization (EM) algorithm is used to produce maximum likelihood estimators (MLEs) for the parameters defined in the estimator function, such as linear regression coefficients, which define the Gaussian model for the continuous latent data. Fitted values (point predictions), residuals, and log-likelihood values are also available. Inference for the estimators proceeds via classical maximum likelihood. Initialization of the EM algorithm can be randomized to monitor convergence. However, the log-likelihood is concave for all transformations (except 'box-cox'), so global convergence is guaranteed.

Value

a list with the following elements:

coefficients the MLEs of the coefficients
fitted.values the fitted values at the MLEs
g.hat a function containing the (known or estimated) transformation
sigma.hat the MLE of the standard deviation
mu.hat the MLE of the conditional mean (on the transformed scale)
z.hat the estimated latent data (on the transformed scale) at the MLEs
residuals the Dunn-Smyth residuals (randomized)
residuals_rep the Dunn-Smyth residuals (randomized) for 10 replicates
logLik the log-likelihood at the MLEs
logLik0 the log-likelihood at the MLEs for the *unrounded* initialization
lambda the Box-Cox nonlinear parameter
and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications

Note

References

Kowal, D. R., & Wu, B. (2021). Semiparametric count data regression for self‐reported mental health. Biometrics. doi:10.1111/biom.13617

Examples

# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 5)
y = sim_dat$y; X = sim_dat$X

# Select a transformation:
transformation = 'np'

# Example using GAM as underlying estimator (for illustration purposes only)
if(require("mgcv")){
  fit_em = genEM_star(y = y,
                      estimator = function(y) gam(y ~ s(X1)+s(X2),
                      data=data.frame(y,X)),
                      transformation = transformation)
}

# Fitted coefficients:
coef(fit_em)

# Fitted values:
y_hat = fitted(fit_em)
plot(y_hat, y);

# Log-likelihood at MLEs:
fit_em$logLik

Generalized MCMC Algorithm for STAR

Description

Run the MCMC algorithm for STAR given

a function to initialize model parameters; and
a function to sample (i.e., update) model parameters.

The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility.

Usage

genMCMC_star(
  y,
  sample_params,
  init_params,
  transformation = "np",
  y_max = Inf,
  nsave = 5000,
  nburn = 5000,
  nskip = 0,
  save_y_hat = FALSE,
  verbose = TRUE
)

Arguments

y

n x 1 vector of observed counts

sample_params

a function that inputs data y and a named list params containing

mu: the n x 1 vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

and outputs an updated list params of samples from the full conditional posterior distribution of coefficients and sigma (and updates mu)

init_params

an initializing function that inputs data y and initializes the named list params of mu, sigma, and coefficients

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)

y_max

a fixed and known upper bound for all observations; default is Inf

nsave

number of MCMC iterations to save

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

save_y_hat

logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute

verbose

logical; if TRUE, print time remaining

Details

Value

a list with at least the following elements:

post.pred: draws from the posterior predictive distribution of y
post.sigma: draws from the posterior distribution of sigma
post.log.like.point: draws of the log-likelihood for each of the n observations
WAIC: Widely-Applicable/Watanabe-Akaike Information Criterion
p_waic: Effective number of parameters based on WAIC
post.lambda: draws from the posterior distribution of lambda (NULL unless transformation='box-cox')
fitted.values: the posterior mean of the conditional expectation of the counts y (NULL if save_y_hat=FALSE)
post.fitted.values: posterior draws of the conditional mean of the counts y (NULL if save_y_hat=FALSE)

If the coefficients list from init_params and sample_params contains a named element beta, e.g. for linear regression, then the function output contains

coefficients: the posterior mean of the beta coefficients
post.beta: draws from the posterior distribution of beta
post.othercoefs: draws from the posterior distribution of any other sampled coefficients, e.g. variance terms

If no beta exists in the parameter coefficients, then the output list just contains

coefficients: the posterior mean of all coefficients
post.beta: draws from the posterior distribution of all coefficients

Additionally, if init_params and sample_params have output mu_test, then the sampler will output post.predtest, which contains draws from the posterior predictive distribution at test points.

Examples

# Simulate data with count-valued response y:
sim_dat = simulate_nb_lm(n = 100, p = 5)
y = sim_dat$y; X = sim_dat$X

# STAR: log-transformation:
fit_log = genMCMC_star(y = y,
                         sample_params = function(y, params) sample_lm_gprior(y, X, params),
                         init_params = function(y) init_lm_gprior(y, X),
                         transformation = 'log')
# Posterior mean of each coefficient:
coef(fit_log)

# WAIC for STAR-log:
fit_log$WAIC

# MCMC diagnostics:
plot(as.ts(fit_log$post.beta[,1:3]))

# Posterior predictive check:
hist(apply(fit_log$post.pred, 1,
           function(x) mean(x==0)), main = 'Proportion of Zeros', xlab='');
abline(v = mean(y==0), lwd=4, col ='blue')

MCMC sampler for STAR with a monotone spline model for the transformation

Description

Run the MCMC algorithm for STAR given

a function to initialize model parameters; and
a function to sample (i.e., update) model parameters.

The transformation is modeled as an unknown, monotone function using I-splines. The Robust Adaptive Metropolis (RAM) sampler is used for drawing the parameter of the transformation function.

Usage

genMCMC_star_ispline(
  y,
  sample_params,
  init_params,
  lambda_prior = 1/2,
  y_max = Inf,
  nsave = 5000,
  nburn = 5000,
  nskip = 0,
  save_y_hat = FALSE,
  target_acc_rate = 0.3,
  adapt_rate = 0.75,
  stop_adapt_perc = 0.5,
  verbose = TRUE
)

Arguments

y

n x 1 vector of observed counts

sample_params

a function that inputs data y and a named list params containing at least

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

and optionally a fourth element mu_test which contains the vector of conditional means at test points. The output is an updated list params of samples from the full conditional posterior distribution of coefficients and sigma (along with updates of mu and mu_test if applicable)

init_params

an initializing function that inputs data y and initializes the named list params of mu, sigma, coefficients and mu_test (if desired)

lambda_prior

the prior mean for the transformation g() is the Box-Cox function with parameter lambda_prior

y_max

a fixed and known upper bound for all observations; default is Inf

nsave

number of MCMC iterations to save

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

save_y_hat

logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute

target_acc_rate

target acceptance rate (between zero and one)

adapt_rate

rate of adaptation in RAM sampler (between zero and one)

stop_adapt_perc

stop adapting at the proposal covariance at stop_adapt_perc*nburn

verbose

logical; if TRUE, print time remaining

Details

If the coefficients list from init_params and sample_params contains a named element beta, e.g. for linear regression, then the function output contains

coefficients: the posterior mean of the beta coefficients
post.beta: draws from the posterior distribution of beta
post.othercoefs: draws from the posterior distribution of any other sampled coefficients, e.g. variance terms

If no beta exists in the parameter coefficients, then the output list just contains

coefficients: the posterior mean of all coefficients
post.beta: draws from the posterior distribution of all coefficients

Additionally, if init_params and sample_params have output mu_test, then the sampler will output post.predtest, which contains draws from the posterior predictive distribution at test points.

Value

A list with at least the following elements:

post.pred: draws from the posterior predictive distribution of y
post.sigma: draws from the posterior distribution of sigma
post.log.like.point: draws of the log-likelihood for each of the n observations
WAIC: Widely-Applicable/Watanabe-Akaike Information Criterion
p_waic: Effective number of parameters based on WAIC
post.g: draws from the posterior distribution of the transformation g
post.sigma.gamma: draws from the posterior distribution of sigma.gamma, the prior standard deviation of the transformation g() coefficients
fitted.values: the posterior mean of the conditional expectation of the counts y (NULL if save_y_hat=FALSE)
post.fitted.values: posterior draws of the conditional mean of the counts y (NULL if save_y_hat=FALSE)

along with other elements depending on the nature of the initialization and sampling functions. See details for more info.

Summarize of effective sample size

Description

Compute the summary statistics for the effective sample size (ESS) across posterior samples for possibly many variables

Usage

getEffSize(postX)

Arguments

postX

An array of arbitrary dimension (nsims x ... x ...), where nsims is the number of posterior samples

Value

Table of summary statistics using the function summary().

Examples

# ESS for iid simulations:
rand_iid = rnorm(n = 10^4)
getEffSize(rand_iid)

# ESS for several AR(1) simulations with coefficients 0.1, 0.2,...,0.9:
rand_ar1 = sapply(seq(0.1, 0.9, by = 0.1), function(x) arima.sim(n = 10^4, list(ar = x)))
getEffSize(rand_ar1)

Initialize the parameters for an additive model

Description

Initialize the parameters for an additive model, which may contain both linear and nonlinear predictors. The nonlinear terms are modeled using orthogonalized splines.

Usage

init_bam_orthog(y, X_lin, X_nonlin, B_all = NULL)

Arguments

y

n x 1 vector of data

X_lin

n x pL matrix of predictors to be modelled as linear

X_nonlin

n x pNL matrix of predictors to be modelled as nonlinear

B_all

optional pNL-dimensional list of n x L[j] dimensional basis matrices for each nonlinear term j=1,...,pNL; if NULL, compute internally

Value

a named list params containing

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

Note

The parameters in coefficients are:

beta_lin: the p x 1 linear coefficients, including the linear terms from X_nonlin
f_j: the n x pNL matrix of fitted values for each nonlinear function
theta_j: the pNL-dimensional of nonlinear basis coefficients
sigma_beta: p x 1 vector of linear regression coefficient standard deviations
sigma_theta_j: pNL x 1 vector of nonlinear coefficient standard deviations

Initialize the parameters for an additive model

Description

Initialize the parameters for an additive model, which may contain both linear and nonlinear predictors. The nonlinear terms are modeled using low-rank thin plate splines.

Usage

init_bam_thin(y, X_lin, X_nonlin, B_all = NULL)

Arguments

y

n x 1 vector of data

X_lin

n x pL matrix of predictors to be modelled as linear

X_nonlin

n x pNL matrix of predictors to be modelled as nonlinear

B_all

optional pNL-dimensional list of n x L[j] dimensional basis matrices for each nonlinear term j=1,...,pNL; if NULL, compute internally

Value

a named list params containing

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

Note

The parameters in coefficients are:

beta_lin: the p x 1 linear coefficients, including the linear terms from X_nonlin
f_j: the n x pNL matrix of fitted values for each nonlinear function
theta_j: the pNL-dimensional of nonlinear basis coefficients
sigma_beta: p x 1 vector of linear regression coefficient standard deviations
sigma_theta_j: pNL x 1 vector of nonlinear coefficient standard deviations

Initialize linear regression parameters assuming a g-prior

Description

Initialize the parameters for a linear regression model assuming a g-prior for the coefficients.

Usage

init_lm_gprior(y, X, X_test = NULL)

Arguments

y

n x 1 vector of data

X

n x p matrix of predictors

X_test

n0 x p matrix of predictors at test points (default is NULL)

Value

a named list params containing at least

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

Additionally, if X_test is not NULL, then the list includes an element mu_test, the vector of conditional means at the test points

Note

The parameters in coefficients are:

beta: the p x 1 vector of regression coefficients components of beta

Examples

# Simulate data for illustration:
sim_dat = simulate_nb_lm(n = 100, p = 5)
y = sim_dat$y; X = sim_dat$X

# Initialize:
params = init_lm_gprior(y = y, X = X)
names(params)
names(params$coefficients)

Initialize linear regression parameters assuming a horseshoe prior

Description

Initialize the parameters for a linear regression model assuming a horseshoe prior for the (non-intercept) coefficients. The number of predictors p may exceed the number of observations n.

Usage

init_lm_hs(y, X, X_test = NULL)

Arguments

y

n x 1 vector of data

X

n x p matrix of predictors

X_test

n0 x p matrix of predictors at test points (default is NULL)

Value

a named list params containing at least

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

Additionally, if X_test is not NULL, then the list includes an element mu_test, the vector of conditional means at the test points

Note

The parameters in coefficients are:

beta: the p x 1 vector of regression coefficients
sigma_beta: the p x 1 vector of regression coefficient standard deviations (local scale parameters)
xi_sigma_beta: the p x 1 vector of parameter-expansion variables for sigma_beta
lambda_beta: the global scale parameter
xi_lambda_beta: the parameter-expansion variable for lambda_beta components of beta

Initialize linear regression parameters assuming a ridge prior

Description

Initialize the parameters for a linear regression model assuming a ridge prior for the (non-intercept) coefficients. The number of predictors p may exceed the number of observations n.

Usage

init_lm_ridge(y, X, X_test = NULL)

Arguments

y

n x 1 vector of data

X

n x p matrix of predictors

X_test

n0 x p matrix of predictors at test points (default is NULL)

Value

a named list params containing at least

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

Additionally, if X_test is not NULL, then the list includes an element mu_test, the vector of conditional means at the test points

Note

The parameters in coefficients are:

beta: the p x 1 vector of regression coefficients
sigma_beta: the prior standard deviation for the (non-intercept) components of beta

Initialize the parameters for a simple mean-only model

Description

Initialize the parameters for the model y ~ N(mu0, sigma^2) with a flat prior on mu0.

Usage

init_params_mean(y)

Arguments

y

n x 1 vector of data

Value

a named list params containing

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

Note

The only parameter in coefficients is mu0. Although redundant here, this parametrization is useful in other functions.

Estimate confidence intervals/bands for a STAR process

Description

Compute confidence intervals/bands for the expected value of the count-valued STAR process y based on intervals/bands for the Gaussian process mu.

Usage

interval_gRcpp(g_a_j, g_a_jp1, L_mu, U_mu, sigma, Jmax)

Arguments

g_a_j

Jmax x 1 vector of g(a(j))

g_a_jp1

Jmax x 1 vector of g(a(j + 1))

L_mu

m x 1 vector of lower intervals for mu

U_mu

m x 1 vector of upper intervals for mu

sigma

m x 1 vector of conditional standard deviations

Jmax

m x 1 vector of maximum integer values to consider

Value

LU_y m x 2 vector of intervals for y.

Note

This function uses Rcpp for computational efficiency.

Compute the inverse log-odds

Description

Compute the inverse log-odds

Usage

invlogit(x)

Arguments

x

scalar or vector for which to compute the (componentwise) inverse log-odds

Value

A scalar or vector of values in (0,1)

Fitting frequentist STAR linear model via EM algorithm

Description

Compute the MLEs and log-likelihood for the STAR linear model. The regression coefficients are estimated using least squares within an EM algorithm.

Usage

lm_star(
  formula,
  data = NULL,
  transformation = "np",
  y_max = Inf,
  sd_init = 10,
  tol = 10^-10,
  max_iters = 1000
)

Arguments

formula

an object of class "formula" (see lm for details on model specification)

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model; like lm, if not found in data, the variables are taken from environment(formula)

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)

y_max

a fixed and known upper bound for all observations; default is Inf

sd_init

add random noise for EM algorithm initialization scaled by sd_init times the Gaussian MLE standard deviation; default is 10

tol

tolerance for stopping the EM algorithm; default is 10^-10;

max_iters

maximum number of EM iterations before stopping; default is 1000

Details

Standard function calls including coefficients, fitted, and residuals apply. Fitted values are the expectation at the MLEs, and as such are not necessarily count-valued.

Value

an object of class "lmstar", which is a list with the following elements:

coefficients the MLEs of the coefficients
fitted.values the fitted values at the MLEs
g.hat a function containing the (known or estimated) transformation
ginv.hat a function containing the inverse of the transformation
sigma.hat the MLE of the standard deviation
mu.hat the MLE of the conditional mean (on the transformed scale)
z.hat the estimated latent data (on the transformed scale) at the MLEs
residuals the Dunn-Smyth residuals (randomized)
residuals_rep the Dunn-Smyth residuals (randomized) for 10 replicates
logLik the log-likelihood at the MLEs
logLik0 the log-likelihood at the MLEs for the *unrounded* initialization
lambda the Box-Cox nonlinear parameter
and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications

Note

References

Kowal, D. R., & Wu, B. (2021). Semiparametric count data regression for self‐reported mental health. Biometrics. doi:10.1111/biom.13617

Examples

# Simulate data with count-valued response y:
sim_dat = simulate_nb_lm(n = 100, p = 3)
y = sim_dat$y; X = sim_dat$X

# Fit model
fit_em = lm_star(y~X)

# Fitted coefficients:
coef(fit_em)
# Fitted values:
y_hat = fitted(fit_em)
plot(y_hat, y);

# Residuals:
plot(residuals(fit_em))
qqnorm(residuals(fit_em)); qqline(residuals(fit_em))

Compute the pointwise log-likelihood for STAR

Description

Compute the pointwise log-likelihood for a STAR model. The code here assumes that the transformed real-valued process (z_star) has conditionally independent components with means mu and standard deviations sigma.

Usage

logLikePointRcpp(g_a_j, g_a_jp1, mu, sigma)

Arguments

g_a_j

m x 1 vector of g(a(j))

g_a_jp1

m x 1 vector of g(a(j + 1))

mu

m x 1 vector of conditional expectations

sigma

m x 1 vector of conditional standard deviations

Value

loglike m x 1 log-likelihood value

Note

This function uses Rcpp for computational efficiency.

Compute the log-likelihood for STAR

Description

Compute the log-likelihood for a STAR model. The code here assumes that the transformed real-valued process (z_star) has conditionally independent components with means mu and standard deviations sigma.

Usage

logLikeRcpp(g_a_j, g_a_jp1, mu, sigma)

Arguments

g_a_j

m x 1 vector of g(a(j))

g_a_jp1

m x 1 vector of g(a(j + 1))

mu

m x 1 vector of conditional expectations

sigma

m x 1 vector of conditional standard deviations

Value

loglike scalar log-likelihood value

Note

This function uses Rcpp for computational efficiency.

Compute the log-odds

Description

Compute the log-odds

Usage

logit(x)

Arguments

x

scalar or vector in (0,1) for which to compute the (componentwise) log-odds

Value

A scalar or vector of log-odds

Plot the estimated regression coefficients and credible intervals

Description

Plot the estimated regression coefficients and credible intervals for the linear effects in up to two models.

Usage

plot_coef(
  post_coefficients_1,
  post_coefficients_2 = NULL,
  alpha = 0.05,
  labels = NULL
)

Arguments

post_coefficients_1

Nsims x p matrix of simulations from the posterior distribution of the p coefficients, where Nsims is the number of simulations

post_coefficients_2

Nsims x p matrix of simulations from the posterior distribution of the p coefficients from another model

alpha

confidence level for the credible intervals

labels

p dimensional string of labels for the coefficient names

Value

A plot of regression coefficients and credible intervals for 1-2 models

Plot the fitted values and the data

Description

Plot the fitted values, plus pointwise credible intervals, against the data. For simulations, one may use the true values in place of the data.

Usage

plot_fitted(y, post_y, y_hat = NULL, alpha = 0.05, ...)

Arguments

y

n x 1 vector of data

post_y

Nsims x n matrix of simulated fitted values, where Nsims is the number of simulations

y_hat

n x 1 vector of fitted values; if NULL, use the pointwise sample mean colMeans(post_y)

alpha

confidence level for the credible intervals

...

other arguments for plotting

Value

A plot with the fitted values and the credible intervals against the data

Plot the empirical and model-based probability mass functions

Description

Plot the empirical probability mass function, i.e., the proportion of data values y that equal j for each j=0,1,..., together with the model-based estimate of the probability mass function based on the posterior predictive distribution.

Usage

plot_pmf(y, post.pred, error.bars = FALSE, alpha = 0.05)

Arguments

y

n x 1 vector of data

post.pred

nsave draws from the posterior predictive distribution of y

error.bars

logical; if TRUE, include errors bars on the model-based PMF

alpha

confidence level for the credible intervals

Value

A plot of the empirical PMF of y along with a PMF estimate from the model posterior predictive distribution

pmax() in Rcpp

Description

Compute the pointwise max for two vectors of equal length

Usage

pmaxRcpp(v1, v2)

Arguments

v1

m x 1 vector

v2

m x 1 vector

Value

vm m x 1 vector of pointwise maxima

Note

This function uses Rcpp for computational efficiency.

pmin() in Rcpp

Description

Compute the pointwise min for two vectors of equal length

Usage

pminRcpp(v1, v2)

Arguments

v1

m x 1 vector

v2

m x 1 vector

Value

vm m x 1 vector of pointwise minima

Note

This function uses Rcpp for computational efficiency.

Predict method for response in STAR linear model

Description

Outputs predicted values based on an lmstar fit and optionally prediction intervals based on the the (plug-in) predictive distribution for the STAR linear model

Usage

## S3 method for class 'lmstar'
predict(object, newdata = NULL, interval = FALSE, level = 0.95, N = 1000, ...)

Arguments

object

Object of class "lmstar" as output by lm_star

newdata

An optional matrix/data frame in which to look for variables with which to predict. If omitted, the fitted values are used.

interval

logical; whether or not to include prediction intervals (default FALSE)

level

Level for prediction intervals

N

number of Monte Carlo samples from the posterior predictive distribution used to approximate intervals; default is 1000

...

Ignored

Details

If interval=TRUE, then predict.lmstar uses a Monte Carlo approach to estimating the (plug-in) predictive distribution for the STAR linear model. The algorithm iteratively samples (i) the latent data given the observed data, (ii) the latent predictive data given the latent data from (i), and (iii) (inverse) transforms and rounds the latent predictive data to obtain a draw from the integer-valued predictive distribution.

The appropriate quantiles of these Monte Carlo draws are computed and reported as the prediction interval.

Value

Either a a vector of predictions (if interval=FALSE) or a matrix of predictions and bounds with column names fit, lwr, and upr

Note

The “plug-in" predictive distribution is a crude approximation. Better approaches are available using the Bayesian models, e.g. blm_star, which provide samples from the posterior predictive distribution.

For highly skewed responses, prediction intervals especially at lower levels may not include the predicted value itself, since the mean is often much larger than the median.

Examples

# Simulate data with count-valued response y:
x = seq(0, 1, length.out = 100)
y = rpois(n = length(x), lambda = exp(1.5 + 5*(x -.5)^2))

# Estimate model--assume a quadratic effect (better for illustration purposes)
fit = lm_star(y~x+I(x^2), transformation = 'sqrt')

#Compute the predictive draws for the test points (same as observed points here)
#Also compute intervals using plug-in predictive distribution
y_pred = predict(fit, interval=TRUE)

# Plot the results
plot(x, y, ylim = range(y, y_pred), main = 'STAR: Predictions and 95% PI')
lines(x,y_pred[,"fit"], col='black', type='s', lwd=4)
lines(x, y_pred[,"lwr"], col='darkgray', type='s', lwd=4)
lines(x, y_pred[,"upr"], col='darkgray', type='s', lwd=4)

Compute coefficient p-values for STAR linear regression using likelihood ratio test

Description

For a linear regression model within the STAR framework, compute p-values for regression coefficients using a likelihood ratio test. It also computes a p-value for excluding all predictors, akin to a (partial) F test.

Usage

pvals(object)

Arguments

object

Object of class "lmstar" as output by lm_star

Value

a list of p+1 p-values, one for each predictor as well as the joint p-value excluding all predictors

Examples

# Simulate data with count-valued response y:
sim_dat = simulate_nb_lm(n = 100, p = 2)
y = sim_dat$y; X = sim_dat$X

# Select a transformation:
transformation = 'np'

#Estimate model
fit = lm_star(y~X, transformation = transformation)

#Compute p-values
pvals(fit)

Fit Random Forest STAR with EM algorithm

Description

Compute the MLEs and log-likelihood for the Random Forest STAR model. The STAR model requires a *transformation* and an *estimation function* for the conditional mean given observed data. The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility. The estimator in this case is a random forest. Standard function calls including fitted and residuals apply.

Usage

randomForest_star(
  y,
  X,
  X.test = NULL,
  transformation = "np",
  y_max = Inf,
  sd_init = 10,
  tol = 10^-10,
  max_iters = 1000,
  ntree = 500,
  mtry = max(floor(ncol(X)/3), 1),
  nodesize = 5
)

Arguments

y

n x 1 vector of observed counts

X

n x p matrix of predictors

X.test

m x p matrix of out-of-sample predictors

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)
"box-cox" (box-cox transformation with learned parameter)

y_max

a fixed and known upper bound for all observations; default is Inf

sd_init

add random noise for EM algorithm initialization scaled by sd_init times the Gaussian MLE standard deviation; default is 10

tol

tolerance for stopping the EM algorithm; default is 10^-10;

max_iters

maximum number of EM iterations before stopping; default is 1000

ntree

Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. Default is 500.

mtry

Number of variables randomly sampled as candidates at each split. Default is p/3.

nodesize

Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). Default is 5.

Details

The expectation-maximization (EM) algorithm is used to produce maximum likelihood estimators (MLEs) for the parameters defined in the The fitted values are computed using out-of-bag samples. As a result, the log-likelihood is based on out-of-bag prediction, and it is similarly straightforward to compute out-of-bag squared and absolute errors.

Value

a list with the following elements:

fitted.values: the fitted values at the MLEs based on out-of-bag samples (training)
fitted.values.test: the fitted values at the MLEs (testing)
g.hat a function containing the (known or estimated) transformation
sigma.hat the MLE of the standard deviation
mu.hat the MLE of the conditional mean (on the transformed scale)
z.hat the estimated latent data (on the transformed scale) at the MLEs
residuals the Dunn-Smyth residuals (randomized)
residuals_rep the Dunn-Smyth residuals (randomized) for 10 replicates
logLik the log-likelihood at the MLEs
logLik0 the log-likelihood at the MLEs for the *unrounded* initialization
lambda the Box-Cox nonlinear parameter
rfObj: the object returned by randomForest() at the MLEs
and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications

Note

Since the random forest produces random predictions, the EM algorithm will never converge exactly.

References

Kowal, D. R., & Wu, B. (2021). Semiparametric count data regression for self‐reported mental health. Biometrics. doi:10.1111/biom.13617

Examples


# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 10)
y = sim_dat$y; X = sim_dat$X

# EM algorithm for STAR (using the log-link)
fit_em = randomForest_star(y = y, X = X,
                 transformation = 'log',
                 max_iters = 100)

# Fitted values (out-of-bag)
y_hat = fitted(fit_em)
plot(y_hat, y);

# Residuals:
plot(residuals(fit_em))
qqnorm(residuals(fit_em)); qqline(residuals(fit_em))

# Log-likelihood at MLEs (out-of-bag):
fit_em$logLik

Data on the efficacy of a pest management system at reducing the number of roaches in urban apartments.

Description

Data on the efficacy of a pest management system at reducing the number of roaches in urban apartments.

Usage

roaches

Format

## 'roaches' A data frame with 262 obs. of 6 variables:

y: Number of roaches caught
roach1: Pretreatment number of roaches
treatment: Treatment indicator
senior: Indicator for only elderly residents in building
exposure2: Number of days for which the roach traps were used

Source

Gelman and Hill (2007); package 'rstanarm'

Rounding function

Description

Define the rounding operator associated with the floor function. The function also returns zero whenever the input is negative and caps the value at y_max, where y_max is a known upper bound on the data y (if specified).

Usage

round_floor(z, y_max = Inf)

Arguments

z

the real-valued input(s)

y_max

a fixed and known upper bound for all observations; default is Inf

Value

The count-valued output(s) from the rounding function.

Examples


# Floor function:
round_floor(1.5)
round_floor(0.5)

# Special treatmeant of negative numbers:
round_floor(-1)

Sample from a truncated normal distribution

Description

Sample from a truncated normal distribution. Samples are drawn componentwise, so each component of the vector is allowed its own mean, standard deviation, and upper and lower limits. The components are assumed to be independent.

Usage

rtruncnormRcpp(y_lower, y_upper, mu, sigma, u_rand)

Arguments

y_lower

m x 1 vector of lower endpoints

y_upper

m x 1 vector of upper endpoints

mu

m x 1 vector of conditional expectations

sigma

m x 1 vector of conditional standard deviations

u_rand

m x 1 vector of uniform random variables

Value

z_star m x 1 draw from the truncated normal distribution

Note

This function uses Rcpp for computational efficiency.

Sample a Gaussian vector using the fast sampler of BHATTACHARYA et al.

Description

Sample from N(mu, Sigma) where Sigma = solve(crossprod(Phi) + solve(D)) and mu = Sigma*crossprod(Phi, alpha):

Usage

sampleFastGaussian(Phi, Ddiag, alpha)

Arguments

Phi

n x p matrix (of predictors)

Ddiag

p x 1 vector of diagonal components (of prior variance)

alpha

n x 1 vector (of data, scaled by variance)

Value

Draw from N(mu, Sigma), which is p x 1, and is computed in O(n^2*p)

Note

Assumes D is diagonal, but extensions are available

Sample the parameters for an additive model

Description

Sample the parameters for an additive model, which may contain both linear and nonlinear predictors. The nonlinear terms are modeled using orthogonalized splines. The sampler draws the linear terms jointly and then samples each vector of nonlinear coefficients using Bayesian backfitting (i.e., conditional on all other nonlinear and linear terms).

Usage

sample_bam_orthog(
  y,
  X_lin,
  X_nonlin,
  params,
  A = 10^4,
  B_all = NULL,
  diagBtB_all = NULL,
  XtX = NULL
)

Arguments

y

n x 1 vector of data

X_lin

n x pL matrix of predictors to be modelled as linear

X_nonlin

n x pNL matrix of predictors to be modelled as nonlinear

params

the named list of parameters containing

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

A

the prior scale for sigma_beta, which we assume follows a Uniform(0, A) prior.

B_all

optional pNL-dimensional list of n x L[j] dimensional basis matrices for each nonlinear term j=1,...,pNL; if NULL, compute internally

diagBtB_all

optional pNL-dimensional list of diag(crossprod(B_all[[j]])); if NULL, compute internally

XtX

optional p x p matrix of crossprod(X) (one-time cost); if NULL, compute internally

Value

The updated named list params with draws from the full conditional distributions of sigma and coefficients (and updated mu).

Note

The parameters in coefficients are:

beta_lin: the p x 1 linear coefficients, including the linear terms from X_nonlin
f_j: the n x pNL matrix of fitted values for each nonlinear function
theta_j: the pNL-dimensional of nonlinear basis coefficients
sigma_beta: p x 1 vector of linear regression coefficient standard deviations
sigma_theta_j: pNL x 1 vector of nonlinear coefficient standard deviations

Sample the parameters for an additive model

Description

Sample the parameters for an additive model, which may contain both linear and nonlinear predictors. The nonlinear terms are modeled using low-rank thin plate splines. The sampler draws the linear terms jointly and then samples each vector of nonlinear coefficients using Bayesian backfitting (i.e., conditional on all other nonlinear and linear terms).

Usage

sample_bam_thin(
  y,
  X_lin,
  X_nonlin,
  params,
  A = 10^4,
  B_all = NULL,
  BtB_all = NULL,
  XtX = NULL
)

Arguments

y

n x 1 vector of data

X_lin

n x pL matrix of predictors to be modelled as linear

X_nonlin

n x pNL matrix of predictors to be modelled as nonlinear

params

the named list of parameters containing

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

A

the prior scale for sigma_beta, which we assume follows a Uniform(0, A) prior.

B_all

optional pNL-dimensional list of n x L[j] dimensional basis matrices for each nonlinear term j=1,...,pNL; if NULL, compute internally

BtB_all

optional pNL-dimensional list of crossprod(B_all[[j]]); if NULL, compute internally

XtX

optional p x p matrix of crossprod(X) (one-time cost); if NULL, compute internally

Value

The updated named list params with draws from the full conditional distributions of sigma and coefficients (and updated mu).

Note

The parameters in coefficients are:

beta_lin: the p x 1 linear coefficients, including the linear terms from X_nonlin
f_j: the n x pNL matrix of fitted values for each nonlinear function
theta_j: the pNL-dimensional of nonlinear basis coefficients
sigma_beta: p x 1 vector of linear regression coefficient standard deviations
sigma_theta_j: pNL x 1 vector of nonlinear coefficient standard deviations

Sample the linear regression parameters assuming a g-prior

Description

Sample the parameters for a linear regression model assuming a g-prior for the coefficients.

Usage

sample_lm_gprior(y, X, params, psi = NULL, XtX = NULL, X_test = NULL)

Arguments

y

n x 1 vector of data

X

n x p matrix of predictors

params

the named list of parameters containing

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

psi

the prior variance for the g-prior

XtX

the p x p matrix of crossprod(X) (one-time cost); if NULL, compute within the function

X_test

matrix of predictors at test points (default is NULL)

Value

The updated named list params with draws from the full conditional distributions of sigma and coefficients (along with updated mu and mu_test if applicable).

Note

The parameters in coefficients are:

beta: the p x 1 vector of regression coefficients components of beta

Examples

# Simulate data for illustration:
sim_dat = simulate_nb_lm(n = 100, p = 5)
y = sim_dat$y; X = sim_dat$X
# Initialize:
params = init_lm_gprior(y = y, X = X)
# Sample:
params = sample_lm_gprior(y = y, X = X, params = params)
names(params)
names(params$coefficients)

Sample linear regression parameters assuming horseshoe prior

Description

Sample the parameters for a linear regression model assuming a horseshoe prior for the (non-intercept) coefficients. The number of predictors p may exceed the number of observations n.

Usage

sample_lm_hs(y, X, params, XtX = NULL, X_test = NULL)

Arguments

y

n x 1 vector of data

X

n x p matrix of predictors

params

the named list of parameters containing

mu n x 1 vector of conditional means (fitted values)
sigma the conditional standard deviation
coefficients a named list of parameters that determine mu

XtX

the p x p matrix of crossprod(X) (one-time cost); if NULL, compute within the function

X_test

matrix of predictors at test points (default is NULL)

Value

The updated named list params with draws from the full conditional distributions of sigma and coefficients (along with updated mu and mu_test if applicable).

Note

The parameters in coefficients are:

beta the p x 1 vector of regression coefficients
sigma_beta p x 1 vector of regression coefficient standard deviations (local scale parameters)
xi_sigma_beta p x 1 vector of parameter-expansion variables for sigma_beta
lambda_beta the global scale parameter
xi_lambda_beta parameter-expansion variable for lambda_beta components of beta

Sample linear regression parameters assuming a ridge prior

Description

Sample the parameters for a linear regression model assuming a ridge prior for the (non-intercept) coefficients. The number of predictors p may exceed the number of observations n.

Usage

sample_lm_ridge(y, X, params, A = 10^4, XtX = NULL, X_test = NULL)

Arguments

y

n x 1 vector of data

X

n x p matrix of predictors

params

the named list of parameters containing

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

A

the prior scale for sigma_beta, which we assume follows a Uniform(0, A) prior.

XtX

the p x p matrix of crossprod(X) (one-time cost); if NULL, compute within the function

X_test

matrix of predictors at test points (default is NULL)

Value

The updated named list params with draws from the full conditional distributions of sigma and coefficients (along with updated mu and mu_test if applicable).

Note

The parameters in coefficients are:

beta: the p x 1 vector of regression coefficients
sigma_beta: the prior standard deviation for the (non-intercept) components of beta

Sample the parameters for a simple mean-only model

Description

Sample the parameters for the model y ~ N(mu0, sigma^2) with a flat prior on mu0 and sigma ~ Unif(0, A).

Usage

sample_params_mean(y, params)

Arguments

y

n x 1 vector of data

params

the named list of parameters containing

mu: vector of conditional means (fitted values)
sigma: the conditional standard deviation
coefficients: a named list of parameters that determine mu

Value

The updated named list params with draws from the full conditional distributions of sigma and coefficients (and updated mu).

Note

The only parameter in coefficients is mu0. Although redundant here, this parametrization is useful in other functions.

Compute Simultaneous Band Scores (SimBaS)

Description

Compute simultaneous band scores (SimBaS) from Meyer et al. (2015, Biometrics). SimBaS uses MC(MC) simulations of a function of interest to compute the minimum alpha such that the joint credible bands at the alpha level do not include zero. This quantity is computed for each grid point (or observation point) in the domain of the function.

Usage

simBaS(sampFuns)

Arguments

sampFuns

Nsims x m matrix of Nsims MCMC samples and m points along the curve

Value

m x 1 vector of simBaS

Note

The input needs not be curves: the simBaS may be computed for vectors to achieve a multiplicity adjustment.

The minimum of the returned value, PsimBaS_t, over the domain t is the Global Bayesian P-Value (GBPV) for testing whether the function is zero everywhere.

Simulate count data from Friedman's nonlinear regression

Description

Simulate data from a negative-binomial distribution with nonlinear mean function.

Usage

simulate_nb_friedman(
  n = 100,
  p = 10,
  r_nb = 1,
  b_int = log(1.5),
  b_sig = log(5),
  sigma_true = sqrt(2 * log(1)),
  seed = NULL
)

Arguments

n

number of observations

p

number of predictors

r_nb

the dispersion parameter of the Negative Binomial dispersion; smaller values imply greater overdispersion, while larger values approximate the Poisson distribution.

b_int

intercept; default is log(1.5).

b_sig

regression coefficients for true signals; default is log(5.0).

sigma_true

standard deviation of the Gaussian innovation; default is zero.

seed

optional integer to set the seed for reproducible simulation; default is NULL which results in a different dataset after each run

Details

The log-expected counts are modeled using the Friedman (1991) nonlinear function with interactions, possibly with additional Gaussian noise (on the log-scale). We assume that half of the predictors are associated with the response, i.e., true signals. For sufficiently large dispersion parameter r_nb, the distribution will approximate a Poisson distribution. Here, the predictor variables are simulated from independent uniform distributions.

Value

A named list with the simulated count response y, the simulated design matrix X, and the true expected counts Ey.

Note

Specifying sigma_true = sqrt(2*log(1 + a)) implies that the expected counts are inflated by 100*a% (relative to exp(X*beta)), in addition to providing additional overdispersion.

Examples

# Simulate and plot the count data:
sim_dat = simulate_nb_friedman(n = 100, p = 10);
plot(sim_dat$y)

Simulate count data from a linear regression

Description

Simulate data from a negative-binomial distribution with linear mean function.

Usage

simulate_nb_lm(
  n = 100,
  p = 10,
  r_nb = 1,
  b_int = log(1.5),
  b_sig = log(2),
  sigma_true = sqrt(2 * log(1)),
  ar1 = 0,
  intercept = FALSE,
  seed = NULL
)

Arguments

n

number of observations

p

number of predictors (including the intercept)

r_nb

the dispersion parameter of the Negative Binomial dispersion; smaller values imply greater overdispersion, while larger values approximate the Poisson distribution.

b_int

intercept; default is log(1.5), which implies the expected count is 1.5 when all predictors are zero

b_sig

regression coefficients for true signals; default is log(2.0), which implies a twofold increase in the expected counts for a one unit increase in x

sigma_true

standard deviation of the Gaussian innovation; default is zero.

ar1

the autoregressive coefficient among the columns of the X matrix; default is zero.

intercept

a Boolean indicating whether an intercept column should be included in the returned design matrix; default is FALSE

seed

optional integer to set the seed for reproducible simulation; default is NULL which results in a different dataset after each run

Details

The log-expected counts are modeled as a linear function of covariates, possibly with additional Gaussian noise (on the log-scale). We assume that half of the predictors are associated with the response, i.e., true signals. For sufficiently large dispersion parameter r_nb, the distribution will approximate a Poisson distribution. Here, the predictor variables are simulated from independent standard normal distributions.

Value

A named list with the simulated count response y, the simulated design matrix X (possibly including the intercept), the true expected counts Ey, and the true regression coefficients beta_true.

Note

Specifying sigma_true = sqrt(2*log(1 + a)) implies that the expected counts are inflated by 100*a% (relative to exp(X*beta)), in addition to providing additional overdispersion.

Examples

# Simulate and plot the count data:
sim_dat = simulate_nb_lm(n = 100, p = 10);
plot(sim_dat$y)

Initialize and reparametrize a spline basis matrix

Description

Following Wand and Ormerod (2008), compute a low-rank thin plate spline basis which is diagonalized such that the prior variance for the nonlinear component is a scalar times a diagonal matrix. Knot locations are determined by quantiles and the penalty is the integrated squared second derivative.

Usage

splineBasis(tau, sumToZero = TRUE, rescale01 = TRUE)

Arguments

tau

m x 1 vector of observed points

sumToZero

logical; if TRUE, enforce a sum-to-zero constraint (useful for additive models)

rescale01

logical; if TRUE, rescale tau to the interval [0,1] prior to computing basis and penalty matrices

Value

B_nl: the nonlinear component of the spline basis matrix

Note

To form the full spline basis matrix, compute cbind(1, tau, B_nl). The sum-to-zero constraint implicitly assumes that the linear term is centered and scaled, i.e., scale(tau).

Estimation for Bayesian STAR spline regression

Description

Compute samples from the predictive distributions of a STAR spline regression model using either a Gibbs sampling approach or exact Monte Carlo sampling (default is Gibbs sampling which scales better for large n)

Usage

spline_star(
  y,
  tau = NULL,
  transformation = "np",
  y_max = Inf,
  psi = NULL,
  approx_Fz = FALSE,
  approx_Fy = FALSE,
  nsave = 1000,
  use_MCMC = TRUE,
  nburn = 1000,
  nskip = 0,
  verbose = TRUE,
  method_sigma = "mle"
)

Arguments

y

n x 1 vector of observed counts

tau

n x 1 vector of observation points; if NULL, assume equally-spaced on [0,1]

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"bnp" (Bayesian nonparametric transformation using the Bayesian bootstrap)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)

y_max

a fixed and known upper bound for all observations; default is Inf

psi

prior variance (1/smoothing parameter); if NULL, update in MCMC

approx_Fz

logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data

approx_Fy

logical; in BNP transformation, approximate the marginal CDF of y using the empirical CDF

nsave

number of MCMC iterations to save (or number of Monte Carlo simulations)

use_MCMC

logical; whether to run Gibbs sampler or Monte Carlo (default is TRUE)

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

verbose

logical; if TRUE, print time remaining

method_sigma

method to estimate the latent data standard deviation (only applicable if use_MCMC=FALSE); must be one of

"mle" use the MLE from the STAR EM algorithm (default)
"mmle" use the marginal MLE (Note: slower!)

Details

There are several options for the transformation. First, the transformation can belong to the *Box-Cox* family, which includes the known transformations 'identity', 'log', and 'sqrt'. Second, the transformation can be estimated (before model fitting) using the empirical distribution of the data y. Options in this case include the empirical cumulative distribution function (CDF), which is fully nonparametric ('np'), or the parametric alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin') distributions. For the parametric distributions, the parameters of the distribution are estimated using moments (means and variances) of y. The distribution-based transformations approximately preserve the mean and variance of the count data y on the latent data scale, which lends interpretability to the model parameters. Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'), which is a Bayesian nonparametric model and incorporates the uncertainty about the transformation into posterior and predictive inference.

Value

a list with the following elements:

post_ytilde: nsave x n samples from the posterior predictive distribution at the observation points tau
marg_like: the marginal likelihood (only if use_MCMC=FALSE; otherwise NULL)

Note

For the 'bnp' transformation (without the Fy approximation), there are numerical stability issues when psi is modeled as unknown. In this case, it is better to fix psi at some positive number.

Examples

# Simulate some data:
n = 100
tau = seq(0,1, length.out = n)
y = round_floor(exp(1 + rnorm(n)/4 + poly(tau, 4)%*%rnorm(n=4, sd = 4:1)))

# Sample from the predictive distribution of a STAR spline model:
fit = spline_star(y = y, tau = tau)

# Compute 90% prediction intervals:
pi_y = t(apply(fit$post_ytilde, 2, quantile, c(0.05, .95)))

# Plot the results: intervals, median, and smoothed mean
plot(tau, y, ylim = range(pi_y, y))
polygon(c(tau, rev(tau)),c(pi_y[,2], rev(pi_y[,1])),col='gray', border=NA)
lines(tau, apply(fit$post_ytilde, 2, median), lwd=5, col ='black')
lines(tau, smooth.spline(tau, apply(fit$post_ytilde, 2, mean))$y, lwd=5, col='blue')
lines(tau, y, type='p')

Monte Carlo predictive sampler for spline regression

Description

Compute direct Monte Carlo samples from the posterior predictive distribution of a STAR spline regression model.

Usage

spline_star_exact(
  y,
  tau = NULL,
  transformation = "np",
  y_max = Inf,
  psi = 1000,
  method_sigma = "mle",
  approx_Fz = FALSE,
  approx_Fy = FALSE,
  nsave = 1000,
  compute_marg = TRUE
)

Arguments

y

n x 1 vector of observed counts

tau

n x 1 vector of observation points; if NULL, assume equally-spaced on [0,1]

transformation

transformation to use for the latent data; must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"bnp" (Bayesian nonparametric transformation using the Bayesian bootstrap)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)

y_max

a fixed and known upper bound for all observations; default is Inf

psi

prior variance (1/smoothing parameter)

method_sigma

method to estimate the latent data standard deviation; must be one of

"mle" use the MLE from the STAR EM algorithm
"mmle" use the marginal MLE (Note: slower!)

approx_Fz

logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data

approx_Fy

logical; in BNP transformation, approximate the marginal CDF of y using the empirical CDF

nsave

number of Monte Carlo simulations

compute_marg

logical; if TRUE, compute and return the marginal likelihood

Details

There are several options for the transformation. First, the transformation can belong to the *Box-Cox* family, which includes the known transformations 'identity', 'log', and 'sqrt'. Second, the transformation can be estimated (before model fitting) using the empirical distribution of the data y. Options in this case include the empirical cumulative distribution function (CDF), which is fully nonparametric ('np'), or the parametric alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin') distributions. For the parametric distributions, the parameters of the distribution are estimated using moments (means and variances) of y. The distribution-based transformations approximately preserve the mean and variance of the count data y on the latent data scale, which lends interpretability to the model parameters. Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'), which is a Bayesian nonparametric model and incorporates the uncertainty about the transformation into posterior and predictive inference.

The Monte Carlo sampler produces direct, discrete, and joint draws from the posterior predictive distribution of the spline regression model at the observed tau points.

Value

a list with the following elements:

post_ytilde: nsave x n samples from the posterior predictive distribution at the observation points tau
marg_like: the marginal likelihood (if requested; otherwise NULL)

Compute the first and second moment of a truncated normal

Description

Given lower and upper endpoints and the mean and standard deviation of a (non-truncated) normal distribution, compute the first and second moment of the truncated normal distribution. All inputs may be scalars or vectors.

Usage

truncnorm_mom(a, b, mu, sig)

Arguments

a

lower endpoint

b

upper endpoint

mu

expected value of the non-truncated normal distribution

sig

standard deviation of the non-truncated normal distribution

Value

a list containing the first moment m1 and the second moment m2

Univariate Slice Sampler from Neal (2008)

Description

Compute a draw from a univariate distribution using the code provided by Radford M. Neal. The documentation below is also reproduced from Neal (2008).

Usage

uni.slice(x0, g, w = 1, m = Inf, lower = -Inf, upper = +Inf, gx0 = NULL)

Arguments

x0

Initial point

g

Function returning the log of the probability density (plus constant)

w

Size of the steps for creating interval (default 1)

m

Limit on steps (default infinite)

lower

Lower bound on support of the distribution (default -Inf)

upper

Upper bound on support of the distribution (default +Inf)

gx0

Value of g(x0), if known (default is not known)

Value

The point sampled, with its log density attached as an attribute.

Note

The log density function may return -Inf for points outside the support of the distribution. If a lower and/or upper bound is specified for the support, the log density function will not be called outside such limits.

Update parameters for warpDLM model with trend DLM

Description

This function serves to update the warpDLM variance parameters when the underlying DLM is a structural model (i.e. local level or local linear trend). It assumes a Unif(0,A=10^4) prior on all standard deviations.

Usage

update_struct(fit, z_star, theta)

Arguments

fit

the KFAS model object describing the DLM

z_star

the latest draw of z*

theta

the latest draw of the latent state(s) theta

Value

A KFAS model object (of class SSModel) updated with the newly sampled variance parameters

Posterior Inference for warpDLM model with latent structural DLM

Description

This function outputs posterior quantities and forecasts from a univariate warpDLM model. Currently two latent DLM specifications are supported: local level and the local linear trend.

Usage

warpDLM(
  y,
  type = c("level", "trend"),
  transformation = c("np", "identity", "log", "sqrt", "pois", "neg-bin"),
  y_max = Inf,
  R0 = 10,
  nsave = 5000,
  nburn = 5000,
  nskip = 1,
  n.ahead = 1
)

Arguments

y

the count-valued time series

type

the type of latent DLM (must be either level or trend)

transformation

transformation to use for the latent process (default is np); must be one of

"identity" (identity transformation)
"log" (log transformation)
"sqrt" (square root transformation)
"np" (nonparametric transformation estimated from empirical CDF)
"pois" (transformation for moment-matched marginal Poisson CDF)
"neg-bin" (transformation for moment-matched marginal Negative Binomial CDF)

y_max

a fixed and known upper bound for all observations; default is Inf

R0

the variance for the initial state theta_0; default is 10

nsave

number of MCMC iterations to save

nburn

number of MCMC iterations to discard

nskip

number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw

n.ahead

number of steps to forecast ahead

Value

A list with the following elements:

V_post: posterior draws of the observation variance
W_post: posterior draws of the state update variance(s)
fc_post: draws from the forecast distribution (of length n.ahead)
post_pred: draws from the posterior predictive distribution of y
g_func: transformation function
g_inv_func: inverse transformation function
KFAS_mod: the final KFAS model representing the latent DLM