Type: | Package |
Title: | Flexible Modeling of Count Data |
Version: | 1.0.2 |
Description: | For Bayesian and classical inference and prediction with count-valued data, Simultaneous Transformation and Rounding (STAR) Models provide a flexible, interpretable, and easy-to-use approach. STAR models the observed count data using a rounded continuous data model and incorporates a transformation for greater flexibility. Implicitly, STAR formalizes the commonly-applied yet incoherent procedure of (i) transforming count-valued data and subsequently (ii) modeling the transformed data using Gaussian models. STAR is well-defined for count-valued data, which is reflected in predictive accuracy, and is designed to account for zero-inflation, bounded or censored data, and over- or underdispersion. Importantly, STAR is easy to combine with existing MCMC or point estimation methods for continuous data, which allows seamless adaptation of continuous data models (such as linear regressions, additive models, BART, random forests, and gradient boosting machines) for count-valued data. The package also includes several methods for modeling count time series data, namely via warped Dynamic Linear Models. For more details and background on these methodologies, see the works of Kowal and Canale (2020) <doi:10.1214/20-EJS1707>, Kowal and Wu (2022) <doi:10.1111/biom.13617>, King and Kowal (2022) <doi:10.48550/arXiv.2110.14790>, and Kowal and Wu (2023) <doi:10.48550/arXiv.2110.12316>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
LinkingTo: | Rcpp, RcppArmadillo |
Imports: | stats, utils, coda, dbarts, FastGP, gbm, graphics, Matrix, spikeSlabGAM, splines2, randomForest, Rcpp, TruncatedNormal, truncdist, KFAS |
RoxygenNote: | 7.2.3 |
Suggests: | knitr, rmarkdown, bayesplot, mgcv |
VignetteBuilder: | knitr |
URL: | https://bking124.github.io/countSTAR/ https://github.com/bking124/countSTAR |
BugReports: | https://github.com/bking124/countSTAR/issues |
Depends: | R (≥ 2.10) |
LazyData: | true |
NeedsCompilation: | yes |
Packaged: | 2023-06-29 21:33:30 UTC; brian |
Author: | Brian King [aut, cre], Dan Kowal [aut] |
Maintainer: | Brian King <brianking387@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-06-30 18:30:08 UTC |
Brent's method for optimization
Description
Implementation for Brent's algorithm for minimizing a univariate function over an interval.
The code is based on a function in the stsm
package.
Usage
BrentMethod(a = 0, b, fcn, tol = .Machine$double.eps^0.25)
Arguments
a |
lower limit for search |
b |
upper limit for search |
fcn |
function to minimize |
tol |
tolerance level for convergence of the optimization procedure |
Value
a list of containing the following elements:
-
fx
the minimum value of the input function -
x
the argument that minimizes the function -
iter
number of iterations to converge -
vx
a vector that stores the arguments until convergence
Inverse rounding function
Description
Define the intervals associated with y = j
based on the flooring function.
The function returns -Inf
for j = 0
(or smaller) and Inf
for
any j >= y_max + 1
, where y_max
is a known upper bound on the data y
(if specified).
Usage
a_j(j, y_max = Inf)
Arguments
j |
the integer-valued input(s) |
y_max |
a fixed and known upper bound for all observations; default is |
Value
The (lower) interval endpoint(s) associated with j
.
Examples
# Standard cases:
a_j(1)
a_j(20)
# Boundary cases:
a_j(0)
a_j(20, y_max = 15)
Fit Bayesian Additive STAR Model with MCMC
Description
Run the MCMC algorithm for a STAR Bayesian additive model The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility.
Usage
bam_star(
y,
X_lin,
X_nonlin,
splinetype = "orthogonal",
transformation = "np",
y_max = Inf,
nsave = 5000,
nburn = 5000,
nskip = 2,
save_y_hat = FALSE,
verbose = TRUE
)
Arguments
y |
|
X_lin |
|
X_nonlin |
|
splinetype |
Type of spline to use for modelling the nonlinear predictors; must be either "orthogonal" (orthogonalized splines–the default) or "thinplate" (low-rank thin plate splines) |
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
nsave |
number of MCMC iterations to save |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
save_y_hat |
logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute |
verbose |
logical; if TRUE, print time remaining |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation.
Posterior and predictive inference is obtained via a Gibbs sampler that combines (i) a latent data augmentation step (like in probit regression) and (ii) an existing sampler for a continuous data model.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter
is inferred within the MCMC sampler ('box-cox'). Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. Third, the transformation can be
modeled as an unknown, monotone function using I-splines ('ispline'). The
Robust Adaptive Metropolis (RAM) sampler is used for drawing the parameter
of the transformation function.
Value
a list with at least the following elements:
-
coefficients
: the posterior mean of the coefficients -
fitted.values
: the posterior mean of the conditional expectation of the countsy
-
post.coefficients
: posterior draws of the coefficients -
post.fitted.values
: posterior draws of the conditional mean of the countsy
-
post.pred
: draws from the posterior predictive distribution ofy
-
post.lambda
: draws from the posterior distribution oflambda
-
post.sigma
: draws from the posterior distribution ofsigma
-
post.log.like.point
: draws of the log-likelihood for each of then
observations -
WAIC
: Widely-Applicable/Watanabe-Akaike Information Criterion -
p_waic
: Effective number of parameters based on WAIC
In the case of transformation="ispline"
, the list also contains
-
post.g
: draws from the posterior distribution of the transformationg
-
post.sigma.gamma
: draws from the posterior distribution ofsigma.gamma
, the prior standard deviation of the transformation g() coefficients
Examples
# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 5, seed=32)
y = sim_dat$y; X = sim_dat$X
# Linear and nonlinear components:
X_lin = as.matrix(X[,-(1:3)])
X_nonlin = as.matrix(X[,(1:3)])
# STAR: nonparametric transformation
fit <- bam_star(y,X_lin, X_nonlin, nburn=1000, nskip=0)
# Posterior mean of each coefficient:
coef(fit)
# WAIC:
fit$WAIC
# MCMC diagnostics:
plot(as.ts(fit$post.coefficients[,1:3]))
# Posterior predictive check:
hist(apply(fit$post.pred, 1,
function(x) mean(x==0)), main = 'Proportion of Zeros', xlab='');
abline(v = mean(y==0), lwd=4, col ='blue')
MCMC Algorithm for BART-STAR
Description
Run the MCMC algorithm for a BART model for count-valued responses using STAR. The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility.
Usage
bart_star(
y,
X,
X_test = NULL,
y_test = NULL,
transformation = "np",
y_max = Inf,
n.trees = 200,
sigest = NULL,
sigdf = 3,
sigquant = 0.9,
k = 2,
power = 2,
base = 0.95,
nsave = 5000,
nburn = 5000,
nskip = 2,
save_y_hat = FALSE,
verbose = TRUE
)
Arguments
y |
|
X |
|
X_test |
|
y_test |
|
transformation |
transformation to use for the latent process; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
n.trees |
number of trees to use in BART; default is 200 |
sigest |
positive numeric estimate of the residual standard deviation (see ?bart) |
sigdf |
degrees of freedom for error variance prior (see ?bart) |
sigquant |
quantile of the error variance prior that the rough estimate (sigest) is placed at. The closer the quantile is to 1, the more aggressive the fit will be (see ?bart) |
k |
the number of prior standard deviations E(Y|x) = f(x) is away from +/- 0.5. The response is internally scaled to range from -0.5 to 0.5. The bigger k is, the more conservative the fitting will be (see ?bart) |
power |
power parameter for tree prior (see ?bart) |
base |
base parameter for tree prior (see ?bart) |
nsave |
number of MCMC iterations to save |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
save_y_hat |
logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute |
verbose |
logical; if TRUE, print time remaining |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the model in (1) is a Bayesian additive regression tree (BART) model.
Posterior and predictive inference is obtained via a Gibbs sampler that combines (i) a latent data augmentation step (like in probit regression) and (ii) an existing sampler for a continuous data model.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter
is inferred within the MCMC sampler ('box-cox'). Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. Third, the transformation can be
modeled as an unknown, monotone function using I-splines ('ispline'). The
Robust Adaptive Metropolis (RAM) sampler is used for drawing the parameter
of the transformation function.
Value
a list with the following elements:
-
post.pred
: draws from the posterior predictive distribution ofy
-
post.sigma
: draws from the posterior distribution ofsigma
-
post.log.like.point
: draws of the log-likelihood for each of then
observations -
WAIC
: Widely-Applicable/Watanabe-Akaike Information Criterion -
p_waic
: Effective number of parameters based on WAIC -
post.pred.test
: draws from the posterior predictive distribution at the test pointsX_test
(NULL
ifX_test
is not given) -
post.fitted.values.test
: posterior draws of the conditional mean at the test pointsX_test
(NULL
ifX_test
is not given) -
post.mu.test
: draws of the conditional mean of z_star at the test pointsX_test
(NULL
ifX_test
is not given) -
post.log.pred.test
: draws of the log-predictive distribution for each of then0
test cases (NULL
ifX_test
is not given) -
fitted.values
: the posterior mean of the conditional expectation of the countsy
(NULL
ifsave_y_hat=FALSE
) -
post.fitted.values
: posterior draws of the conditional mean of the countsy
(NULL
ifsave_y_hat=FALSE
)
In the case of transformation="ispline"
, the list also contains
-
post.g
: draws from the posterior distribution of the transformationg
-
post.sigma.gamma
: draws from the posterior distribution ofsigma.gamma
, the prior standard deviation of the transformation g() coefficients
If transformation="box-cox"
, then the list also contains
-
post.lambda
: draws from the posterior distribution oflambda
Examples
# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 10)
y = sim_dat$y; X = sim_dat$X
# BART-STAR with log-transformation:
fit_log = bart_star(y = y, X = X, transformation = 'log',
save_y_hat = TRUE, nburn=1000, nskip=0)
# Fitted values
plot_fitted(y = sim_dat$Ey,
post_y = fit_log$post.fitted.values,
main = 'Fitted Values: BART-STAR-log')
# WAIC for BART-STAR-log:
fit_log$WAIC
# MCMC diagnostics:
plot(as.ts(fit_log$post.fitted.values[,1:10]))
# Posterior predictive check:
hist(apply(fit_log$post.pred, 1,
function(x) mean(x==0)), main = 'Proportion of Zeros', xlab='');
abline(v = mean(y==0), lwd=4, col ='blue')
# BART-STAR with nonparametric transformation:
fit = bart_star(y = y, X = X,
transformation = 'np', save_y_hat = TRUE)
# Fitted values
plot_fitted(y = sim_dat$Ey,
post_y = fit$post.fitted.values,
main = 'Fitted Values: BART-STAR-np')
# WAIC for BART-STAR-np:
fit$WAIC
# MCMC diagnostics:
plot(as.ts(fit$post.fitted.values[,1:10]))
# Posterior predictive check:
hist(apply(fit$post.pred, 1,
function(x) mean(x==0)), main = 'Proportion of Zeros', xlab='');
abline(v = mean(y==0), lwd=4, col ='blue')
MCMC sampler for BART-STAR with a monotone spline model for the transformation
Description
Run the MCMC algorithm for BART model for count-valued responses using STAR. The transformation is modeled as an unknown, monotone function using I-splines. The Robust Adaptive Metropolis (RAM) sampler is used for drawing the parameter of the transformation function.
Usage
bart_star_ispline(
y,
X,
X_test = NULL,
y_test = NULL,
lambda_prior = 1/2,
y_max = Inf,
n.trees = 200,
sigest = NULL,
sigdf = 3,
sigquant = 0.9,
k = 2,
power = 2,
base = 0.95,
nsave = 5000,
nburn = 5000,
nskip = 2,
save_y_hat = FALSE,
target_acc_rate = 0.3,
adapt_rate = 0.75,
stop_adapt_perc = 0.5,
verbose = TRUE
)
Arguments
y |
|
X |
|
X_test |
|
y_test |
|
lambda_prior |
the prior mean for the transformation g() is the Box-Cox function with
parameter |
y_max |
a fixed and known upper bound for all observations; default is |
n.trees |
number of trees to use in BART; default is 200 |
sigest |
positive numeric estimate of the residual standard deviation (see ?bart) |
sigdf |
degrees of freedom for error variance prior (see ?bart) |
sigquant |
quantile of the error variance prior that the rough estimate (sigest) is placed at. The closer the quantile is to 1, the more aggresive the fit will be (see ?bart) |
k |
the number of prior standard deviations E(Y|x) = f(x) is away from +/- 0.5. The response is internally scaled to range from -0.5 to 0.5. The bigger k is, the more conservative the fitting will be (see ?bart) |
power |
power parameter for tree prior (see ?bart) |
base |
base parameter for tree prior (see ?bart) |
nsave |
number of MCMC iterations to save |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
save_y_hat |
logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute |
target_acc_rate |
target acceptance rate (between zero and one) |
adapt_rate |
rate of adaptation in RAM sampler (between zero and one) |
stop_adapt_perc |
stop adapting at the proposal covariance at |
verbose |
logical; if TRUE, print time remaining |
Value
a list with the following elements:
-
fitted.values
: the posterior mean of the conditional expectation of the countsy
-
post.fitted.values
: posterior draws of the conditional mean of the countsy
-
post.pred.test
: draws from the posterior predictive distribution at the test pointsX_test
-
post.fitted.values.test
: posterior draws of the conditional mean at the test pointsX_test
-
post.pred
: draws from the posterior predictive distribution ofy
-
post.sigma
: draws from the posterior distribution ofsigma
-
post.mu.test
: draws of the conditional mean of z_star at the test points -
post.log.like.point
: draws of the log-likelihood for each of then
observations -
post.log.pred.test
: draws of the log-predictive distribution for each of then0
test cases -
WAIC
: Widely-Applicable/Watanabe-Akaike Information Criterion -
p_waic
: Effective number of parameters based on WAIC -
post.g
: draws from the posterior distribution of the transformationg
-
post.sigma.gamma
: draws from the posterior distribution ofsigma.gamma
, the prior standard deviation of the transformationg
coefficients
STAR Bayesian Linear Regression
Description
Posterior inference for STAR linear model
Usage
blm_star(
y,
X,
X_test = NULL,
transformation = "np",
y_max = Inf,
prior = "gprior",
use_MCMC = TRUE,
nsave = 5000,
nburn = 5000,
nskip = 0,
method_sigma = "mle",
approx_Fz = FALSE,
approx_Fy = FALSE,
psi = NULL,
compute_marg = FALSE
)
Arguments
y |
|
X |
|
X_test |
|
transformation |
transformation to use for the latent process; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
prior |
prior to use for the latent linear regression; currently implemented options are "gprior", "horseshoe", and "ridge". Not all modeling options and transformations are available with the latter two priors. |
use_MCMC |
= TRUE, |
nsave |
number of MCMC iterations to save (or MC samples to draw if use_MCMC=FALSE) |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
method_sigma |
method to estimate the latent data standard deviation in exact sampler; must be one of
|
approx_Fz |
logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data |
approx_Fy |
logical; in BNP transformation, approximate
the marginal CDF of |
psi |
prior variance (g-prior) |
compute_marg |
logical; if TRUE, compute and return the marginal likelihood (only available when using exact sampler, i.e. use_MCMC=FALSE) |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the continuous latent data model is a linear regression.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter
is inferred within the MCMC sampler ('box-cox'). Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. The distribution-based
transformations approximately preserve the mean and variance of the count data y
on the latent data scale, which lends interpretability to the model parameters.
Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'),
which is a Bayesian nonparametric model and incorporates the uncertainty
about the transformation into posterior and predictive inference.
The Monte Carlo sampler (use_MCMC=FALSE
) produces direct, discrete, and joint draws
from the posterior distribution and the posterior predictive distribution
of the linear regression model with a g-prior.
Value
a list with at least the following elements:
-
coefficients
: the posterior mean of the regression coefficients -
post.beta
: posterior draws of the regression coefficients -
post.pred
: draws from the posterior predictive distribution ofy
-
post.log.like.point
: draws of the log-likelihood for each of then
observations -
WAIC
: Widely-Applicable/Watanabe-Akaike Information Criterion -
p_waic
: Effective number of parameters based on WAIC
If test points are passed in, then the list will also have post.predtest
,
which contains draws from the posterior predictive distribution at test points.
Other elements may be present depending on the choice of prior, transformation, and sampling approach.
Note
The 'bnp' transformation (without the Fy
approximation) is
slower than the other transformations because of the way
the TruncatedNormal
sampler must be updated as the lower and upper
limits change (due to the sampling of g
). Thus, computational
improvements are likely available.
Gibbs sampler for STAR linear regression with BNP transformation
Description
Compute MCMC samples from the posterior and predictive distributions of a STAR linear regression model with a g-prior and BNP transformation.
Usage
blm_star_bnpgibbs(
y,
X,
X_test = X,
y_max = Inf,
psi = NULL,
approx_Fz = FALSE,
approx_Fy = FALSE,
nsave = 1000,
nburn = 1000,
nskip = 0,
verbose = TRUE
)
Arguments
y |
|
X |
|
X_test |
|
y_max |
a fixed and known upper bound for all observations; default is |
psi |
prior variance (g-prior) |
approx_Fz |
logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data |
approx_Fy |
logical; in BNP transformation, approximate
the marginal CDF of |
nsave |
number of MCMC iterations to save |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the continuous latent data model is a linear regression.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt'. Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. The distribution-based
transformations approximately preserve the mean and variance of the count data y
on the latent data scale, which lends interpretability to the model parameters.
Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'),
which is a Bayesian nonparametric model and incorporates the uncertainty
about the transformation into posterior and predictive inference.
Value
a list with the following elements:
-
coefficients
the posterior mean of the regression coefficients -
post_beta
:nsave x p
samples from the posterior distribution of the regression coefficients -
post_ytilde
:nsave x n0
samples from the posterior predictive distribution at test pointsX_test
-
post_g
:nsave
posterior samples of the transformation evaluated at the uniquey
values (only applies for 'bnp' transformations)
Monte Carlo sampler for STAR linear regression with a g-prior
Description
Compute direct Monte Carlo samples from the posterior and predictive distributions of a STAR linear regression model with a g-prior.
Usage
blm_star_exact(
y,
X,
X_test = X,
transformation = "np",
y_max = Inf,
psi = NULL,
method_sigma = "mle",
approx_Fz = FALSE,
approx_Fy = FALSE,
nsave = 5000,
compute_marg = FALSE
)
Arguments
y |
|
X |
|
X_test |
|
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
psi |
prior variance (g-prior) |
method_sigma |
method to estimate the latent data standard deviation; must be one of
|
approx_Fz |
logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data |
approx_Fy |
logical; in BNP transformation, approximate
the marginal CDF of |
nsave |
number of Monte Carlo simulations |
compute_marg |
logical; if TRUE, compute and return the marginal likelihood |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the continuous latent data model is a linear regression.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt'. Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. The distribution-based
transformations approximately preserve the mean and variance of the count data y
on the latent data scale, which lends interpretability to the model parameters.
Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'),
which is a Bayesian nonparametric model and incorporates the uncertainty
about the transformation into posterior and predictive inference.
The Monte Carlo sampler produces direct, discrete, and joint draws from the posterior distribution and the posterior predictive distribution of the linear regression model with a g-prior.
Value
a list with the following elements:
-
coefficients
the posterior mean of the regression coefficients -
post.beta
:nsave x p
samples from the posterior distribution of the regression coefficients -
post.pred
: draws from the posterior predictive distribution ofy
-
post.pred.test
:nsave x n0
samples from the posterior predictive distribution at test pointsX_test
(if given, otherwise NULL) -
sigma
: The estimated latent data standard deviation -
post.g
:nsave
posterior samples of the transformation evaluated at the uniquey
values (only applies for 'bnp' transformations) -
marg.like
: the marginal likelihood (if requested; otherwise NULL)
Note
The 'bnp' transformation (without the Fy
approximation) is
slower than the other transformations because of the way
the TruncatedNormal
sampler must be updated as the lower and upper
limits change (due to the sampling of g
). Thus, computational
improvements are likely available.
Estimate the remaining time in the MCMC based on previous samples
Description
Estimate the remaining time in the MCMC based on previous samples
Usage
computeTimeRemaining(nsi, timer0, nsims, nrep = 1000)
Arguments
nsi |
Current iteration |
timer0 |
Initial timer value, returned from |
nsims |
Total number of simulations |
nrep |
Print the estimated time remaining every |
Value
Table of summary statistics using the function summary
Compute asymptotic confidence intervals for STAR linear regression
Description
For a linear regression model within the STAR framework, compute (asymptotic) confidence intervals for a regression coefficient of interest. Confidence intervals are computed by inverting the likelihood ratio test and profiling the log-likelihood.
Usage
## S3 method for class 'lmstar'
confint(object, parm, level = 0.95, ...)
Arguments
object |
Object of class "lmstar" as output by |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
confidence level; default is 0.95 |
... |
Ignored |
Value
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in
Examples
#Simulate data with count-valued response y:
sim_dat = simulate_nb_lm(n = 100, p = 2)
y = sim_dat$y; X = sim_dat$X
#Select a transformation:
transformation = 'np'
#Estimate model
fit = lm_star(y~X, transformation=transformation)
#Confidence interval for all parameters
confint(fit)
Compute Simultaneous Credible Bands
Description
Compute (1-alpha)% credible BANDS for a function based on MCMC samples using Crainiceanu et al. (2007)
Usage
credBands(sampFuns, alpha = 0.05)
Arguments
sampFuns |
|
alpha |
confidence level |
Value
m x 2
matrix of credible bands; the first column is the lower band, the second is the upper band
Note
The input needs not be curves: the simultaneous credible "bands" may be computed for vectors. The resulting credible intervals will provide joint coverage at the (1-alpha) level across all components of the vector.
Compute the ergodic (running) mean.
Description
Compute the ergodic (running) mean.
Usage
ergMean(x)
Arguments
x |
vector for which to compute the running mean |
Value
A vector y
with each element defined by y[i] = mean(x[1:i])
Examples
# Compare:
ergMean(1:10)
mean(1:10)
# Running mean for iid N(5, 1) samples:
x = rnorm(n = 10^4, mean = 5, sd = 1)
plot(ergMean(x))
abline(h=5)
Compute E(Y^2) for a STAR process
Description
Compute the conditional expectation of Y^2 for a STAR process Y under a generic link function g.
Usage
expectation2_gRcpp(g_a_j, g_a_jp1, mu, sigma, Jmax)
Arguments
g_a_j |
|
g_a_jp1 |
|
mu |
|
sigma |
|
Jmax |
|
Value
y2_hat m x 1
vector of conditional expectations
Note
This function uses Rcpp
for computational efficiency.
Estimate the mean for a STAR process
Description
Estimate the conditional expectation for a STAR process under a generic link function g.
Usage
expectation_gRcpp(g_a_j, g_a_jp1, mu, sigma, Jmax)
Arguments
g_a_j |
|
g_a_jp1 |
|
mu |
|
sigma |
|
Jmax |
|
Value
y_hat m x 1
vector of conditional expectations
Note
This function uses Rcpp
for computational efficiency.
Estimate the mean for a STAR process
Description
Estimate the conditional expectation for a STAR process under the identity link function.
Usage
expectation_identity(a, Jmax, Mu, sigma_t, Offset)
Arguments
a |
|
Jmax |
|
Mu |
|
sigma_t |
|
Offset |
|
Value
Zhat T x m
matrix of conditional expectations
Note
This function uses Rcpp
for computational efficiency.
Estimate the mean for a STAR process
Description
Estimate the conditional expectation for a STAR process under the log link function.
Usage
expectation_log(a, Jmax, Mu, sigma_t, Offset)
Arguments
a |
|
Jmax |
|
Mu |
|
sigma_t |
|
Offset |
|
Value
Zhat T x m
matrix of conditional expectations
Note
This function uses Rcpp
for computational efficiency.
Estimate the mean for a STAR process
Description
Estimate the conditional expectation for a STAR process under the square root link function.
Usage
expectation_sqrt(a, Jmax, Mu, sigma_t, Offset)
Arguments
a |
|
Jmax |
|
Mu |
|
sigma_t |
|
Offset |
|
Value
Zhat T x m
matrix of conditional expectations
Note
This function uses Rcpp
for computational efficiency.
Box-Cox transformation
Description
Evaluate the Box-Cox transformation, which is a scaled power transformation
to preserve continuity in the index lambda
at zero. Negative values are
permitted.
Usage
g_bc(t, lambda)
Arguments
t |
argument(s) at which to evaluate the function |
lambda |
Box-Cox parameter |
Value
The evaluation(s) of the Box-Cox function at the given input(s) t
.
Note
Special cases include
the identity transformation (lambda = 1
),
the square-root transformation (lambda = 1/2
),
and the log transformation (lambda = 0
).
Examples
# Log-transformation:
g_bc(1:5, lambda = 0); log(1:5)
# Square-root transformation: note the shift and scaling
g_bc(1:5, lambda = 1/2); sqrt(1:5)
Bayesian bootstrap-based transformation
Description
Compute one posterior draw from the smoothed transformation
implied by (separate) Bayesian bootstrap models for the CDFs
of y
and X
.
Usage
g_bnp(
y,
xtSigmax = rep(0, length(y)),
zgrid = NULL,
sigma_epsilon = 1,
approx_Fz = FALSE
)
Arguments
y |
|
xtSigmax |
|
zgrid |
optional vector of grid points for evaluating the CDF
of z ( |
sigma_epsilon |
latent standard deviation |
approx_Fz |
logical; if TRUE, use a normal approximation for |
Value
A smooth monotone function which can be used for evaluations of the transformation at each posterior draw.
Examples
# Sample some data:
y = rpois(n = 200, lambda = 5)
# Compute 200 draws of g on a grid:
t = seq(0, max(y), length.out = 100) # grid
g_post = t(sapply(1:500, function(s) g_bnp(y, approx_Fz = TRUE)(t)))
# Plot together:
plot(t, t, ylim = range(g_post), type='n', ylab = 'g(t)', main = 'Bayesian bootstrap posterior: g')
apply(g_post, 1, function(g) lines(t, g, col='gray'))
# And the posterior mean of g:
lines(t, colMeans(g_post), lwd=3)
Cumulative distribution function (CDF)-based transformation
Description
Compute a CDF-based transformation using the observed count data.
The CDF can be estimated nonparametrically or parametrically based on the
Poisson or Negative Binomial distributions. In the parametric case,
the parameters are determined based on the moments of y
.
Note that this is a fixed quantity and does not come with uncertainty quantification.
Usage
g_cdf(y, distribution = "np")
Arguments
y |
|
distribution |
the distribution used for the CDF; must be one of
|
Value
A smooth monotone function which can be used for evaluations of the transformation.
Examples
# Sample some data:
y = rpois(n = 500, lambda = 5)
# Empirical CDF version:
g_np = g_cdf(y, distribution = 'np')
# Poisson version:
g_pois = g_cdf(y, distribution = 'pois')
# Negative binomial version:
g_negbin = g_cdf(y, distribution = 'neg-bin')
# Plot together:
t = 1:max(y) # grid
plot(t, g_np(t), type='l')
lines(t, g_pois(t), lty = 2)
lines(t, g_negbin(t), lty = 3)
Approximate inverse transformation
Description
Compute the inverse function of a transformation g
based on a grid search.
Usage
g_inv_approx(g, t_grid)
Arguments
g |
the transformation function |
t_grid |
grid of arguments at which to evaluate the transformation function |
Value
A function which can be used for evaluations of the (approximate) inverse transformation function.
Examples
# Sample some data:
y = rpois(n = 500, lambda = 5)
# Empirical CDF transformation:
g_np = g_cdf(y, distribution = 'np')
# Grid for approximation:
t_grid = seq(1, max(y), length.out = 100)
# Approximate inverse:
g_inv = g_inv_approx(g = g_np, t_grid = t_grid)
# Check the approximation:
plot(t_grid, g_inv(g_np(t_grid)), type='p')
lines(t_grid, t_grid)
Inverse Box-Cox transformation
Description
Evaluate the inverse Box-Cox transformation. Negative values are permitted.
Usage
g_inv_bc(s, lambda)
Arguments
s |
argument(s) at which to evaluate the function |
lambda |
Box-Cox parameter |
Value
The evaluation(s) of the inverse Box-Cox function at the given input(s) s
.
Note
Special cases include
the identity transformation (lambda = 1
),
the square-root transformation (lambda = 1/2
),
and the log transformation (lambda = 0
).
#' @examples # (Inverse) log-transformation: g_inv_bc(1:5, lambda = 0); exp(1:5)
# (Inverse) square-root transformation: note the shift and scaling g_inv_bc(1:5, lambda = 1/2); (1:5)^2
Fitting STAR Gradient Boosting Machines via EM algorithm
Description
Compute the MLEs and log-likelihood for the Gradient Boosting Machines (GBM) STAR model.
The STAR model requires a *transformation* and an *estimation function* for the conditional mean
given observed data. The transformation can be known (e.g., log or sqrt) or unknown
(Box-Cox or estimated nonparametrically) for greater flexibility.
The estimator in this case is a GBM.
Standard function calls including fitted
and residuals
apply.
Usage
gbm_star(
y,
X,
X.test = NULL,
transformation = "np",
y_max = Inf,
sd_init = 10,
tol = 10^-10,
max_iters = 1000,
n.trees = 100,
interaction.depth = 1,
shrinkage = 0.1,
bag.fraction = 1
)
Arguments
y |
|
X |
|
X.test |
|
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
sd_init |
add random noise for EM algorithm initialization scaled by |
tol |
tolerance for stopping the EM algorithm; default is 10^-10; |
max_iters |
maximum number of EM iterations before stopping; default is 1000 |
n.trees |
Integer specifying the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. Default is 100. |
interaction.depth |
Integer specifying the maximum depth of each tree (i.e., the highest level of variable interactions allowed). A value of 1 implies an additive model, a value of 2 implies a model with up to 2-way interactions, etc. Default is 1. |
shrinkage |
a shrinkage parameter applied to each tree in the expansion. Also known as the learning rate or step-size reduction; 0.001 to 0.1 usually work, but a smaller learning rate typically requires more trees. Default is 0.1. |
bag.fraction |
the fraction of the training set observations randomly selected to propose the next tree in the expansion. This introduces randomnesses into the model fit. If bag.fraction < 1 then running the same model twice will result in similar but different fits. Default is 1 (for a deterministic prediction). |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. The Gaussian model in this case is a GBM.
Value
a list with the following elements:
-
fitted.values
: the fitted values at the MLEs (training) -
fitted.values.test
: the fitted values at the MLEs (testing) -
g.hat
a function containing the (known or estimated) transformation -
sigma.hat
the MLE of the standard deviation -
mu.hat
the MLE of the conditional mean (on the transformed scale) -
z.hat
the estimated latent data (on the transformed scale) at the MLEs -
residuals
the Dunn-Smyth residuals (randomized) -
residuals_rep
the Dunn-Smyth residuals (randomized) for 10 replicates -
logLik
the log-likelihood at the MLEs -
logLik0
the log-likelihood at the MLEs for the *unrounded* initialization -
lambda
the Box-Cox nonlinear parameter -
gbmObj
: the object returned by gbm() at the MLEs and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications
Note
Infinite latent data values may occur when the transformed Gaussian model is highly inadequate. In that case, the function returns the *indices* of the data points with infinite latent values, which are significant outliers under the model. Deletion of these indices and re-running the model is one option, but care must be taken to ensure that (i) it is appropriate to treat these observations as outliers and (ii) the model is adequate for the remaining data points.
References
Kowal, D. R., & Wu, B. (2021). Semiparametric count data regression for selfâreported mental health. Biometrics. doi:10.1111/biom.13617
Examples
# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 10)
y = sim_dat$y; X = sim_dat$X
# EM algorithm for STAR (using the log-link)
fit_em = gbm_star(y = y, X = X,
transformation = 'log')
# Evaluate convergence:
plot(fit_em$logLik_all, type='l', main = 'GBM-STAR-log', xlab = 'Iteration', ylab = 'log-lik')
# Fitted values:
y_hat = fitted(fit_em)
plot(y_hat, y);
# Residuals:
plot(residuals(fit_em))
qqnorm(residuals(fit_em)); qqline(residuals(fit_em))
# Log-likelihood at MLEs:
fit_em$logLik
Generalized EM estimation for STAR
Description
Compute MLEs and log-likelihood for a generalized STAR model. The STAR model requires
a *transformation* and an *estimation function* for the conditional mean
given observed data. The transformation can be known (e.g., log or sqrt) or unknown
(Box-Cox or estimated nonparametrically) for greater flexibility.
The estimator can be any least squares estimator, including nonlinear models.
Standard function calls including
coefficients()
, fitted()
, and residuals()
apply.
Usage
genEM_star(
y,
estimator,
transformation = "np",
y_max = Inf,
sd_init = 10,
tol = 10^-10,
max_iters = 1000
)
Arguments
y |
|
estimator |
a function that inputs data
|
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
sd_init |
add random noise for EM algorithm initialization scaled by |
tol |
tolerance for stopping the EM algorithm; default is 10^-10; |
max_iters |
maximum number of EM iterations before stopping; default is 1000 |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation.
The expectation-maximization (EM) algorithm is used to produce
maximum likelihood estimators (MLEs) for the parameters defined in the
estimator
function, such as linear regression coefficients,
which define the Gaussian model for the continuous latent data.
Fitted values (point predictions), residuals, and log-likelihood values
are also available. Inference for the estimators proceeds via classical maximum likelihood.
Initialization of the EM algorithm can be randomized to monitor convergence.
However, the log-likelihood is concave for all transformations (except 'box-cox'),
so global convergence is guaranteed.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter
is estimated within the EM algorithm ('box-cox'). Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
.
Value
a list with the following elements:
-
coefficients
the MLEs of the coefficients -
fitted.values
the fitted values at the MLEs -
g.hat
a function containing the (known or estimated) transformation -
sigma.hat
the MLE of the standard deviation -
mu.hat
the MLE of the conditional mean (on the transformed scale) -
z.hat
the estimated latent data (on the transformed scale) at the MLEs -
residuals
the Dunn-Smyth residuals (randomized) -
residuals_rep
the Dunn-Smyth residuals (randomized) for 10 replicates -
logLik
the log-likelihood at the MLEs -
logLik0
the log-likelihood at the MLEs for the *unrounded* initialization -
lambda
the Box-Cox nonlinear parameter and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications
Note
Infinite latent data values may occur when the transformed Gaussian model is highly inadequate. In that case, the function returns the *indices* of the data points with infinite latent values, which are significant outliers under the model. Deletion of these indices and re-running the model is one option, but care must be taken to ensure that (i) it is appropriate to treat these observations as outliers and (ii) the model is adequate for the remaining data points.
References
Kowal, D. R., & Wu, B. (2021). Semiparametric count data regression for selfâreported mental health. Biometrics. doi:10.1111/biom.13617
Examples
# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 5)
y = sim_dat$y; X = sim_dat$X
# Select a transformation:
transformation = 'np'
# Example using GAM as underlying estimator (for illustration purposes only)
if(require("mgcv")){
fit_em = genEM_star(y = y,
estimator = function(y) gam(y ~ s(X1)+s(X2),
data=data.frame(y,X)),
transformation = transformation)
}
# Fitted coefficients:
coef(fit_em)
# Fitted values:
y_hat = fitted(fit_em)
plot(y_hat, y);
# Log-likelihood at MLEs:
fit_em$logLik
Generalized MCMC Algorithm for STAR
Description
Run the MCMC algorithm for STAR given
a function to initialize model parameters; and
a function to sample (i.e., update) model parameters.
The transformation can be known (e.g., log or sqrt) or unknown (Box-Cox or estimated nonparametrically) for greater flexibility.
Usage
genMCMC_star(
y,
sample_params,
init_params,
transformation = "np",
y_max = Inf,
nsave = 5000,
nburn = 5000,
nskip = 0,
save_y_hat = FALSE,
verbose = TRUE
)
Arguments
y |
|
sample_params |
a function that inputs data
and outputs an updated list |
init_params |
an initializing function that inputs data |
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
nsave |
number of MCMC iterations to save |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
save_y_hat |
logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute |
verbose |
logical; if TRUE, print time remaining |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation.
Posterior and predictive inference is obtained via a Gibbs sampler that combines (i) a latent data augmentation step (like in probit regression) and (ii) an existing sampler for a continuous data model.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt', as well as a version in which the Box-Cox parameter
is inferred within the MCMC sampler ('box-cox'). Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
.
Value
a list with at least the following elements:
-
post.pred
: draws from the posterior predictive distribution ofy
-
post.sigma
: draws from the posterior distribution ofsigma
-
post.log.like.point
: draws of the log-likelihood for each of then
observations -
WAIC
: Widely-Applicable/Watanabe-Akaike Information Criterion -
p_waic
: Effective number of parameters based on WAIC -
post.lambda
: draws from the posterior distribution oflambda
(NULL unlesstransformation='box-cox'
) -
fitted.values
: the posterior mean of the conditional expectation of the countsy
(NULL
ifsave_y_hat=FALSE
) -
post.fitted.values
: posterior draws of the conditional mean of the countsy
(NULL
ifsave_y_hat=FALSE
)
If the coefficients list from init_params
and sample_params
contains a named element beta
,
e.g. for linear regression, then the function output contains
-
coefficients
: the posterior mean of the beta coefficients -
post.beta
: draws from the posterior distribution ofbeta
-
post.othercoefs
: draws from the posterior distribution of any other sampled coefficients, e.g. variance terms
If no beta
exists in the parameter coefficients, then the output list just contains
-
coefficients
: the posterior mean of all coefficients -
post.beta
: draws from the posterior distribution of all coefficients
Additionally, if init_params
and sample_params
have output mu_test
, then the sampler will output
post.predtest
, which contains draws from the posterior predictive distribution at test points.
Examples
# Simulate data with count-valued response y:
sim_dat = simulate_nb_lm(n = 100, p = 5)
y = sim_dat$y; X = sim_dat$X
# STAR: log-transformation:
fit_log = genMCMC_star(y = y,
sample_params = function(y, params) sample_lm_gprior(y, X, params),
init_params = function(y) init_lm_gprior(y, X),
transformation = 'log')
# Posterior mean of each coefficient:
coef(fit_log)
# WAIC for STAR-log:
fit_log$WAIC
# MCMC diagnostics:
plot(as.ts(fit_log$post.beta[,1:3]))
# Posterior predictive check:
hist(apply(fit_log$post.pred, 1,
function(x) mean(x==0)), main = 'Proportion of Zeros', xlab='');
abline(v = mean(y==0), lwd=4, col ='blue')
MCMC sampler for STAR with a monotone spline model for the transformation
Description
Run the MCMC algorithm for STAR given
a function to initialize model parameters; and
a function to sample (i.e., update) model parameters.
The transformation is modeled as an unknown, monotone function using I-splines. The Robust Adaptive Metropolis (RAM) sampler is used for drawing the parameter of the transformation function.
Usage
genMCMC_star_ispline(
y,
sample_params,
init_params,
lambda_prior = 1/2,
y_max = Inf,
nsave = 5000,
nburn = 5000,
nskip = 0,
save_y_hat = FALSE,
target_acc_rate = 0.3,
adapt_rate = 0.75,
stop_adapt_perc = 0.5,
verbose = TRUE
)
Arguments
y |
|
sample_params |
a function that inputs data
and optionally a fourth element |
init_params |
an initializing function that inputs data |
lambda_prior |
the prior mean for the transformation g() is the Box-Cox function with
parameter |
y_max |
a fixed and known upper bound for all observations; default is |
nsave |
number of MCMC iterations to save |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
save_y_hat |
logical; if TRUE, compute and save the posterior draws of the expected counts, E(y), which may be slow to compute |
target_acc_rate |
target acceptance rate (between zero and one) |
adapt_rate |
rate of adaptation in RAM sampler (between zero and one) |
stop_adapt_perc |
stop adapting at the proposal covariance at |
verbose |
logical; if TRUE, print time remaining |
Details
If the coefficients list from init_params
and sample_params
contains a named element beta
,
e.g. for linear regression, then the function output contains
-
coefficients
: the posterior mean of the beta coefficients -
post.beta
: draws from the posterior distribution ofbeta
-
post.othercoefs
: draws from the posterior distribution of any other sampled coefficients, e.g. variance terms
If no beta
exists in the parameter coefficients, then the output list just contains
-
coefficients
: the posterior mean of all coefficients -
post.beta
: draws from the posterior distribution of all coefficients
Additionally, if init_params
and sample_params
have output mu_test
, then the sampler will output
post.predtest
, which contains draws from the posterior predictive distribution at test points.
Value
A list with at least the following elements:
-
post.pred
: draws from the posterior predictive distribution ofy
-
post.sigma
: draws from the posterior distribution ofsigma
-
post.log.like.point
: draws of the log-likelihood for each of then
observations -
WAIC
: Widely-Applicable/Watanabe-Akaike Information Criterion -
p_waic
: Effective number of parameters based on WAIC -
post.g
: draws from the posterior distribution of the transformationg
-
post.sigma.gamma
: draws from the posterior distribution ofsigma.gamma
, the prior standard deviation of the transformation g() coefficients -
fitted.values
: the posterior mean of the conditional expectation of the countsy
(NULL
ifsave_y_hat=FALSE
) -
post.fitted.values
: posterior draws of the conditional mean of the countsy
(NULL
ifsave_y_hat=FALSE
)
along with other elements depending on the nature of the initialization and sampling functions. See details for more info.
Summarize of effective sample size
Description
Compute the summary statistics for the effective sample size (ESS) across posterior samples for possibly many variables
Usage
getEffSize(postX)
Arguments
postX |
An array of arbitrary dimension |
Value
Table of summary statistics using the function summary()
.
Examples
# ESS for iid simulations:
rand_iid = rnorm(n = 10^4)
getEffSize(rand_iid)
# ESS for several AR(1) simulations with coefficients 0.1, 0.2,...,0.9:
rand_ar1 = sapply(seq(0.1, 0.9, by = 0.1), function(x) arima.sim(n = 10^4, list(ar = x)))
getEffSize(rand_ar1)
Initialize the parameters for an additive model
Description
Initialize the parameters for an additive model, which may contain both linear and nonlinear predictors. The nonlinear terms are modeled using orthogonalized splines.
Usage
init_bam_orthog(y, X_lin, X_nonlin, B_all = NULL)
Arguments
y |
|
X_lin |
|
X_nonlin |
|
B_all |
optional |
Value
a named list params
containing
-
mu
: vector of conditional means (fitted values) -
sigma
: the conditional standard deviation -
coefficients
: a named list of parameters that determinemu
Note
The parameters in coefficients
are:
-
beta_lin
: thep x 1
linear coefficients, including the linear terms fromX_nonlin
-
f_j
: then x pNL
matrix of fitted values for each nonlinear function -
theta_j
: thepNL
-dimensional of nonlinear basis coefficients -
sigma_beta
:p x 1
vector of linear regression coefficient standard deviations -
sigma_theta_j
:pNL x 1
vector of nonlinear coefficient standard deviations
Initialize the parameters for an additive model
Description
Initialize the parameters for an additive model, which may contain both linear and nonlinear predictors. The nonlinear terms are modeled using low-rank thin plate splines.
Usage
init_bam_thin(y, X_lin, X_nonlin, B_all = NULL)
Arguments
y |
|
X_lin |
|
X_nonlin |
|
B_all |
optional |
Value
a named list params
containing
-
mu
: vector of conditional means (fitted values) -
sigma
: the conditional standard deviation -
coefficients
: a named list of parameters that determinemu
Note
The parameters in coefficients
are:
-
beta_lin
: thep x 1
linear coefficients, including the linear terms fromX_nonlin
-
f_j
: then x pNL
matrix of fitted values for each nonlinear function -
theta_j
: thepNL
-dimensional of nonlinear basis coefficients -
sigma_beta
:p x 1
vector of linear regression coefficient standard deviations -
sigma_theta_j
:pNL x 1
vector of nonlinear coefficient standard deviations
Initialize linear regression parameters assuming a g-prior
Description
Initialize the parameters for a linear regression model assuming a g-prior for the coefficients.
Usage
init_lm_gprior(y, X, X_test = NULL)
Arguments
y |
|
X |
|
X_test |
|
Value
a named list params
containing at least
-
mu
: vector of conditional means (fitted values) -
sigma
: the conditional standard deviation -
coefficients
: a named list of parameters that determinemu
Additionally, if X_test is not NULL, then the list includes an element
mu_test
, the vector of conditional means at the test points
Note
The parameters in coefficients
are:
-
beta
: thep x 1
vector of regression coefficients components ofbeta
Examples
# Simulate data for illustration:
sim_dat = simulate_nb_lm(n = 100, p = 5)
y = sim_dat$y; X = sim_dat$X
# Initialize:
params = init_lm_gprior(y = y, X = X)
names(params)
names(params$coefficients)
Initialize linear regression parameters assuming a horseshoe prior
Description
Initialize the parameters for a linear regression model assuming a
horseshoe prior for the (non-intercept) coefficients. The number of predictors
p
may exceed the number of observations n
.
Usage
init_lm_hs(y, X, X_test = NULL)
Arguments
y |
|
X |
|
X_test |
|
Value
a named list params
containing at least
-
mu
: vector of conditional means (fitted values) -
sigma
: the conditional standard deviation -
coefficients
: a named list of parameters that determinemu
Additionally, if X_test is not NULL, then the list includes an element
mu_test
, the vector of conditional means at the test points
Note
The parameters in coefficients
are:
-
beta
: thep x 1
vector of regression coefficients -
sigma_beta
: thep x 1
vector of regression coefficient standard deviations (local scale parameters) -
xi_sigma_beta
: thep x 1
vector of parameter-expansion variables forsigma_beta
-
lambda_beta
: the global scale parameter -
xi_lambda_beta
: the parameter-expansion variable forlambda_beta
components ofbeta
Initialize linear regression parameters assuming a ridge prior
Description
Initialize the parameters for a linear regression model assuming a
ridge prior for the (non-intercept) coefficients. The number of predictors
p
may exceed the number of observations n
.
Usage
init_lm_ridge(y, X, X_test = NULL)
Arguments
y |
|
X |
|
X_test |
|
Value
a named list params
containing at least
-
mu
: vector of conditional means (fitted values) -
sigma
: the conditional standard deviation -
coefficients
: a named list of parameters that determinemu
Additionally, if X_test is not NULL, then the list includes an element
mu_test
, the vector of conditional means at the test points
Note
The parameters in coefficients
are:
-
beta
: thep x 1
vector of regression coefficients -
sigma_beta
: the prior standard deviation for the (non-intercept) components ofbeta
Initialize the parameters for a simple mean-only model
Description
Initialize the parameters for the model y ~ N(mu0, sigma^2) with a flat prior on mu0.
Usage
init_params_mean(y)
Arguments
y |
|
Value
a named list params
containing
-
mu
: vector of conditional means (fitted values) -
sigma
: the conditional standard deviation -
coefficients
: a named list of parameters that determinemu
Note
The only parameter in coefficients
is mu0
.
Although redundant here, this parametrization is useful in other functions.
Estimate confidence intervals/bands for a STAR process
Description
Compute confidence intervals/bands for the expected value of the count-valued STAR process y
based on intervals/bands for the Gaussian process mu
.
Usage
interval_gRcpp(g_a_j, g_a_jp1, L_mu, U_mu, sigma, Jmax)
Arguments
g_a_j |
|
g_a_jp1 |
|
L_mu |
|
U_mu |
|
sigma |
|
Jmax |
|
Value
LU_y m x 2
vector of intervals for y
.
Note
This function uses Rcpp
for computational efficiency.
Compute the inverse log-odds
Description
Compute the inverse log-odds
Usage
invlogit(x)
Arguments
x |
scalar or vector for which to compute the (componentwise) inverse log-odds |
Value
A scalar or vector of values in (0,1)
Fitting frequentist STAR linear model via EM algorithm
Description
Compute the MLEs and log-likelihood for the STAR linear model. The regression coefficients are estimated using least squares within an EM algorithm.
Usage
lm_star(
formula,
data = NULL,
transformation = "np",
y_max = Inf,
sd_init = 10,
tol = 10^-10,
max_iters = 1000
)
Arguments
formula |
an object of class " |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame)
containing the variables in the model; like |
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
sd_init |
add random noise for EM algorithm initialization scaled by |
tol |
tolerance for stopping the EM algorithm; default is 10^-10; |
max_iters |
maximum number of EM iterations before stopping; default is 1000 |
Details
Standard function calls including
coefficients
, fitted
, and residuals
apply. Fitted values are the expectation
at the MLEs, and as such are not necessarily count-valued.
Value
an object of class
"lmstar", which is a list with the following elements:
-
coefficients
the MLEs of the coefficients -
fitted.values
the fitted values at the MLEs -
g.hat
a function containing the (known or estimated) transformation -
ginv.hat
a function containing the inverse of the transformation -
sigma.hat
the MLE of the standard deviation -
mu.hat
the MLE of the conditional mean (on the transformed scale) -
z.hat
the estimated latent data (on the transformed scale) at the MLEs -
residuals
the Dunn-Smyth residuals (randomized) -
residuals_rep
the Dunn-Smyth residuals (randomized) for 10 replicates -
logLik
the log-likelihood at the MLEs -
logLik0
the log-likelihood at the MLEs for the *unrounded* initialization -
lambda
the Box-Cox nonlinear parameter and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications
Note
Infinite latent data values may occur when the transformed Gaussian model is highly inadequate. In that case, the function returns the *indices* of the data points with infinite latent values, which are significant outliers under the model. Deletion of these indices and re-running the model is one option, but care must be taken to ensure that (i) it is appropriate to treat these observations as outliers and (ii) the model is adequate for the remaining data points.
References
Kowal, D. R., & Wu, B. (2021). Semiparametric count data regression for selfâreported mental health. Biometrics. doi:10.1111/biom.13617
Examples
# Simulate data with count-valued response y:
sim_dat = simulate_nb_lm(n = 100, p = 3)
y = sim_dat$y; X = sim_dat$X
# Fit model
fit_em = lm_star(y~X)
# Fitted coefficients:
coef(fit_em)
# Fitted values:
y_hat = fitted(fit_em)
plot(y_hat, y);
# Residuals:
plot(residuals(fit_em))
qqnorm(residuals(fit_em)); qqline(residuals(fit_em))
Compute the pointwise log-likelihood for STAR
Description
Compute the pointwise log-likelihood for a STAR model. The code here assumes
that the transformed real-valued process (z_star) has conditionally independent
components with means mu
and standard deviations sigma
.
Usage
logLikePointRcpp(g_a_j, g_a_jp1, mu, sigma)
Arguments
g_a_j |
|
g_a_jp1 |
|
mu |
|
sigma |
|
Value
loglike m x 1
log-likelihood value
Note
This function uses Rcpp
for computational efficiency.
Compute the log-likelihood for STAR
Description
Compute the log-likelihood for a STAR model. The code here assumes
that the transformed real-valued process (z_star) has conditionally independent
components with means mu
and standard deviations sigma
.
Usage
logLikeRcpp(g_a_j, g_a_jp1, mu, sigma)
Arguments
g_a_j |
|
g_a_jp1 |
|
mu |
|
sigma |
|
Value
loglike scalar log-likelihood value
Note
This function uses Rcpp
for computational efficiency.
Compute the log-odds
Description
Compute the log-odds
Usage
logit(x)
Arguments
x |
scalar or vector in (0,1) for which to compute the (componentwise) log-odds |
Value
A scalar or vector of log-odds
Plot the estimated regression coefficients and credible intervals
Description
Plot the estimated regression coefficients and credible intervals for the linear effects in up to two models.
Usage
plot_coef(
post_coefficients_1,
post_coefficients_2 = NULL,
alpha = 0.05,
labels = NULL
)
Arguments
post_coefficients_1 |
|
post_coefficients_2 |
|
alpha |
confidence level for the credible intervals |
labels |
|
Value
A plot of regression coefficients and credible intervals for 1-2 models
Plot the fitted values and the data
Description
Plot the fitted values, plus pointwise credible intervals, against the data. For simulations, one may use the true values in place of the data.
Usage
plot_fitted(y, post_y, y_hat = NULL, alpha = 0.05, ...)
Arguments
y |
|
post_y |
|
y_hat |
|
alpha |
confidence level for the credible intervals |
... |
other arguments for plotting |
Value
A plot with the fitted values and the credible intervals against the data
Plot the empirical and model-based probability mass functions
Description
Plot the empirical probability mass function, i.e., the proportion of
data values y
that equal j
for each j=0,1,...
,
together with the model-based estimate of the probability mass function
based on the posterior predictive distribution.
Usage
plot_pmf(y, post.pred, error.bars = FALSE, alpha = 0.05)
Arguments
y |
|
post.pred |
|
error.bars |
logical; if TRUE, include errors bars on the model-based PMF |
alpha |
confidence level for the credible intervals |
Value
A plot of the empirical PMF of y along with a PMF estimate from the model posterior predictive distribution
pmax() in Rcpp
Description
Compute the pointwise max for two vectors of equal length
Usage
pmaxRcpp(v1, v2)
Arguments
v1 |
|
v2 |
|
Value
vm m x 1
vector of pointwise maxima
Note
This function uses Rcpp
for computational efficiency.
pmin() in Rcpp
Description
Compute the pointwise min for two vectors of equal length
Usage
pminRcpp(v1, v2)
Arguments
v1 |
|
v2 |
|
Value
vm m x 1
vector of pointwise minima
Note
This function uses Rcpp
for computational efficiency.
Predict method for response in STAR linear model
Description
Outputs predicted values based on an lmstar fit and optionally prediction intervals based on the the (plug-in) predictive distribution for the STAR linear model
Usage
## S3 method for class 'lmstar'
predict(object, newdata = NULL, interval = FALSE, level = 0.95, N = 1000, ...)
Arguments
object |
Object of class "lmstar" as output by |
newdata |
An optional matrix/data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
interval |
logical; whether or not to include prediction intervals (default FALSE) |
level |
Level for prediction intervals |
N |
number of Monte Carlo samples from the posterior predictive distribution used to approximate intervals; default is 1000 |
... |
Ignored |
Details
If interval=TRUE, then predict.lmstar
uses a Monte Carlo approach to estimating the (plug-in)
predictive distribution for the STAR linear model. The algorithm iteratively samples
(i) the latent data given the observed data, (ii) the latent predictive data given the latent data from (i),
and (iii) (inverse) transforms and rounds the latent predictive data to obtain a
draw from the integer-valued predictive distribution.
The appropriate quantiles of these Monte Carlo draws are computed and reported as the prediction interval.
Value
Either a a vector of predictions (if interval=FALSE) or a matrix of predictions and bounds with column names fit, lwr, and upr
Note
The “plug-in" predictive distribution is a crude approximation. Better
approaches are available using the Bayesian models, e.g. blm_star
, which provide samples
from the posterior predictive distribution.
For highly skewed responses, prediction intervals especially at lower levels may not include the predicted value itself, since the mean is often much larger than the median.
Examples
# Simulate data with count-valued response y:
x = seq(0, 1, length.out = 100)
y = rpois(n = length(x), lambda = exp(1.5 + 5*(x -.5)^2))
# Estimate model--assume a quadratic effect (better for illustration purposes)
fit = lm_star(y~x+I(x^2), transformation = 'sqrt')
#Compute the predictive draws for the test points (same as observed points here)
#Also compute intervals using plug-in predictive distribution
y_pred = predict(fit, interval=TRUE)
# Plot the results
plot(x, y, ylim = range(y, y_pred), main = 'STAR: Predictions and 95% PI')
lines(x,y_pred[,"fit"], col='black', type='s', lwd=4)
lines(x, y_pred[,"lwr"], col='darkgray', type='s', lwd=4)
lines(x, y_pred[,"upr"], col='darkgray', type='s', lwd=4)
Compute coefficient p-values for STAR linear regression using likelihood ratio test
Description
For a linear regression model within the STAR framework, compute p-values for regression coefficients using a likelihood ratio test. It also computes a p-value for excluding all predictors, akin to a (partial) F test.
Usage
pvals(object)
Arguments
object |
Object of class "lmstar" as output by |
Value
a list of p+1 p-values, one for each predictor as well as the joint p-value excluding all predictors
Examples
# Simulate data with count-valued response y:
sim_dat = simulate_nb_lm(n = 100, p = 2)
y = sim_dat$y; X = sim_dat$X
# Select a transformation:
transformation = 'np'
#Estimate model
fit = lm_star(y~X, transformation = transformation)
#Compute p-values
pvals(fit)
Fit Random Forest STAR with EM algorithm
Description
Compute the MLEs and log-likelihood for the Random Forest STAR model.
The STAR model requires a *transformation* and an *estimation function* for the conditional mean
given observed data. The transformation can be known (e.g., log or sqrt) or unknown
(Box-Cox or estimated nonparametrically) for greater flexibility.
The estimator in this case is a random forest.
Standard function calls including fitted
and residuals
apply.
Usage
randomForest_star(
y,
X,
X.test = NULL,
transformation = "np",
y_max = Inf,
sd_init = 10,
tol = 10^-10,
max_iters = 1000,
ntree = 500,
mtry = max(floor(ncol(X)/3), 1),
nodesize = 5
)
Arguments
y |
|
X |
|
X.test |
|
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
sd_init |
add random noise for EM algorithm initialization scaled by |
tol |
tolerance for stopping the EM algorithm; default is 10^-10; |
max_iters |
maximum number of EM iterations before stopping; default is 1000 |
ntree |
Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. Default is 500. |
mtry |
Number of variables randomly sampled as candidates at each split. Default is p/3. |
nodesize |
Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). Default is 5. |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation.
The expectation-maximization (EM) algorithm is used to produce maximum likelihood estimators (MLEs) for the parameters defined in the The fitted values are computed using out-of-bag samples. As a result, the log-likelihood is based on out-of-bag prediction, and it is similarly straightforward to compute out-of-bag squared and absolute errors.
Value
a list with the following elements:
-
fitted.values
: the fitted values at the MLEs based on out-of-bag samples (training) -
fitted.values.test
: the fitted values at the MLEs (testing) -
g.hat
a function containing the (known or estimated) transformation -
sigma.hat
the MLE of the standard deviation -
mu.hat
the MLE of the conditional mean (on the transformed scale) -
z.hat
the estimated latent data (on the transformed scale) at the MLEs -
residuals
the Dunn-Smyth residuals (randomized) -
residuals_rep
the Dunn-Smyth residuals (randomized) for 10 replicates -
logLik
the log-likelihood at the MLEs -
logLik0
the log-likelihood at the MLEs for the *unrounded* initialization -
lambda
the Box-Cox nonlinear parameter -
rfObj
: the object returned by randomForest() at the MLEs and other parameters that (1) track the parameters across EM iterations and (2) record the model specifications
Note
Since the random forest produces random predictions, the EM algorithm will never converge exactly.
Infinite latent data values may occur when the transformed Gaussian model is highly inadequate. In that case, the function returns the *indices* of the data points with infinite latent values, which are significant outliers under the model. Deletion of these indices and re-running the model is one option, but care must be taken to ensure that (i) it is appropriate to treat these observations as outliers and (ii) the model is adequate for the remaining data points.
References
Kowal, D. R., & Wu, B. (2021). Semiparametric count data regression for selfâreported mental health. Biometrics. doi:10.1111/biom.13617
Examples
# Simulate data with count-valued response y:
sim_dat = simulate_nb_friedman(n = 100, p = 10)
y = sim_dat$y; X = sim_dat$X
# EM algorithm for STAR (using the log-link)
fit_em = randomForest_star(y = y, X = X,
transformation = 'log',
max_iters = 100)
# Fitted values (out-of-bag)
y_hat = fitted(fit_em)
plot(y_hat, y);
# Residuals:
plot(residuals(fit_em))
qqnorm(residuals(fit_em)); qqline(residuals(fit_em))
# Log-likelihood at MLEs (out-of-bag):
fit_em$logLik
Data on the efficacy of a pest management system at reducing the number of roaches in urban apartments.
Description
Data on the efficacy of a pest management system at reducing the number of roaches in urban apartments.
Usage
roaches
Format
## 'roaches' A data frame with 262 obs. of 6 variables:
- y
Number of roaches caught
- roach1
Pretreatment number of roaches
- treatment
Treatment indicator
- senior
Indicator for only elderly residents in building
- exposure2
Number of days for which the roach traps were used
Source
Gelman and Hill (2007); package 'rstanarm'
Rounding function
Description
Define the rounding operator associated with the floor function. The function
also returns zero whenever the input is negative and caps the value at y_max
,
where y_max
is a known upper bound on the data y
(if specified).
Usage
round_floor(z, y_max = Inf)
Arguments
z |
the real-valued input(s) |
y_max |
a fixed and known upper bound for all observations; default is |
Value
The count-valued output(s) from the rounding function.
Examples
# Floor function:
round_floor(1.5)
round_floor(0.5)
# Special treatmeant of negative numbers:
round_floor(-1)
Sample from a truncated normal distribution
Description
Sample from a truncated normal distribution. Samples are drawn componentwise, so each component of the vector is allowed its own mean, standard deviation, and upper and lower limits. The components are assumed to be independent.
Usage
rtruncnormRcpp(y_lower, y_upper, mu, sigma, u_rand)
Arguments
y_lower |
|
y_upper |
|
mu |
|
sigma |
|
u_rand |
|
Value
z_star m x 1
draw from the truncated normal distribution
Note
This function uses Rcpp
for computational efficiency.
Sample a Gaussian vector using the fast sampler of BHATTACHARYA et al.
Description
Sample from N(mu, Sigma) where Sigma = solve(crossprod(Phi) + solve(D)) and mu = Sigma*crossprod(Phi, alpha):
Usage
sampleFastGaussian(Phi, Ddiag, alpha)
Arguments
Phi |
|
Ddiag |
|
alpha |
|
Value
Draw from N(mu, Sigma), which is p x 1
, and is computed in O(n^2*p)
Note
Assumes D is diagonal, but extensions are available
Sample the parameters for an additive model
Description
Sample the parameters for an additive model, which may contain both linear and nonlinear predictors. The nonlinear terms are modeled using orthogonalized splines. The sampler draws the linear terms jointly and then samples each vector of nonlinear coefficients using Bayesian backfitting (i.e., conditional on all other nonlinear and linear terms).
Usage
sample_bam_orthog(
y,
X_lin,
X_nonlin,
params,
A = 10^4,
B_all = NULL,
diagBtB_all = NULL,
XtX = NULL
)
Arguments
y |
|
X_lin |
|
X_nonlin |
|
params |
the named list of parameters containing
|
A |
the prior scale for |
B_all |
optional |
diagBtB_all |
optional |
XtX |
optional |
Value
The updated named list params
with draws from the full conditional distributions
of sigma
and coefficients
(and updated mu
).
Note
The parameters in coefficients
are:
-
beta_lin
: thep x 1
linear coefficients, including the linear terms fromX_nonlin
-
f_j
: then x pNL
matrix of fitted values for each nonlinear function -
theta_j
: thepNL
-dimensional of nonlinear basis coefficients -
sigma_beta
:p x 1
vector of linear regression coefficient standard deviations -
sigma_theta_j
:pNL x 1
vector of nonlinear coefficient standard deviations
Sample the parameters for an additive model
Description
Sample the parameters for an additive model, which may contain both linear and nonlinear predictors. The nonlinear terms are modeled using low-rank thin plate splines. The sampler draws the linear terms jointly and then samples each vector of nonlinear coefficients using Bayesian backfitting (i.e., conditional on all other nonlinear and linear terms).
Usage
sample_bam_thin(
y,
X_lin,
X_nonlin,
params,
A = 10^4,
B_all = NULL,
BtB_all = NULL,
XtX = NULL
)
Arguments
y |
|
X_lin |
|
X_nonlin |
|
params |
the named list of parameters containing
|
A |
the prior scale for |
B_all |
optional |
BtB_all |
optional |
XtX |
optional |
Value
The updated named list params
with draws from the full conditional distributions
of sigma
and coefficients
(and updated mu
).
Note
The parameters in coefficients
are:
-
beta_lin
: thep x 1
linear coefficients, including the linear terms fromX_nonlin
-
f_j
: then x pNL
matrix of fitted values for each nonlinear function -
theta_j
: thepNL
-dimensional of nonlinear basis coefficients -
sigma_beta
:p x 1
vector of linear regression coefficient standard deviations -
sigma_theta_j
:pNL x 1
vector of nonlinear coefficient standard deviations
Sample the linear regression parameters assuming a g-prior
Description
Sample the parameters for a linear regression model assuming a g-prior for the coefficients.
Usage
sample_lm_gprior(y, X, params, psi = NULL, XtX = NULL, X_test = NULL)
Arguments
y |
|
X |
|
params |
the named list of parameters containing
|
psi |
the prior variance for the g-prior |
XtX |
the |
X_test |
matrix of predictors at test points (default is NULL) |
Value
The updated named list params
with draws from the full conditional distributions
of sigma
and coefficients
(along with updated mu
and mu_test
if applicable).
Note
The parameters in coefficients
are:
-
beta
: thep x 1
vector of regression coefficients components ofbeta
Examples
# Simulate data for illustration:
sim_dat = simulate_nb_lm(n = 100, p = 5)
y = sim_dat$y; X = sim_dat$X
# Initialize:
params = init_lm_gprior(y = y, X = X)
# Sample:
params = sample_lm_gprior(y = y, X = X, params = params)
names(params)
names(params$coefficients)
Sample linear regression parameters assuming horseshoe prior
Description
Sample the parameters for a linear regression model assuming a
horseshoe prior for the (non-intercept) coefficients. The number of predictors
p
may exceed the number of observations n
.
Usage
sample_lm_hs(y, X, params, XtX = NULL, X_test = NULL)
Arguments
y |
|
X |
|
params |
the named list of parameters containing
|
XtX |
the |
X_test |
matrix of predictors at test points (default is NULL) |
Value
The updated named list params
with draws from the full conditional distributions
of sigma
and coefficients
(along with updated mu
and mu_test
if applicable).
Note
The parameters in coefficients
are:
-
beta
thep x 1
vector of regression coefficients -
sigma_beta
p x 1
vector of regression coefficient standard deviations (local scale parameters) -
xi_sigma_beta
p x 1
vector of parameter-expansion variables forsigma_beta
-
lambda_beta
the global scale parameter -
xi_lambda_beta
parameter-expansion variable forlambda_beta
components ofbeta
Sample linear regression parameters assuming a ridge prior
Description
Sample the parameters for a linear regression model assuming a
ridge prior for the (non-intercept) coefficients. The number of predictors
p
may exceed the number of observations n
.
Usage
sample_lm_ridge(y, X, params, A = 10^4, XtX = NULL, X_test = NULL)
Arguments
y |
|
X |
|
params |
the named list of parameters containing
|
A |
the prior scale for |
XtX |
the |
X_test |
matrix of predictors at test points (default is NULL) |
Value
The updated named list params
with draws from the full conditional distributions
of sigma
and coefficients
(along with updated mu
and mu_test
if applicable).
Note
The parameters in coefficients
are:
-
beta
: thep x 1
vector of regression coefficients -
sigma_beta
: the prior standard deviation for the (non-intercept) components ofbeta
Sample the parameters for a simple mean-only model
Description
Sample the parameters for the model y ~ N(mu0, sigma^2) with a flat prior on mu0 and sigma ~ Unif(0, A).
Usage
sample_params_mean(y, params)
Arguments
y |
|
params |
the named list of parameters containing
|
Value
The updated named list params
with draws from the full conditional distributions
of sigma
and coefficients
(and updated mu
).
Note
The only parameter in coefficients
is mu0
.
Although redundant here, this parametrization is useful in other functions.
Compute Simultaneous Band Scores (SimBaS)
Description
Compute simultaneous band scores (SimBaS) from Meyer et al. (2015, Biometrics). SimBaS uses MC(MC) simulations of a function of interest to compute the minimum alpha such that the joint credible bands at the alpha level do not include zero. This quantity is computed for each grid point (or observation point) in the domain of the function.
Usage
simBaS(sampFuns)
Arguments
sampFuns |
|
Value
m x 1
vector of simBaS
Note
The input needs not be curves: the simBaS may be computed for vectors to achieve a multiplicity adjustment.
The minimum of the returned value, PsimBaS_t
,
over the domain t
is the Global Bayesian P-Value (GBPV) for testing
whether the function is zero everywhere.
Simulate count data from Friedman's nonlinear regression
Description
Simulate data from a negative-binomial distribution with nonlinear mean function.
Usage
simulate_nb_friedman(
n = 100,
p = 10,
r_nb = 1,
b_int = log(1.5),
b_sig = log(5),
sigma_true = sqrt(2 * log(1)),
seed = NULL
)
Arguments
n |
number of observations |
p |
number of predictors |
r_nb |
the dispersion parameter of the Negative Binomial dispersion; smaller values imply greater overdispersion, while larger values approximate the Poisson distribution. |
b_int |
intercept; default is log(1.5). |
b_sig |
regression coefficients for true signals; default is log(5.0). |
sigma_true |
standard deviation of the Gaussian innovation; default is zero. |
seed |
optional integer to set the seed for reproducible simulation; default is NULL which results in a different dataset after each run |
Details
The log-expected counts are modeled using the Friedman (1991) nonlinear function
with interactions, possibly
with additional Gaussian noise (on the log-scale). We assume that half of the predictors
are associated with the response, i.e., true signals. For sufficiently large dispersion
parameter r_nb
, the distribution will approximate a Poisson distribution.
Here, the predictor variables are simulated from independent uniform distributions.
Value
A named list with the simulated count response y
, the simulated design matrix X
,
and the true expected counts Ey
.
Note
Specifying sigma_true = sqrt(2*log(1 + a))
implies that the expected counts are
inflated by 100*a
% (relative to exp(X*beta)
), in addition to providing additional
overdispersion.
Examples
# Simulate and plot the count data:
sim_dat = simulate_nb_friedman(n = 100, p = 10);
plot(sim_dat$y)
Simulate count data from a linear regression
Description
Simulate data from a negative-binomial distribution with linear mean function.
Usage
simulate_nb_lm(
n = 100,
p = 10,
r_nb = 1,
b_int = log(1.5),
b_sig = log(2),
sigma_true = sqrt(2 * log(1)),
ar1 = 0,
intercept = FALSE,
seed = NULL
)
Arguments
n |
number of observations |
p |
number of predictors (including the intercept) |
r_nb |
the dispersion parameter of the Negative Binomial dispersion; smaller values imply greater overdispersion, while larger values approximate the Poisson distribution. |
b_int |
intercept; default is log(1.5), which implies the expected count is 1.5 when all predictors are zero |
b_sig |
regression coefficients for true signals; default is log(2.0), which implies a twofold increase in the expected counts for a one unit increase in x |
sigma_true |
standard deviation of the Gaussian innovation; default is zero. |
ar1 |
the autoregressive coefficient among the columns of the X matrix; default is zero. |
intercept |
a Boolean indicating whether an intercept column should be included in the returned design matrix; default is FALSE |
seed |
optional integer to set the seed for reproducible simulation; default is NULL which results in a different dataset after each run |
Details
The log-expected counts are modeled as a linear function of covariates, possibly
with additional Gaussian noise (on the log-scale). We assume that half of the predictors
are associated with the response, i.e., true signals. For sufficiently large dispersion
parameter r_nb
, the distribution will approximate a Poisson distribution.
Here, the predictor variables are simulated from independent standard normal distributions.
Value
A named list with the simulated count response y
, the simulated design matrix X
(possibly including the intercept), the true expected counts Ey
,
and the true regression coefficients beta_true
.
Note
Specifying sigma_true = sqrt(2*log(1 + a))
implies that the expected counts are
inflated by 100*a
% (relative to exp(X*beta)
), in addition to providing additional
overdispersion.
Examples
# Simulate and plot the count data:
sim_dat = simulate_nb_lm(n = 100, p = 10);
plot(sim_dat$y)
Initialize and reparametrize a spline basis matrix
Description
Following Wand and Ormerod (2008), compute a low-rank thin plate spline basis which is diagonalized such that the prior variance for the nonlinear component is a scalar times a diagonal matrix. Knot locations are determined by quantiles and the penalty is the integrated squared second derivative.
Usage
splineBasis(tau, sumToZero = TRUE, rescale01 = TRUE)
Arguments
tau |
|
sumToZero |
logical; if TRUE, enforce a sum-to-zero constraint (useful for additive models) |
rescale01 |
logical; if TRUE, rescale |
Value
B_nl
: the nonlinear component of the spline basis matrix
Note
To form the full spline basis matrix, compute cbind(1, tau, B_nl)
.
The sum-to-zero constraint implicitly assumes that the linear term is
centered and scaled, i.e., scale(tau)
.
Estimation for Bayesian STAR spline regression
Description
Compute samples from the predictive distributions of a STAR spline regression model using either a Gibbs sampling approach or exact Monte Carlo sampling (default is Gibbs sampling which scales better for large n)
Usage
spline_star(
y,
tau = NULL,
transformation = "np",
y_max = Inf,
psi = NULL,
approx_Fz = FALSE,
approx_Fy = FALSE,
nsave = 1000,
use_MCMC = TRUE,
nburn = 1000,
nskip = 0,
verbose = TRUE,
method_sigma = "mle"
)
Arguments
y |
|
tau |
|
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
psi |
prior variance (1/smoothing parameter); if NULL, update in MCMC |
approx_Fz |
logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data |
approx_Fy |
logical; in BNP transformation, approximate
the marginal CDF of |
nsave |
number of MCMC iterations to save (or number of Monte Carlo simulations) |
use_MCMC |
logical; whether to run Gibbs sampler or Monte Carlo (default is TRUE) |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
verbose |
logical; if TRUE, print time remaining |
method_sigma |
method to estimate the latent data standard deviation (only applicable if
|
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the continuous latent data model is a spline regression.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt'. Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. The distribution-based
transformations approximately preserve the mean and variance of the count data y
on the latent data scale, which lends interpretability to the model parameters.
Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'),
which is a Bayesian nonparametric model and incorporates the uncertainty
about the transformation into posterior and predictive inference.
Value
a list with the following elements:
-
post_ytilde
:nsave x n
samples from the posterior predictive distribution at the observation pointstau
-
marg_like
: the marginal likelihood (only ifuse_MCMC=FALSE
; otherwise NULL)
Note
For the 'bnp' transformation (without the Fy
approximation),
there are numerical stability issues when psi
is modeled as unknown.
In this case, it is better to fix psi
at some positive number.
Examples
# Simulate some data:
n = 100
tau = seq(0,1, length.out = n)
y = round_floor(exp(1 + rnorm(n)/4 + poly(tau, 4)%*%rnorm(n=4, sd = 4:1)))
# Sample from the predictive distribution of a STAR spline model:
fit = spline_star(y = y, tau = tau)
# Compute 90% prediction intervals:
pi_y = t(apply(fit$post_ytilde, 2, quantile, c(0.05, .95)))
# Plot the results: intervals, median, and smoothed mean
plot(tau, y, ylim = range(pi_y, y))
polygon(c(tau, rev(tau)),c(pi_y[,2], rev(pi_y[,1])),col='gray', border=NA)
lines(tau, apply(fit$post_ytilde, 2, median), lwd=5, col ='black')
lines(tau, smooth.spline(tau, apply(fit$post_ytilde, 2, mean))$y, lwd=5, col='blue')
lines(tau, y, type='p')
Monte Carlo predictive sampler for spline regression
Description
Compute direct Monte Carlo samples from the posterior predictive distribution of a STAR spline regression model.
Usage
spline_star_exact(
y,
tau = NULL,
transformation = "np",
y_max = Inf,
psi = 1000,
method_sigma = "mle",
approx_Fz = FALSE,
approx_Fy = FALSE,
nsave = 1000,
compute_marg = TRUE
)
Arguments
y |
|
tau |
|
transformation |
transformation to use for the latent data; must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
psi |
prior variance (1/smoothing parameter) |
method_sigma |
method to estimate the latent data standard deviation; must be one of
|
approx_Fz |
logical; in BNP transformation, apply a (fast and stable) normal approximation for the marginal CDF of the latent data |
approx_Fy |
logical; in BNP transformation, approximate
the marginal CDF of |
nsave |
number of Monte Carlo simulations |
compute_marg |
logical; if TRUE, compute and return the marginal likelihood |
Details
STAR defines a count-valued probability model by (1) specifying a Gaussian model for continuous *latent* data and (2) connecting the latent data to the observed data via a *transformation and rounding* operation. Here, the continuous latent data model is a spline regression.
There are several options for the transformation. First, the transformation
can belong to the *Box-Cox* family, which includes the known transformations
'identity', 'log', and 'sqrt'. Second, the transformation
can be estimated (before model fitting) using the empirical distribution of the
data y
. Options in this case include the empirical cumulative
distribution function (CDF), which is fully nonparametric ('np'), or the parametric
alternatives based on Poisson ('pois') or Negative-Binomial ('neg-bin')
distributions. For the parametric distributions, the parameters of the distribution
are estimated using moments (means and variances) of y
. The distribution-based
transformations approximately preserve the mean and variance of the count data y
on the latent data scale, which lends interpretability to the model parameters.
Lastly, the transformation can be modeled using the Bayesian bootstrap ('bnp'),
which is a Bayesian nonparametric model and incorporates the uncertainty
about the transformation into posterior and predictive inference.
The Monte Carlo sampler produces direct, discrete, and joint draws from the posterior predictive distribution of the spline regression model at the observed tau points.
Value
a list with the following elements:
-
post_ytilde
:nsave x n
samples from the posterior predictive distribution at the observation pointstau
-
marg_like
: the marginal likelihood (if requested; otherwise NULL)
Compute the first and second moment of a truncated normal
Description
Given lower and upper endpoints and the mean and standard deviation of a (non-truncated) normal distribution, compute the first and second moment of the truncated normal distribution. All inputs may be scalars or vectors.
Usage
truncnorm_mom(a, b, mu, sig)
Arguments
a |
lower endpoint |
b |
upper endpoint |
mu |
expected value of the non-truncated normal distribution |
sig |
standard deviation of the non-truncated normal distribution |
Value
a list containing the first moment m1
and the second moment m2
Univariate Slice Sampler from Neal (2008)
Description
Compute a draw from a univariate distribution using the code provided by Radford M. Neal. The documentation below is also reproduced from Neal (2008).
Usage
uni.slice(x0, g, w = 1, m = Inf, lower = -Inf, upper = +Inf, gx0 = NULL)
Arguments
x0 |
Initial point |
g |
Function returning the log of the probability density (plus constant) |
w |
Size of the steps for creating interval (default 1) |
m |
Limit on steps (default infinite) |
lower |
Lower bound on support of the distribution (default -Inf) |
upper |
Upper bound on support of the distribution (default +Inf) |
gx0 |
Value of g(x0), if known (default is not known) |
Value
The point sampled, with its log density attached as an attribute.
Note
The log density function may return -Inf for points outside the support of the distribution. If a lower and/or upper bound is specified for the support, the log density function will not be called outside such limits.
Update parameters for warpDLM model with trend DLM
Description
This function serves to update the warpDLM variance parameters when the underlying DLM is a structural model (i.e. local level or local linear trend). It assumes a Unif(0,A=10^4) prior on all standard deviations.
Usage
update_struct(fit, z_star, theta)
Arguments
fit |
the KFAS model object describing the DLM |
z_star |
the latest draw of z* |
theta |
the latest draw of the latent state(s) theta |
Value
A KFAS model object (of class SSModel) updated with the newly sampled variance parameters
Posterior Inference for warpDLM model with latent structural DLM
Description
This function outputs posterior quantities and forecasts from a univariate warpDLM model. Currently two latent DLM specifications are supported: local level and the local linear trend.
Usage
warpDLM(
y,
type = c("level", "trend"),
transformation = c("np", "identity", "log", "sqrt", "pois", "neg-bin"),
y_max = Inf,
R0 = 10,
nsave = 5000,
nburn = 5000,
nskip = 1,
n.ahead = 1
)
Arguments
y |
the count-valued time series |
type |
the type of latent DLM (must be either level or trend) |
transformation |
transformation to use for the latent process (default is np); must be one of
|
y_max |
a fixed and known upper bound for all observations; default is |
R0 |
the variance for the initial state theta_0; default is 10 |
nsave |
number of MCMC iterations to save |
nburn |
number of MCMC iterations to discard |
nskip |
number of MCMC iterations to skip between saving iterations, i.e., save every (nskip + 1)th draw |
n.ahead |
number of steps to forecast ahead |
Value
A list with the following elements:
-
V_post
: posterior draws of the observation variance -
W_post
: posterior draws of the state update variance(s) -
fc_post
: draws from the forecast distribution (of length n.ahead) -
post_pred
: draws from the posterior predictive distribution ofy
-
g_func
: transformation function -
g_inv_func
: inverse transformation function -
KFAS_mod
: the final KFAS model representing the latent DLM