Type: | Package |
Title: | Two-Part Estimation of Treatment Rules for Semi-Continuous Data |
Version: | 0.0.1 |
Description: | Implements the methodology of Huling, Smith, and Chen (2020) <doi:10.1080/01621459.2020.1801449>, which allows for subgroup identification for semi-continuous outcomes by estimating individualized treatment rules. It uses a two-part modeling framework to handle semi-continuous data by separately modeling the positive part of the outcome and an indicator of whether each outcome is positive, but still results in a single treatment rule. High dimensional data is handled with a cooperative lasso penalty, which encourages the coefficients in the two models to have the same sign. |
URL: | https://github.com/jaredhuling/personalized2part |
BugReports: | https://github.com/jaredhuling/personalized2part/issues |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | personalized, HDtweedie |
LinkingTo: | Rcpp, RcppEigen |
Imports: | Rcpp, foreach, methods |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | yes |
Packaged: | 2020-09-02 20:55:06 UTC; jared |
Author: | Jared Huling |
Maintainer: | Jared Huling <jaredhuling@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2020-09-10 10:00:03 UTC |
Fit a penalized gamma augmentation model via cross fitting
Description
Fits a penalized gamma augmentation model via cross fitting and returns vector of length n of out of sample predictions on the link scale from cross fitting
Usage
HDtweedie_kfold_aug(
x,
y,
trt,
wts = NULL,
K = 10,
p = 1.5,
interactions = FALSE
)
Arguments
x |
an n x p matrix of covariates for the zero part data, where each row is an observation and each column is a predictor. MUST be ordered such that the first n_s rows align with the observations in x_s and s |
y |
a length n vector of responses taking positive values |
trt |
a length n vector of treatment variables with 1 indicating treatment and -1 indicating control |
wts |
a length n vector of sample weights |
K |
number of folds for cross fitting |
p |
tweedie mixing parameter. See |
interactions |
boolean variable of whether or not to fit model with interactions. For predictions, interactions will be integrated out |
Cross validation for hd2part models
Description
Cross validation for hd2part models
Usage
cv.hd2part(
x,
z,
x_s,
s,
weights = rep(1, NROW(x)),
weights_s = rep(1, NROW(x_s)),
offset = NULL,
offset_s = NULL,
lambda = NULL,
type.measure = c("mae", "mse", "sep-auc-mse", "sep-auc-mae"),
nfolds = 10,
foldid = NULL,
grouped = TRUE,
keep = FALSE,
parallel = FALSE,
...
)
Arguments
x |
an n x p matrix of covariates for the zero part data, where each row is an observation and each column is a predictor. MUST be ordered such that the first n_s rows align with the observations in x_s and s |
z |
a length n vector of responses taking values 1 and 0, where 1 indicates the response is positive and zero indicates the response has value 0. MUST be ordered such that the first n_s values align with the observations in x_s and s |
x_s |
an n_s x p matrix of covariates (which is a submatrix of x) for the positive part data, where each row is an observation and each column is a predictor |
s |
a length n_s vector of responses taking strictly positive values |
weights |
a length n vector of observation weights for the zero part data |
weights_s |
a length n_s vector of observation weights for the positive part data |
offset |
a length n vector of offset terms for the zero part data |
offset_s |
a length n_s vector of offset terms for the positive part data |
lambda |
A user supplied lambda sequence. By default, the program computes its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. |
type.measure |
measure to evaluate for cross-validation. Will add more description later |
nfolds |
number of folds for cross-validation. default is 10. 3 is smallest value allowed. |
foldid |
an optional vector of values between 1 and nfold specifying which fold each observation belongs to. |
grouped |
Like in glmnet, this is an experimental argument, with default |
keep |
If |
parallel |
If TRUE, use parallel foreach to fit each fold. Must register parallel before hand, such as doMC. |
... |
other parameters to be passed to |
Examples
set.seed(1)
Fitting subgroup identification models for semicontinuous positive outcomes
Description
Fits subgroup identification models
Usage
fit_subgroup_2part(
x,
y,
trt,
propensity.func = NULL,
propensity.func.positive = NULL,
match.id = NULL,
augment.func.zero = NULL,
augment.func.positive = NULL,
cutpoint = 1,
larger.outcome.better = TRUE,
penalize.ate = TRUE,
y_eps = 1e-06,
...
)
Arguments
x |
The design matrix (not including intercept term) |
y |
The nonnegative response vector |
trt |
treatment vector with each element equal to a 0 or a 1, with 1 indicating treatment status is active. |
propensity.func |
function that inputs the design matrix x and the treatment vector trt and outputs
the propensity score, ie Pr(trt = 1 | X = x). Function should take two arguments 1) x and 2) trt. See example below.
For a randomized controlled trial this can simply be a function that returns a constant equal to the proportion
of patients assigned to the treatment group, i.e.:
|
propensity.func.positive |
function that inputs the design matrix x and the treatment vector trt and outputs
the propensity score for units with positive outcome values, ie Pr(trt = 1 | X = x, Z = 1). Function should take
two arguments 1) x and 2) trt. See example below.
For a randomized controlled trial this can simply be a function that returns a constant equal to the proportion
of patients assigned to the treatment group, i.e.:
|
match.id |
a (character, factor, or integer) vector with length equal to the number of observations in Example 1: Example 2: augment.func <- function(x, y, trt) { data <- data.frame(x, y, trt) lmod <- glm(y ~ x * trt, family = binomial()) ## get predictions when trt = 1 data$trt <- 1 preds_1 <- predict(lmod, data, type = "response") ## get predictions when trt = -1 data$trt <- -1 preds_n1 <- predict(lmod, data, type = "response") ## return predictions averaged over trt return(0.5 * (preds_1 + preds_n1)) } |
augment.func.zero |
(similar to augment.func.positive) function which inputs the
indicators of whether each response is positive ( |
augment.func.positive |
(similar to augment.func.zero) function which inputs the positive part response
(ie all observations in |
cutpoint |
numeric value for patients with benefit scores above which
(or below which if |
larger.outcome.better |
boolean value of whether a larger outcome is better/preferable. Set to |
penalize.ate |
should the treatment main effect (ATE) be penalized too? |
y_eps |
positive value above which observations in |
... |
options to be passed to |
Examples
set.seed(42)
dat <- sim_semicontinuous_data(250, n.vars = 15)
x <- dat$x
y <- dat$y
trt <- dat$trt
prop_func <- function(x, trt)
{
propensmod <- glm(trt ~ x, family = binomial())
propens <- unname(fitted(propensmod))
propens
}
fitted_model <- fit_subgroup_2part(x, y, trt, prop_func, prop_func)
fitted_model
## correlation of estimated covariate-conditional risk ratio and truth
cor(fitted_model$benefit.scores, dat$treatment_risk_ratio, method = "spearman")
Main fitting function for group lasso and cooperative lasso penalized two part models
Description
This function fits penalized two part models with a logistic regression model for the zero part and a gamma regression model for the positive part. Each covariate's effect has either a group lasso or cooperative lasso penalty for its effects for the two consituent models
Usage
hd2part(
x,
z,
x_s,
s,
weights = rep(1, NROW(x)),
weights_s = rep(1, NROW(x_s)),
offset = NULL,
offset_s = NULL,
penalty = c("grp.lasso", "coop.lasso"),
penalty_factor = NULL,
nlambda = 100L,
lambda_min_ratio = ifelse(n_s < p, 0.05, 0.005),
lambda = NULL,
tau = 0,
opposite_signs = FALSE,
flip_beta_zero = FALSE,
intercept_z = FALSE,
intercept_s = FALSE,
strongrule = TRUE,
maxit_irls = 50,
tol_irls = 1e-05,
maxit_mm = 500,
tol_mm = 1e-05,
balance_likelihoods = TRUE
)
Arguments
x |
an n x p matrix of covariates for the zero part data, where each row is an observation and each column is a predictor |
z |
a length n vector of responses taking values 1 and 0, where 1 indicates the response is positive and zero indicates the response has value 0. |
x_s |
an n_s x p matrix of covariates (which is a submatrix of x) for the positive part data, where each row is an observation and each column is a predictor |
s |
a length n_s vector of responses taking strictly positive values |
weights |
a length n vector of observation weights for the zero part data |
weights_s |
a length n_s vector of observation weights for the positive part data |
offset |
a length n vector of offset terms for the zero part data |
offset_s |
a length n_s vector of offset terms for the positive part data |
penalty |
either |
penalty_factor |
a length p vector of penalty adjustment factors corresponding to each covariate. A value of 0 in the jth location indicates no penalization on the jth variable, and any positive value will indicate a multiplicative factor on top of the common penalization amount. The default value is 1 for all variables |
nlambda |
the number of lambda values. The default is 100. |
lambda_min_ratio |
Smallest value for |
lambda |
a user supplied sequence of penalization tuning parameters. By default, the program automatically
chooses a sequence of lambda values based on |
tau |
value between 0 and 1 for sparse group mixing penalty. 0 implies either group lasso or coop lasso and 1 implies lasso |
opposite_signs |
a boolean variable indicating whether the signs of coefficients across models should be encouraged to have
opposite signs instead of the same signs. Default is |
flip_beta_zero |
should we flip the signs of the parameters for the zero part model? Defaults to |
intercept_z |
whether or not to include an intercept in the zero part model. Default is |
intercept_s |
whether or not to include an intercept in the positive part model. Default is |
strongrule |
should a strong rule be used? Defaults to |
maxit_irls |
maximum number of IRLS iterations |
tol_irls |
convergence tolerance for IRLS iterations |
maxit_mm |
maximum number of MM iterations. Note that for |
tol_mm |
convergence tolerance for MM iterations. Note that for |
balance_likelihoods |
should the likelihoods be balanced so variables would enter both models at the same value of lambda
if the penalty were a lasso penalty? Recommended to keep at the default, |
Examples
library(personalized2part)
Fitting function for lasso penalized GLMs
Description
This function fits penalized gamma GLMs
Usage
hdgamma(
x,
y,
weights = rep(1, NROW(x)),
offset = NULL,
penalty_factor = NULL,
nlambda = 100L,
lambda_min_ratio = ifelse(n < p, 0.05, 0.005),
lambda = NULL,
tau = 0,
intercept = TRUE,
strongrule = TRUE,
maxit_irls = 50,
tol_irls = 1e-05,
maxit_mm = 500,
tol_mm = 1e-05
)
Arguments
x |
an n x p matrix of covariates for the zero part data, where each row is an observation and each column is a predictor |
y |
a length n vector of responses taking strictly positive values. |
weights |
a length n vector of observation weights |
offset |
a length n vector of offset terms |
penalty_factor |
a length p vector of penalty adjustment factors corresponding to each covariate. A value of 0 in the jth location indicates no penalization on the jth variable, and any positive value will indicate a multiplicative factor on top of the common penalization amount. The default value is 1 for all variables |
nlambda |
the number of lambda values. The default is 100. |
lambda_min_ratio |
Smallest value for |
lambda |
a user supplied sequence of penalization tuning parameters. By default, the program automatically
chooses a sequence of lambda values based on |
tau |
a scalar numeric value between 0 and 1 (included) which is a mixing parameter for sparse group lasso penalty. 0 indicates group lasso and 1 indicates lasso, values in between reflect different emphasis on group and lasso penalties |
intercept |
whether or not to include an intercept. Default is |
strongrule |
should a strong rule be used? |
maxit_irls |
maximum number of IRLS iterations |
tol_irls |
convergence tolerance for IRLS iterations |
maxit_mm |
maximum number of MM iterations. Note that for |
tol_mm |
convergence tolerance for MM iterations. Note that for |
Examples
library(personalized2part)
Plot method for hd2part fitted objects
Description
Plot method for hd2part fitted objects
Usage
## S3 method for class 'hd2part'
plot(
x,
model = c("zero", "positive"),
xvar = c("loglambda", "norm", "lambda"),
labsize = 0.6,
xlab = iname,
ylab = NULL,
main = paste(model, "model"),
...
)
## S3 method for class 'cv.hd2part'
plot(x, sign.lambda = 1, ...)
Arguments
x |
fitted "hd2part" model object |
model |
either |
xvar |
What is on the X-axis. |
labsize |
size of labels for variable names. If labsize = 0, then no variable names will be plotted |
xlab |
label for x-axis |
ylab |
label for y-axis |
main |
main title for plot |
... |
other graphical parameters for the plot |
sign.lambda |
Either plot against log(lambda) (default) or its negative if |
Examples
set.seed(123)
set.seed(123)
Prediction function for fitted cross validation hd2part objects
Description
Prediction function for fitted cross validation hd2part objects
Usage
## S3 method for class 'cv.hd2part'
predict(
object,
newx,
model = c("zero", "positive"),
s = c("lambda.min", "lambda.1se"),
type = c("link", "model_response", "response", "coefficients", "nonzero"),
...
)
Arguments
object |
fitted |
newx |
Matrix of new values for |
model |
either |
s |
Value(s) of the penalty parameter lambda at which predictions are required. Default is the entire sequence used to create
the model. For |
type |
Type of prediction required. |
... |
arguments to be passed to |
Examples
set.seed(123)
Prediction method for two part fitted objects
Description
Prediction method for two part fitted objects
Usage
## S3 method for class 'hd2part'
predict(
object,
newx,
s = NULL,
model = c("zero", "positive"),
type = c("link", "model_response", "response", "coefficients", "nonzero"),
newoffset = NULL,
...
)
Arguments
object |
fitted "hd2part" model object |
newx |
Matrix of new values for |
s |
Value(s) of the penalty parameter lambda for the zero part at which predictions are required. Default is the entire sequence used to create the model. |
model |
either |
type |
Type of prediction required. |
newoffset |
f an offset is used in the fit, then one must be supplied for making predictions |
... |
not used |
Value
An object depending on the type argument
Examples
set.seed(1)
Generates data from a two part distribution with a point mass at zero and heterogeneous treatment effects
Description
Generates semicontinuous data with heterogeneity of treatment effect
Usage
sim_semicontinuous_data(n.obs = 1000, n.vars = 25)
Arguments
n.obs |
number of observations |
n.vars |
number of variables. Must be at least 10 |
Value
returns list with values y
for outcome, x
for design matrix, trt
for
treatment assignments, betanonzero
for true coefficients for treatment-covariate interactions for model for
whether or not a response is nonzero, betapos
for true coefficients for treatment-covariate interactions
for positive model, treatment_risk_ratio
for the true covariate-conditional treatment effect risk ratio for
each observation, pi.x
for the true underlying propensity score