Type: | Package |
Title: | Evaluate a Longitudinal Surrogate with a Censored Outcome |
Version: | 1.1 |
Date: | 2025-06-05 |
Depends: | R (≥ 4.1.0) |
Description: | Provides influence function-based methods to evaluate a longitudinal surrogate marker in a censored time-to-event outcome setting, with plug-in and targeted maximum likelihood estimation options. Details are described in: Agniel D and Parast L (2025). "Robust Evaluation of Longitudinal Surrogate Markers with Censored Data." Journal of the Royal Statistical Society: Series B <doi:10.1093/jrsssb/qkae119>. A tutorial for this package can be found at https://www.laylaparast.com/survivalsurrogate and a Shiny App implementing the package can be found at https://parastlab.shinyapps.io/survivalsurrogateApp/. |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
Imports: | stats, dplyr, magrittr, glue, mlr3, purrr, SparseM, rBeta2009, data.table, utils, rpart |
Suggests: | mlr3learners |
NeedsCompilation: | no |
Packaged: | 2025-06-05 19:20:21 UTC; parastlm |
Author: | Denis Agniel [aut], Layla Parast [aut, cre] |
Maintainer: | Layla Parast <parast@austin.utexas.edu> |
Repository: | CRAN |
Date/Publication: | 2025-06-05 20:20:01 UTC |
Estimates the proportion of the treatment effect explained
Description
Estimates the proportion of the treatment effect on the censored primary outcome that is explained by the longitudinal surrogate marker, given the estimated influence functions for the overall treatment effect (delta) and the residual treatment effect (delta_s).
Usage
estimate_R(delta_if, delta_s_if, delta = NULL, delta_s = NULL, se_type = "asymptotic",
n_boot = NULL, alpha = 0.05)
Arguments
delta_if |
Influence function estimates for delta. |
delta_s_if |
Influence function estimates for delta_s. |
delta |
(Optional) Defaults to the mean of the delta influence function estimates. |
delta_s |
(Optional) Defaults to the mean of the delta_s influence function estimates. |
se_type |
Type of standard error estimation, choices are "asymptotic" or "bootstrap". |
n_boot |
(Optional unless se_type = "bootstrap") Number of boostrap samples. |
alpha |
(Optional) Alpha level used for confidence interval. If not provided, this is set to 0.05. |
Value
A dataframe with the following components:
estimand |
Indicates what estimand is estimated, including the treatment effect (delta), residual treatment effect (delta.s), and proportion of treatment effect explained (R) by the longitudinal surrogate marker. |
estimate |
Estimates of the treatment effect (delta), residual treatment effect (delta.s), and proportion of treatment effect explained (R) by the longitudinal surrogate marker. |
se |
Estimated standard errors. |
ci_l |
Lower bound of the confidence intervals. |
ci_h |
Upper bound of the confidence intervals. |
References
Agniel D and Parast L (2025). "Robust Evaluation of Longitudinal Surrogate Markers with Censored Data." Journal of the Royal Statistical Society: Series B; doi:10.1093/jrsssb/qkae119.
Examples
data(exampledata)
names(exampledata)
library(glue)
library(rpart)
library(mlr3)
library(dplyr)
library(mlr3learners)
tt <- 5
t0 <- 4
yvars <- paste0('Y_', 0:tt)
lrnc <- glue('regr.rpart')
lrnb <- glue('classif.log_reg')
p_deltahat <- plugin_delta(
data = exampledata,
folds = 'ff',
id = 'ID',
x = 'X_0',
g = 'G_0',
a = paste0('A_', 0:tt),
y = yvars,
s = paste0('S_', 0:t0),
binary_lrnr = lrn(lrnb, predict_type = 'prob'),
cont_lrnr = lrn(lrnc),
truncate_e = 0.005,
verbose = FALSE
)
p_deltahat_s <- plugin_delta_s(
data = exampledata,
folds = 'ff',
id = 'ID',
x = 'X_0',
g = 'G_0',
a = paste0('A_', 0:tt),
y = yvars,
s = paste0('S_', 0:t0),
binary_lrnr = lrn(lrnb, predict_type = 'prob'),
cont_lrnr = lrn(lrnc),
t0=t0,
truncate_pi = 0.005,
truncate_e = 0.005,
verbose = FALSE
)
estimate_R(p_deltahat$if_data[[1]]$eif,
p_deltahat_s$if_data[[1]]$eif)
Example dataset
Description
Example dataset with longitudinal surrogate marker and censored primary outcome
Usage
data("exampledata")
Format
A data frame with 1000 observations on the following 21 variables. Here, the landmark time is 4 and the final time point is 5. The surrogate is measured at the following time points: 0,1,2,3,4.
ID
Unique individual ID
X_0
Baseline covariate vector.
G_0
Treatment indicator vector.
Y_0
Primary outcome variable at time 0.
S_0
Surrogate marker value at time 0.
Y_1
Primary outcome variable at time 1.
S_1
Surrogate marker value at time 1.
Y_2
Primary outcome variable at time 2.
S_2
Surrogate marker value at time 2.
Y_3
Primary outcome variable at time 3.
S_3
Surrogate marker value at time 3.
Y_4
Primary outcome variable at time 4.
S_4
Surrogate marker value at time 4.
Y_5
Primary outcome variable at time 5.
ff
Fold number.
A_0
Observation variable at time 0.
A_1
Observation variable at time 1.
A_2
Observation variable at time 2.
A_3
Observation variable at time 3.
A_4
Observation variable at time 4.
A_5
Observation variable at time 5.
Treatment effect estimation using plug-in estimator
Description
Estimates the treatment effect using the plug-in estimator.
Usage
plugin_delta(data, folds, id, x, g, a = NULL, y, s, binary_lrnr = NULL,
cont_lrnr = NULL, e = NULL, gamma1 = NULL, gamma0 = NULL,
mu1 = NULL, mu0 = NULL, Q1 = NULL, Q0 = NULL,
truncate_e = 1e-12, verbose = FALSE)
Arguments
data |
A dataframe containing all necessary variables for estimation. The functions in this package require the dataframe to be in a specific form. Therefore, you will need to reformat your dataset as follows. At a minimum, you will need variables in your dataframe that indicate (1) the folds for crossfitting; (2) a unique observation identifier; (3) baseline covariates, if there are none you must have a variable with all values equal to 1 to provide to the argument x below; (4) a variable indicating treatment group which should be 1 for treatment and 0 for control; (5) a set of variables that contain the surrogate marker value at each time point up to and including the landmark time, denoted t0; (6) a set of variables that indicate primary outcome status at each time point where the surrogate marker is measured, in addition to measurements beyond t0 up to a final time point t > t0. Optionally, if there is censoring, you may also include a set of variables that indicate observation status (1 if not censored, 0 if censored) at all time points. See the exampledata for an example where t0=4 and t=5. |
folds |
A vector defining crossfitting folds for nuisance estimation. |
id |
A string giving the name of the column with unique unit IDs (e.g., individual ID). |
x |
A character vector of covariate names to adjust for in nuisance estimation. At a minimum this must have one covariate equal to 1 for all individuals. |
g |
A string indicating the column name of the treatment indicator with 1 for treatment and 0 for control. |
a |
(Optional) A character vector of the column names indicating observation status at each time point. Specifically, A_t = 1 if the individual is still under observation (i.e., uncensored) at time t, meaning their outcome status could in principle be measured at that time. A value of 0 means the individual was censored prior to t. If not provided, assumed to be 1 for all time points. |
y |
A character vector of the column names indicating primary outcome status at each time point. Specifically, Y_t = 1 if the primary event (e.g., failure, relapse, death) has not yet occurred by time t, and the individual is still at risk. A value of 0 means the primary event occurred on or before time t. |
s |
A character vector of the column names indicating the surrogate marker values measured at each time point, where the value is NA if the individual is no longer observable at that time point. |
binary_lrnr |
Learner object or specification used for estimating binary nuisance components (e.g., propensity scores, censoring). |
cont_lrnr |
Learner object used for estimating continuous-valued regressions. |
e |
(Optional) Column name of propensity score estimates. If not provided, these will be estimated. |
gamma1 |
(Optional) Vector of column names for censoring probabilities under treatment. If not provided, these will be estimated. |
gamma0 |
(Optional) Vector of column names for censoring probabilities under control. If not provided, these will be estimated. |
mu1 |
(Optional) Names of estimated hazards under treatment. If not provided, these will be estimated. |
mu0 |
(Optional) Names of estimated hazards under control. If not provided, these will be estimated. |
Q1 |
(Optional) Names of estimated conditional means under treatment. If not provided, these will be estimated. |
Q0 |
(Optional) Names of estimated conditional means under control. If not provided, these will be estimated. |
truncate_e |
Numeric truncation level for propensity scores to avoid division by near-zero values. Default is |
verbose |
Logical; if TRUE, print progress messages. |
Value
A dataframe with the following components:
plugin_est |
Plug-in estimate of the treatment effect |
plugin_se |
Estimated standard error of the plug-in estimator, based on the influence function. |
if_data |
A nested data frame containing the influence function contributions for each observation. |
References
Agniel D and Parast L (2025). "Robust Evaluation of Longitudinal Surrogate Markers with Censored Data." Journal of the Royal Statistical Society: Series B; doi:10.1093/jrsssb/qkae119.
Examples
data(exampledata)
names(exampledata)
library(glue)
library(rpart)
library(mlr3)
library(dplyr)
library(mlr3learners)
tt <- 5
t0 <- 4
yvars <- paste0('Y_', 0:tt)
lrnc <- glue('regr.rpart')
lrnb <- glue('classif.log_reg')
p_deltahat <- plugin_delta(
data = exampledata,
folds = 'ff',
id = 'ID',
x = 'X_0',
g = 'G_0',
a = paste0('A_', 0:tt),
y = yvars,
s = paste0('S_', 0:t0),
binary_lrnr = lrn(lrnb, predict_type = 'prob'),
cont_lrnr = lrn(lrnc),
truncate_e = 0.005,
verbose = FALSE
)
p_deltahat
Residual treatment effect estimation using the plug-in estimator
Description
Estimates the residual treatment effect using the plug-in estimator for evaluating a longitudinal surrogate marker.
Usage
plugin_delta_s(data, folds, id, x, g, a = NULL, y, s, binary_lrnr = NULL, cont_lrnr
= NULL, t0 = length(s), e = NULL, gamma1 = NULL, gamma0 = NULL, mu1 = NULL, mu0 = NULL,
pi = NULL, pistar = NULL, Q1 = NULL, Q0 = NULL, truncate_e = 1e-12, truncate_pi = 1e-12,
se_type = "asymptotic", n_boot = NULL, alpha = 0.05, verbose = FALSE, retain_data
= FALSE)
Arguments
data |
A dataframe containing all necessary variables for estimation. The functions in this package require the dataframe to be in a specific form. Therefore, you will need to reformat your dataset as follows. At a minimum, you will need variables in your dataframe that indicate (1) the folds for crossfitting; (2) a unique observation identifier; (3) baseline covariates, if there are none you must have a variable with all values equal to 1 to provide to the argument x below; (4) a variable indicating treatment group which should be 1 for treatment and 0 for control; (5) a set of variables that contain the surrogate marker value at each time point up to and including the landmark time, denoted t0; (6) a set of variables that indicate primary outcome status at each time point where the surrogate marker is measured, in addition to measurements beyond t0 up to a final time point t > t0. Optionally, if there is censoring, you may also include a set of variables that indicate observation status (1 if not censored, 0 if censored) at all time points. See the exampledata for an example where t0=4 and t=5. |
folds |
A vector defining crossfitting folds for nuisance estimation. |
id |
A string giving the name of the column with unique unit IDs (e.g., individual ID). |
x |
A character vector of covariate names to adjust for in nuisance estimation. At a minimum this must have one covariate equal to 1 for all individuals. |
g |
A string indicating the column name of the treatment indicator with 1 for treatment and 0 for control. |
a |
(Optional) A character vector of the column names indicating observation status at each time point. Specifically, A_t = 1 if the individual is still under observation (i.e., uncensored) at time t, meaning their outcome status could in principle be measured at that time. A value of 0 means the individual was censored prior to t. If not provided, assumed to be 1 for all time points. |
y |
A character vector of the column names indicating primary outcome status at each time point. Specifically, Y_t = 1 if the primary event (e.g., failure, relapse, death) has not yet occurred by time t, and the individual is still at risk. A value of 0 means the primary event occurred on or before time t. |
s |
A character vector of the column names indicating the surrogate marker values measured at each time point, where the value is NA if the individual is no longer observable at that time point. |
binary_lrnr |
Learner object or specification used for estimating binary nuisance components (e.g., propensity scores, censoring). |
cont_lrnr |
Learner object used for estimating continuous-valued outcome regressions. |
t0 |
Landmark time after which the surrogate is not observed. |
e |
(Optional) Column name of propensity score estimates. If not provided, these will be estimated. |
gamma1 |
(Optional) Vector of column names for censoring probabilities under treatment. If not provided, these will be estimated. |
gamma0 |
(Optional) Vector of column names for censoring probabilities under control. If not provided, these will be estimated. |
mu1 |
(Optional) Names of estimated hazards under treatment. If not provided, these will be estimated. |
mu0 |
(Optional) Names of estimated hazards under control. If not provided, these will be estimated. |
pi |
(Optional) Vector of column names giving estimated 'propensity score' for treatment conditional on future values of the surrogate up to time k. See Agniel and Parast (2025+) for full definition. If not provided, these will be estimated. |
pistar |
(Optional) Vector of column names giving estimated 'propensity score' for treatment conditional on future values of the surrogate up to time k-1. See Agniel and Parast (2025+) for full definition. If not provided, these will be estimated. |
Q1 |
(Optional) Names of estimated conditional means under treatment. If not provided, these will be estimated. |
Q0 |
(Optional) Names of estimated conditional means under control.If not provided, these will be estimated. |
truncate_e |
Numeric truncation level for propensity scores to avoid division by near-zero values. Default is |
truncate_pi |
Truncation threshold for surrogate-conditional propensity scores to avoid instability. Default is |
se_type |
Type of standard error estimation, choices are "asymptotic" or "bootstrap". |
n_boot |
(Optional unless se_type = "bootstrap") Number of boostrap samples. |
alpha |
Significance level for confidence intervals. Default is |
verbose |
Logical; if TRUE, print progress messages. |
retain_data |
Logical; if TRUE, retains full dataset with estimated components in the output. |
Value
A dataframe with the following components:
plugin_est |
The plug-in estimate of the residual treatment effect at the landmark time. |
plugin_se |
Standard error of the estimate, based on influence function or bootstrap. |
ci_l |
Lower bound of the confidence interval. |
ci_h |
Upper bound of the confidence interval. |
if_data |
A list containing the dataset with the efficient influence function if retain_data = TRUE. |
References
Agniel D and Parast L (2025). "Robust Evaluation of Longitudinal Surrogate Markers with Censored Data." Journal of the Royal Statistical Society: Series B; doi:10.1093/jrsssb/qkae119.
Examples
data(exampledata)
names(exampledata)
library(glue)
library(rpart)
library(mlr3)
library(dplyr)
library(mlr3learners)
tt <- 5
t0 <- 4
yvars <- paste0('Y_', 0:tt)
lrnc <- glue('regr.rpart')
lrnb <- glue('classif.log_reg')
p_deltahat_s <- plugin_delta_s(
data = exampledata,
folds = 'ff',
id = 'ID',
x = 'X_0',
g = 'G_0',
a = paste0('A_', 0:tt),
y = yvars,
s = paste0('S_', 0:t0),
binary_lrnr = lrn(lrnb, predict_type = 'prob'),
cont_lrnr = lrn(lrnc),
t0=t0,
truncate_pi = 0.005,
truncate_e = 0.005,
verbose = FALSE
)
p_deltahat_s
Treatment effect estimation using TMLE
Description
Estimates the treatment effect using targeted maximum likelihood estimation (TMLE).
Usage
tmle_delta(data, folds, id, x, g, a = NULL, y, s, binary_lrnr = NULL, cont_lrnr
= NULL, e = NULL, gamma1 = NULL, gamma0 = NULL, truncate_e = 1e-12, verbose
= FALSE)
Arguments
data |
A dataframe containing all necessary variables for estimation. The functions in this package require the dataframe to be in a specific form. Therefore, you will need to reformat your dataset as follows. At a minimum, you will need variables in your dataframe that indicate (1) the folds for crossfitting; (2) a unique observation identifier; (3) baseline covariates, if there are none you must have a variable with all values equal to 1 to provide to the argument x below; (4) a variable indicating treatment group which should be 1 for treatment and 0 for control; (5) a set of variables that contain the surrogate marker value at each time point up to and including the landmark time, denoted t0; (6) a set of variables that indicate primary outcome status at each time point where the surrogate marker is measured, in addition to measurements beyond t0 up to a final time point t > t0. Optionally, if there is censoring, you may also include a set of variables that indicate observation status (1 if not censored, 0 if censored) at all time points. See the exampledata for an example where t0=4 and t=5. |
folds |
A vector defining crossfitting folds for nuisance estimation. |
id |
A string giving the name of the column with unique unit IDs (e.g., individual ID). |
x |
A character vector of covariate names to adjust for in nuisance estimation. At a minimum this must have one covariate equal to 1 for all individuals. |
g |
A string indicating the column name of the treatment indicator with 1 for treatment and 0 for control. |
a |
(Optional) A character vector of the column names indicating observation status at each time point. Specifically, A_t = 1 if the individual is still under observation (i.e., uncensored) at time t, meaning their outcome status could in principle be measured at that time. A value of 0 means the individual was censored prior to t. If not provided, assumed to be 1 for all time points. |
y |
A character vector of the column names indicating primary outcome status at each time point. Specifically, Y_t = 1 if the primary event (e.g., failure, relapse, death) has not yet occurred by time t, and the individual is still at risk. A value of 0 means the primary event occurred on or before time t. |
s |
A character vector of the column names indicating the surrogate marker values measured at each time point, where the value is NA if the individual is no longer observable at that time point. |
binary_lrnr |
Learner object or specification used for estimating binary nuisance components (e.g., propensity scores, censoring). |
cont_lrnr |
Learner object used for estimating continuous-valued outcome regressions. |
e |
(Optional) Column name of propensity score estimates. If not provided, these will be estimated. |
gamma1 |
(Optional) Vector of column names for censoring probabilities under treatment. If not provided, these will be estimated. |
gamma0 |
(Optional) Vector of column names for censoring probabilities under control. If not provided, these will be estimated. |
truncate_e |
Numeric truncation level for propensity scores to avoid division by near-zero values. Default is |
verbose |
Logical; if TRUE, print progress messages. |
Value
A dataframe with the following components:
tmle_est |
TMLE estimate of the treatment effect. |
tmle_se |
Estimated standard error of the TMLE estimator, based on the influence function. |
if_data |
A nested data frame containing the influence function contributions for each observation. |
References
Agniel D and Parast L (2025). "Robust Evaluation of Longitudinal Surrogate Markers with Censored Data." Journal of the Royal Statistical Society: Series B; doi:10.1093/jrsssb/qkae119.
Examples
data(exampledata)
names(exampledata)
library(glue)
library(rpart)
library(mlr3)
library(dplyr)
library(mlr3learners)
tt <- 5
t0 <- 4
yvars <- paste0('Y_', 0:tt)
lrnc <- glue('regr.rpart')
lrnb <- glue('classif.log_reg')
tml_deltahat <- tmle_delta(data = exampledata,
folds = 'ff',
id = 'ID',
x = 'X_0',
g = 'G_0',
a = paste0('A_', 0:tt),
y = yvars,
s = paste0('S_', 0:t0),
binary_lrnr = lrn(lrnb, predict_type = 'prob'),
cont_lrnr = lrn(lrnc),
truncate_e = 0.005,
verbose = FALSE)
tml_deltahat
Residual treatment effect estimation using TMLE
Description
Estimates the residual treatment effect using targeted maximum likelihood estimation (TMLE) for evaluating a longitudinal surrogate marker.
Usage
tmle_delta_s(data, folds, id, x, g, a = NULL, y, s, binary_lrnr = NULL, cont_lrnr
= NULL, t0 = length(s), e = NULL, gamma1 = NULL, gamma0 = NULL, pi = NULL,
pistar = NULL, truncate_e = 1e-12, verbose = FALSE, retain_data = FALSE)
Arguments
data |
A dataframe containing all necessary variables for estimation. The functions in this package require the dataframe to be in a specific form. Therefore, you will need to reformat your dataset as follows. At a minimum, you will need variables in your dataframe that indicate (1) the folds for crossfitting; (2) a unique observation identifier; (3) baseline covariates, if there are none you must have a variable with all values equal to 1 to provide to the argument x below; (4) a variable indicating treatment group which should be 1 for treatment and 0 for control; (5) a set of variables that contain the surrogate marker value at each time point up to and including the landmark time, denoted t0; (6) a set of variables that indicate primary outcome status at each time point where the surrogate marker is measured, in addition to measurements beyond t0 up to a final time point t > t0. Optionally, if there is censoring, you may also include a set of variables that indicate observation status (1 if not censored, 0 if censored) at all time points. See the exampledata for an example where t0=4 and t=5. |
folds |
A vector defining crossfitting folds for nuisance estimation. |
id |
A string giving the name of the column with unique unit IDs (e.g., individual ID). |
x |
A character vector of covariate names to adjust for in nuisance estimation. At a minimum this must have one covariate equal to 1 for all individuals. |
g |
A string indicating the column name of the treatment indicator with 1 for treatment and 0 for control. |
a |
(Optional) A character vector of the column names indicating observation status at each time point. Specifically, A_t = 1 if the individual is still under observation (i.e., uncensored) at time t, meaning their outcome status could in principle be measured at that time. A value of 0 means the individual was censored prior to t. If not provided, assumed to be 1 for all time points. |
y |
A character vector of the column names indicating primary outcome status at each time point. Specifically, Y_t = 1 if the primary event (e.g., failure, relapse, death) has not yet occurred by time t, and the individual is still at risk. A value of 0 means the primary event occurred on or before time t. |
s |
A character vector of the column names indicating the surrogate marker values measured at each time point, where the value is NA if the individual is no longer observable at that time point. |
binary_lrnr |
Learner object or specification used for estimating binary nuisance components (e.g., propensity scores, censoring). |
cont_lrnr |
Learner object used for estimating continuous-valued outcome regressions. |
t0 |
Landmark time after which the surrogate is not observed. |
e |
(Optional) Column name of propensity score estimates. If not provided, these will be estimated. |
gamma1 |
(Optional) Vector of column names for censoring probabilities under treatment. If not provided, these will be estimated. |
gamma0 |
(Optional) Vector of column names for censoring probabilities under control. If not provided, these will be estimated. |
pi |
(Optional) Vector of column names giving estimated 'propensity score' for treatment conditional on future values of the surrogate up to time k. See Agniel and Parast (2025+) for full definition. If not provided, these will be estimated. |
pistar |
(Optional) Vector of column names giving estimated 'propensity score' for treatment conditional on future values of the surrogate up to time k-1. See Agniel and Parast (2025+) for full definition. If not provided, these will be estimated. |
truncate_e |
Numeric truncation level for propensity scores to avoid division by near-zero values. Default is |
verbose |
Logical; if TRUE, print progress messages. |
retain_data |
Logical; if TRUE, the function return the full data and influence function values. |
Value
A dataframe with the following components:
tmle_est |
TMLE estimate of the residual treatment effect. |
tmle_se |
Estimated standard error using the empirical standard deviation of the influence function. |
if_data |
(Optional) Full dataset joined with estimated influence function contributions. |
References
Agniel D and Parast L (2025). "Robust Evaluation of Longitudinal Surrogate Markers with Censored Data." Journal of the Royal Statistical Society: Series B; doi:10.1093/jrsssb/qkae119.
Examples
data(exampledata)
names(exampledata)
library(glue)
library(rpart)
library(mlr3)
library(dplyr)
library(mlr3learners)
tt <- 5
t0 <- 4
yvars <- paste0('Y_', 0:tt)
lrnc <- glue('regr.rpart')
lrnb <- glue('classif.log_reg')
tml_deltahat_s <- tmle_delta_s(data = exampledata,
folds = 'ff',
id = 'ID',
x = 'X_0',
g = 'G_0',
a = paste0('A_', 0:tt),
y = yvars,
s = paste0('S_', 0:t0),
binary_lrnr = lrn(lrnb, predict_type = 'prob'),
cont_lrnr = lrn(lrnc),
t0=t0,
truncate_e = 0.005,
verbose = FALSE)
tml_deltahat_s