Title: | Regression with Interval-Censored Covariates |
Version: | 0.1.3 |
Description: | Provides functions to simulate and analyze data for a regression model with an interval censored covariate, as described in Morrison et al. (2021) <doi:10.1111/biom.13472>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.1.2 |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
Imports: | biglm, dplyr, lubridate, magrittr, stats, pryr, arm, ggplot2, scales |
Suggests: | spelling, rmarkdown, knitr, testthat, markdown, pander |
Language: | en-US |
URL: | https://d-morrison.github.io/rwicc/, https://github.com/d-morrison/rwicc |
BugReports: | https://github.com/d-morrison/rwicc/issues |
NeedsCompilation: | no |
Packaged: | 2022-03-09 01:15:52 UTC; dmorrison |
Author: | Douglas Morrison |
Maintainer: | Douglas Morrison <dmorrison01@ucla.edu> |
Repository: | CRAN |
Date/Publication: | 2022-03-09 21:40:06 UTC |
convert a pair of simple logistic regression coefficients into P(Y|T) curve:
Description
convert a pair of simple logistic regression coefficients into P(Y|T) curve:
Usage
build_phi_function_from_coefs(coefs)
Arguments
coefs |
numeric vector of coefficients |
Value
function(t) P(Y=1|T=t)
compute mean window period duration from simple logistic regression coefficients
Description
compute mean window period duration from simple logistic regression coefficients
Usage
compute_mu(theta)
Arguments
theta |
numeric vector of coefficients |
Value
numeric scalar: mean window period duration
Fit a logistic regression model with an interval-censored covariate
Description
This function fits a logistic regression model for a binary outcome Y with an interval-censored covariate T, using an EM algorithm, as described in Morrison et al (2021); doi: 10.1111/biom.13472.
Usage
fit_joint_model(
participant_level_data,
obs_level_data,
model_formula = stats::formula(Y ~ T),
mu_function = compute_mu,
bin_width = 1,
denom_offset = 0.1,
EM_toler_loglik = 0.1,
EM_toler_est = 1e-04,
EM_max_iterations = Inf,
glm_tolerance = 1e-07,
glm_maxit = 20,
initial_S_estimate_location = 0.25,
coef_change_metric = "max abs rel diff coefs",
verbose = FALSE
)
Arguments
participant_level_data |
a data.frame or tibble with the following variables:
|
obs_level_data |
a data.frame or tibble with the following variables:
|
model_formula |
the functional form for the regression model for p(y|t) (as a formula() object) |
mu_function |
a function taking a vector of regression coefficient estimates as input and outputting an estimate of mu (mean duration of MAA-positive infection). |
bin_width |
the number of days between possible seroconversion dates (should be an integer) |
denom_offset |
an offset value added to the denominator of the hazard estimates to improve numerical stability |
EM_toler_loglik |
the convergence cutoff for the log-likelihood criterion ("Delta_L" in the paper) |
EM_toler_est |
the convergence cutoff for the parameter estimate criterion ("Delta_theta" in the paper) |
EM_max_iterations |
the number of EM iterations to perform before giving up if still not converged. |
glm_tolerance |
the convergence cutoff for the glm fit in the M step |
glm_maxit |
the iterations cutoff for the glm fit in the M step |
initial_S_estimate_location |
determines how seroconversion date is guessed to initialize the algorithm; can be any decimal between 0 and 1; 0.5 = midpoint imputation, 0.25 = 1st quartile, 0 = last negative, etc. |
coef_change_metric |
a string indicating the type of parameter estimate criterion to use:
|
verbose |
whether to print algorithm progress details to the console |
Value
a list with the following elements:
-
Theta
: the estimated regression coefficients for the model of p(Y|T) -
Mu
: the estimated mean window period (a transformation ofTheta
) -
Omega
: a table with the estimated parameters for the model of p(S|E). -
converged
: indicator of whether the algorithm reached its cutoff criteria before reaching the specified maximum iterations. 1 = reached cutoffs, 0 = not. -
iterations
: the number of EM iterations completed before the algorithm stopped. -
convergence_metrics
: the four convergence metrics
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi: 10.1111/biom.13472.
Examples
## Not run:
# simulate data:
study_data <- simulate_interval_censoring()
# fit model:
EM_algorithm_outputs <- fit_joint_model(
obs_level_data = study_data$obs_data,
participant_level_data = study_data$pt_data
)
## End(Not run)
Fit model using midpoint imputation
Description
Fit model using midpoint imputation
Usage
fit_midpoint_model(
participant_level_data,
obs_level_data,
maxit = 1000,
tolerance = 1e-08
)
Arguments
participant_level_data |
a data.frame or tibble with the following variables:
|
obs_level_data |
a data.frame or tibble with the following variables:
|
maxit |
maximum iterations, passed to |
tolerance |
convergence criterion, passed to |
Value
a vector of logistic regression coefficient estimates
Examples
sim_data = simulate_interval_censoring(
"theta" = c(0.986, -3.88),
"study_cohort_size" = 4500,
"preconversion_interval_length" = 365,
"hazard_alpha" = 1,
"hazard_beta" = 0.5)
theta_est_midpoint = fit_midpoint_model(
obs_level_data = sim_data$obs_data,
participant_level_data = sim_data$pt_data
)
Fit model using uniform imputation
Description
Fit model using uniform imputation
Usage
fit_uniform_model(
participant_level_data,
obs_level_data,
maxit = 1000,
tolerance = 1e-08,
n_imputations = 10
)
Arguments
participant_level_data |
a data.frame or tibble with the following variables:
|
obs_level_data |
a data.frame or tibble with the following variables:
|
maxit |
maximum iterations, passed to |
tolerance |
convergence criterion, passed to |
n_imputations |
number of imputed data sets to create |
Value
a vector of logistic regression coefficient estimates
Examples
sim_data = simulate_interval_censoring(
"theta" = c(0.986, -3.88),
"study_cohort_size" = 4500,
"preconversion_interval_length" = 365,
"hazard_alpha" = 1,
"hazard_beta" = 0.5)
theta_est_midpoint = fit_uniform_model(
obs_level_data = sim_data$obs_data,
participant_level_data = sim_data$pt_data
)
plot estimated and true CDFs for seroconversion date distribution
Description
plot estimated and true CDFs for seroconversion date distribution
Usage
plot_CDF(true_hazard_alpha, true_hazard_beta, omega.hat)
Arguments
true_hazard_alpha |
The data-generating hazard at the start of the study |
true_hazard_beta |
The change in data-generating hazard per calendar year |
omega.hat |
tibble of estimated discrete hazards |
Value
a ggplot
Examples
## Not run:
hazard_alpha = 1
hazard_beta = 0.5
study_data <- simulate_interval_censoring(
"hazard_alpha" = hazard_alpha,
"hazard_beta" = hazard_beta)
# fit model:
EM_algorithm_outputs <- fit_joint_model(
obs_level_data = study_data$obs_data,
participant_level_data = study_data$pt_data
)
plot1 = plot_CDF(
true_hazard_alpha = hazard_alpha,
true_hazard_beta = hazard_beta,
omega.hat = EM_algorithm_outputs$Omega)
print(plot1)
## End(Not run)
Plot true and estimated curves for P(Y=1|T=t)
Description
Plot true and estimated curves for P(Y=1|T=t)
Usage
plot_phi_curves(
theta_true,
theta.hat_joint,
theta.hat_midpoint,
theta.hat_uniform
)
Arguments
theta_true |
the coefficients of the data-generating model P(Y=1|T=t) |
theta.hat_joint |
the estimated coefficients from the joint model |
theta.hat_midpoint |
the estimated coefficients from midpoint imputation |
theta.hat_uniform |
the estimated coefficients from uniform imputation |
Value
a ggplot
Examples
## Not run:
theta_true = c(0.986, -3.88)
hazard_alpha = 1
hazard_beta = 0.5
sim_data = simulate_interval_censoring(
"theta" = theta_true,
"study_cohort_size" = 4500,
"preconversion_interval_length" = 365,
"hazard_alpha" = hazard_alpha,
"hazard_beta" = hazard_beta)
# extract the participant-level and observation-level simulated data:
sim_participant_data = sim_data$pt_data
sim_obs_data = sim_data$obs_data
rm(sim_data)
# joint model:
EM_algorithm_outputs = fit_joint_model(
obs_level_data = sim_obs_data,
participant_level_data = sim_participant_data,
bin_width = 7,
verbose = FALSE)
# midpoint imputation:
theta_est_midpoint = fit_midpoint_model(
obs_level_data = sim_obs_data,
participant_level_data = sim_participant_data
)
# uniform imputation:
theta_est_uniform = fit_uniform_model(
obs_level_data = sim_obs_data,
participant_level_data = sim_participant_data
)
plot2 = plot_phi_curves(
theta_true = theta_true,
theta.hat_uniform = theta_est_uniform,
theta.hat_midpoint = theta_est_midpoint,
theta.hat_joint = EM_algorithm_outputs$Theta)
print(plot2)
## End(Not run)
rwicc: Regression with Interval-Censored Covariates
Description
The rwicc
package implements a regression model with an
interval-censored covariate using an EM algorithm, as described in Morrison et al (2021); doi: 10.1111/biom.13472.
rwicc functions
The main rwicc
functions are:
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi: 10.1111/biom.13472.
Inverse survival function for time-to-event variable with linear hazard function
Description
This function determines the seroconversion date corresponding to a provided probability of survival. See doi: 10.1111/biom.13472, Supporting Information, Section A.4.
Usage
seroconversion_inverse_survival_function(u, e, hazard_alpha, hazard_beta)
Arguments
u |
a vector of seroconversion survival probabilities |
e |
a vector of time differences between study start and enrollment (in years) |
hazard_alpha |
the instantaneous hazard of seroconversion on the study start date |
hazard_beta |
the change in hazard per year after study start date |
Value
numeric vector of time differences between study start and seroconversion (in years)
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics, doi: 10.1111/biom.13472.
Simulate a dataset with interval-censored seroconversion dates
Description
simulate_interval_censoring
generates a simulated data set from a
data-generating model based on the typical structure of a cohort study of HIV
biomarker progression, as described in Morrison et al (2021); doi: 10.1111/biom.13472.
Usage
simulate_interval_censoring(
study_cohort_size = 4500,
hazard_alpha = 1,
hazard_beta = 0.5,
preconversion_interval_length = 84,
theta = c(0.986, -3.88),
probability_of_ever_seroconverting = 0.05,
years_in_study = 10,
max_scheduling_offset = 7,
days_from_study_start_to_recruitment_end = 365,
study_start_date = lubridate::ymd("2001-01-01")
)
Arguments
study_cohort_size |
the number of participants to simulate (N_0 in the paper) |
hazard_alpha |
the hazard (instantaneous risk) of seroconversion at the start date of the cohort study for those participants at risk of seroconversion |
hazard_beta |
the change in hazard per calendar year |
preconversion_interval_length |
the number of days between tests for seroconversion |
theta |
the parameters of a logistic model (with linear functional from) specifying the probability of MAA-positive biomarkers as a function of time since seroconversion |
probability_of_ever_seroconverting |
the probability that each participant is at risk of HIV seroconversion |
years_in_study |
the duration of follow-up for each participant |
max_scheduling_offset |
the maximum divergence of pre-seroconversion followup visits from the prescribed schedule |
days_from_study_start_to_recruitment_end |
the length of the recruitment period |
study_start_date |
the date when the study starts recruitment ("d_0" in the main text). The value of this parameter does not affect the simulation results; it is only necessary as a reference point for generating E, L, R, O, and S. |
Value
A list containing the following two tibbles:
-
pt_data
: a tibble of participant-level information, with the following columns:-
ID
: participant ID -
E
: enrollment date -
L
: date of last HIV test prior to seroconversion -
R
: date of first HIV test after seroconversion
-
-
obs_data
: a tibble of longitudinal observations with the following columns:-
ID
: participant ID -
O
: dates of biomarker sample collection -
Y
: MAA classifications of biomarker samples
-
References
Morrison, Laeyendecker, and Brookmeyer (2021). "Regression with interval-censored covariates: Application to cross-sectional incidence estimation". Biometrics. doi: 10.1111/biom.13472.
Examples
study_data <- simulate_interval_censoring()
participant_characteristics <- study_data$pt_data
longitudinal_observations <- study_data$obs_data