Type: | Package |
Title: | Marginal Hazard Ratio Estimation in Clustered Failure Time Data |
Version: | 1.0.0 |
Description: | Estimation of marginal hazard ratios in clustered failure time data. It implements the weighted generalized estimating equation approach based on a semiparametric marginal proportional hazards model (See Niu, Y. Peng, Y.(2015). "A new estimating equation approach for marginal hazard ratio estimation"), accounting for within-cluster correlations. 5 different correlation structures are supported. The package is designed for researchers in biostatistics and epidemiology who require accurate and efficient estimation methods for survival analysis in clustered data settings. |
Depends: | R (≥ 4.4.0), Matrix |
Imports: | Rcpp, RcppEigen, survival, ggplot2, stats |
LinkingTo: | Rcpp, RcppEigen |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
LazyData: | true |
Maintainer: | Junyi Chen <2655088079@qq.com> |
License: | GPL-3 |
NeedsCompilation: | yes |
Packaged: | 2025-04-09 09:55:26 UTC; NTLDR |
Author: | Junyi Chen [aut, cre], Siqi Zhou [ctb], Shida Li [ctb], Yi Niu [aut] |
Repository: | CRAN |
Date/Publication: | 2025-04-10 14:20:02 UTC |
Diabetes Study Data
Description
A dataset containing clinical information from a diabetes study.
Usage
data(diabetes)
Format
A data frame with 166 rows and 6 variables:
risk
Numeric: Risk score of the patient.
cens
Binary (0/1): Censoring indicator (1 = event occurred, 0 = censored).
time
Numeric: Time to event or censoring (in months).
id
Integer: Patient ID.
trt
Binary (0/1): Treatment indicator (1 = treated, 0 = control).
age
Binary (0/1): Age group indicator (1 = older, 0 = younger).
Source
Hypothetical clinical study data.
Examples
data(diabetes)
summary(diabetes)
Generate Simulated Datasets for Cox Proportional Hazards Model
Description
This function generates multiple datasets for survival analysis based on a Cox proportional hazards model.
The baseline hazard function follows either a Weibull or an exponential distribution, depending on the values of lambda
.
The function ensures that the maximum observed time in both the control and treatment groups is checked for censoring.
If the maximum time is not censored, it is forced to be censored to maintain the desired censoring rate.
Usage
gendat(
type = "bin",
dimension = 10,
K = 30,
n = 2,
lambda = c(1, 2),
b1 = c(log(2), -0.1),
theta = 8,
censrate = 0.3
)
Arguments
type |
Character. If |
dimension |
Integer. The number of datasets to be generated. |
K |
Integer. The number of clusters (groups) within each dataset. |
n |
Integer. The number of samples within each cluster. |
lambda |
Numeric vector. A two-element vector specifying the parameters for the baseline distribution:
|
b1 |
Vector. The regression coefficient for the covariates, affecting the hazard function. We suggest that the maximum of |
theta |
Numeric. A parameter controlling the dependency structure between survival times within clusters. Higher values indicate stronger within-cluster correlation. |
censrate |
Numeric. The target censoring rate for the dataset. |
Value
A list containing:
-
data
- A list of data frames, each containing a generated dataset. -
censoringrates
- A numeric vector representing the censoring rate for each dataset. -
mean(censoringrates)
- The mean censoring rate across all datasets.
Examples
# Generate binary covariate datasets with 1 datasets, 10 clusters, and 6 samples per cluster
print(gendat(type = 'bin', dimension = 1, K = 6, n = 10, lambda = c(1, 2),
b1 = c(log(2),-log(2)), theta = 8, censrate = 0.5))
Kidney Disease Study Data
Description
A dataset containing survival analysis information related to kidney disease patients.
Usage
data(kidney_data)
Format
A data frame with 76 rows and 5 variables:
time
Numeric: Time to event or censoring (in days).
cens
Binary (0/1): Censoring indicator (1 = event occurred, 0 = censored).
age
Numeric: Age of the patient in years.
sex
Binary (0/1): Sex of the patient (1 = male, 0 = female).
type
Categorical (0,1,2,3): Kidney disease type classification.
Source
Hypothetical survival study data.
Examples
data(kidney_data)
summary(kidney_data)
Analysis for Cox Proportional Hazards Models
Description
This function performs marcox analysis for Cox proportional hazards models, incorporating clustered data and handling time-dependent covariates. It estimates coefficients, standard errors, and p-values based on the specified formula and dataset.
Usage
marcox(
formula,
data,
method = "exchangeable",
sep = NULL,
col_id = "id",
div = NULL,
k_value = 1,
plot_x = NULL,
x_axis = "Time",
y_axis = "Survival Rates",
size = 0.5
)
Arguments
formula |
A model formula that uses the |
data |
The file path or the dataset(matrix) to be analyzed. If a file path is provided, the file will be loaded into a matrix. The file should be in a tabular format (e.g., .csv, .txt). |
method |
The method employed to solve the correlation coefficient:
|
sep |
Character. The |
col_id |
Character. The name of column that identifies the clusters. |
div |
Integer. The number of observation points per sample. If provided, the data will be divided accordingly. If the data has complex observational situations, please preprocess the data before using this function. |
k_value |
The k value only for k-dependent structure. The default value is 1. |
plot_x |
A character string specifying the column name of the covariate for which survival curves are generated; if not provided, no survival curves will be produced. |
x_axis |
A character string specifying the title for the x-axis. |
y_axis |
A character string specifying the title for the y-axis. |
size |
The size of the generated survival curve. |
Details
The marcox()
function is specifically designed for survival data analysis using Cox proportional hazards models. It handles both clustered and time-dependent covariates effectively.
The survival outcome must be defined using the Surv()
function in the model formula, and covariates can be included directly or by converting categorical variables with the factormar()
function.
Value
A list containing the following components:
-
coef
- The estimated regression coefficients. -
exp(coef)
- The exponentiated coefficients (hazard ratios). -
se(coef)
- The standard errors of the estimated coefficients. -
z
- The z-statistics for testing the significance of the coefficients. -
p
- The p-values associated with the coefficients. (hidden).correlation - Correlation coefficients of the data.
Examples
formula <- Surv(time, cens) ~ sex + factormar('type', d_v=c(1,2,3))
r <- marcox(formula, data = kidney_data, div = 2, method = 'exchangeable', plot_x = 'sex')
print(r)
print(r$plot)