Version: | 0.13.0 |
Title: | Analysis of Simulation Studies Including Monte Carlo Error |
Description: | Summarise results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modelled on the 'simsum' user-written command in 'Stata' (White I.R., 2010 https://www.stata-journal.com/article.html?article=st0200), further extending it with additional performance measures and functionality. |
License: | GPL (≥ 3) |
Depends: | R (≥ 2.10) |
Imports: | checkmate, generics, ggridges, ggplot2, knitr, lifecycle, rlang (≥ 0.4.0), scales, stats |
Suggests: | covr, devtools, dplyr, eha, rmarkdown, rstpm2, survival, testthat, usethis |
URL: | https://ellessenne.github.io/rsimsum/, https://github.com/ellessenne/rsimsum |
BugReports: | https://github.com/ellessenne/rsimsum/issues |
VignetteBuilder: | knitr |
RoxygenNote: | 7.3.1 |
LazyData: | true |
ByteCompile: | true |
Encoding: | UTF-8 |
Language: | en-GB |
NeedsCompilation: | no |
Packaged: | 2024-03-03 09:22:49 UTC; ellessenne |
Author: | Alessandro Gasparini
|
Maintainer: | Alessandro Gasparini <alessandro@ellessenne.xyz> |
Repository: | CRAN |
Date/Publication: | 2024-03-03 09:40:02 UTC |
rsimsum: Analysis of Simulation Studies Including Monte Carlo Error
Description
Summarise results from simulation studies and compute Monte Carlo standard errors of commonly used summary statistics. This package is modelled on the 'simsum' user-written command in 'Stata' (White I.R., 2010 https://www.stata-journal.com/article.html?article=st0200), further extending it with additional performance measures and functionality.
Author(s)
Maintainer: Alessandro Gasparini alessandro@ellessenne.xyz (ORCID)
Authors:
Ian R. White
See Also
Useful links:
Report bugs at https://github.com/ellessenne/rsimsum/issues
Example of a simulation study on missing data
Description
A dataset from a simulation study comparing different ways to handle missing covariates when fitting a Cox model (White and Royston, 2009).
One thousand datasets were simulated, each containing normally distributed covariates x
and z
and time-to-event outcome.
Both covariates have 20\
Each simulated dataset was analysed in three ways.
A Cox model was fit to the complete cases (CC
).
Then two methods of multiple imputation using chained equations (van Buuren, Boshuizen, and Knook, 1999) were used.
The MI_LOGT
method multiply imputes the missing values of x
and z
with the outcome included as \log (t)
and d
, where t
is the survival time and d
is the event indicator.
The MI_T
method is the same except that \log (t)
is replaced by t
in the imputation model.
The results are stored in long format.
Usage
MIsim
MIsim2
Format
A data frame with 3,000 rows and 4 variables:
-
dataset
Simulated dataset number. -
method
Method used (CC
,MI_LOGT
orMI_T
). -
b
Point estimate. -
se
Standard error of the point estimate.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 3000 rows and 5 columns.
Note
MIsim2
is a version of the same dataset with the method
column split into two columns, m1
and m2
.
References
White, I.R., and P. Royston. 2009. Imputing missing covariate values for the Cox model. Statistics in Medicine 28(15):1982-1998 doi:10.1002/sim.3618
Examples
data("MIsim", package = "rsimsum")
data("MIsim2", package = "rsimsum")
autoplot method for multisimsum objects
Description
autoplot
can produce a series of plot to summarise results of simulation studies. See vignette("C-plotting", package = "rsimsum")
for further details.
Usage
## S3 method for class 'multisimsum'
autoplot(
object,
par,
type = "forest",
stats = "nsim",
target = NULL,
fitted = TRUE,
scales = "fixed",
top = TRUE,
density.legend = TRUE,
zoom = 1,
zip_ci_colours = "yellow",
...
)
Arguments
object |
An object of class |
par |
The parameter results to plot. |
type |
The type of the plot to be produced. Possible choices are: |
stats |
Summary statistic to plot, defaults to |
target |
Target of summary statistic, e.g. 0 for |
fitted |
Superimpose a fitted regression line, useful when |
scales |
Should scales be fixed ( |
top |
Should the legend for a nested loop plot be on the top side of the plot? Defaults to |
density.legend |
Should the legend for density and hexbin plots be included? Defaults to |
zoom |
A numeric value between 0 and 1 signalling that a zip plot should zoom on the top x% of the plot (to ease interpretation). Defaults to 1, where the whole zip plot is displayed. |
zip_ci_colours |
A string with (1) a hex code to use for plotting coverage probability and its Monte Carlo confidence intervals (the default, with value |
... |
Not used. |
Value
A ggplot
object.
Examples
data("frailty", package = "rsimsum")
ms <- multisimsum(
data = frailty,
par = "par", true = c(trt = -0.50, fv = 0.75),
estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist", x = TRUE
)
library(ggplot2)
autoplot(ms, par = "trt")
autoplot(ms, par = "trt", type = "lolly", stats = "cover")
autoplot(ms, par = "trt", type = "zip")
autoplot(ms, par = "trt", type = "est_ba")
autoplot method for simsum objects
Description
autoplot
can produce a series of plot to summarise results of simulation studies. See vignette("C-plotting", package = "rsimsum")
for further details.
Usage
## S3 method for class 'simsum'
autoplot(
object,
type = "forest",
stats = "nsim",
target = NULL,
fitted = TRUE,
scales = "fixed",
top = TRUE,
density.legend = TRUE,
zoom = 1,
zip_ci_colours = "yellow",
...
)
Arguments
object |
An object of class |
type |
The type of the plot to be produced. Possible choices are: |
stats |
Summary statistic to plot, defaults to |
target |
Target of summary statistic, e.g. 0 for |
fitted |
Superimpose a fitted regression line, useful when |
scales |
Should scales be fixed ( |
top |
Should the legend for a nested loop plot be on the top side of the plot? Defaults to |
density.legend |
Should the legend for density and hexbin plots be included? Defaults to |
zoom |
A numeric value between 0 and 1 signalling that a zip plot should zoom on the top x% of the plot (to ease interpretation). Defaults to 1, where the whole zip plot is displayed. |
zip_ci_colours |
A string with (1) a hex code to use for plotting coverage probability and its Monte Carlo confidence intervals (the default, with value |
... |
Not used. |
Value
A ggplot
object.
Examples
data("MIsim", package = "rsimsum")
s <- rsimsum::simsum(
data = MIsim, estvarname = "b", true = 0.5, se = "se",
methodvar = "method", x = TRUE
)
library(ggplot2)
autoplot(s)
autoplot(s, type = "lolly")
autoplot(s, type = "est_hex")
autoplot(s, type = "zip", zoom = 0.5)
# Nested loop plot:
data("nlp", package = "rsimsum")
s1 <- rsimsum::simsum(
data = nlp, estvarname = "b", true = 0, se = "se",
methodvar = "model", by = c("baseline", "ss", "esigma")
)
autoplot(s1, stats = "bias", type = "nlp")
autoplot method for summary.multisimsum objects
Description
autoplot method for summary.multisimsum objects
Usage
## S3 method for class 'summary.multisimsum'
autoplot(
object,
par,
type = "forest",
stats = "nsim",
target = NULL,
fitted = TRUE,
scales = "fixed",
top = TRUE,
density.legend = TRUE,
zoom = 1,
zip_ci_colours = "yellow",
...
)
Arguments
object |
An object of class |
par |
The parameter results to plot. |
type |
The type of the plot to be produced. Possible choices are: |
stats |
Summary statistic to plot, defaults to |
target |
Target of summary statistic, e.g. 0 for |
fitted |
Superimpose a fitted regression line, useful when |
scales |
Should scales be fixed ( |
top |
Should the legend for a nested loop plot be on the top side of the plot? Defaults to |
density.legend |
Should the legend for density and hexbin plots be included? Defaults to |
zoom |
A numeric value between 0 and 1 signalling that a zip plot should zoom on the top x% of the plot (to ease interpretation). Defaults to 1, where the whole zip plot is displayed. |
zip_ci_colours |
A string with (1) a hex code to use for plotting coverage probability and its Monte Carlo confidence intervals (the default, with value |
... |
Not used. |
Value
A ggplot
object.
Examples
data("frailty", package = "rsimsum")
ms <- multisimsum(
data = frailty,
par = "par", true = c(trt = -0.50, fv = 0.75),
estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist", x = TRUE
)
sms <- summary(ms)
library(ggplot2)
autoplot(sms, par = "trt")
autoplot method for summary.simsum objects
Description
autoplot method for summary.simsum objects
Usage
## S3 method for class 'summary.simsum'
autoplot(
object,
type = "forest",
stats = "nsim",
target = NULL,
fitted = TRUE,
scales = "fixed",
top = TRUE,
density.legend = TRUE,
zoom = 1,
zip_ci_colours = "yellow",
...
)
Arguments
object |
An object of class |
type |
The type of the plot to be produced. Possible choices are: |
stats |
Summary statistic to plot, defaults to |
target |
Target of summary statistic, e.g. 0 for |
fitted |
Superimpose a fitted regression line, useful when |
scales |
Should scales be fixed ( |
top |
Should the legend for a nested loop plot be on the top side of the plot? Defaults to |
density.legend |
Should the legend for density and hexbin plots be included? Defaults to |
zoom |
A numeric value between 0 and 1 signalling that a zip plot should zoom on the top x% of the plot (to ease interpretation). Defaults to 1, where the whole zip plot is displayed. |
zip_ci_colours |
A string with (1) a hex code to use for plotting coverage probability and its Monte Carlo confidence intervals (the default, with value |
... |
Not used. |
Value
A ggplot
object.
Examples
data("MIsim", package = "rsimsum")
s <- rsimsum::simsum(
data = MIsim, estvarname = "b", true = 0.5, se = "se",
methodvar = "method", x = TRUE
)
ss <- summary(s)
library(ggplot2)
autoplot(ss)
autoplot(ss, type = "lolly")
Identify replications with large point estimates, standard errors
Description
dropbig
is useful to identify replications with large point estimates or standard errors. Large values are defined as standardised values above a given threshold, as defined when calling dropbig
. Regular standardisation using mean and standard deviation is implemented, as well as robust standardisation using median and inter-quartile range. Further to that, the standardisation process is stratified by data-generating mechanism if by
factors are defined.
Usage
dropbig(
data,
estvarname,
se = NULL,
methodvar = NULL,
by = NULL,
max = 10,
semax = 100,
robust = TRUE
)
Arguments
data |
A |
estvarname |
The name of the variable containing the point estimates. |
se |
The name of the variable containing the standard errors of the point estimates. |
methodvar |
The name of the variable containing the methods to compare. For instance, methods could be the models compared within a simulation study. Can be |
by |
A vector of variable names to compute performance measures by a list of factors. Factors listed here are the (potentially several) data-generating mechanisms used to simulate data under different scenarios (e.g. sample size, true distribution of a variable, etc.). Can be |
max |
Specifies the maximum acceptable absolute value of the point estimates, after standardisation. Defaults to 10. |
semax |
Specifies the maximum acceptable absolute value of the standard error, after standardisation. Defaults to 100. |
robust |
Specifies whether to use robust standardisation (using median and inter-quartile range) rather than normal standardisation (using mean and standard deviation). Defaults to |
Value
The same data.frame
given as input with an additional column named .dropbig
identifying rows that are classified as large (.dropbig = TRUE
) according to the specified criterion.
Examples
data("frailty", package = "rsimsum")
frailty2 <- subset(frailty, par == "fv")
# Using low values of max, semax for illustration purposes:
dropbig(
data = frailty2, estvarname = "b", se = "se",
methodvar = "model", by = "fv_dist", max = 2, semax = 2
)
# Using regular standardisation:
dropbig(
data = frailty2, estvarname = "b", se = "se",
methodvar = "model", by = "fv_dist", max = 2, semax = 2, robust = FALSE
)
Example of a simulation study on frailty survival models
Description
A dataset from a simulation study comparing frailty flexible parametric models fitted using penalised likelihood to semiparametric frailty models. Both models are fitted assuming a Gamma and a log-Normal frailty. One thousand datasets were simulated, each containing a binary treatment variable with a log-hazard ratio of -0.50. Clustered survival data was simulated assuming 50 clusters of 50 individuals each, with a mixture Weibull baseline hazard function and a frailty following either a Gamma or a Log-Normal distribution. The comparison involves estimates of the log-treatment effect, and estimates of heterogeneity (i.e. the estimated frailty variance).
Usage
frailty
frailty2
Format
A data frame with 16,000 rows and 6 variables:
-
i
Simulated dataset number. -
b
Point estimate. -
se
Standard error of the point estimate. -
par
The estimand.trt
is the log-treatment effect,fv
is the variance of the frailty. -
fv_dist
The true frailty distribution. -
model
Method used (Cox, Gamma
,Cox, Log-Normal
,RP(P), Gamma
, orRP(P), Log-Normal
).
An object of class data.frame
with 16000 rows and 7 columns.
Note
frailty2
is a version of the same dataset with the model
column split into two columns, m_baseline
and m_frailty
.
Examples
data("frailty", package = "rsimsum")
data("frailty2", package = "rsimsum")
get_data
Description
Extract data slots from an object of class simsum
, summary.simsum
, multisimsum
, or summary.multisimsum
.
Usage
get_data(x, stats = NULL, ...)
Arguments
x |
An object of class |
stats |
Summary statistics to include; can be a scalar value or a vector. Possible choices are:
|
... |
Ignored. |
Value
A data.frame
containing summary statistics from a simulation study.
Examples
data(MIsim)
x <- simsum(
data = MIsim, estvarname = "b", true = 0.5, se = "se",
methodvar = "method"
)
get_data(x)
# Extracting only bias and coverage:
get_data(x, stats = c("bias", "cover"))
xs <- summary(x)
get_data(xs)
is.multisimsum
Description
Reports whether x is a multisimsum object
Usage
is.multisimsum(x)
Arguments
x |
An object to test. |
is.simsum
Description
Reports whether x is a simsum object
Usage
is.simsum(x)
Arguments
x |
An object to test. |
is.summary.multisimsum
Description
Reports whether x is a summary.multisimsum object
Usage
is.summary.multisimsum(x)
Arguments
x |
An object to test. |
is.summary.simsum
Description
Reports whether x is a summary.simsum object
Usage
is.summary.simsum(x)
Arguments
x |
An object to test. |
Create 'kable's
Description
Create tables in LaTeX, HTML, Markdown, or reStructuredText from objects of class simsum
, summary.simsum
, multisimsum
, summary.multisimsum
.
Usage
## S3 method for class 'simsum'
kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'summary.simsum'
kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'multisimsum'
kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'summary.multisimsum'
kable(x, stats = NULL, digits = max(3, getOption("digits") - 3), ...)
kable(x, ...)
Arguments
x |
An object of class |
stats |
Summary statistics to include. See |
digits |
Maximum number of digits for numeric columns; |
... |
Further arguments passed to |
See Also
Analyses of simulation studies with multiple estimands at once, including Monte Carlo error
Description
multisimsum
is an extension of simsum()
that can handle multiple estimated parameters at once.
multisimsum
calls simsum()
internally, each estimands at once.
There is only one new argument that must be set when calling multisimsum
: par
, a string representing the column of data
that identifies the different estimands.
Additionally, with multisimsum
the argument true
can be a named vector, where names correspond to each estimand (see examples).
Otherwise, constant values (or values identified by a column in data
) will be utilised.
See vignette("E-custom-inputs", package = "rsimsum")
for more details.
Usage
multisimsum(
data,
par,
estvarname,
se = NULL,
true = NULL,
methodvar = NULL,
ref = NULL,
by = NULL,
ci.limits = NULL,
df = NULL,
dropbig = FALSE,
x = FALSE,
control = list()
)
Arguments
data |
A |
par |
The name of the variable containing the methods to compare.
Can be |
estvarname |
The name of the variable containing the point estimates. Note that some column names are forbidden: these are listed below in the Details section. |
se |
The name of the variable containing the standard errors of the point estimates. Note that some column names are forbidden: these are listed below in the Details section. |
true |
The true value of the parameter; this is used in calculations of bias, relative bias, coverage, and mean squared error and is required whenever these performance measures are requested.
|
methodvar |
The name of the variable containing the methods to compare.
For instance, methods could be the models compared within a simulation study.
Can be |
ref |
Specifies the reference method against which relative precision will be calculated.
Only useful if |
by |
A vector of variable names to compute performance measures by a list of factors. Factors listed here are the (potentially several) data-generating mechanisms used to simulate data under different scenarios (e.g. sample size, true distribution of a variable, etc.).
Can be |
ci.limits |
Can be used to specify the limits (lower and upper) of confidence intervals used to calculate coverage and bias-eliminated coverage.
Useful for non-Wald type estimators (e.g. bootstrap).
Defaults to |
df |
Can be used to specify that a column containing the replication-specific number of degrees of freedom that will be used to calculate confidence intervals for coverage (and bias-eliminated coverage) assuming t-distributed critical values (rather than normal theory intervals).
See |
dropbig |
Specifies that point estimates or standard errors beyond the maximum acceptable values should be dropped. Defaults to |
x |
Set to |
control |
A list of parameters that control the behaviour of
|
Details
The following names are not allowed for estvarname
, se
, methodvar
, by
, par
: stat
, est
, mcse
, lower
, upper
, :methodvar
.
Value
An object of class multisimsum
.
Examples
data("frailty", package = "rsimsum")
ms <- multisimsum(
data = frailty,
par = "par", true = c(trt = -0.50, fv = 0.75),
estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist"
)
ms
Example of a simulation study on survival modelling
Description
A dataset from a simulation study with 150 data-generating mechanisms, useful to illustrate nested loop plots. This simulation study aims to compare the Cox model and flexible parametric models in a variety of scenarios: different baseline hazard functions, sample size, and varying amount of heterogeneity unaccounted for in the model (simulated as white noise with a given variance). A Cox model and a Royston-Parmar model with 5 degrees of freedom are fit to each replication.
Usage
nlp
Format
A data frame with 30,000 rows and 10 variables:
-
dgm
Data-generating mechanism, 1 to 150. -
i
Simulated dataset number. -
model
Method used, with 1 the Cox model and 2 the RP(5) model. -
b
Point estimate for the log-hazard ratio. -
se
Standard error of the point estimate. -
baseline
Baseline hazard function of the simulated dataset. -
ss
Sample size of the simulated dataset. -
esigma
Standard deviation of the white noise. -
pars
(Ancillary) Parameters of the baseline hazard function.
Note
Further details on this simulation study can be found in the R script used to generate this dataset, available on GitHub: https://github.com/ellessenne/rsimsum/blob/master/data-raw/nlp-data.R
References
Cox D.R. 1972. Regression models and life-tables. Journal of the Royal Statistical Society, Series B (Methodological) 34(2):187-220. doi:10.1007/978-1-4612-4380-9_37
Royston, P. and Parmar, M.K. 2002. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(15):2175-2197 doi:10.1002/sim.1203
Rücker, G. and Schwarzer, G. 2014. Presenting simulation results in a nested loop plot. BMC Medical Research Methodology 14:129 doi:10.1186/1471-2288-14-129
Examples
data("nlp", package = "rsimsum")
Compute number of simulations required
Description
The function nsim
computes the number of simulations B
to perform based on the accuracy of an estimate of interest, using the following equation:
B = \left( \frac{(Z_{1 - \alpha / 2} + Z_{1 - theta}) \sigma}{\delta} \right) ^ 2,
where \delta
is the specified level of accuracy of the estimate of interest you are willing to accept (i.e. the permissible difference from the true value \beta
), Z_{1 - \alpha / 2}
is the (1 - \alpha / 2)
quantile of the standard normal distribution, Z_{1 - \theta}
is the (1 - \theta)
quantile of the standard normal distribution with (1 - \theta)
being the power to detect a specific difference from the true value as significant, and \sigma ^ 2
is the variance of the parameter of interest.
Usage
nsim(alpha, sigma, delta, power = 0.5)
Arguments
alpha |
Significance level. Must be a value between 0 and 1. |
sigma |
Variance for the parameter of interest. Must be greater than 0. |
delta |
Specified level of accuracy of the estimate of interest you are willing to accept. Must be greater than 0. |
power |
Power to detect a specific difference from the true value as significant. Must be a value between 0 and 1. Defaults to 0.5, e.g. a power of 50%. |
Value
A scalar value B
representing the number of simulations to perform based on the accuracy required.
References
Burton, A., Douglas G. Altman, P. Royston. et al. 2006. The design of simulation studies in medical statistics. Statistics in Medicine 25: 4279-4292 doi:10.1002/sim.2673
Examples
# Number of simulations required to produce an estimate to within 5%
# accuracy of the true coefficient of 0.349 with a 5% significance level,
# assuming the variance of the estimate is 0.0166 and 50% power:
nsim(alpha = 0.05, sigma = sqrt(0.0166), delta = 0.349 * 5 / 100, power = 0.5)
# Number of simulations required to produce an estimate to within 1%
# accuracy of the true coefficient of 0.349 with a 5% significance level,
# assuming the variance of the estimate is 0.0166 and 50% power:
nsim(alpha = 0.05, sigma = sqrt(0.0166), delta = 0.349 * 1 / 100, power = 0.5)
print.multisimsum
Description
Print method for multisimsum objects
Usage
## S3 method for class 'multisimsum'
print(x, ...)
Arguments
x |
An object of class |
... |
Ignored. |
Examples
data(frailty)
ms <- multisimsum(
data = frailty, par = "par", true = c(
trt = -0.50,
fv = 0.75
), estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist"
)
ms
data("frailty", package = "rsimsum")
frailty$true <- ifelse(frailty$par == "trt", -0.50, 0.75)
ms <- multisimsum(data = frailty, par = "par", estvarname = "b", true = "true")
ms
print.simsum
Description
Print method for simsum objects
Usage
## S3 method for class 'simsum'
print(x, ...)
Arguments
x |
An object of class |
... |
Ignored. |
Examples
data("MIsim")
x <- simsum(
data = MIsim, estvarname = "b", true = 0.5, se = "se",
methodvar = "method"
)
x
MIsim$true <- 0.5
x <- simsum(data = MIsim, estvarname = "b", true = "true", se = "se")
x
print.summary.multisimsum
Description
Print method for summary.multisimsum
objects
Usage
## S3 method for class 'summary.multisimsum'
print(x, digits = 4, mcse = TRUE, ...)
Arguments
x |
An object of class |
digits |
Number of significant digits used for printing. Defaults to 4. |
mcse |
Should Monte Carlo standard errors be reported?
If |
... |
Ignored. |
Examples
data(frailty)
ms <- multisimsum(
data = frailty, par = "par", true = c(
trt = -0.50,
fv = 0.75
), estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist"
)
sms <- summary(ms, stats = c("bias", "cover", "mse"))
sms
# Printing less significant digits:
print(sms, digits = 3)
# Printing confidence intervals:
print(sms, digits = 3, mcse = FALSE)
# Printing values only:
print(sms, mcse = NULL)
print.summary.simsum
Description
Print method for summary.simsum
objects
Usage
## S3 method for class 'summary.simsum'
print(x, digits = 4, mcse = TRUE, ...)
Arguments
x |
An object of class |
digits |
Number of significant digits used for printing. Defaults to 4. |
mcse |
Should Monte Carlo standard errors be reported?
If |
... |
Ignored. |
Examples
data("MIsim")
x <- simsum(
data = MIsim, estvarname = "b", true = 0.5, se = "se",
methodvar = "method"
)
xs <- summary(x)
xs
# Printing less significant digits:
print(xs, digits = 2)
# Printing confidence intervals:
print(xs, mcse = FALSE)
# Printing values only:
print(xs, mcse = NULL)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
- generics
Example of a simulation study on survival modelling
Description
A dataset from a simulation study assessing the impact of misspecifying the baseline hazard in survival models on regression coefficients.
One thousand datasets were simulated, each containing a binary treatment variable with a log-hazard ratio of -0.50.
Survival data was simulated for two different sample sizes, 50 and 250 individuals, and under two different baseline hazard functions, exponential and Weibull.
Consequently, a Cox model (Cox, 1972), a fully parametric exponential model, and a Royston-Parmar (Royston and Parmar, 2002) model with two degrees of freedom were fit to each simulated dataset.
See vignette("B-relhaz", package = "rsimsum")
for more information.
Usage
relhaz
Format
A data frame with 1,200 rows and 6 variables:
-
dataset
Simulated dataset number. -
n
Sample size of the simulate dataset. -
baseline
Baseline hazard function of the simulated dataset. -
model
Method used (Cox
,Exp
, orRP(2)
). -
theta
Point estimate for the log-hazard ratio. -
se
Standard error of the point estimate.
References
Cox D.R. 1972. Regression models and life-tables. Journal of the Royal Statistical Society, Series B (Methodological) 34(2):187-220. doi:10.1007/978-1-4612-4380-9_37
Royston, P. and Parmar, M.K. 2002. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21(15):2175-2197 doi:10.1002/sim.1203
Examples
data("relhaz", package = "rsimsum")
Analyses of simulation studies including Monte Carlo error
Description
simsum()
computes performance measures for simulation studies in which each simulated data set yields point estimates by one or more analysis methods.
Bias, relative bias, empirical standard error and precision relative to a reference method can be computed for each method.
If, in addition, model-based standard errors are available then simsum()
can compute the average model-based standard error, the relative error in the model-based standard error, the coverage of nominal confidence intervals, the coverage under the assumption that there is no bias (bias-eliminated coverage), and the power to reject a null hypothesis.
Monte Carlo errors are available for all estimated quantities.
Usage
simsum(
data,
estvarname,
se = NULL,
true = NULL,
methodvar = NULL,
ref = NULL,
by = NULL,
ci.limits = NULL,
df = NULL,
dropbig = FALSE,
x = FALSE,
control = list()
)
Arguments
data |
A |
estvarname |
The name of the variable containing the point estimates. Note that some column names are forbidden: these are listed below in the Details section. |
se |
The name of the variable containing the standard errors of the point estimates. Note that some column names are forbidden: these are listed below in the Details section. |
true |
The true value of the parameter; this is used in calculations of bias, relative bias, coverage, and mean squared error and is required whenever these performance measures are requested.
|
methodvar |
The name of the variable containing the methods to compare.
For instance, methods could be the models compared within a simulation study.
Can be |
ref |
Specifies the reference method against which relative precision will be calculated.
Only useful if |
by |
A vector of variable names to compute performance measures by a list of factors. Factors listed here are the (potentially several) data-generating mechanisms used to simulate data under different scenarios (e.g. sample size, true distribution of a variable, etc.).
Can be |
ci.limits |
Can be used to specify the limits (lower and upper) of confidence intervals used to calculate coverage and bias-eliminated coverage.
Useful for non-Wald type estimators (e.g. bootstrap).
Defaults to |
df |
Can be used to specify that a column containing the replication-specific number of degrees of freedom that will be used to calculate confidence intervals for coverage (and bias-eliminated coverage) assuming t-distributed critical values (rather than normal theory intervals).
See |
dropbig |
Specifies that point estimates or standard errors beyond the maximum acceptable values should be dropped. Defaults to |
x |
Set to |
control |
A list of parameters that control the behaviour of
|
Details
The following names are not allowed for any column in data
that is passed to simsum()
: stat
, est
, mcse
, lower
, upper
, :methodvar
, :true
.
Value
An object of class simsum
.
References
White, I.R. 2010. simsum: Analyses of simulation studies including Monte Carlo error. The Stata Journal 10(3): 369-385. https://www.stata-journal.com/article.html?article=st0200
Morris, T.P., White, I.R. and Crowther, M.J. 2019. Using simulation studies to evaluate statistical methods. Statistics in Medicine, doi:10.1002/sim.8086
Gasparini, A. 2018. rsimsum: Summarise results from Monte Carlo simulation studies. Journal of Open Source Software 3(26):739, doi:10.21105/joss.00739
Examples
data("MIsim", package = "rsimsum")
s <- simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method", ref = "CC")
# If 'ref' is not specified, the reference method is inferred
s <- simsum(data = MIsim, estvarname = "b", true = 0.5, se = "se", methodvar = "method")
Summarising multisimsum objects
Description
The summary()
method for objects of class multisimsum
returns confidence intervals for performance measures based on Monte Carlo standard errors.
Usage
## S3 method for class 'multisimsum'
summary(object, ci_level = 0.95, df = NULL, stats = NULL, ...)
Arguments
object |
An object of class |
ci_level |
Significance level for confidence intervals based on Monte Carlo standard errors. Ignored if a |
df |
Degrees of freedom of a t distribution that will be used to calculate confidence intervals based on Monte Carlo standard errors.
If |
stats |
Summary statistics to include; can be a scalar value or a vector (for multiple summary statistics at once). Possible choices are:
|
... |
Ignored. |
Value
An object of class summary.multisimsum
.
See Also
multisimsum()
, print.summary.multisimsum()
Examples
data(frailty)
ms <- multisimsum(
data = frailty, par = "par", true = c(
trt = -0.50,
fv = 0.75
), estvarname = "b", se = "se", methodvar = "model",
by = "fv_dist"
)
sms <- summary(ms)
sms
Summarising simsum objects
Description
The summary()
method for objects of class simsum
returns confidence intervals for performance measures based on Monte Carlo standard errors.
Usage
## S3 method for class 'simsum'
summary(object, ci_level = 0.95, df = NULL, stats = NULL, ...)
Arguments
object |
An object of class |
ci_level |
Significance level for confidence intervals based on Monte Carlo standard errors. Ignored if a |
df |
Degrees of freedom of a t distribution that will be used to calculate confidence intervals based on Monte Carlo standard errors. If |
stats |
Summary statistics to include; can be a scalar value or a vector (for multiple summary statistics at once). Possible choices are:
Defaults to |
... |
Ignored. |
Value
An object of class summary.simsum
.
See Also
simsum()
, print.summary.simsum()
Examples
data("MIsim")
object <- simsum(
data = MIsim, estvarname = "b", true = 0.5, se = "se",
methodvar = "method"
)
xs <- summary(object)
xs
Turn an object into a tidy dataset
Description
Extract a tidy dataset with results from an object of class simsum
, summary.simsum
, multisimsum
, or summary.multisimsum
.
Usage
## S3 method for class 'simsum'
tidy(x, stats = NULL, ...)
## S3 method for class 'summary.simsum'
tidy(x, stats = NULL, ...)
## S3 method for class 'multisimsum'
tidy(x, stats = NULL, ...)
## S3 method for class 'summary.multisimsum'
tidy(x, stats = NULL, ...)
Arguments
x |
An object of class |
stats |
Summary statistics to include; can be a scalar value or a vector. Possible choices are:
|
... |
Ignored. |
Value
A data.frame
containing summary statistics from a simulation study.
Examples
data(MIsim)
x <- simsum(
data = MIsim, estvarname = "b", true = 0.5, se = "se",
methodvar = "method"
)
tidy(x)
# Extracting only bias and coverage:
tidy(x, stats = c("bias", "cover"))
xs <- summary(x)
tidy(xs)
Example of a simulation study on the t-test
Description
A dataset from a simulation study with 4 data-generating mechanisms, useful to illustrate custom input of confidence intervals to calculate coverage probability. This simulation study aims to compare the t-test assuming pooled or unpooled variance in violation (or not) of the t-test assumptions: normality of data, and equality (or not) or variance between groups. The true value of the difference between groups is -1.
Usage
tt
Format
A data frame with 4,000 rows and 8 variables:
-
diff
The difference in mean between groups estimated by the t-test; -
se
Standard error of the estimated difference; -
conf.low
,conf.high
Confidence interval for the difference in mean as reported by the t-test; -
df
The number of degrees of freedom assumed by the t-test; -
repno
Identifies each replication, between 1 and 500; -
dgm
Identifies each data-generating mechanism: 1 corresponds to normal data with equal variance between the groups, 2 is normal data with unequal variance, 3 and 4 are skewed data (simulated from a Gamma distribution) with equal and unequal variance between groups, respectively; -
method
Analysis method: 1 represents the t-test with pooled variance, while 2 represents the t-test with unpooled variance.
Note
Further details on this simulation study can be found in the R script used to generate this dataset, available on GitHub: https://github.com/ellessenne/rsimsum/blob/master/data-raw/tt-data.R
Examples
data("tt", package = "rsimsum")