Type: | Package |
Title: | Simulation Tools for Small Area Estimation |
Version: | 0.11.0 |
URL: | https://wahani.github.io/saeSim/ |
BugReports: | https://github.com/wahani/saeSim/issues |
Depends: | R(≥ 3.1), methods |
Imports: | dplyr (≥ 0.2), functional, ggplot2, grDevices, MASS, utils, spdep, stats, parallelMap, tibble |
Suggests: | testthat, knitr, rmarkdown, covr |
Description: | Tools for the simulation of data in the context of small area estimation. Combine all steps of your simulation - from data generation over drawing samples to model fitting - in one object. This enables easy modification and combination of different scenarios. You can store your results in a folder or start the simulation in parallel. |
License: | MIT + file LICENSE |
VignetteBuilder: | knitr |
RoxygenNote: | 7.1.1 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2022-02-07 16:19:44 UTC; lswarnholz |
Author: | Sebastian Warnholz [aut, cre], Timo Schmid [aut] |
Maintainer: | Sebastian Warnholz <wahani@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2022-02-07 16:40:02 UTC |
Piping operator
Description
This is the 'pipe operator' from the package 'magrittr'. Use it to chain all operations for the simulation together. See the original documentation for details: %>%.
Usage
lhs %>% rhs
Arguments
lhs |
The value to be piped |
rhs |
A function or expression |
Aggregation function
Description
This function is intended to be used with sim_agg
and not interactively. This is one implementation for aggregating data in a simulation set-up.
Usage
agg_all(groupVars = "idD")
Arguments
groupVars |
variable names as character identifying groups to be aggregated. |
Details
This function follows the split-apply-combine idiom. Each data set is split by the defined variables. Then the variables within each subset are aggregated (reduced to one row). Logical variables are reduced by any
; for characters and factors dummy variables are created and the aggregate is the mean of each dummy; and for numerics the mean (removing NAs).
See Also
Examples
sim_base() %>% sim_gen_x() %>% sim_gen_e() %>% sim_agg(agg_all())
as.data.frame method for sim_setup
Description
Use this method to get a single simulated data.frame out of a sim_setup object.
Usage
## S3 method for class 'sim_setup'
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
Arguments
x |
a sim_setup |
row.names |
will have no effect |
optional |
will have no effect |
... |
will have no effect |
Autoplot method
Description
Use this function to produce plots for an object of class sim_setup
and you like to have plots based on ggplot2. At this time it is a ggplot2 implementation which mimics the behavior of smoothScatter
without all the options.
Usage
## S3 method for class 'sim_setup'
autoplot(object, x = "x", y = "y", ...)
Arguments
object |
a sim_setup |
x |
character of variable name in the data on the x-axis |
y |
character of variable name in the data on the y-axis |
... |
is not used |
Examples
## Not run:
autoplot(sim_base_lm())
## End(Not run)
Add id-variables to data
Description
Use this function to add id-variables to your data.
Usage
base_add_id(data, domainId)
Arguments
data |
a data.frame. |
domainId |
variable names in |
Construct data with id-variables
Description
This function constructs a data frame with grouping/id
variables.
Usage
base_id(nDomains = 10, nUnits = 10)
base_id_temporal(nDomains = 10, nUnits = 10, nTime = 10)
Arguments
nDomains |
The number of domains. |
nUnits |
The number of units in each domain. Can have |
nTime |
The number of time points for each units. |
Value
Return a data.frame
with variables idD
as ID-variable for
domains, and idU
as ID-variable for units.
Examples
base_id(2, 2)
base_id(2, c(2, 3))
Compute variables in data
Description
This function is intended to be used with sim_comp_pop
, sim_comp_sample
or sim_comp_agg
and not interactively. This is a wrapper around mutate
Usage
comp_var(...)
Arguments
... |
variables interpreted in the context of that data frame. |
See Also
sim_comp_pop
, sim_comp_sample
, sim_comp_agg
Examples
sim_base_lm() %>% sim_comp_pop(comp_var(yExp = exp(y)))
Generator functions
Description
These functions are intended to be used with sim_gen
and not
interactively. They are designed to draw random numbers according to the
setting of grouping variables.
Usage
gen_norm(mean = 0, sd = 1, name = "e")
gen_v_norm(mean = 0, sd = 1, name = "v")
gen_v_sar(mean = 0, sd = 1, rho = 0.5, type = "rook", name)
gen_v_ar1(mean = 0, sd = 1, rho = 0.5, groupVar = "idD", timeVar = "idT", name)
gen_generic(generator, ..., groupVars = NULL, name)
Arguments
mean |
the mean passed to the random number generator, for example
|
sd |
the standard deviation passed to the random number generator, for example rnorm. |
name |
name of variable as character in which random numbers are stored. |
rho |
the correlation used to create the variance covariance matrix for
a SAR process - see |
type |
either "rook" or "queen". See |
groupVar |
a variable name identifying groups. |
timeVar |
a variable name identifying repeated measurements. |
generator |
a function producing random numbers. |
... |
arguments passed to |
groupVars |
names of variables as character. Identify groups within random numbers are constant. |
Details
gen_norm
is used to draw random numbers from a normal
distribution where all generated numbers are independent.
gen_v_norm
and gen_v_sar
will create an area-level random
component. In the case of v_norm
, the error component will be from a
normal distribution and i.i.d. from an area-level perspective (all units in
an area will have the same value, all areas are independent). v_sar will also
be from a normal distribution, but the errors are correlated. The variance
covariance matrix is constructed for a SAR(1) - spatial/simultanous
autoregressive process. mvrnorm is used for the random number
generation. gen_v_norm
and gen_v_sar
expect a variable
idD
in the data identifying the areas.
gen_generic
can be used if your world is not normal. You can specify
'any' function as generator, like rnorm
. Arguments in
...
are matched by name or position. The first argument of
generator
is expected to be the number of random numbers (not
necessarily named n
) and need not to be specified.
See Also
sim_gen
, sim_gen_x
,
sim_gen_e
, sim_gen_ec
, sim_gen_v
,
sim_gen_vc
, cell2nb
Examples
sim_base() %>% sim_gen_x() %>% sim_gen_e() %>% sim_gen_v() %>% sim_gen(gen_v_sar(name = "vSP"))
# Generic interface
set.seed(1)
dat1 <- sim(base_id() %>%
sim_gen(gen_generic(rnorm, mean = 0, sd = 4, name = "e")))
set.seed(1)
dat2 <- sim(base_id() %>% sim_gen_e())
all.equal(dat1, dat2)
Plotting methods
Description
Use this function to produce plots for an object of class sim_setup
.
Usage
## S3 method for class 'sim_setup'
plot(x, y, ...)
Arguments
x |
a |
y |
will be ignored |
... |
Arguments to be passed to |
See Also
Sampling functions
Description
These functions are intended to be used with sim_sample
and not
interactively. They are wrappers around sample_frac and
sample_n.
Usage
sample_fraction(size, replace = FALSE, weight = NULL, groupVars = NULL)
sample_number(size, replace = FALSE, weight = NULL, groupVars = NULL)
sample_numbers(size, replace = FALSE, groupVars = NULL)
sample_cluster_number(size, replace = FALSE, weight = NULL, groupVars)
sample_cluster_fraction(size, replace = FALSE, weight = NULL, groupVars)
Arguments
size |
< |
replace |
Sample with or without replacement? |
weight |
< |
groupVars |
character with names of variables to be used for grouping. |
Details
sample_numbers
is a vectorized version of sample_number
.
sample_cluster_number
and sample_cluster_fraction
will sample
clusters (all units in a cluster).
Examples
sim_base_lm() %>% sim_sample(sample_number(5))
sim_base_lm() %>% sim_sample(sample_fraction(0.5))
sim_base_lm() %>% sim_sample(sample_cluster_number(5, groupVars = "idD"))
sim_base_lm() %>% sim_sample(sample_cluster_fraction(0.5, groupVars = "idD"))
Show for sim_setup
Description
This is the documentation for the show methods in the package saeSim
. In case you don't know, show
is for S4-classes like print
for S3. If you don't know what that means, don't bother, there is no reason to call show
directly, however there is the need to document it.
Usage
## S4 method for signature 'sim_setup'
show(object)
## S4 method for signature 'summary.sim_setup'
show(object)
Arguments
object |
Any R object |
Details
Will print the head of a sim_setup
to the console, after converting it to a data.frame
.
Start simulation
Description
This function will start the simulation. Use the printing method as long as you are testing the scenario.
Usage
sim(
x,
R = 1,
path = NULL,
overwrite = TRUE,
...,
suffix = NULL,
fileExt = ".csv",
libs = NULL,
exports = NULL
)
Arguments
x |
a |
R |
number of repetitions. |
path |
optional path in which the simulation results can be saved. They
will we coerced to a |
overwrite |
|
... |
arguments passed to |
suffix |
an optional suffix of file names. |
fileExt |
the file extension. Default is ".csv" - alternative it can be ".RData". |
libs |
arguments passed to |
exports |
arguments passed to |
Details
The package parallelMap is utilized as back-end for parallel computations.
Use the argument path
to store the simulation results in a directory.
This may be a good idea for long running simulations and for those using
large data.frame
s. You can use sim_read_data
to read
them in. The return value will change to NULL in each run.
Value
The return value is a list. The elements are the results of each
simulation run, typically of class data.frame
. In case you specified
path
, each element is NULL
.
Examples
setup <- sim_base_lm()
resultList <- sim(setup, R = 1)
# For parallel computations you may need to export objects
localFun <- function() cat("Hello World!")
comp_fun <- function(dat) {
localFun()
dat
}
res <- sim_base_lm() %>%
sim_comp_pop(comp_fun) %>%
sim(R = 2,
mode = "socket", cpus = 2,
exports = "localFun")
str(res)
Aggregation component
Description
One of the components which can be added to a simulation set-up. Aggregating the data is a simulation component which can be used to aggregate the population or sample. The aggregation will simply be done after the sampling, if you haven't specified any sampling component, the population is aggregated (makes sense if you draw samples directly from the model).
Usage
sim_agg(simSetup, aggFun = agg_all())
Arguments
simSetup |
a |
aggFun |
function which controls the aggregation process. At the moment only |
Details
Potentially you can define an aggFun
yourself. Take care that it only has one argument, named dat
, and returns the aggregated data as data.frame
.
See Also
agg_all
, sim_gen
, sim_comp_pop
, sim_sample
, , sim_comp_sample
Examples
# Aggregating the population:
sim_base_lm() %>% sim_agg()
# Aggregating after sampling:
sim_base_lm() %>% sim_sample() %>% sim_agg()
# User aggFun:
sim_base_lm() %>% sim_agg(function(dat) dat[1, ])
Base component
Description
Use the 'sim_base' functions to start a new sim_setup
.
Usage
sim_base(data = base_id(100, 100))
Arguments
data |
a |
Examples
# Example for a linear model:
sim_base() %>% sim_gen_x() %>% sim_gen_e()
Preconfigured set-ups
Description
sim_base_lm()
will start a linear model: One regressor, one error component. sim_base_lmm()
will start a linear mixed model: One regressor, one error component and one random effect for the domain. sim_base_lmc()
and sim_base_lmmc()
add outlier contamination to the scenarios. Use these as a quick start, then you probably want to configure your own scenario.
Usage
sim_base_lm()
sim_base_lmm()
sim_base_lmc()
sim_base_lmmc()
Details
Additional information on the generated variables:
nDomains: 100 domains
nUnits: 100 in each domain
x: is normally distributed with mean of 0 and sd of 4
e: is normally distributed with mean of 0 and sd of 4
v: is normally distributed with mean of 0 and sd of 1, it is a constant within domains
e-cont: as e; probability of unit to be contaminated is 0.05; sd is then 150
v-cont: as v; probability of area to be contaminated is 0.05; sd is then 40
y = 100 + x + v + e
Examples
# The preconfigured set-ups:
sim_base_lm()
sim_base_lmm()
sim_base_lmc()
sim_base_lmmc()
Preconfigured computation components
Description
sim_comp_n
and sim_comp_N
will add the sample and population size in each domain respectively. sim_comp_popMean
and sim_comp_popVar
the population mean and variance of the variable y
. The data is expected to have a variable idD
identifying domains.
Usage
sim_comp_n(simSetup)
sim_comp_N(simSetup)
sim_comp_popMean(simSetup)
sim_comp_popVar(simSetup)
Arguments
simSetup |
a |
Calculation component
Description
One of the components which can be added to a sim_setup
. These functions can be used for adding new variables to the data.
Usage
sim_comp_pop(simSetup, fun = comp_var(), by = "")
sim_comp_sample(simSetup, fun = comp_var(), by = "")
sim_comp_agg(simSetup, fun = comp_var(), by = "")
Arguments
simSetup |
a |
fun |
a function, see details. |
by |
names of variables as character; identifying groups for which fun is applied. |
Details
Potentially you can define a function for computation yourself. Take care that it only has one argument, named dat
, and returns a data.frame
. Use comp_var
for simple data manipulation. Functions added with sim_comp_pop
are applied before sampling; sim_comp_sample
after sampling. Functions added with sim_comp_agg
after aggregation.
See Also
comp_var
, sim_gen
, sim_agg
, sim_sample
, sim_comp_N
, sim_comp_n
, sim_comp_popMean
, sim_comp_popVar
Examples
# Standard behavior
sim_base() %>% sim_gen_x() %>% sim_comp_N()
# Custom data modifications
## Add predicted values of a linear model
library(saeSim)
comp_lm <- function(dat) {
dat$linearPredictor <- predict(lm(y ~ x, data = dat))
dat
}
sim_base_lm() %>% sim_comp_pop(comp_lm)
# or if applied after sampling
sim_base_lm() %>% sim_sample() %>% sim_comp_pop(comp_lm)
Generation component
Description
One of the components which can be added to a sim_setup
.
Usage
sim_gen(simSetup, generator)
sim_gen_generic(simSetup, ...)
Arguments
simSetup |
a |
generator |
generator function used to generate random numbers. |
... |
arguments passed to |
Details
Potentially you can define a generator
yourself. Take care that it has one argument, named dat
, and returns a data.frame
. sim_gen_generic
is a shortcut to gen_generic
.
See Also
gen_norm
, gen_v_norm
, gen_v_sar
, sim_agg
, , sim_comp_pop
, sim_sample
, sim_gen_x
, sim_gen_e
, sim_gen_v
, sim_gen_vc
, sim_gen_ec
Examples
# Data setup for a mixed model
sim_base() %>% sim_gen_x() %>% sim_gen_v() %>% sim_gen_e()
# Adding contamination in the model error
sim_base() %>% sim_gen_x() %>% sim_gen_v() %>% sim_gen_e() %>% sim_gen_ec()
# Simple user defined generator:
gen_myVar <- function(dat) {
dat["myVar"] <- rnorm(nrow(dat))
dat
}
sim_base() %>% sim_gen_x() %>% sim_gen(gen_myVar)
# And a chi-sq(5) distributed 'random-effect':
sim_base() %>% sim_gen_generic(rchisq, df = 5, groupVars = "idD", name = "re")
Generation Component for contamination
Description
One of the components which can be added to a sim_setup
. It is applied after functions added with sim_gen
.
Usage
sim_gen_cont(simSetup, generator, nCont, type, areaVar = NULL, fixed = TRUE)
Arguments
simSetup |
a |
generator |
generator function used to generate random numbers. |
nCont |
gives the number of contaminated observations. Values between 0
and 1 will be treated as probability. If type is 'unit' and length is
larger than 1, the expected length is the number of areas. If type is
'area' and length is larger than 1 the values are interpreted as area
positions; i.e. |
type |
"unit" or "area" - unit- or area-level contamination. |
areaVar |
character with variable name(s) identifying areas. |
fixed |
TRUE fixes the observations which will be contaminated. FALSE will result in a random selection of observations or areas. |
See Also
Examples
sim_base_lm() %>%
sim_gen_cont(gen_norm(name = "e"), nCont = 0.05, type = "unit", areaVar = "idD") %>%
as.data.frame
Preconfigured generation components
Description
These are some preconfigured generation components and all wrappers around sim_gen
and sim_gen_cont
.
Usage
sim_gen_x(simSetup, mean = 0, sd = 4, name = "x")
sim_gen_e(simSetup, mean = 0, sd = 4, name = "e")
sim_gen_ec(
simSetup,
mean = 0,
sd = 150,
name = "e",
nCont = 0.05,
type = "unit",
areaVar = "idD",
fixed = TRUE
)
sim_gen_v(simSetup, mean = 0, sd = 1, name = "v")
sim_gen_vc(
simSetup,
mean = 0,
sd = 40,
name = "v",
nCont = 0.05,
type = "area",
areaVar = "idD",
fixed = TRUE
)
Arguments
simSetup |
a |
mean |
the mean passed to the random number generator, for example
|
sd |
the standard deviation passed to the random number generator, for example rnorm. |
name |
name of variable as character in which random numbers are stored. |
nCont |
gives the number of contaminated observations. Values between 0
and 1 will be treated as probability. If type is 'unit' and length is
larger than 1, the expected length is the number of areas. If type is
'area' and length is larger than 1 the values are interpreted as area
positions; i.e. |
type |
"unit" or "area" - unit- or area-level contamination. |
areaVar |
character with variable name(s) identifying areas. |
fixed |
TRUE fixes the observations which will be contaminated. FALSE will result in a random selection of observations or areas. |
Details
x
: fixed-effect component; e
: model-error; ec
: contaminated model error; v
: random-effect (error constant for each domain); vc
contaminated random-effect. Note that for contamination you are expected to add both, a non-contaminated component and a contaminated component.
Read in simulated data
Description
Functions to read in simulation data from folder. Can be csv or RData files.
Usage
sim_read_data(path, ..., returnList = FALSE)
sim_clear_data(path, ...)
sim_read_list(path)
sim_clear_list(path)
Arguments
path |
path to the files you want to read in. |
... |
arguments passed to |
returnList |
if |
Response component
Description
One of the components which can be added to a sim_setup
.
Usage
sim_resp(simSetup, respFun)
sim_resp_eq(simSetup, ...)
Arguments
simSetup |
a |
respFun |
a function constructing the response variable |
... |
< The value can be:
|
Details
Potentially you can define an respFun
yourself. Take care that it only has one argument, named dat
, and returns the a data.frame
.
See Also
agg_all
, sim_gen
, sim_comp_pop
, sim_sample
, , sim_comp_sample
Examples
base_id() %>% sim_gen_x() %>% sim_gen_e() %>% sim_resp_eq(y = 100 + 2 * x + e)
Sampling component
Description
One of the components which can be added to a sim_setup
. This component can be used to add a sampling mechanism to the simulation set-up. A sample will be drawn after the population is generated (sim_gen
) and variables on the population are computed (sim_comp_pop
).
Usage
sim_sample(simSetup, smplFun = sample_number(size = 5L, groupVars = "idD"))
Arguments
simSetup |
a |
smplFun |
function which controls the sampling process. |
Details
Potentially you can define a smplFun
yourself. Take care that it has one argument, named dat
being the data as data.frame, and returns the sample as data.frame.
See Also
sample_number
, sample_fraction
Examples
# Simple random sample - 5% sample:
sim_base_lm() %>% sim_sample(sample_fraction(0.05))
# Simple random sampling proportional to size - 5% in each domain:
sim_base_lm() %>% sim_sample(sample_fraction(0.05, groupVars = "idD"))
# User defined sampling function:
sample_mySampleFun <- function(dat) {
dat[sample.int(nrow(dat), 10), ]
}
sim_base_lm() %>% sim_sample(sample_mySampleFun)
Add a name to a sim_setup
Description
Use this function to add a name to a sim_setup
in case you are simulating different scenarios. This name will be added if you use the function sim for simulation
Usage
sim_simName(simSetup, name)
Arguments
simSetup |
a |
name |
a character |
Examples
sim_base_lm() %>% sim_simName("newName")
Summary for a sim_setup
Description
Reports a summary of the simulation setup.
Usage
## S4 method for signature 'sim_setup'
summary(object, ...)
Arguments
object |
a |
... |
has no effect. |
Examples
summary(sim_base_lm())