Title: Simulate Data for Discrete Choice Experiments
Version: 0.3.0
Description: Supports simulating choice experiment data for given designs. It helps to quickly test different designs against each other and compare the performance of new models. The goal of 'simulateDCE' is to make it easy to simulate choice experiment datasets using designs from 'NGENE', 'idefix' or 'spdesign'. You have to store the design file(s) in a sub-directory and need to specify certain parameters and the utility functions for the data generating process. For more details on choice experiments see Mariel et al. (2021) <doi:10.1007/978-3-030-62669-3>.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: dplyr (≥ 1.1.4), evd, formula.tools, ggplot2, kableExtra, magrittr, mixl, psych, purrr, readr, rmarkdown, stats, utils, stringr, tibble, tictoc, tidyr, future, furrr, qs, data.table
Suggests: knitr, testthat (≥ 3.0.0), rlang
Config/testthat/edition: 3
Depends: R (≥ 4.1.0)
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-07-05 12:59:50 UTC; dj44vuri
Author: Julian Sagebiel ORCID iD [aut, cre]
Maintainer: Julian Sagebiel <julian.sagebiel@idiv.de>
Repository: CRAN
Date/Publication: 2025-07-09 10:30:02 UTC

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling rhs(lhs).


Aggregate Simulation Results

Description

Processes the simulation results to extract summaries, coefficients, and graphs.

Usage

aggregateResults(all_designs, fromfolder = NULL)

Arguments

all_designs

A list of simulation results from sim_choice. Can contain different designs but need to have the common structure returned by simchoice

fromfolder

A folder from where to read simulations. If provided, the function will read all .qs files from the folder and process them. The files are usually saved by your earlier work and should be qs files as they are more efficient that rds files.

Value

A list with aggregated results including summary, coefficients, graphs, and power.


Create a Dataset for Choice Experiment Analysis

Description

This function takes a design matrix and generates a dataset for use in choice experiments. It handles blocks, replicates the design for the number of respondents, and assigns respondent IDs.

Usage

createDataset(design, respondents)

Arguments

design

A data frame containing the design matrix for the choice experiment. It should include at least the columns Choice.situation and optionally Block.

respondents

The number of respondents to generate data for.

Details

The function performs the following steps:

Value

A data frame containing the augmented design matrix with additional columns:

ID

A unique identifier for each respondent.

Choice.situation

The original choice situations, replicated for respondents.

Other columns

All original columns in the input design are retained.

Examples

# Example usage:
design <- data.frame(
  Choice.situation = rep(1:12),
  Attribute1 = rnorm(12),
  Attribute2 = sample(1:3, 12, replace = TRUE)
)
result <- createDataset(design, 10)

Title Extracts beta values from an spdesign object

Description

Title Extracts beta values from an spdesign object

Usage

extract_b_values(input_list)

Arguments

input_list

the list where the parameters are stored. Usually this is design$utility

Value

A named list with parameter values which can be used in sim_all

Examples

d <- system.file("extdata", "CSA", "linear", "BLIeff.RDS", package = "simulateDCE")
extract_b_values(readRDS(d)$utility)


Modify bcoeff Names

Description

This function modifies the names of beta coefficients (bcoeff) by removing dots and underscores to ensure consistent naming. If the names are already modified, the function skips processing.

Usage

modify_bcoeff_names(bcoeff)

Arguments

bcoeff

A named list of beta coefficients. The names of this list are the coefficients used in the utility function.

Value

A list containing:

bcoeff

The modified bcoeff list with updated names.

bcoeff_lookup

A tibble mapping the original names to the modified names.


Creates a dataframe with the design.

Description

Creates a dataframe with the design.

Usage

readdesign(design = designfile, designtype = NULL, destype = NULL)

Arguments

design

The path to a design file

designtype

Is it a design created with ngene, spdesign or idefix. use 'ngene', 'spdesign' or 'idefix. Ngene designs should be stored as the standard .ngd output. spdesign should be the spdesign object stored as an RDS file. Idefix objects should also be stored as an RDS file. If designtype is not specified, I try to guess what it is. This is especially helpful if you want to carry out a simulation for both spdesign designs and ngene designs at the same time.

destype

Deprecated. Use designtype instead.

Value

a dataframe

Examples

library(simulateDCE)
mydesign <- readdesign(
  system.file("extdata", "agora", "altscf_eff.ngd", package = "simulateDCE"),
  "ngene"
)

print(mydesign)


Is a wrapper for sim_choice executing the simulation over all designs stored in a specific folder update

Description

Is a wrapper for sim_choice executing the simulation over all designs stored in a specific folder update

Usage

sim_all(
  nosim = 2,
  resps,
  designtype = NULL,
  destype = NULL,
  designpath,
  u,
  bcoeff,
  decisiongroups = c(0, 1),
  manipulations = list(),
  estimate = TRUE,
  chunks = 1,
  utility_transform_type = "simple",
  reshape_type = "auto",
  mode = c("parallel", "sequential"),
  preprocess_function = NULL,
  savefile = NULL
)

Arguments

nosim

Number of runs or simulations. For testing use 2 but once you go serious, use at least 200, for better results use 2000.

resps

Number of respondents you want to simulate

designtype

Is it a design created with ngene, spdesign or idefix. use 'ngene', 'spdesign' or 'idefix. Ngene designs should be stored as the standard .ngd output. spdesign should be the spdesign object stored as an RDS file. Idefix objects should also be stored as an RDS file. If designtype is not specified, I try to guess what it is. This is especially helpful if you want to carry out a simulation for both spdesign designs and ngene designs at the same time.

destype

Deprecated. Use designtype instead.

designpath

The path to the folder where the designs are stored. For example "c:/myfancydec/Designs"

u

A list with utility functions. The list can incorporate as many decision rule groups as you want. However, each group must be in a list in this list. If you just use one group (the normal), this group still has to be in a list in the u list. As a convention name beta coefficients starting with a lower case "b"

bcoeff

List of initial coefficients for the utility function. List content/length can vary based on application. I ideally begins (but does not have to) with b and need be the same as those entered in the utility functions

decisiongroups

A vector showing how decision groups are numerically distributed

manipulations

A variable to alter terms of the utility functions examples may be applying a factor or applying changes to terms selectively for different groups

estimate

If TRUE models will be estimated. If false only a dataset will be simulated. Default is true

chunks

The number of chunks determines how often results should be stored on disk as a safety measure to not loose simulations if models have already been estimated. For example, if no_sim is 100 and chunks = 2, the data will be saved on disk after 50 and after 100 runs.

utility_transform_type

How the utility function you entered is transformed to the utility function required for mixl. You can use the classic way (simple) where parameters have to start with "b" and variables with "alt" or the more flexible (but potentially error prone) way (exact) where parameters and variables are matched exactly what how the are called in the dataset and in the bcoeff list. Default is "simple". In the long run, simple will be deleted, as exact should be downwards compatible.

reshape_type

Must be "auto", "stats" to use the reshape from the stats package or tidyr to use pivot longer. Default is auto and should not bother you. Only change it once you face an error at this position and you may be lucky that it works then.

mode

Set to "parallel" if parts should be run in parallel mode

preprocess_function

= NULL You can supply a function that reads in external data (e.g. GIS coordinates) that will be merged with the simulated dataset. Make sure the the function outputs a data.frame that has a variable called ID which is used for matching.

savefile

Indicate a path if you want to store the results after each design simulation locally. This is useful in case you fear that your computer crashes

Value

A list, with all information on the simulation. This list an be easily processed by the user and in the rmarkdown template.

Examples

library(rlang)
designpath <- system.file("extdata", "SE_DRIVE", package = "simulateDCE")
resps <- 120 # number of respondents
nosim <- 2 # number of simulations to run (about 500 is minimum)

decisiongroups <- c(0, 0.7, 1)

# pass beta coefficients as a list
bcoeff <- list(
  b.preis = -0.01,
  b.lade = -0.07,
  b.warte = 0.02
)

manipulations <- list(
  alt1.x2 = expr(alt1.x2 / 10),
  alt1.x3 = expr(alt1.x3 / 10),
  alt2.x2 = expr(alt2.x2 / 10),
  alt2.x3 = expr(alt2.x3 / 10)
)


# place your utility functions here
ul <- list(
  u1 =

    list(
      v1 = V.1 ~ b.preis * alt1.x1 + b.lade * alt1.x2 + b.warte * alt1.x3,
      v2 = V.2 ~ b.preis * alt2.x1 + b.lade * alt2.x2 + b.warte * alt2.x3
    ),
  u2 = list(
    v1 = V.1 ~ b.preis * alt1.x1,
    v2 = V.2 ~ b.preis * alt2.x1
  )
)


sedrive <- sim_all(
  nosim = nosim,
  resps = resps,
  designpath = designpath,
  u = ul,
  bcoeff = bcoeff,
  decisiongroups = decisiongroups,
  manipulations = manipulations,
  utility_transform_type = "exact",
  mode = "sequential",
  estimate=FALSE
)


Simulate and estimate choices

Description

Simulate and estimate choices

Usage

sim_choice(
  designfile,
  no_sim = 10,
  respondents = 330,
  u,
  designtype = NULL,
  destype = NULL,
  bcoeff,
  decisiongroups = c(0, 1),
  manipulations = list(),
  estimate,
  chunks = 1,
  utility_transform_type = "simple",
  mode = c("parallel", "sequential"),
  preprocess_function = NULL,
  savefile = NULL
)

Arguments

designfile

path to a file containing a design.

no_sim

Number of runs i.e. how often do you want the simulation to be repeated

respondents

Number of respondents. How many respondents do you want to simulate in each run.

u

A list with utility functions. The list can incorporate as many decision rule groups as you want. However, each group must be in a list in this list. If you just use one group (the normal), this group still has to be in a list in the u list. As a convention name beta coefficients starting with a lower case "b"

designtype

Is it a design created with ngene, spdesign or idefix. use 'ngene', 'spdesign' or 'idefix. Ngene designs should be stored as the standard .ngd output. spdesign should be the spdesign object stored as an RDS file. Idefix objects should also be stored as an RDS file. If designtype is not specified, I try to guess what it is. This is especially helpful if you want to carry out a simulation for both spdesign designs and ngene designs at the same time.

destype

Deprecated. Use designtype instead.

bcoeff

List of initial coefficients for the utility function. List content/length can vary based on application. I ideally begins (but does not have to) with b and need be the same as those entered in the utility functions

decisiongroups

A vector showing how decision groups are numerically distributed

manipulations

A variable to alter terms of the utility functions examples may be applying a factor or applying changes to terms selectively for different groups

estimate

If TRUE models will be estimated. If false only a dataset will be simulated. Default is true

chunks

The number of chunks determines how often results should be stored on disk as a safety measure to not loose simulations if models have already been estimated. For example, if no_sim is 100 and chunks = 2, the data will be saved on disk after 50 and after 100 runs.

utility_transform_type

How the utility function you entered is transformed to the utility function required for mixl. You can use the classic way (simple) where parameters have to start with "b" and variables with "alt" or the more flexible (but potentially error prone) way (exact) where parameters and variables are matched exactly what how the are called in the dataset and in the bcoeff list. Default is "simple". In the long run, simple will be deleted, as exact should be downwards compatible.

mode

Set to "parallel" if parts should be run in parallel mode

preprocess_function

= NULL You can supply a function that reads in external data (e.g. GIS coordinates) that will be merged with the simulated dataset. Make sure the the function outputs a data.frame that has a variable called ID which is used for matching.

savefile

Indicate a path if you want to store the results after each design simulation locally. This is useful in case you fear that your computer crashes

Value

a list with all information on the run

Examples

bcoeff <- list(
  basc = -1.2,
  basc2 = -1.4,
  baction = 0.1,
  badvisory = 0.4,
  bpartnertest = 0.3,
  bcomp = 0.02
)
ul <- list(
  u1 =
    list(
     #' # model specification ----------------------------------------------
v1 <- V.1 ~ basc +
  baction      * alt1.b +
  badvisory    * alt1.c +
  bpartnertest * alt1.d +
  bcomp        * alt1.p,

v2 <- V.2 ~ basc2 +
  baction      * alt2.b +
  badvisory    * alt2.c +
  bpartnertest * alt2.d +
  bcomp        * alt2.p,
 v3 <- V.3 ~ 0
    )
)

sim_choice(
  designfile = system.file("extdata", "agora", "altscf_eff.ngd", package = "simulateDCE"),
  no_sim = 2,
  respondents = 144,
  u = ul,
  bcoeff = bcoeff,
  estimate = FALSE
)


Simulate choices based on a data.frame with a design and respondents

Description

Simulate choices based on a data.frame with a design and respondents

Usage

simulate_choices(
  data,
  utility,
  setspp,
  bcoeff,
  decisiongroups = c(0, 1),
  manipulations = list(),
  estimate,
  preprocess_function = NULL
)

Arguments

data

a dataframe that includes a design repeated for the number of observations

utility

a list with the utility functions, one utility function for each alternatives

setspp

an integer, the number of choice sets per person

bcoeff

List of initial coefficients for the utility function. List content/length can vary based on application. I ideally begins (but does not have to) with b and need be the same as those entered in the utility functions

decisiongroups

A vector showing how decision groups are numerically distributed

manipulations

A variable to alter terms of the utility functions examples may be applying a factor or applying changes to terms selectively for different groups

estimate

If TRUE models will be estimated. If false only a dataset will be simulated. Default is true

preprocess_function

= NULL You can supply a function that reads in external data (e.g. GIS coordinates) that will be merged with the simulated dataset. Make sure the the function outputs a data.frame that has a variable called ID which is used for matching.

Value

a data.frame that includes simulated choices and a design

Examples

example_df <- data.frame(
  ID = rep(1:100, each = 4),
  price = rep(c(10, 10, 20, 20), 100),
  quality = rep(c(1, 2, 1, 2), 100)
)

beta <- list(
  bprice   = -0.2,
  bquality =  0.8
)

ut <- list(
  u1 = list(
    v1 = V.1 ~ bprice * price + bquality * quality,
    v2 = V.2 ~ 0
  )
)
simulate_choices(example_df, ut, setspp = 4, bcoeff = beta, estimate = FALSE)