Type: | Package |
Title: | Simulated Maximum Likelihood Estimation of Mixed Logit Models for Large Datasets |
Version: | 1.3.5 |
Description: | Specification and estimation of multinomial logit models. Large datasets and complex models are supported, with an intuitive syntax. Multinomial Logit Models, Mixed models, random coefficients and Hybrid Choice are all supported. For more information, see Molloy et al. (2021) https://www.research-collection.ethz.ch/handle/20.500.11850/477416. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
URL: | https://github.com/joemolloy/fast-mixed-mnl |
BugReports: | https://github.com/joemolloy/fast-mixed-mnl/issues |
Imports: | maxLik, numDeriv, randtoolbox, Rcpp (≥ 0.12.19), readr, sandwich, stats, stringr (≥ 1.3.1) |
Suggests: | knitr, mlogit, rmarkdown, testthat, texreg, xtable |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2025-05-16 15:09:08 UTC; daniehei |
Author: | Joseph Molloy |
Maintainer: | Daniel Heimgartner <d.heimgartners@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-05-16 17:50:01 UTC |
Estimate mixed multinomial logit models
Description
Estimate mixed multinomial logit models using (simulated) maximum likelihood estimation. The package supports standard mnl, mixed-logit and hybrid choice. Using compilation to C++, model estimation is significantly faster than in native R code.
Details
This section should provide a more detailed overview of how to use the package, including the most important functions.
Author(s)
Joe Molloy <joseph.molloy@ivt.baug.ethz.ch>.
References
Molloy, J., Becker, F., Schmid, B., & Axhausen, K. W. (2021). mixl: An open-source R package for estimating complex choice models on large datasets. Journal of choice modelling, 39, 100284.
Examples
data("Train", package="mlogit")
head(Train, 3)
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
ASC_B_RND = @ASC_B + draw_2 * @SIGMA_B;
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60 + @B_change * $change_A;
U_B = ASC_B_RND + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train)
#only take starting values that are needed
est <- stats::setNames(c(0,0,0,0,0,0), c("B_price", "B_time", "B_timeB",
"B_change", "ASC_B", "SIGMA_B"))
availabilities <- mixl::generate_default_availabilities(Train, model_spec$num_utility_functions)
model <- mixl::estimate(model_spec, est, Train, availabilities = availabilities, nDraws = 20)
summary(model)
Extract the availabilites matrix from the dataset, using column indicies
Description
Extract the availabilites matrix from the dataset, using column indicies
Usage
av_matrix(data, av_cols)
Arguments
data |
The dataset used in the model |
av_cols |
A vector of the the column indicies of the availabilities for each alternative |
Value
Matrix of availabilities for alternatives and the number of choice observations
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
Train$avail_A <- sample(2, replace=TRUE, size=nrow(Train))-1
Train$avail_B <- sample(2, replace=TRUE, size=nrow(Train))-1
av_matrix(Train, c('avail_A', 'avail_B'))
Check the inputs to the draw function
Description
Check the inputs to the draw function
Usage
check_draw_inputs(draws, nDraws, draw_dimensions, Nindividuals)
Arguments
draws |
The specified Model |
nDraws |
Named vector of proposed start values for the model |
draw_dimensions |
the dataset on which to estimate |
Nindividuals |
The availabilities for the alternatives in the model specification |
Value
A list consisting of the checked draws and Ndraws, both computed if required)
Check the inputs to the estimate function
Description
This function checks the start_vlaues, data, availabilities, draws and fixedparams for validity. If this function runs without error, then the inputs are valid for the maxLikelihood function. These checks are important, because an error in the internal C++ code will cause the Rstudio session to crash. Incidentally, if there is concern of this happening, it is recommended to run the script from the command line, using Rscript.
Usage
check_inputs(
model_spec,
start_values,
data,
availabilities,
draws,
fixedparam,
weights
)
Arguments
model_spec |
The specified Model |
start_values |
Named vector of proposed start values for the model |
data |
the dataset on which to estimate |
availabilities |
The availabilities for the alternatives in the model specification |
draws |
The matrix of random draws |
fixedparam |
Named vector of parameters to be fixed |
weights |
The weights vector |
Value
Nothing
compileUtilityFunction
Deprecated, please see specify_model()
Description
compileUtilityFunction
Deprecated, please see specify_model()
Usage
compileUtilityFunction(...)
Arguments
... |
Parameters to specify_model |
Create a standard set of Halton draws to use in estimation
Description
Create a standard set of Halton draws to use in estimation
Usage
create_halton_draws(Nindividuals, nDraws, draw_dimensions)
Arguments
Nindividuals |
The number individuals in the dataset |
nDraws |
The number of draws needed |
draw_dimensions |
the number of draw dimensions needed |
Value
Matrix of availabilities for alternatives and the number of choice observations
Examples
create_halton_draws(100, 10, 5)
create_halton_draws(100, 100, 20)
Runs a maximum likelihood estimation on a mixl choice model
Description
This function performs a maximum likelihood estimation for choice models speficied using this package.
Usage
estimate(
model_spec,
start_values,
data,
availabilities,
draws,
nDraws,
fixedparam = c(),
num_threads = 1,
weights = NULL,
...
)
Arguments
model_spec |
The object that contains the loglikelihood function and other
variables that help return better error messages. This function is best generated using the
|
start_values |
A named vector of start values for the estimation. A warning and error will be given respectively if to many values are included or some are missing. |
data |
A dataframe of the observations. It must include The columns CHOICE and ID, as well as columns for the variables specified in the utility function. The CHOICE variable must be from 1..k, where k is the number of utility functions |
availabilities |
A 1/0 matrix of availabilities. The dimensions must be |
draws |
A numeric matrix of draws for calculating mixed effects. If there no mixed effects, this should be left null.
If the model specification included mixed effects, either this or |
nDraws |
The number of draws to use in estimating a mixed model.
Only needed if |
fixedparam |
(optional) Coefficients which should be fixed to their starting values during estimation. |
num_threads |
The maximum number of parallel cores to use in estimation. The default is 1. This should only be speficied on machines with an openMP compiler (linux and some OSXs). |
weights |
(optional) A vector of weights (vector length must equal the number of observations). |
... |
futher arguments. such as control are passed to the maximisaiton routine in maxLik.
See |
Details
It is a wrapper for the maxLik function in the maxLik package. And additional arguments can be passed through to this function if required.
Value
a mixl object that contains the results of the estimation
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60;
U_B = @asc + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train, disable_multicore=T)
#only take starting values that are needed
est <- stats::setNames(c(1, 1,1,1), c("asc", "B_price", "B_time", "B_timeB"))
availabilities <- mixl::generate_default_availabilities(
Train, model_spec$num_utility_functions)
model <- mixl::estimate(model_spec, est, Train, availabilities = availabilities)
print(model)
Extract the availabilites matrix from the dataset using a column name prefix
Description
Extract the availabilites matrix from the dataset using a column name prefix
Usage
extract_av_cols(data, prefix)
Arguments
data |
The dataset used in the model |
prefix |
The prefix of the availability columns, i.e. avail_ |
Value
Matrix of availabilities for alternatives and the number of choice observations
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
Train$avail_A <- sample(2, replace=TRUE, size=nrow(Train))-1
Train$avail_B <- sample(2, replace=TRUE, size=nrow(Train))-1
extract_av_cols(Train, 'avail_')
Extract the individual level data from the dataset for use in posterior analysis
Description
Extract the individual level data from the dataset for use in posterior analysis
Usage
extract_indiv_data(data, data_cols = NULL)
Arguments
data |
The dataset |
data_cols |
The individual level columns of attributes - Can be null to take aggregate for each column |
Value
dataframe of all individual level data for each ID
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
#in this case not actually individual data columns
#an ID column is required here
extract_indiv_data(Train, c('comfort_A', 'comfort_B'))
Generate a ones-matrix of availabilities
Description
Generate a ones-matrix of availabilities
Usage
generate_default_availabilities(data, num_utility_functions)
Arguments
data |
The dataset used in the model |
num_utility_functions |
the number of alternatives in the model |
Value
Ones-matrix of availabilities for alternatives and the number of choice observations
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
generate_default_availabilities(Train, 5)
Calculate the posteriors for a specified and estimated model
Description
Calculate the posteriors for a specified and estimated model
Usage
posteriors(model, indiv_data, code_output_file = NULL)
Arguments
model |
The estimated Model |
indiv_data |
Alternative individual data to use insteaf of that in the dataset |
code_output_file |
An (optional) location where the compiled code should be saved (useful for debugging |
Value
Dataframe of individual-level posteriors
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
ASC_A_RND = @ASC_A + draw_1 * @SIGMA_A1 + draw_7 * @SIGMA_A2;
ASC_B_RND = @ASC_B + draw_2 * @SIGMA_B;
U_A = ASC_A_RND + @B_price * $price_A / 1000
+ @B_time * $time_A / 60 + @B_change * $change_A;
U_B = ASC_B_RND + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
#only take starting values that are needed
est <- stats::setNames(c(-1059.69729, -181.27796, -251.78909,
-241.18878, -86.77386, -173.09451,
291.02618, 142.71793, 332.60909)
, c("B_price", "B_time", "B_timeB", "B_change",
"ASC_A", "ASC_B", "SIGMA_A1", "SIGMA_A2", "SIGMA_B"))
availabilities <- generate_default_availabilities(Train, 2)
model_specification <- specify_model(mnl_test, Train, disable_multicore=T)
model <- estimate(model_specification, est, Train,
availabilities = availabilities, nDraws = 1)
posteriors(model)
Prints the output of a model
Description
print()
is an S3 method for the mixl class.
It creates a model summary and then prints the result
Usage
## S3 method for class 'mixl'
print(x, ...)
Arguments
x |
The model to print |
... |
Options to pass to print |
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60;
U_B = @asc + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train, disable_multicore=T)
#only take starting values that are needed
est <- stats::setNames(c(1, 1,1,1), c("asc", "B_price", "B_time", "B_timeB"))
availabilities <- mixl::generate_default_availabilities(
Train, model_spec$num_utility_functions
)
model2 <- mixl::estimate(model_spec, est, Train, availabilities = availabilities)
print(model2)
Print a model summary
Description
print()
is an S3 method for the summary.mixl class, the output of a model plus goodness of fit metrics
Usage
## S3 method for class 'summary.mixl'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
The summary to print. |
digits |
minimum number of significant digits to be used for most numbers. |
... |
Options to pass to print. |
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60;
U_B = @asc + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train, disable_multicore=T)
#only take starting values that are needed
est <- stats::setNames(c(1, 1,1,1), c("asc", "B_price", "B_time", "B_timeB"))
availabilities <- mixl::generate_default_availabilities(
Train, model_spec$num_utility_functions
)
model2 <- mixl::estimate(model_spec, est, Train, availabilities = availabilities)
print(model2)
Calculate the probabilities for a specified and estimated model. Note that if new data or draws are provided, the model will not be re-estimated
Description
Calculate the probabilities for a specified and estimated model. Note that if new data or draws are provided, the model will not be re-estimated
Usage
probabilities(
model,
data = NULL,
availabilities = NULL,
draws = NULL,
nDraws = NULL,
num_threads = 1
)
Arguments
model |
The estimated Model |
data |
(Optional) New data to use instead of that in the dataset |
availabilities |
(Optional) New availabilites to use |
draws |
(Optional) Optional new set of random draws to use |
nDraws |
(Optional) Optional new number of random draws to use |
num_threads |
Enable parallel computing where available using this many cores |
Value
Dataframe of individual-level posteriors
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60;
U_B = @asc + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train, disable_multicore=T)
#only take starting values that are needed
est <- stats::setNames(c(1, 1,1,1), c("asc", "B_price", "B_time", "B_timeB"))
availabilities <- mixl::generate_default_availabilities(
Train, model_spec$num_utility_functions
)
model <- mixl::estimate(model_spec, est, Train, availabilities = availabilities)
probabilities(model)
#hypothetical scenario where the travel time of option A doubles
Train$time_A = Train$time_A * 2
probabilities(model, Train)
Validate the utility functions against the dataset and generate the optimised logliklihood function
Description
This function takes a utility function description, and generates a optimised C++ version
of the utility function which can be called from R. If the data_names are provided, then the variables
in the function are checked against those provided. If an output_file
is provided, the C++ code is saved there.
See the user guide vignette for how to write valid utility scripts. There is some minimal specific syntax required.
Usage
specify_model(
utility_script,
dataset = NULL,
output_file = NULL,
compile = TRUE,
model_name = "mixl_model",
disable_multicore = T,
...
)
Arguments
utility_script |
The utility script to be compiled |
dataset |
An (optional) dataframe to check if the all the variables are present |
output_file |
An (optional) location where the compiled code should be saved (useful for debugging |
compile |
If compile is false, then the code will not be compiled, but just validated and saved if an |
model_name |
A name for the model, which will be used for saving. Defaults to mixl_model |
disable_multicore |
Depreciated and not used. Mutlicore is now autodetected |
... |
Further parameters to pass to sourceCpp |
Value
An object
which contains the loglikelihood function, and information from the compile process
See Also
browseVignettes("mixl")
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60;
U_B = @asc + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train, disable_multicore=T)
#only take starting values that are needed
est <- stats::setNames(c(1, 1,1,1), c("asc", "B_price", "B_time", "B_timeB"))
availabilities <- mixl::generate_default_availabilities(
Train, model_spec$num_utility_functions)
model <- mixl::estimate(model_spec, est, Train, availabilities = availabilities)
print(model)
Create a model summary
Description
summary()
is an S3 method for the class mixl, which adds metrics of goodness of fit
Usage
## S3 method for class 'mixl'
summary(object, ...)
Arguments
object |
The mixl output to summarize. |
... |
Options to pass to summarize (currently). |
Value
A summary object for a mixl model
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60;
U_B = @asc + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train, disable_multicore=T)
#only take starting values that are needed
est <- stats::setNames(c(1, 1,1,1), c("asc", "B_price", "B_time", "B_timeB"))
availabilities <- mixl::generate_default_availabilities(
Train, model_spec$num_utility_functions
)
model2 <- mixl::estimate(model_spec, est, Train, availabilities = availabilities)
print(model2)
Return tex formatted output of a model summary. If an output_file parameter is provided, save the object to that location
Description
The returned object can be passed to texreg
's formatting functions (see 'Examples').
Usage
summary_tex(model_summary, output_file)
Arguments
model_summary |
A summary of an estimated Model |
output_file |
Where to save the tex representation |
Value
Formatted texreg object containing the latex table suitable for a research paper. See createTexreg
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60;
U_B = @asc + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train, disable_multicore=T)
#only take starting values that are needed
est <- stats::setNames(c(1, 1,1,1), c("asc", "B_price", "B_time", "B_timeB"))
availabilities <- mixl::generate_default_availabilities(
Train, model_spec$num_utility_functions
)
model2 <- mixl::estimate(model_spec, est, Train, availabilities = availabilities)
print(model2)
tex <- summary_tex(model2)
texreg::screenreg(tex, digits = 5)
Return the the utilities for a set of coefficients
Description
Return the the utilities for a set of coefficients
Usage
utilities(model_spec, beta, data, availabilities, draws, nDraws)
Arguments
model_spec |
The generated model_spec. |
beta |
The coefficients to use in the model when estimating the utilities. |
data |
The dataframe of observations. |
availabilities |
The availabilities of each alternative. |
draws |
For mixed models, a matrix of draws. If none is provided, one is created. |
nDraws |
The number of draws to use or generated. |
Value
Dataframe of utilties for each observation
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
est <- stats::setNames(c(1,1,1,1), c("B_price", "B_time", "B_timeB", "B_change"))
availabilities <- mixl::generate_default_availabilities(Train, 2)
Nindividuals <- length(unique(Train$ID))
utility_script <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60 + @B_change * $change_A;
U_B = @B_price * $price_B / 1000 + @B_timeB * $time_B / 60 ;
"
model_spec <- mixl::specify_model(utility_script, Train)
utilities_matrix = mixl::utilities(model_spec, est, Train, availabilities, NULL)
utilities_matrix
Calculates the Variance-Covariance Matrix of the mixl summary
Description
vcov()
is an S3 method for the summary.mixl class, giving the Variance-Covariance Matrix
Usage
## S3 method for class 'mixl'
vcov(object, eigentol = 1e-12, ...)
Arguments
object |
The summary to print. |
eigentol |
The tolerance value. |
... |
Options to pass to print. |
Examples
data("Train", package="mlogit")
Train$ID <- Train$id
Train$CHOICE <- as.numeric(Train$choice)
mnl_test <- "
U_A = @B_price * $price_A / 1000 + @B_time * $time_A / 60;
U_B = @asc + @B_price * $price_B / 1000 + @B_timeB * $time_B / 60;
"
model_spec <- mixl::specify_model(mnl_test, Train, disable_multicore=T)
#only take starting values that are needed
est <- stats::setNames(c(1, 1,1,1), c("asc", "B_price", "B_time", "B_timeB"))
availabilities <- mixl::generate_default_availabilities(
Train, model_spec$num_utility_functions
)
model2 <- mixl::estimate(model_spec, est, Train, availabilities = availabilities)
print(model2)