Type: | Package |
Title: | Ordered Random Forests |
Version: | 0.1.4 |
Date: | 2022-07-21 |
Author: | Gabriel Okasa [aut, cre], Michael Lechner [ctb] |
Maintainer: | Gabriel Okasa <okasa.gabriel@gmail.com> |
Description: | An implementation of the Ordered Forest estimator as developed in Lechner & Okasa (2019) <doi:10.48550/arXiv.1907.02436>. The Ordered Forest flexibly estimates the conditional probabilities of models with ordered categorical outcomes (so-called ordered choice models). Additionally to common machine learning algorithms the 'orf' package provides functions for estimating marginal effects as well as statistical inference thereof and thus provides similar output as in standard econometric models for ordered choice. The core forest algorithm relies on the fast C++ forest implementation from the 'ranger' package (Wright & Ziegler, 2017) <doi:10.48550/arXiv.1508.04409>. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Depends: | R (≥ 2.10) |
Imports: | ggplot2, ranger, Rcpp, stats, utils, xtable |
RoxygenNote: | 7.2.1 |
Suggests: | knitr, rmarkdown, testthat |
VignetteBuilder: | knitr |
URL: | https://github.com/okasag/orf |
BugReports: | https://github.com/okasag/orf/issues |
LinkingTo: | Rcpp |
NeedsCompilation: | yes |
Packaged: | 2022-07-21 09:12:03 UTC; okasag |
Repository: | CRAN |
Date/Publication: | 2022-07-23 22:40:02 UTC |
orf: Ordered Random Forests
Description
An implementation of the Ordered Forest estimator as developed
in Lechner & Okasa (2019). The Ordered Forest flexibly
estimates the conditional probabilities of models with ordered
categorical outcomes (so-called ordered choice models).
Additionally to common machine learning algorithms the orf
package provides functions for estimating marginal effects as well
as statistical inference thereof and thus provides similar output
as in standard econometric models for ordered choice. The core
forest algorithm relies on the fast C++ forest implementation
from the ranger
package (Wright & Ziegler, 2017).
Author(s)
Gabriel Okasa, Michael Lechner
References
Lechner, M., & Okasa, G. (2019). Random Forest Estimation of the Ordered Choice Model. arXiv preprint arXiv:1907.02436. https://arxiv.org/abs/1907.02436
Goller, D., Knaus, M. C., Lechner, M., & Okasa, G. (2021). Predicting Match Outcomes in Football by an Ordered Forest Estimator. A Modern Guide to Sports Economics. Edward Elgar Publishing, 335-355. doi:10.4337/9781789906530.00026
Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. doi:10.18637/jss.v077.i01.
Examples
## Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest with default settings
orf_fit <- orf(X, Y)
# print output of the orf estimation
print(orf_fit)
# show summary of the orf estimation
summary(orf_fit)
# plot the estimated probability distributions
plot(orf_fit)
# predict with the estimated orf
predict(orf_fit)
# estimate marginal effects of the orf
margins(orf_fit)
check X data
Description
Checks the input data as a numeric matrix
Usage
check_X(X)
Arguments
X |
matrix of input features X |
check Y data
Description
Checks the input data as a numeric matrix/vector
Usage
check_Y(Y, X)
Arguments
Y |
matrix of input outcomes Y |
X |
matrix of input features X |
Value
Y
check if Y is discrete
Description
Checks the input data as discrete outcome
Usage
check_discrete_Y(Y)
Arguments
Y |
matrix of input outcomes Y |
Value
Y
check evaluation for margins
Description
Checks the input data of eval in margins
Usage
check_eval(eval)
Arguments
eval |
string, evaluation points for margins |
Value
eval
check honesty
Description
Checks the input for honesty
Usage
check_honesty(honesty)
Arguments
honesty |
logical, if TRUE honest forest is built using 50:50 data split |
Value
honesty
check honesty fraction
Description
Checks the input data of honesty.fraction in orf
Usage
check_honesty_fraction(honesty.fraction, honesty)
Arguments
honesty.fraction |
scalar, share of the data set aside to estimate the effects (default is 0.5) |
honesty |
logical, if data should be split into train and honest sample |
Value
honesty.fraction
check importance
Description
Checks the input for importance
Usage
check_importance(importance)
Arguments
importance |
logical, if TRUE variable importance is conducted |
Value
importance
check inference
Description
Checks the input for inference
Usage
check_inference(inference)
Arguments
inference |
logical, if TRUE the weight based inference is conducted |
Value
inference
check latex
Description
Checks the input for latex
Usage
check_latex(latex)
Arguments
latex |
logical, TRUE if latex summary should be generated |
Value
latex
check min.node.size
Description
Checks the input data of min.node.size
Usage
check_min_node_size(min.node.size, X)
Arguments
min.node.size |
scalar, minimum node size |
X |
matrix of input features X |
Value
min.node.size
check mtry
Description
Checks the input data of mtry
Usage
check_mtry(mtry, X)
Arguments
mtry |
scalar, number of randomly selected features |
X |
matrix of input features X |
Value
mtry
check newdata
Description
Checks the input for newdata for predict.orf
Usage
check_newdata(new_data, X)
Arguments
new_data |
matrix X containing the observations to predict |
X |
matrix of input features X |
check num.trees
Description
Checks the input data of num.trees
Usage
check_num_trees(num.trees)
Arguments
num.trees |
scalar, number of trees to be estimated |
Value
num.trees
check replace
Description
Checks the input for replace
Usage
check_replace(replace)
Arguments
replace |
logical, if TRUE bootstrapping, if FALSE subsampling |
Value
replace
check sample.fraction
Description
Checks the input data of sample.fraction
Usage
check_sample_fraction(sample.fraction, replace)
Arguments
sample.fraction |
scalar, fraction of data used for subsampling |
replace |
logical, if bootstrap or subsampling should be used |
Value
sample.fraction
check prediction type for predict.orf
Description
Checks the input data of type in predict.orf
Usage
check_type(type)
Arguments
type |
string, prediction type for predict.orf |
Value
type
check window size
Description
Checks the input data of window in margins
Usage
check_window(window)
Arguments
window |
scalar, share of SD of X used for margins |
Value
window
Get Forest Weights
Description
get forest weights, i.e. in-sample weights based on honest or train sample produced by the random forest algorithm as defined in Wager & Athey (2018)
Usage
get_forest_weights(forest, honest_data, train_data)
Arguments
forest |
estimated forest object of type ranger |
honest_data |
honest dataframe |
train_data |
train dataframe |
Value
matrix of honest forest weights
Get Honest Predictions
Description
get honest prediction, i.e. fitted values for in sample data (train and honest sample) based on the honest sample
Usage
get_honest(forest, honest_data, train_data)
Arguments
forest |
estimated forest object of type ranger |
honest_data |
honest dataframe |
train_data |
train dataframe |
Value
vector of honest forest predictions
Get honest predictions (C++)
Description
Computes honest predictions (fitted values) from the random forest for the train and honest sample based on the honest training sample
Usage
get_honest_C(x, y, z, w)
Arguments
x |
unique_leaves (List)[ntree] |
y |
honest_y (NumericVector)[nrow] |
z |
honest_leaves (NumericMatrix)[nrow, ntree] |
w |
train_leaves (NumericMatrix)[nrow, ntree] |
Get ORF Variance
Description
get variance of ordered random forest predictions based on honest sample splitting as described in Lechner (2018)
Usage
get_orf_variance(
honest_pred,
honest_weights,
train_pred,
train_weights,
Y_ind_honest
)
Arguments
honest_pred |
list of vectors of honest forest predictions |
honest_weights |
list of n x n matrices of honest forest weights |
train_pred |
list of vectors of honest forests predictions from train sample |
train_weights |
list of vectors of honest forests predictions from train sample |
Y_ind_honest |
list of vectors of 0-1 outcomes for the honest sample |
Value
vector of ORF variances
Get honest weights (C++)
Description
Computes honest weights from the random forest as in Wager & Athey (2019) for the train and honest sample based on the honest training sample
Usage
get_weights_C(x, y, z)
Arguments
x |
leaf_IDs_train - list of leaf IDs in train data |
y |
leaf_IDs - list of leaf IDs in honest data |
z |
leaf_size - list of leaf sizes in honest data |
honest sample split
Description
Creates honest sample split by randomly selecting prespecified share of observations to belong to honest sample and to training sample
Usage
honest_split(data, honesty.fraction, orf)
Arguments
data |
dataframe or matrix of features and outcomes to be split honestly |
honesty.fraction |
share of sample to belong to honesty sample |
orf |
logical, if honest split should be done for orf or not |
Value
named list of honest and training sample
Marginal Effects
Description
S3 generic method for estimation of marginal effects
of an Ordered Forest objects of class orf
.
Usage
margins(forest, eval = NULL, inference = NULL, window = NULL, newdata = NULL)
Arguments
forest |
estimated Ordered Forest object of class |
eval |
string, defining evaluation point for marginal effects. These can be one of "mean", "atmean", or "atmedian". (Default is "mean") |
inference |
logical, if TRUE inference on marginal effects will be conducted (default is inherited from the |
window |
numeric, share of standard deviation of X to be used for evaluation of the marginal effect (default is 0.1) |
newdata |
numeric matrix X containing the new observations for which the marginal effects should be estimated |
Author(s)
Gabriel Okasa
See Also
margins.orf
, summary.margins.orf
and print.margins.orf
Examples
## Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest
orf_fit <- orf(X, Y)
# estimate default marginal effects of the orf
orf_margins <- margins(orf_fit)
Marginal Effects for the Ordered Forest
Description
S3 method for estimation of marginal effects
of an Ordered Forest objects of class orf
.
Usage
## S3 method for class 'orf'
margins(forest, eval = NULL, inference = NULL, window = NULL, newdata = NULL)
Arguments
forest |
estimated Ordered Forest object of class |
eval |
string, defining evaluation point for marginal effects. These can be one of "mean", "atmean", or "atmedian". (Default is "mean") |
inference |
logical, if TRUE inference on marginal effects will be conducted (default is inherited from the |
window |
numeric, share of standard deviation of X to be used for evaluation of the marginal effect (default is 0.1) |
newdata |
numeric matrix X containing the new observations for which the marginal effects should be estimated |
Details
margins.orf
estimates marginal effects at the mean, at the median, or
the mean marginal effects, depending on the eval
argument. It is advised
to increase the number of subsampling replications in the supplied orf
object as the estimation of the marginal effects is a more demanding exercise
than a simple Ordered Forest estimation/prediction. Additionally to the estimation
of the marginal effects, the weight-based inference for the effects is supported
as well. Note, that the inference procedure is much more computationally exhausting
exercise due to the computation of the forest weights. Additionally, the evaluation
window for the marginal effects can be regulated through the window
argument.
Furthermore, new data for which marginal effects should be computed can be supplied
as well as long as it lies within the support of X
.
Value
object of type margins.orf
with following elements
info |
info containing forest inputs and data used |
effects |
marginal effects |
variances |
variances of marginal effects |
errors |
standard errors of marginal effects |
tvalues |
t-values of marginal effects |
pvalues |
p-values of marginal effects |
Author(s)
Gabriel Okasa
See Also
summary.margins.orf
, print.margins.orf
Examples
## Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest
orf_fit <- orf(X, Y)
# estimate marginal effects of the orf (default)
orf_margins <- margins(orf_fit)
# estimate marginal effects evaluated at the mean
orf_margins <- margins(orf_fit, eval = "atmean")
# estimate marginal effects with inference
# (orf object has to be estimated with honesty and subsampling)
orf_margins <- margins(orf_fit, inference = TRUE)
# estimate marginal effects with custom window size
orf_margins <- margins(orf_fit, window = 0.5)
# estimate marginal effects for some new data (within support of X)
orf_margins <- margins(orf_fit, newdata = X[1:10, ])
# estimate marginal effects with all custom settings
orf_margins <- margins(orf_fit, eval = "atmedian", inference = TRUE,
window = 0.5, newdata = X[1:10, ])
Formatted output for marginal effects with inference
Description
function for creating inference table output for estimated effects which
can be passed into print.margins.orf
Usage
margins_output(x)
Arguments
x |
object of type |
Formatted latex output for marginal effects with inference
Description
function for creating latex inference table output for estimated effects which
can be passed into print.margins.orf
Usage
margins_output_latex(x)
Arguments
x |
object of type |
Mean Squared Error
Description
computes the mean squared error (MSE) for evaluating the accuracy of ordered/unordered probability predictions
Usage
mse(predictions, observed)
Arguments
predictions |
matrix of predictions (n x categories) |
observed |
vector of observed ordered categorical outcomes (n x 1) |
Value
scalar, sum MSE for given predictions
Simulated Example Dataset
Description
A simulated example dataset with ordered categorical outcome variable containing different types of covariates for illustration purposes.
Usage
odata
Format
A data frame with 1000 rows and 5 variables
Details
For the exact data generating process, see the example below.
Value
Y |
ordered outcome, classes 1, 2, and 3 |
X1 |
continuous covariate, N(0,1) |
X2 |
categorical covariate, values 1, 2, and 3 |
X3 |
binary covariate, values 0 and 1 |
X4 |
continuous covariate, N(0,10) |
Examples
# generate example data
# set seed for replicability
set.seed(123)
# number of observations
n <- 1000
# various covariates
X1 <- rnorm(n, 0, 1) # continuous
X2 <- rbinom(n, 2, 0.5) # categorical
X3 <- rbinom(n, 1, 0.5) # dummy
X4 <- rnorm(n, 0, 10) # noise
# bind into matrix
X <- as.matrix(cbind(X1, X2, X3, X4))
# deterministic component
deterministic <- X1 + X2 + X3
# generate continuous outcome with logistic error
Y <- deterministic + rlogis(n, 0, 1)
# thresholds for continuous outcome
cuts <- quantile(Y, c(0, 1/3, 2/3, 1))
# discretize outcome into ordered classes 1, 2, 3
Y <- as.numeric(cut(Y, breaks = cuts, include.lowest = TRUE))
# save data as a dataframe
odata <- as.data.frame(cbind(Y, X))
# end of data generating
Ordered Forest Estimator
Description
An implementation of the Ordered Forest estimator as developed
in Lechner & Okasa (2019). The Ordered Forest flexibly
estimates the conditional probabilities of models with ordered
categorical outcomes (so-called ordered choice models).
Additionally to common machine learning algorithms the orf
package provides functions for estimating marginal effects as well
as statistical inference thereof and thus provides similar output
as in standard econometric models for ordered choice. The core
forest algorithm relies on the fast C++ forest implementation
from the ranger
package (Wright & Ziegler, 2017).
Usage
orf(
X,
Y,
num.trees = 1000,
mtry = NULL,
min.node.size = NULL,
replace = FALSE,
sample.fraction = NULL,
honesty = TRUE,
honesty.fraction = NULL,
inference = FALSE,
importance = FALSE
)
Arguments
X |
numeric matrix of features |
Y |
numeric vector of outcomes |
num.trees |
scalar, number of trees in a forest, i.e. bootstrap replications (default is 1000 trees) |
mtry |
scalar, number of randomly selected features (default is the squared root of number of features, rounded up to the nearest integer) |
min.node.size |
scalar, minimum node size, i.e. leaf size of a tree (default is 5 observations) |
replace |
logical, if TRUE sampling with replacement, i.e. bootstrap is used to grow the trees, otherwise subsampling without replacement is used (default is set to FALSE) |
sample.fraction |
scalar, subsampling rate (default is 1 for bootstrap and 0.5 for subsampling) |
honesty |
logical, if TRUE honest forest is built using sample splitting (default is set to TRUE) |
honesty.fraction |
scalar, share of observations belonging to honest sample not used for growing the forest (default is 0.5) |
inference |
logical, if TRUE the weight based inference is conducted (default is set to FALSE) |
importance |
logical, if TRUE variable importance measure based on permutation is conducted (default is set to FALSE) |
Details
The Ordered Forest function, orf
, estimates the conditional ordered choice
probabilities, i.e. P[Y=m|X=x]. Additionally, weight-based inference for
the probability predictions can be conducted as well. If inference is desired,
the Ordered Forest must be estimated with honesty and subsampling.
If prediction only is desired, estimation without honesty and with bootstrapping
is recommended for optimal prediction performance.
In order to estimate the Ordered Forest user must supply the data in form of
matrix of covariates X
and a vector of outcomes 'codeY to the orf
function. These data inputs are also the only inputs that must be specified by
the user without any defaults. Further optional arguments include the classical forest
hyperparameters such as number of trees, num.trees
, number of randomly
selected features, mtry
, and the minimum leaf size, min.node.size
.
The forest building scheme is regulated by the replace
argument, meaning
bootstrapping if replace = TRUE
or subsampling if replace = FALSE
.
For the case of subsampling, sample.fraction
argument regulates the subsampling
rate. Further, honest forest is estimated if the honesty
argument is set to
TRUE
, which is also the default. Similarly, the fraction of the sample used
for the honest estimation is regulated by the honesty.fraction
argument.
The default setting conducts a 50:50 sample split, which is also generally advised
to follow for optimal performance. Inference procedure of the Ordered Forest is based on
the forest weights and is controlled by the inference
argument. Note, that
such weight-based inference is computationally demanding exercise due to the estimation
of the forest weights and as such longer computation time is to be expected. Lastly,
the importance
argument turns on and off the permutation based variable
importance.
orf
is compatible with standard R
commands such as
predict
, margins
, plot
, summary
and print
.
For further details, see examples below.
Value
object of type orf
with following elements
forests |
saved forests trained for |
info |
info containing forest inputs and data used |
predictions |
predicted values for class probabilities |
variances |
variances of predicted values |
importance |
weighted measure of permutation based variable importance |
accuracy |
oob measures for mean squared error and ranked probability score |
Author(s)
Gabriel Okasa
References
Lechner, M., & Okasa, G. (2019). Random Forest Estimation of the Ordered Choice Model. arXiv preprint arXiv:1907.02436. https://arxiv.org/abs/1907.02436
Goller, D., Knaus, M. C., Lechner, M., & Okasa, G. (2021). Predicting Match Outcomes in Football by an Ordered Forest Estimator. A Modern Guide to Sports Economics. Edward Elgar Publishing, 335-355. doi:10.4337/9781789906530.00026
Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. doi:10.18637/jss.v077.i01.
See Also
summary.orf
, plot.orf
predict.orf
, margins.orf
Examples
## Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest with default parameters
orf_fit <- orf(X, Y)
# estimate Ordered Forest with own tuning parameters
orf_fit <- orf(X, Y, num.trees = 2000, mtry = 3, min.node.size = 10)
# estimate Ordered Forest with bootstrapping and without honesty
orf_fit <- orf(X, Y, replace = TRUE, honesty = FALSE)
# estimate Ordered Forest with subsampling and with honesty
orf_fit <- orf(X, Y, replace = FALSE, honesty = TRUE)
# estimate Ordered Forest with subsampling and with honesty
# with own tuning for subsample fraction and honesty fraction
orf_fit <- orf(X, Y, replace = FALSE, sample.fraction = 0.5,
honesty = TRUE, honesty.fraction = 0.5)
# estimate Ordered Forest with subsampling and with honesty and with inference
# (for inference, subsampling and honesty are required)
orf_fit <- orf(X, Y, replace = FALSE, honesty = TRUE, inference = TRUE)
# estimate Ordered Forest with simple variable importance measure
orf_fit <- orf(X, Y, importance = TRUE)
# estimate Ordered Forest with all custom settings
orf_fit <- orf(X, Y, num.trees = 2000, mtry = 3, min.node.size = 10,
replace = TRUE, sample.fraction = 1,
honesty = FALSE, honesty.fraction = 0,
inference = FALSE, importance = FALSE)
Plot of the Ordered Forest
Description
plot the probability distributions estimated by the Ordered Forest object of class orf
Usage
## S3 method for class 'orf'
plot(x, ...)
Arguments
x |
estimated Ordered Forest object of class |
... |
further arguments (currently ignored) |
Details
plot.orf
generates probability distributions, i.e. density plots of estimated
ordered probabilities by the Ordered Forest for each outcome class considered.
The plots effectively visualize the estimated probability density in contrast to
a real observed ordered outcome class and as such provide a visual inspection of
the overall in-sample estimation accuracy. The dashed lines locate the means of
the respective probability distributions.
Author(s)
Gabriel Okasa
Examples
# Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest
orf_fit <- orf(X, Y)
# plot the estimated probability distributions
plot(orf_fit)
Predict honest predictions (C++)
Description
Computes honest predictions from the random forest for a test sample based on the honest training sample
Usage
pred_honest_C(x, y, z, w)
Arguments
x |
unique_leaves (List)[ntree] |
y |
honest_y (NumericVector)[nrow] |
z |
honest_leaves (NumericMatrix)[nrow, ntree] |
w |
test_leaves (NumericMatrix)[nrow, ntree] |
Predict ORF Variance
Description
predict variance of ordered random forest predictions based on honest sample splitting as described in Lechner (2018)
Usage
pred_orf_variance(honest_pred, honest_weights, Y_ind_honest)
Arguments
honest_pred |
list of vectors of honest forest predictions |
honest_weights |
list of n x n matrices of honest forest weights |
Y_ind_honest |
list of vectors of 0-1 outcomes for the honest sample |
Value
vector of ORF variances
Predict honest weights (C++)
Description
Computes honest weights from the random forest as in Wager & Athey (2019) for the test sample based on the honest training sample
Usage
pred_weights_C(x, y, z, w)
Arguments
x |
leaf_IDs_test - list of leaf IDs in test data |
y |
leaf_IDs - list of leaf IDs in honest data |
z |
leaf_size - list of leaf sizes in honest data |
w |
binary indicator - equal 1 if marginal effects are being computed, 0 otherwise for normal prediction |
Prediction of the Ordered Forest
Description
Prediction for new observations based on estimated Ordered Forest of class orf
Usage
## S3 method for class 'orf'
predict(object, newdata = NULL, type = NULL, inference = NULL, ...)
Arguments
object |
estimated Ordered Forest object of class |
newdata |
numeric matrix X containing the observations for which the outcomes should be predicted |
type |
string, specifying the type of the prediction, These can be either "probs" or "p" for probabilities and "class" or "c" for classes. (Default is "probs"). |
inference |
logical, if TRUE variances for the predictions will be estimated (only feasible for probability predictions). |
... |
further arguments (currently ignored) |
Details
predict.orf
estimates the conditional ordered choice probabilities,
i.e. P[Y=m|X=x] for new data points (matrix X containing new observations
of covariates) based on the estimated Ordered Forest object of class orf
.
Furthermore, weight-based inference for the probability predictions can be
conducted as well. If inference is desired, the supplied Ordered Forest must be
estimated with honesty and subsampling. If prediction only is desired, estimation
without honesty and with bootstrapping is recommended for optimal prediction
performance. Additionally to the probability predictions, class predictions can
be estimated as well using the type
argument. In this case, the predicted
classes are obtained as classes with the highest predicted probability.
Value
object of class orf.prediction
with following elements
info |
info containing forest inputs and data used |
predictions |
predicted values |
variances |
variances of predicted values |
Author(s)
Gabriel Okasa
See Also
summary.orf.prediction
, print.orf.prediction
Examples
# Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates for train and test
idx <- sample(seq(1, nrow(odata), 1), 0.8*nrow(odata))
# train set
Y_train <- as.numeric(odata[idx, 1])
X_train <- as.matrix(odata[idx, -1])
# test set
Y_test <- as.numeric(odata[-idx, 1])
X_test <- as.matrix(odata[-idx, -1])
# estimate Ordered Forest
orf_fit <- orf(X_train, Y_train)
# predict the probabilities with the estimated orf
orf_pred <- predict(orf_fit, newdata = X_test)
# predict the probabilities with estimated orf together with variances
orf_pred <- predict(orf_fit, newdata = X_test, inference = TRUE)
# predict the classes with estimated orf
orf_pred <- predict(orf_fit, newdata = X_test, type = "class")
ORF Predictions for Marginal Effects
Description
Fast ORF Predictions for estimation of marginal effects at mean
Usage
predict_forest_preds_for_ME(forest, data, pred_data)
Arguments
forest |
list of ranger forest objects |
data |
list of n x n matrices of indicator data for orf |
pred_data |
list of prediction data (X_mean_up/down) |
Value
list of predictions
Predict Forest Weights
Description
predict forest weights, i.e. out-of-sample weights based on honest or train sample produced by the random forest algorithm as defined in Wager & Athey (2018)
Usage
predict_forest_weights(forest, data, pred_data)
Arguments
forest |
estimated forest object of type ranger |
data |
train (honest) dataframe |
pred_data |
prediction dataframe |
Value
matrix of honest forest weights
ORF Weight Predictions for Marginal Effects
Description
Fast ORF Weight Predictions for estimation of marginal effects at mean
Usage
predict_forest_weights_for_ME(forest, data, pred_data)
Arguments
forest |
list of ranger forest objects |
data |
list of n x n matrices of indicator data for orf |
pred_data |
list of prediction data (X_mean_up/down) |
Value
list of weights
Predict Honest Predictions
Description
predict honest prediction for out of sample data based on the honest sample
Usage
predict_honest(forest, honest_data, test_data)
Arguments
forest |
estimated forest object of type ranger |
honest_data |
honest dataframe |
test_data |
test dataframe |
Value
vector of honest forest predictions
Print of the Ordered Forest Marginal Effects
Description
print of estimated marginal effects of the Ordered Forest of class margins.orf
Usage
## S3 method for class 'margins.orf'
print(x, ...)
Arguments
x |
estimated Ordered Forest Marginal Effect object of type |
... |
further arguments (currently ignored) |
Details
print.margins.orf
provides a first glimpse of the Ordered Forest
marginal effects, printed directly to the R
console. The printed information
contains the results for the marginal effects for each covariate and each outcome class.
Author(s)
Gabriel Okasa
Examples
## Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest
orf_fit <- orf(X, Y)
# estimate marginal effects of the orf
orf_margins <- margins(orf_fit)
# print marginal effects
print(orf_margins)
Print of the Ordered Forest
Description
print of an estimated Ordered Forest object of class orf
Usage
## S3 method for class 'orf'
print(x, ...)
Arguments
x |
estimated Ordered Forest object of class |
... |
further arguments (currently ignored) |
Details
print.orf
provides a first glimpse of the Ordered Forest estimation,
printed directly to the R
console. The printed information contains
the main inputs of the orf
function.
Author(s)
Gabriel Okasa
Examples
# Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest
orf_fit <- orf(X, Y)
# print output of the orf estimation
print(orf_fit)
Print of the Ordered Forest Prediction
Description
print of Ordered Forest predictions of class orf.prediction
Usage
## S3 method for class 'orf.prediction'
print(x, ...)
Arguments
x |
predicted Ordered Forest object of class |
... |
further arguments (currently ignored) |
Details
print.orf.prediction
provides a first glimpse of the Ordered Forest
prediction, printed directly to the R
console. The printed information
contains the main inputs of the predict.orf
function.
Author(s)
Gabriel Okasa
Examples
# Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates for train and test
idx <- sample(seq(1, nrow(odata), 1), 0.8*nrow(odata))
# train set
Y_train <- as.numeric(odata[idx, 1])
X_train <- as.matrix(odata[idx, -1])
# test set
Y_test <- as.numeric(odata[-idx, 1])
X_test <- as.matrix(odata[-idx, -1])
# estimate Ordered Forest
orf_fit <- orf(X_train, Y_train)
# predict the probabilities with the estimated orf
orf_pred <- predict(orf_fit, newdata = X_test)
# print the prediction object
print(orf_pred)
Ranked Probability Score
Description
Computes the mean ranked probability score (RPS) for evaluating the accuracy of ordered probability predictions
Usage
rps(predictions, observed)
Arguments
predictions |
matrix of predictions (n x categories) |
observed |
vector of observed ordered categorical outcomes (n x 1) |
Value
scalar, mean RPS for given predictions
Summary of the Ordered Forest Marginal Effects
Description
summary of estimated marginal effects of the Ordered Forest of class margins.orf
Usage
## S3 method for class 'margins.orf'
summary(object, latex = FALSE, ...)
Arguments
object |
estimated Ordered Forest Marginal Effect object of type |
latex |
logical, if TRUE latex coded summary will be generated (default is FALSE) |
... |
further arguments (currently ignored) |
Details
summary.margins.orf
provides estimation results of the Ordered Forest
marginal effects. The summary contains the results for the marginal effects
for each covariate and each outcome class, optionally with inference as well.
Furthermore, summary output as a LaTeX table is supported in order to directly
extract the results for the documentation.
Author(s)
Gabriel Okasa
Examples
## Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest
orf_fit <- orf(X, Y)
# estimate marginal effects of the orf
orf_margins <- margins(orf_fit)
# summary of marginal effects
summary(orf_margins)
# summary of marginal effects coded in LaTeX
summary(orf_margins, latex = TRUE)
Summary of the Ordered Forest
Description
summary of an estimated Ordered Forest object of class orf
Usage
## S3 method for class 'orf'
summary(object, latex = FALSE, ...)
Arguments
object |
estimated Ordered Forest object of class |
latex |
logical, if TRUE latex coded summary will be generated (default is FALSE) |
... |
further arguments (currently ignored) |
Details
summary.orf
provides a short summary of the Ordered Forest estimation,
including the input information regarding the values of hyperparameters as
well as the output information regarding the prediction accuracy.
Author(s)
Gabriel Okasa
Examples
# Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates
Y <- as.numeric(odata[, 1])
X <- as.matrix(odata[, -1])
# estimate Ordered Forest
orf_fit <- orf(X, Y)
# show summary of the orf estimation
summary(orf_fit)
# show summary of the orf estimation coded in LaTeX
summary(orf_fit, latex = TRUE)
Summary of the Ordered Forest Prediction
Description
summary of Ordered Forest predictions of class orf.prediction
Usage
## S3 method for class 'orf.prediction'
summary(object, latex = FALSE, ...)
Arguments
object |
predicted Ordered Forest object of class |
latex |
logical, if TRUE latex coded summary will be generated (default is FALSE) |
... |
further arguments (currently ignored) |
Details
summary.orf.prediction
provides a main summary of the Ordered Forest
prediction, including the input information regarding the values of hyperparameters
as well as the inputs of the predict.orf
function.
Author(s)
Gabriel Okasa
Examples
# Ordered Forest
require(orf)
# load example data
data(odata)
# specify response and covariates for train and test
idx <- sample(seq(1, nrow(odata), 1), 0.8*nrow(odata))
# train set
Y_train <- as.numeric(odata[idx, 1])
X_train <- as.matrix(odata[idx, -1])
# test set
Y_test <- as.numeric(odata[-idx, 1])
X_test <- as.matrix(odata[-idx, -1])
# estimate Ordered Forest
orf_fit <- orf(X_train, Y_train)
# predict the probabilities with the estimated orf
orf_pred <- predict(orf_fit, newdata = X_test)
# summary of the prediction object
summary(orf_pred)
# show summary of the orf prediction coded in LaTeX
summary(orf_pred, latex = TRUE)