Help for package cramR

Type:

Package

Title:

Cram Method for Efficient Simultaneous Learning and Evaluation

Version:

0.1.0

Date:

2025-05-11

Maintainer:

Yanis Vandecasteele <yanisvdc.ensae@gmail.com>

Description:

Performs the Cram method, a general and efficient approach to simultaneous learning and evaluation using a generic machine learning algorithm. In a single pass of batched data, the proposed method repeatedly trains a machine learning algorithm and tests its empirical performance. Because it utilizes the entire sample for both learning and evaluation, cramming is significantly more data-efficient than sample-splitting. Unlike cross-validation, Cram evaluates the final learned model directly, providing sharper inference aligned with real-world deployment. The method naturally applies to both policy learning and contextual bandits, where decisions are based on individual features to maximize outcomes. The package includes cram_policy() for learning and evaluating individualized binary treatment rules, cram_ml() to train and assess the population-level performance of machine learning models, and cram_bandit() for on-policy evaluation of contextual bandit algorithms. For all three functions, the package provides estimates of the average outcome that would result if the model were deployed, along with standard errors and confidence intervals for these estimates. Details of the method are described in Jia, Imai, and Li (2024) https://www.hbs.edu/ris/Publication%20Files/2403.07031v1_a83462e0-145b-4675-99d5-9754aa65d786.pdf and Jia et al. (2025) <doi:10.48550/arXiv.2403.07031>.

License:

GPL-3

URL:

https://github.com/yanisvdc/cramR, https://yanisvdc.github.io/cramR/

BugReports:

https://github.com/yanisvdc/cramR/issues

Depends:

R (≥ 3.5.0)

Imports:

caret (≥ 7.0-1), grf (≥ 2.4.0), glmnet (≥ 4.1.8), stats (≥ 4.3.3), magrittr (≥ 2.0.3), doParallel (≥ 1.0.17), foreach (≥ 1.5.2), DT (≥ 0.33), data.table (≥ 1.16.4), keras (≥ 2.15.0), dplyr (≥ 1.1.4), purrr, R6, rjson, R.devices, itertools, iterators

Suggests:

testthat (≥ 3.0.0), covr (≥ 3.5.1), kableExtra (≥ 1.4.0), profvis (≥ 0.4.0), devtools, waldo, knitr, rmarkdown, randomForest, gbm, nnet, withr

Encoding:

UTF-8

RoxygenNote:

7.3.2

VignetteBuilder:

knitr, rmarkdown

Config/testthat/edition:

Config/Needs/citation:

yes

NeedsCompilation:

Packaged:

2025-05-11 23:19:56 UTC; yanis

Author:

Yanis Vandecasteele [cre, aut], Michael Lingzhi Li [ctb], Kosuke Imai [ctb], Zeyang Jia [ctb], Longlin Wang [ctb]

Repository:

CRAN

Date/Publication:

2025-05-14 08:00:02 UTC

Batch Contextual Epsilon-Greedy Policy

Description

Batch Contextual Epsilon-Greedy Policy

Details

Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.

Super class

cramR::NA

Public fields

epsilon: Probability of selecting a random arm (exploration rate).
batch_size: Number of rounds per batch before updating model parameters.
A_cc: List of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_cc: List of reward-weighted context sums (one per arm), updated batch-wise.
class_name: Internal class name identifier.

Methods

Public methods

BatchContextualEpsilonGreedyPolicy$new()
BatchContextualEpsilonGreedyPolicy$set_parameters()
BatchContextualEpsilonGreedyPolicy$get_action()
BatchContextualEpsilonGreedyPolicy$set_reward()
BatchContextualEpsilonGreedyPolicy$clone()

Inherited methods

Method `new()`

Constructor for the Batch Epsilon-Greedy policy.

Usage

BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)

Arguments

epsilon: Numeric between 0 and 1. Probability of random arm selection.
batch_size: Integer. Number of observations between parameter updates.

Method `set_parameters()`

Initializes the parameter structures for each arm.

Usage

BatchContextualEpsilonGreedyPolicy$set_parameters(context_params)

Arguments

context_params: A list with at least 'd' (number of features) and 'k' (number of arms).

Method `get_action()`

Chooses an arm based on epsilon-greedy logic and the current estimates.

Usage

BatchContextualEpsilonGreedyPolicy$get_action(t, context)

Arguments

t: Integer time step.
context: A list with contextual features and arm count.

Returns

A list with the selected action.

Method `set_reward()`

Updates model statistics based on observed reward. Updates occur once per batch.

Usage

BatchContextualEpsilonGreedyPolicy$set_reward(t, context, action, reward)

Arguments

t: Integer time step.
context: List of contextual features used for the action.
action: A list with the chosen arm.
reward: A list with the observed reward.

Returns

Updated parameter estimates.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BatchContextualEpsilonGreedyPolicy$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Batch Contextual Thompson Sampling Policy

Description

Batch Contextual Thompson Sampling Policy

Details

Implements Thompson Sampling for linear contextual bandits with batch updates.

Methods

- 'initialize(v = 0.2, batch_size = 1)': Constructor, sets variance and batch size. - 'set_parameters(context_params)': Initializes arm-level matrices. - 'get_action(t, context)': Samples from the posterior and selects action. - 'set_reward(t, context, action, reward)': Updates posterior statistics using observed feedback.

Super class

cramR::NA

Public fields

sigma: Numeric, posterior variance scale parameter.
batch_size: Integer, size of mini-batches before parameter updates.
A_cc: List of accumulated Gram matrices per arm.
b_cc: List of reward-weighted context sums per arm.
class_name: Internal name of the class.

Methods

Public methods

BatchContextualLinTSPolicy$new()
BatchContextualLinTSPolicy$set_parameters()
BatchContextualLinTSPolicy$get_action()
BatchContextualLinTSPolicy$set_reward()
BatchContextualLinTSPolicy$clone()

Inherited methods

Method `new()`

Constructor for the batch-based Thompson Sampling policy.

Usage

BatchContextualLinTSPolicy$new(v = 0.2, batch_size = 1)

Arguments

v: Numeric. Standard deviation scaling parameter for posterior sampling.
batch_size: Integer. Number of rounds before parameters are updated.

Method `set_parameters()`

Initializes per-arm sufficient statistics.

Usage

BatchContextualLinTSPolicy$set_parameters(context_params)

Arguments

context_params: List with entries: 'unique' (feature vector), 'k' (number of arms).

Method `get_action()`

Samples from the posterior distribution of expected rewards and selects an action.

Usage

BatchContextualLinTSPolicy$get_action(t, context)

Arguments

t: Integer. Time step.
context: List containing the current context and arm information.

Returns

A list with the chosen arm ('choice').

Method `set_reward()`

Updates Gram matrix and response vector for the chosen arm. Parameters are refreshed every 'batch_size' rounds.

Usage

BatchContextualLinTSPolicy$set_reward(t, context, action, reward)

Arguments

t: Integer. Time step.
context: Context object containing feature info.
action: Chosen action (arm index).
reward: Observed reward for the action.

Returns

Updated internal parameters.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BatchContextualLinTSPolicy$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Description

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Details

Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.

Methods

- 'initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)': Constructor. - 'set_parameters(context_params)': Initializes sufficient statistics for each arm. - 'get_action(t, context)': Selects an arm using UCB scores and epsilon-greedy rule. - 'set_reward(t, context, action, reward)': Updates statistics and refreshes model at batch intervals.

Super class

cramR::NA

Public fields

alpha: Numeric, UCB exploration strength parameter.
epsilon: Numeric, probability of taking a random exploratory action.
batch_size: Integer, number of rounds per batch update.
A_cc: List of Gram matrices per arm, accumulated across batch.
b_cc: List of reward-weighted context vectors per arm.
class_name: Internal class name identifier.

Methods

Public methods

BatchLinUCBDisjointPolicyEpsilon$new()
BatchLinUCBDisjointPolicyEpsilon$set_parameters()
BatchLinUCBDisjointPolicyEpsilon$get_action()
BatchLinUCBDisjointPolicyEpsilon$set_reward()
BatchLinUCBDisjointPolicyEpsilon$clone()

Inherited methods

Method `new()`

Constructor for batched LinUCB with epsilon-greedy exploration.

Usage

BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)

Arguments

alpha: Numeric. UCB width parameter (exploration strength).
epsilon: Numeric. Probability of selecting a random arm.
batch_size: Integer. Number of rounds before updating parameters.

Method `set_parameters()`

Initialize arm-specific parameter containers.

Usage

BatchLinUCBDisjointPolicyEpsilon$set_parameters(context_params)

Arguments

context_params: List containing at least 'unique' (feature size) and 'k' (number of arms).

Method `get_action()`

Chooses an arm based on UCB and epsilon-greedy sampling.

Usage

BatchLinUCBDisjointPolicyEpsilon$get_action(t, context)

Arguments

t: Integer timestep.
context: List containing the context for the decision.

Returns

A list with the selected action.

Method `set_reward()`

Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.

Usage

BatchLinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)

Arguments

t: Integer timestep.
context: Context object used for decision-making.
action: List containing the chosen action.
reward: List containing the observed reward.

Returns

Updated internal model parameters.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

BatchLinUCBDisjointPolicyEpsilon$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Contextual Linear Bandit Environment

Description

Contextual Linear Bandit Environment

Details

An R6 class for simulating a contextual linear bandit environment with normally distributed rewards.

Methods

- 'initialize(k, d, list_betas, sigma = 0.1, binary_rewards = FALSE)': Constructor. - 'post_initialization()': Loads correct coefficients based on 'sim_id'. - 'get_context(t)': Returns context and sets internal reward vector. - 'get_reward(t, context_common, action)': Returns observed reward for an action.

Super class

cramR::NA -> ContextualLinearBandit

Public fields

rewards: A vector of rewards for each arm in the current round.
betas: Coefficient matrix of the linear reward model (one column per arm).
sigma: Standard deviation of the Gaussian noise added to rewards.
binary: Logical, indicating whether to convert rewards into binary outcomes.
weights: The latent reward scores before noise and/or binarization.
list_betas: A list of coefficient matrices, one per simulation.
sim_id: Index for selecting which simulation's coefficients to use.
class_name: Name of the class for internal tracking.

Methods

Public methods

ContextualLinearBandit$new()
ContextualLinearBandit$post_initialization()
ContextualLinearBandit$get_context()
ContextualLinearBandit$get_reward()
ContextualLinearBandit$clone()

Inherited methods

Method `new()`

Usage

ContextualLinearBandit$new(
  k,
  d,
  list_betas,
  sigma = 0.1,
  binary_rewards = FALSE
)

Arguments

k: Number of arms
d: Number of features
list_betas: A list of true beta matrices for each simulation
sigma: Standard deviation of Gaussian noise
binary_rewards: Logical, use binary rewards or not

Method `post_initialization()`

Set the simulation-specific coefficients for the current simulation.

Usage

ContextualLinearBandit$post_initialization()

Returns

No return value; modifies the internal state of the object.

Method `get_context()`

Usage

ContextualLinearBandit$get_context(t)

Arguments

t: Current time step

Returns

A list containing context vector 'X' and arm count 'k'

Method `get_reward()`

Usage

ContextualLinearBandit$get_reward(t, context_common, action)

Arguments

t: Current time step
context_common: Context shared across arms
action: Action taken by the policy

Returns

A list with reward and optimal arm/reward info

Method `clone()`

The objects of this class are cloneable with this method.

Usage

ContextualLinearBandit$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

LinUCB Disjoint Policy with Epsilon-Greedy Exploration

Description

LinUCB Disjoint Policy with Epsilon-Greedy Exploration

Details

Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration.

Methods

- 'initialize(alpha = 1.0, epsilon = 0.1)': Create a new LinUCBDisjointPolicyEpsilon object. - 'set_parameters(context_params)': Initialize arm-level parameters. - 'get_action(t, context)': Selects an arm using epsilon-greedy UCB. - 'set_reward(t, context, action, reward)': Updates internal statistics based on observed reward.

Super class

cramR::NA

Public fields

alpha: Numeric, exploration parameter controlling the width of the confidence bound.
epsilon: Numeric, probability of selecting a random action (exploration).
class_name: Internal class name.

Methods

Public methods

LinUCBDisjointPolicyEpsilon$new()
LinUCBDisjointPolicyEpsilon$set_parameters()
LinUCBDisjointPolicyEpsilon$get_action()
LinUCBDisjointPolicyEpsilon$set_reward()
LinUCBDisjointPolicyEpsilon$clone()

Inherited methods

Method `new()`

Initializes the policy with UCB parameter alpha and exploration rate epsilon.

Usage

LinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1)

Arguments

alpha: Numeric. Controls width of the UCB bonus.
epsilon: Numeric between 0 and 1. Probability of random action selection.

Method `set_parameters()`

Set arm-specific parameter structures.

Usage

LinUCBDisjointPolicyEpsilon$set_parameters(context_params)

Arguments

context_params: A list with context information, typically including the number of unique features.

Method `get_action()`

Selects an arm using epsilon-greedy Upper Confidence Bound (UCB).

Usage

LinUCBDisjointPolicyEpsilon$get_action(t, context)

Arguments

t: Integer time step.
context: A list with contextual features and number of arms.

Returns

A list containing the selected action.

Method `set_reward()`

Updates internal statistics using the observed reward for the selected arm.

Usage

LinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)

Arguments

t: Integer time step.
context: Contextual features for all arms at time t.
action: A list containing the chosen arm.
reward: A list containing the observed reward for the selected arm.

Returns

Updated internal parameters.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

LinUCBDisjointPolicyEpsilon$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Cram Bandit: On-policy Statistical Evaluation in Contextual Bandits

Description

Performs the Cram method for On-policy Statistical Evaluation in Contextual Bandits

Usage

cram_bandit(pi, arm, reward, batch = 1, alpha = 0.05)

Arguments

pi

An array of shape (T × B, T, K) or (T × B, T), where T is the number of learning steps (or policy updates), B is the batch size, K is the number of arms, T x B is the total number of contexts. If 3D, pi[j, t, a] gives the probability that the policy pi_t assigns arm a to context X_j. If 2D, pi[j, t] gives the probability that the policy pi_t assigns arm A_j (arm actually chosen under X_j in the history) to context X_j. Please see vignette for more details.

arm

A vector of length T x B indicating which arm was selected in each context

reward

A vector of observed rewards of length T x B

batch

(Optional) A vector or integer. If a vector, gives the batch assignment for each context. If an integer, interpreted as the batch size and contexts are assigned to a batch in the order of the dataset. Default is 1.

alpha

Significance level for confidence intervals for calculating the empirical coverage. Default is 0.05 (95% confidence).

Value

A list containing:

raw_results

A data frame summarizing key metrics: Empirical Bias on Policy Value, Average relative error on Policy Value, RMSE using relative errors on Policy Value, Empirical Coverage of Confidence Intervals.

interactive_table

An interactive table summarizing the same key metrics in a user-friendly interface.

Examples

# Example with batch size of 1

# Set random seed for reproducibility
set.seed(42)

# Define parameters
T <- 100  # Number of timesteps
K <- 4    # Number of arms

# Simulate a 3D array pi of shape (T, T, K)
# - First dimension: Individuals (context Xj)
# - Second dimension: Time steps (pi_t)
# - Third dimension: Arms (depth)
pi <- array(runif(T * T * K, 0.1, 1), dim = c(T, T, K))

# Normalize probabilities so that each row sums to 1 across arms
for (t in 1:T) {
  for (j in 1:T) {
    pi[j, t, ] <- pi[j, t, ] / sum(pi[j, t, ])
    }
 }

# Simulate arm selections (randomly choosing an arm)
arm <- sample(1:K, T, replace = TRUE)

# Simulate rewards (assume normally distributed rewards)
reward <- rnorm(T, mean = 1, sd = 0.5)

result <- cram_bandit(pi, arm, reward, batch=1, alpha=0.05)
result$raw_results
result$interactive_table

Cram Bandit Policy Value Estimate

Description

This function implements the contextual armed bandit on-policy evaluation by providing the policy value estimate.

Usage

cram_bandit_est(pi, reward, arm, batch = 1)

Arguments

pi

reward

A vector of observed rewards of length T x B

arm

A vector of length T x B indicating which arm was selected in each context

batch

Value

The estimated policy value.

Cram Bandit Simulation

Description

This function runs on-policy simulation for contextual bandit algorithms using the Cram method. It evaluates the statistical properties of policy value estimates.

Usage

cram_bandit_sim(
  horizon,
  simulations,
  bandit,
  policy,
  alpha = 0.05,
  do_parallel = FALSE,
  seed = 42
)

Arguments

horizon

An integer specifying the number of timesteps (rounds) per simulation.

simulations

An integer specifying the number of independent Monte Carlo simulations to perform.

bandit

A contextual bandit environment object that generates contexts (feature vectors) and observed rewards for each arm chosen.

policy

A policy object that takes in a context and selects an arm (action) at each timestep.

alpha

Significance level for confidence intervals for calculating the empirical coverage. Default is 0.05 (95% confidence).

do_parallel

Whether to parallelize the simulations. Default to FALSE. We recommend keeping to FALSE unless necessary, please see vignette.

seed

An optional integer to set the random seed for reproducibility. If NULL, no seed is set.

Value

A list containing:

estimates

A table containing the detailed history of estimates and errors for each simulation.

raw_results

A data frame summarizing key metrics: Empirical Bias on Policy Value, Average relative error on Policy Value, RMSE using relative errors on Policy Value, Empirical Coverage of Confidence Intervals.

interactive_table

An interactive table summarizing the same key metrics in a user-friendly interface.

Examples


# Number of time steps
horizon       <- 500L

# Number of simulations
simulations   <- 100L

# Number of arms
k = 4

# Number of context features
d= 3

# Reward beta parameters of linear model (the outcome generation models,
# one for each arm, are linear with arm-specific parameters betas)
list_betas <- cramR::get_betas(simulations, d, k)

# Define the contextual linear bandit, where sigma is the scale
# of the noise in the outcome linear model
bandit        <- cramR::ContextualLinearBandit$new(k = k,
                                                    d = d,
                                                    list_betas = list_betas,
                                                    sigma = 0.3)

# Define the policy object (choose between Contextual Epsilon Greedy,
# UCB Disjoint and Thompson Sampling)
policy <- cramR::BatchContextualEpsilonGreedyPolicy$new(epsilon=0.1,
                                                         batch_size=5)
# policy <- cramR::BatchLinUCBDisjointPolicyEpsilon$new(alpha=1.0,epsilon=0.1,batch_size=1)
# policy <- cramR::BatchContextualLinTSPolicy$new(v = 0.1, batch_size=1)


sim <- cram_bandit_sim(horizon, simulations,
                       bandit, policy,
                       alpha=0.05, do_parallel = FALSE)
sim$summary_table

Cram Bandit Variance of the Policy Value Estimate

Description

This function implements the crammed variance estimate of the policy value estimate for the contextual armed bandit on-policy evaluation setting.

Usage

cram_bandit_var(pi, reward, arm, batch = 1)

Arguments

pi

reward

A vector of observed rewards of length T x B

arm

A vector of length T x B indicating which arm was selected in each context

batch

Value

The crammed variance estimate of the policy value estimate.

Cram Policy Estimator for Policy Value Difference (Delta)

Description

This function returns the cram policy estimator for the policy value difference (delta).

Usage

cram_estimator(X, Y, D, pi, batch_indices, propensity = NULL)

Arguments

X

A matrix or data frame of covariates for each sample.

Y

A vector of outcomes for the n individuals.

D

A vector of binary treatments for the n individuals.

pi

A matrix of n rows and (nb_batch + 1) columns, where n is the sample size and nb_batch is the number of batches, containing the policy assignment for each individual for each policy. The first column represents the baseline policy.

batch_indices

A list where each element is a vector of indices corresponding to the individuals in each batch.

propensity

The propensity score function

Value

The estimated policy value difference (Delta).

Cram ML Expected Loss Estimate

Description

This function computes the Cram ML expected loss estimator based on the given loss matrix and batch indices.

Usage

cram_expected_loss(loss, batch_indices)

Arguments

loss

A matrix of loss values with N rows (data points) and K+1 columns (batches). We assume that the first column of the loss matrix contains only zeros. The following nb_batch columns contain the losses of each trained model for each individual.

batch_indices

A list where each element is a vector of indices corresponding to a batch.

Value

The Cram ML expected loss estimate

Cram Policy Learning

Description

This function performs the learning part of the Cram Policy method.

Usage

cram_learning(
  X,
  D,
  Y,
  batch,
  model_type = "causal_forest",
  learner_type = "ridge",
  baseline_policy = NULL,
  parallelize_batch = FALSE,
  model_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  n_cores = detectCores() - 1,
  propensity = NULL
)

Arguments

X

A matrix or data frame of covariates for each sample.

D

A vector of binary treatment indicators (1 for treated, 0 for untreated).

Y

A vector of outcome values for each sample.

batch

Either an integer specifying the number of batches (which will be created by random sampling) or a vector of length equal to the sample size providing the batch assignment (index) for each individual in the sample.

model_type

The model type for policy learning. Options include "causal_forest", "s_learner", and "m_learner". Default is "causal_forest". Note: you can also set model_type to NULL and specify custom_fit and custom_predict to use your custom model.

learner_type

The learner type for the chosen model. Options include "ridge" for Ridge Regression, "fnn" for Feedforward Neural Network and "caret" for Caret. Default is "ridge". if model_type is 'causal_forest', choose NULL, if model_type is 's_learner' or 'm_learner', choose between 'ridge', 'fnn' and 'caret'.

baseline_policy

A list providing the baseline policy (binary 0 or 1) for each sample. If NULL, the baseline policy defaults to a list of zeros with the same length as the number of rows in X.

parallelize_batch

Logical. Whether to parallelize batch processing (i.e. the cram method learns T policies, with T the number of batches. They are learned in parallel when parallelize_batch is TRUE vs. learned sequentially using the efficient data.table structure when parallelize_batch is FALSE, recommended for light weight training). Defaults to FALSE.

model_params

A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to NULL.

custom_fit

A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to NULL.

custom_predict

A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to NULL.

n_cores

Number of cores to use for parallelization when parallelize_batch is set to TRUE. Defaults to detectCores() - 1.

propensity

The propensity score

Value

A list containing:

final_policy_model

The final fitted policy model, depending on model_type and learner_type.

policies

A matrix of learned policies, where each column represents a batch's learned policy and the first column is the baseline policy.

batch_indices

The indices for each batch, either as generated (if batch is an integer) or as provided by the user.

Cram ML: Simultaneous Machine Learning and Evaluation

Description

Performs the Cram method for simultaneous machine learning and evaluation.

Usage

cram_ml(
  data,
  batch,
  formula = NULL,
  caret_params = NULL,
  parallelize_batch = FALSE,
  loss_name = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  custom_loss = NULL,
  alpha = 0.05,
  classify = FALSE
)

Arguments

data

A matrix or data frame of covariates. For supervised learning, must include the target variable specified in formula.

batch

Integer specifying number of batches or vector of pre-defined batch assignments.

formula

Formula for supervised learning (e.g., y ~ .).

caret_params

List of parameters for caret::train() containing:

method: Model type (e.g., "rf", "glm", "xgbTree" for supervised learning)
Additional method-specific parameters

parallelize_batch

Logical indicating whether to parallelize batch processing (default = FALSE).

loss_name

Name of loss metric (supported: "se", "logloss", "accuracy").

custom_fit

Optional custom model training function.

custom_predict

Optional custom prediction function.

custom_loss

Optional custom loss function.

alpha

Confidence level for intervals (default = 0.05).

classify

Indicate if this is a classification problem. Defaults to FALSE.

Value

A list containing:

raw_results: Data frame with performance metrics
interactive_table: The same performance metrics in a user-friendly interface
final_ml_model: Trained model object

Examples

# Load necessary libraries
library(caret)

# Set seed for reproducibility
set.seed(42)

# Generate example dataset
X_data <- data.frame(x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100))
Y_data <- rnorm(100)  # Continuous target variable for regression
data_df <- data.frame(X_data, Y = Y_data)  # Ensure target variable is included

# Define caret parameters for simple linear regression (no cross-validation)
caret_params_lm <- list(
  method = "lm",
  trControl = trainControl(method = "none")
)

nb_batch <- 5

# Run ML learning function
result <- cram_ml(
  data = data_df,
  formula = Y ~ .,  # Linear regression model
  batch = nb_batch,
  loss_name = 'se',
  caret_params = caret_params_lm
)

result$raw_results
result$interactive_table
result$final_ml_model

Cram Policy: Efficient Simultaneous Policy Learning and Evaluation

Description

This function performs the cram method (simultaneous policy learning and evaluation) for binary policies on data including covariates (X), binary treatment indicator (D) and outcomes (Y).

Usage

cram_policy(
  X,
  D,
  Y,
  batch,
  model_type = "causal_forest",
  learner_type = "ridge",
  baseline_policy = NULL,
  parallelize_batch = FALSE,
  model_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  alpha = 0.05,
  propensity = NULL
)

Arguments

X

A matrix or data frame of covariates for each sample.

D

A vector of binary treatment indicators (1 for treated, 0 for non-treated).

Y

A vector of outcome values for each sample.

batch

model_type

learner_type

baseline_policy

A list providing the baseline policy (binary 0 or 1) for each sample. If NULL, defaults to a list of zeros with the same length as the number of rows in X.

parallelize_batch

model_params

A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to NULL.

custom_fit

A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to NULL.

custom_predict

A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to NULL.

alpha

Significance level for confidence intervals. Default is 0.05 (95% confidence).

propensity

The propensity score function for binary treatment indicator (D) (probability for each unit to receive treatment). Defaults to 0.5 (random assignment).

Value

A list containing:

raw_results: A data frame summarizing key metrics with truncated decimals:
- Delta Estimate: The estimated treatment effect (delta).
- Delta Standard Error: The standard error of the delta estimate.
- Delta CI Lower: The lower bound of the confidence interval for delta.
- Delta CI Upper: The upper bound of the confidence interval for delta.
- Policy Value Estimate: The estimated policy value.
- Policy Value Standard Error: The standard error of the policy value estimate.
- Policy Value CI Lower: The lower bound of the confidence interval for policy value.
- Policy Value CI Upper: The upper bound of the confidence interval for policy value.
- Proportion Treated: The proportion of individuals treated under the final policy.
interactive_table: An interactive table summarizing key metrics for detailed exploration.
final_policy_model: The final fitted policy model based on model_type and learner_type or custom_fit.

Examples

# Example data
X_data <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5)
D_data <- as.integer(sample(c(0, 1), 100, replace = TRUE))
Y_data <- rnorm(100)
nb_batch <- 5

# Perform CRAM policy
result <- cram_policy(X = X_data,
                          D = D_data,
                          Y = Y_data,
                          batch = nb_batch)

# Access results
result$raw_results
result$interactive_table
result$final_policy_model

Cram Policy: Estimator for Policy Value (Psi)

Description

This function returns the cram estimator for the policy value (psi).

Usage

cram_policy_value_estimator(X, Y, D, pi, batch_indices, propensity = NULL)

Arguments

X

A matrix or data frame of covariates for each sample.

Y

A vector of outcomes for the n individuals.

D

A vector of binary treatments for the n individuals.

pi

batch_indices

A list where each element is a vector of indices corresponding to the individuals in each batch.

propensity

Propensity score function

Value

The estimated policy value.

Cram Policy Simulation

Description

This function performs the cram method (simultaneous learning and evaluation) on simulation data, for which the data generation process (DGP) is known. The data generation process for X can be given directly as a function or induced by a provided dataset via row-wise bootstrapping. Results are averaged across Monte Carlo replicates for the given DGP.

Usage

cram_simulation(
  X = NULL,
  dgp_X = NULL,
  dgp_D,
  dgp_Y,
  batch,
  nb_simulations,
  nb_simulations_truth = NULL,
  sample_size,
  model_type = "causal_forest",
  learner_type = "ridge",
  alpha = 0.05,
  baseline_policy = NULL,
  parallelize_batch = FALSE,
  model_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  propensity = NULL
)

Arguments

X

Optional. A matrix or data frame of covariates for each sample inducing empirically the DGP for covariates.

dgp_X

Optional. A function to generate covariate data for simulations.

dgp_D

A vectorized function to generate binary treatment assignments for each sample.

dgp_Y

A vectorized function to generate the outcome variable for each sample given the treatment and covariates.

batch

nb_simulations

The number of simulations (Monte Carlo replicates) to run.

nb_simulations_truth

Optional. The number of additional simmulations (Monte Carlo replicates) beyond nb_simulations to use when calculating the true policy value difference (delta) and the true policy value (psi)

sample_size

The number of samples in each simulation.

model_type

learner_type

alpha

Significance level for confidence intervals. Default is 0.05 (95% confidence).

baseline_policy

A list providing the baseline policy (binary 0 or 1) for each sample. If NULL, defaults to a list of zeros with the same length as the number of rows in X.

parallelize_batch

model_params

A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to NULL.

custom_fit

A custom, user-defined, function that outputs a fitted model given training data (allows flexibility). Defaults to NULL.

custom_predict

A custom, user-defined, function for making predictions given a fitted model and test data (allow flexibility). Defaults to NULL.

propensity

The propensity score model

Value

A list containing:

avg_proportion_treated: The average proportion of treated individuals across simulations.
avg_delta_estimate: The average delta estimate across simulations.
avg_delta_standard_error: The average standard error of delta estimates.
delta_empirical_bias: The empirical bias of delta estimates.
delta_empirical_coverage: The empirical coverage of delta confidence intervals.
avg_policy_value_estimate: The average policy value estimate across simulations.
avg_policy_value_standard_error: The average standard error of policy value estimates.
policy_value_empirical_bias: The empirical bias of policy value estimates.
policy_value_empirical_coverage: The empirical coverage of policy value confidence intervals.

Examples


set.seed(123)

# dgp_X <- function(n) {
#   data.table::data.table(
#     binary     = rbinom(n, 1, 0.5),
#     discrete   = sample(1:5, n, replace = TRUE),
#     continuous = rnorm(n)
#   )
# }

n <- 100

X_data <- data.table::data.table(
  binary     = rbinom(n, 1, 0.5),
  discrete   = sample(1:5, n, replace = TRUE),
  continuous = rnorm(n)
)


dgp_D <- function(X) rbinom(nrow(X), 1, 0.5)

dgp_Y <- function(D, X) {
  theta <- ifelse(
    X[, binary] == 1 & X[, discrete] <= 2,  # Group 1: High benefit
    1,
    ifelse(X[, binary] == 0 & X[, discrete] >= 4,  # Group 3: Negative benefit
           -1,
           0.1)  # Group 2: Neutral effect
  )
  Y <- D * (theta + rnorm(length(D), mean = 0, sd = 1)) +
    (1 - D) * rnorm(length(D))  # Outcome for untreated
  return(Y)
}

# Parameters
nb_simulations <- 100
nb_simulations_truth <- 200
batch <- 5

# Perform CRAM simulation
result <- cram_simulation(
  X = X_data,
  dgp_D = dgp_D,
  dgp_Y = dgp_Y,
  batch = batch,
  nb_simulations = nb_simulations,
  nb_simulations_truth = nb_simulations_truth,
  sample_size = 500
)

result$raw_results
result$interactive_table

Cram ML: Variance Estimate of the crammed expected loss estimate

Description

This function computes the variance estimator based on the given loss matrix and batch indices.

Usage

cram_var_expected_loss(loss, batch_indices)

Arguments

loss

batch_indices

A list where each element is a vector of indices corresponding to a batch.

Value

The variance estimate of the crammed expected loss estimate

Cram Policy: Variance Estimate of the crammed Policy Value Difference (Delta)

Description

This function estimates the asymptotic variance of the cram estimator for the policy value difference (delta).

Usage

cram_variance_estimator(X, Y, D, pi, batch_indices, propensity = NULL)

Arguments

X

A matrix or data frame of covariates for each sample.

Y

A vector of outcomes for the n individuals.

D

A vector of binary treatments for the n individuals.

pi

batch_indices

A list where each element is a vector of indices corresponding to the individuals in each batch.

propensity

The propensity score function

Value

The estimated variance of the policy value difference (Delta)

Cram Policy: Variance Estimate of the crammed Policy Value estimate (Psi)

Description

This function estimates the asymptotic variance of the cram estimator for the policy value (psi).

Usage

cram_variance_estimator_policy_value(
  X,
  Y,
  D,
  pi,
  batch_indices,
  propensity = NULL
)

Arguments

X

A matrix or data frame of covariates for each sample.

Y

A vector of outcomes for the n individuals.

D

A vector of binary treatments for the n individuals.

pi

batch_indices

A list where each element is a vector of indices corresponding to the individuals in each batch.

propensity

Propensity score function

Value

The variance estimate of the crammed Policy Value estimate (Psi)

Cram Policy: Fit Model

Description

This function trains a given unfitted model with the provided data and parameters, according to model type and learner type.

Usage

fit_model(model, X, Y, D, model_type, learner_type, model_params, propensity)

Arguments

model

An unfitted model object, as returned by 'set_model'.

X

A matrix or data frame of covariates for the samples.

Y

A vector of outcome values.

D

A vector of binary treatment indicators (1 for treated, 0 for untreated).

model_type

The model type for policy learning. Options include "causal_forest", "s_learner", and "m_learner". Default is "causal_forest".

learner_type

The learner type for the chosen model. Options include "ridge" for Ridge Regression and "fnn" for Feedforward Neural Network. Default is "ridge".

model_params

A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to NULL.

propensity

The propensity score

Value

The fitted model object.

Cram ML: Fit Model ML

Description

This function trains a given unfitted model with the provided data and parameters, according to model type and learner type.

Usage

fit_model_ml(data, formula, caret_params, classify)

Arguments

data

The dataset

formula

The formula

caret_params

The parameters for caret model

classify

Indicate if this is a classification problem. Defaults to FALSE

Value

The fitted model object.

Generate Reward Parameters for Simulated Linear Bandits

Description

Creates a list of matrices representing the arm-specific reward-generating parameters (betas) used in contextual linear bandit simulations. Each matrix corresponds to one simulation and contains normalized random coefficients.

Usage

get_betas(simulations, d, k)

Arguments

simulations

Integer. Number of simulations.

d

Integer. Number of features (context dimensions).

k

Integer. Number of arms.

Value

A list of length simulations + 1 (first element being discarded in the underlying simulation package), each containing a d x k matrix of normalized reward parameters.

Cram ML: Generalized ML Learning

Description

This function performs batch-wise learning for machine learning models.

Usage

ml_learning(
  data,
  formula = NULL,
  batch,
  parallelize_batch = FALSE,
  loss_name = NULL,
  caret_params = NULL,
  custom_fit = NULL,
  custom_predict = NULL,
  custom_loss = NULL,
  n_cores = detectCores() - 1,
  classify = FALSE
)

Arguments

data

A matrix or data frame of features. Must include the target variable.

formula

Formula specifying the relationship between the target and predictors for supervised learning.

batch

Either an integer specifying the number of batches (randomly sampled) or a vector of length equal to the sample size indicating batch assignment for each observation.

parallelize_batch

Logical. Whether to parallelize batch processing. Defaults to 'FALSE'.

loss_name

The name of the loss function to be used (e.g., '"se"', '"logloss"').

caret_params

A list of parameters to pass to the 'caret::train()' function. - Required: 'method' (e.g., '"glm"', '"rf"').

custom_fit

A custom function for training user-defined models. Defaults to 'NULL'.

custom_predict

A custom function for making predictions from user-defined models. Defaults to 'NULL'.

custom_loss

Optional custom function for computing the loss of a trained model on the data. Should return a vector containing per-instance losses.

n_cores

Number of CPU cores to use for parallel processing ('parallelize_batch = TRUE'). Defaults to 'detectCores() - 1'.

classify

Indicate if this is a classification problem. Defaults to FALSE

Value

A list containing:

final_ml_model

The final trained ML model.

losses

A matrix of losses where each column represents a batch's trained model. The first column contains zeros (baseline model).

batch_indices

The indices of observations in each batch.

Cram Policy: Predict with the Specified Model

Description

This function performs inference using a trained model, providing flexibility for different types of models such as Causal Forest, Ridge Regression, and Feedforward Neural Networks (FNNs).

Usage

model_predict(model, X, D, model_type, learner_type, model_params)

Arguments

model

A trained model object returned by the 'fit_model' function.

X

A matrix or data frame of covariates for which predictions are required.

D

A vector of binary treatment indicators (1 for treated, 0 for untreated). Optional, depending on the model type.

model_type

The model type for policy learning. Options include "causal_forest", "s_learner", and "m_learner". Default is "causal_forest".

learner_type

The learner type for the chosen model. Options include "ridge" for Ridge Regression and "fnn" for Feedforward Neural Network. Default is "ridge".

model_params

A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to NULL.

Value

A vector of binary policy assignments, depending on the model_type and learner_type.

Cram ML: Predict with the Specified Model

Description

This function performs inference using a trained model

Usage

model_predict_ml(
  model,
  data,
  formula,
  caret_params,
  cram_policy_handle = FALSE
)

Arguments

model

A trained model object returned by the 'fit_model_ml' function.

data

The dataset

formula

The formula

caret_params

The parameters of the caret model

cram_policy_handle

Internal use. Post-process predictions differently for cram policy use. Defaults to FALSE.

Value

Predictions of the model on the data

Cram Policy: Set Model

Description

This function maps the model type and learner type to the corresponding model function.

Usage

set_model(model_type, learner_type, model_params)

Arguments

model_type

learner_type

model_params

A list of additional parameters to pass to the model, which can be any parameter defined in the model reference package. Defaults to NULL. For FNNs, the following elements are defined in the model params list:

input_layer

A list defining the input layer. Must include:

units: Number of units in the input layer.
activation: Activation function for the input layer.
input_shape: Input shape for the layer.

layers

A list of lists, where each sublist specifies a hidden layer with:

units: Number of units in the layer.
activation: Activation function for the layer.

output_layer

A list defining the output layer. Must include:

units: Number of units in the output layer.
activation: Activation function for the output layer (e.g., "linear" or "sigmoid").

compile_args

A list of arguments for compiling the model. Must include:

optimizer: Optimizer for training (e.g., "adam" or "sgd").
loss: Loss function (e.g., "mse" or "binary_crossentropy").
metrics: Optional list of metrics for evaluation (e.g., c("accuracy")).

For other learners (e.g., "ridge" or "causal_forest"), model_params can include relevant hyperparameters.

Value

The instantiated model object or the corresponding model function.

Validate or Set the Baseline Policy

Description

This function validates a provided baseline policy or sets a default baseline policy of zeros for all individuals.

Usage

test_baseline_policy(baseline_policy, n)

Arguments

baseline_policy

A list representing the baseline policy for each individual. If NULL, a default baseline policy of zeros is created.

n

An integer specifying the number of individuals in the population.

Value

A validated or default baseline policy as a list of numeric values.

Validate or Generate Batch Assignments

Description

This function validates a provided batch assignment or generates random batch assignments for individuals.

Usage

test_batch(batch, n)

Arguments

batch

Either an integer specifying the number of batches or a vector/list of batch assignments for all individuals.

n

An integer specifying the number of individuals in the population.

Value

A list containing:

batches: A list where each element contains the indices of individuals assigned to a specific batch.
nb_batch: The total number of batches.

Cram Policy: Validate User-Provided Parameters for a Model

Description

This function validates user-provided parameters against the formal arguments of a specified model function. It ensures that all user-specified parameters are recognized by the model and raises an error for invalid parameters.

Usage

validate_params(model_function, model_type, learner_type, user_params)

Arguments

model_function

The model function for which parameters are being validated (e.g., grf::causal_forest).

model_type

learner_type

user_params

A named list of parameters provided by the user.

Value

A named list of validated parameters that are safe to pass to the model function.

Cram Policy: Validate Parameters for Feedforward Neural Networks (FNNs)

Description

This function validates user-provided parameters for a Feedforward Neural Network (FNN) model. It ensures the correct structure for input_layer, layers, output_layer, compile_args and fit_params.

Usage

validate_params_fnn(model_type, learner_type, model_params, X)

Arguments

model_type

learner_type

model_params

A named list of parameters provided by the user for configuring the FNN model.

X

A matrix or data frame of covariates for which the parameters are validated.

Value

A named list of validated parameters merged with defaults for any missing values.

Batch Contextual Epsilon-Greedy Policy

Description

Details

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method set_parameters()

Usage

Arguments

Method get_action()

Usage

Arguments

Returns

Method set_reward()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Batch Contextual Thompson Sampling Policy

Description

Details

Methods

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method set_parameters()

Usage

Arguments

Method get_action()

Usage

Arguments

Returns

Method set_reward()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Description

Details

Methods

Super class

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Method set_parameters()

Usage

Arguments

Method get_action()

Usage

Arguments

Returns

Method set_reward()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

Contextual Linear Bandit Environment

Description

Details

Methods

Super class

Public fields

Method `new()`

Method `set_parameters()`

Method `get_action()`

Method `set_reward()`

Method `clone()`

Method `new()`

Method `set_parameters()`

Method `get_action()`

Method `set_reward()`

Method `clone()`

Method `new()`

Method `set_parameters()`

Method `get_action()`

Method `set_reward()`

Method `clone()`

Method `new()`

Method `post_initialization()`

Method `get_context()`

Method `get_reward()`

Method `clone()`

Method `new()`

Method `set_parameters()`

Method `get_action()`

Method `set_reward()`

Method `clone()`