Type: | Package |
Title: | Approximate POMDP Planning Software |
Version: | 0.6.16 |
Description: | A toolkit for Partially Observed Markov Decision Processes (POMDP). Provides bindings to C++ libraries implementing the algorithm SARSOP (Successive Approximations of the Reachable Space under Optimal Policies) and described in Kurniawati et al (2008), <doi:10.15607/RSS.2008.IV.009>. This package also provides a high-level interface for generating, solving and simulating POMDP problems and their solutions. |
License: | GPL-2 |
URL: | https://github.com/boettiger-lab/sarsop |
BugReports: | https://github.com/boettiger-lab/sarsop/issues |
RoxygenNote: | 7.1.1 |
Imports: | xml2, parallel, processx, digest, Matrix |
Suggests: | testthat, roxygen2, knitr, covr, spelling |
LinkingTo: | BH |
Encoding: | UTF-8 |
Language: | en-US |
SystemRequirements: | mallinfo, hence Linux, MacOS or Windows |
NeedsCompilation: | yes |
Packaged: | 2025-04-15 17:11:33 UTC; jovyan |
Author: | Carl Boettiger |
Maintainer: | Carl Boettiger <cboettig@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-16 04:50:08 UTC |
alphas_from_log
Description
Read alpha vectors from a log file.
Usage
alphas_from_log(meta, log_dir = ".")
Arguments
meta |
a data frame containing the log metadata
for each set of alpha vectors desired, see
|
log_dir |
path to log directory |
Value
a list with a matrix of alpha vectors for each
entry in the provided metadata (as returned by sarsop
).
Examples
# takes > 5s
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
log = tempfile()
alpha <- sarsop(transition, observation, reward, discount, precision = 10,
log_dir = log)
test the APPL binaries
Description
Asserts that the C++ binaries for appl have been compiled successfully
Usage
assert_has_appl()
Value
Will return TRUE if binaries are installed and can be located and executed, and FALSE otherwise.
Examples
assert_has_appl()
compute_policy
Description
Derive the corresponding policy function from the alpha vectors
Usage
compute_policy(
alpha,
transition,
observation,
reward,
state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
a_0 = 1
)
Arguments
alpha |
the matrix of alpha vectors returned by |
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
state_prior |
initial belief state, optional, defaults to uniform over states |
a_0 |
previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken. |
Value
a data frame providing the optimal policy (choice of action) and corresponding value of the action for each possible belief state
Examples
m <- fisheries_matrices()
## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10)
compute_policy(alpha, m$transition, m$observation, m$reward)
}
f from log
Description
Read transition function from log
Usage
f_from_log(meta)
Arguments
meta |
a data frame containing the log metadata
for each set of alpha vectors desired, see
|
Details
note this function is unique to the fisheries example problem and assumes that sarsop call is run with logging specifying a column "model" that contains either the string "ricker" (corresponding to a Ricker-type growth function) or "allen" (corresponding to an Allen-type.)
Value
the growth function associated with the model indicated.
Examples
# takes > 5s
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
log = tempfile()
alpha <- sarsop(transition, observation, reward, discount, precision = 10,
log_dir = log)
fisheries_matrices
Description
Initialize the transition, observation, and reward matrices given a transition function, reward function, and state space
Usage
fisheries_matrices(
states = 0:20,
actions = states,
observed_states = states,
reward_fn = function(x, a) pmin(x, a),
f = ricker(1, 15),
sigma_g = 0.1,
sigma_m = 0.1,
noise = c("rescaled-lognormal", "lognormal", "uniform", "normal")
)
Arguments
states |
sequence of possible states |
actions |
sequence of possible actions |
observed_states |
sequence of possible observations |
reward_fn |
function of x and a that gives reward for tacking action a when state is x |
f |
transition function of state x and action a. |
sigma_g |
half-width of uniform shock or equivalent variance for log-normal |
sigma_m |
half-width of uniform shock or equivalent variance for log-normal |
noise |
distribution for noise, "lognormal" or "uniform" |
Details
assumes log-normally distributed observation errors and process errors
Value
list of transition matrix, observation matrix, and reward matrix
Examples
m <- fisheries_matrices()
hindcast_pomdp
Description
Compare historical actions to what pomdp recommendation would have been.
Usage
hindcast_pomdp(
transition,
observation,
reward,
discount,
obs,
action,
state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
alpha = NULL,
...
)
Arguments
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
obs |
a given sequence of observations |
action |
the corresponding sequence of actions |
state_prior |
initial belief state, optional, defaults to uniform over states |
alpha |
the matrix of alpha vectors returned by |
... |
additional arguments to |
Value
a list, containing: a data frame with columns for time, obs, action, and optimal action, and an array containing the posterior belief distribution at each time t
Examples
m <- fisheries_matrices()
## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10)
sim <- hindcast_pomdp(m$transition, m$observation, m$reward, 0.95,
obs = rnorm(21, 15, .1), action = rep(1, 21),
alpha = alpha)
}
meta from log
Description
load metadata from a log file
Usage
meta_from_log(
parameters,
log_dir = ".",
metafile = paste0(log_dir, "/meta.csv")
)
Arguments
parameters |
a data.frame with the desired parameter values as given in metafile |
log_dir |
path to log directory |
metafile |
path to metafile index, assumed to be meta.csv in log_dir |
Value
a data.frame with the rows of the matching metadata.
Examples
# takes > 5s
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
log = tempfile()
alpha <- sarsop(transition, observation, reward, discount, precision = 10,
log_dir = log)
model from log
Description
Read model details from log file
Usage
models_from_log(meta, reward_fn = function(x, h) pmin(x, h))
Arguments
meta |
a data frame containing the log metadata
for each set of alpha vectors desired, see
|
reward_fn |
a function f(x,a) giving the reward for taking action a given a system in state x. |
Details
assumes transition can be determined by the f_from_log function, which is specific to the fisheries example
Value
a list with an element for each row in the requested meta data frame, which itself is a list of the three matrices: transition, observation, and reward, defining the pomdp problem.
Examples
# takes > 5s
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
log = tempfile()
alpha <- sarsop(transition, observation, reward, discount, precision = 10,
log_dir = log)
APPL wrappers
Description
Wrappers for the APPL executables. The pomdpsol
function solves a model
file and returns the path to the output policy file.
Usage
pomdpsol(
model,
output = tempfile(),
precision = 0.001,
timeout = NULL,
fast = FALSE,
randomization = FALSE,
memory = NULL,
improvementConstant = NULL,
timeInterval = NULL,
stdout = tempfile(),
stderr = tempfile(),
spinner = TRUE
)
polgraph(
model,
policy,
output = tempfile(),
max_depth = 3,
max_branches = 10,
min_prob = 0.001,
stdout = "",
spinner = TRUE
)
pomdpsim(
model,
policy,
output = tempfile(),
steps = 100,
simulations = 3,
stdout = "",
spinner = TRUE
)
pomdpeval(
model,
policy,
output = tempfile(),
steps = 100,
simulations = 3,
stdout = "",
spinner = TRUE
)
pomdpconvert(model, stdout = "", spinner = TRUE)
Arguments
model |
file/path to the |
output |
file/path of the output policy file. This is also returned by the function. |
precision |
targetPrecision. Set targetPrecision as the target precision in solution quality; run ends when target precision is reached. The target precision is 1e-3 by default. |
timeout |
Use timeLimit as the timeout in seconds. If running time exceeds the specified value, pomdpsol writes out a policy and terminates. There is no time limit by default. |
fast |
logical, default FALSE. use fast (but very picky) alternate parser for .pomdp files. |
randomization |
logical, default FALSE. Turn on randomization for the sampling algorithm. |
memory |
Use memoryLimit as the memory limit in MB. No memory limit by default. If memory usage exceeds the specified value, pomdpsol writes out a policy and terminates. Set the value to be less than physical memory to avoid swapping. |
improvementConstant |
Use improvementConstant as the trial improvement factor in the sampling algorithm. At the default of 0.5, a trial terminates at a belief when the gap between its upper and lower bound is 0.5 of the current precision at the initial belief. |
timeInterval |
Use timeInterval as the time interval between two consecutive write-out of policy files. If this is not specified, pomdpsol only writes out a policy file upon termination. |
stdout |
a filename where pomdp run statistics will be stored |
stderr |
currently ignored. |
spinner |
should we show a spinner while sarsop is running? |
policy |
file/path to the policy file |
max_depth |
the maximum horizon of the generated policy graph |
max_branches |
maximum number of branches to show in the policy graph |
min_prob |
the minimum probability threshold for a branch to be shown in the policy graph |
steps |
number of steps for each simulation run |
simulations |
as the number of simulation runs |
Examples
if(assert_has_appl()){
model <- system.file("models", "example.pomdp", package = "sarsop")
policy <- tempfile(fileext = ".policyx")
pomdpsol(model, output = policy, timeout = 1)
# Other tools
evaluation <- pomdpeval(model, policy, stdout = FALSE)
graph <- polgraph(model, policy, stdout = FALSE)
simulations <- pomdpsim(model, policy, stdout = FALSE)
}
read_policyx
Description
read a .policyx file created by SARSOP and return alpha vectors and associated actions.
Usage
read_policyx(file = "output.policyx")
Arguments
file |
name of the policyx file to be read. |
Value
a list, first element "vectors" is an n_states x n_vectors array of alpha vectors, second element is a numeric vector "action" of length n_vectors whose i'th element indicates the action corresponding to the i'th alpha vector (column) in the vectors array.
Examples
f <- system.file("extdata", "out.policy", package="sarsop", mustWork = TRUE)
policy <- read_policyx(f)
sarsop
Description
sarsop wraps the tasks of writing the pomdpx file defining the problem, running the pomdsol (SARSOP) algorithm in C++, and then reading the resulting policy file back into R. The returned alpha vectors and alpha_action information is then transformed into a more generic, user-friendly representation as a matrix whose columns correspond to actions and rows to states. This function can thus be used at the heart of most pomdp applications.
Usage
sarsop(
transition,
observation,
reward,
discount,
state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
verbose = TRUE,
log_dir = tempdir(),
log_data = NULL,
cache = TRUE,
...
)
Arguments
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
state_prior |
initial belief state, optional, defaults to uniform over states |
verbose |
logical, should the function include a message with pomdp diagnostics (timings, final precision, end condition) |
log_dir |
pomdpx and policyx files will be saved here, along with a metadata file |
log_data |
a data.frame of additional columns to include in the log, such as model parameters. A unique id value for each run can be provided as one of the columns, otherwise, a globally unique id will be generated. |
cache |
should results from the log directory be cached? Default TRUE. Identical functional calls will quickly return previously cached alpha vectors from file rather than re-running. |
... |
additional arguments to |
Value
a matrix of alpha vectors. Column index indicates action associated with the alpha vector, (1:n_actions), rows indicate system state, x. Actions for which no alpha vector was found are included as all -Inf, since such actions are not optimal regardless of belief, and thus have no corresponding alpha vectors in alpha_action list.
Examples
## Takes > 5s
## Use example code to generate matrices for pomdp problem:
source(system.file("examples/fisheries-ex.R", package = "sarsop"))
alpha <- sarsop(transition, observation, reward, discount, precision = 10)
compute_policy(alpha, transition, observation, reward)
simulate a POMDP
Description
Simulate a POMDP given the appropriate matrices.
Usage
sim_pomdp(
transition,
observation,
reward,
discount,
state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]],
x0,
a0 = 1,
Tmax = 20,
policy = NULL,
alpha = NULL,
reps = 1,
...
)
Arguments
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
state_prior |
initial belief state, optional, defaults to uniform over states |
x0 |
initial state |
a0 |
initial action (default is action 1, e.g. can be arbitrary if the observation process is independent of the action taken) |
Tmax |
duration of simulation |
policy |
Simulate using a pre-computed policy (e.g. MDP policy) instead of POMDP |
alpha |
the matrix of alpha vectors returned by |
reps |
number of replicate simulations to compute |
... |
additional arguments to mclapply |
Details
simulation assumes the following order of updating: For system in state[t] at time t, an observation of the system obs[t] is made, and then action[t] is based on that observation and the given policy, returning (discounted) reward[t].
Value
a data frame with columns for time, state, obs, action, and (discounted) value.
Examples
m <- fisheries_matrices()
discount <- 0.95
## Takes > 5s
if(assert_has_appl()){
alpha <- sarsop(m$transition, m$observation, m$reward, discount, precision = 10)
sim <- sim_pomdp(m$transition, m$observation, m$reward, discount,
x0 = 5, Tmax = 20, alpha = alpha)
}
write pomdpx files
Description
A POMDPX file specifies a POMDP problem in terms of the transition, observation, and reward matrices, the discount factor, and the initial belief.
Usage
write_pomdpx(
P,
O,
R,
gamma,
b = rep(1/dim(O)[1], dim(O)[1]),
file = "input.pomdpx",
digits = 12,
digits2 = 12,
format = "f"
)
Arguments
P |
transition matrix |
O |
observation matrix |
R |
reward |
gamma |
discount factor |
b |
initial belief |
file |
pomdpx file to create |
digits |
precision to round to before normalizing. Leave at 4 since sarsop seems unable to do more? |
digits2 |
precision to write solution to. Leave at 10, since normalizing requires additional precision |
format |
floating point format, because sarsop parser doesn't seem to know scientific notation |
Examples
m <- fisheries_matrices()
f <- tempfile()
write_pomdpx(m$transition, m$observation, m$reward, 0.95,
file = f)