Type: Package
Title: Interface to 'dgpsi' for Deep and Linked Gaussian Process Emulations
Version: 2.5.0
Maintainer: Deyu Ming <deyu.ming.16@ucl.ac.uk>
Description: Interface to the 'python' package 'dgpsi' for Gaussian process, deep Gaussian process, and linked deep Gaussian process emulations of computer models and networks using stochastic imputation (SI). The implementations follow Ming & Guillas (2021) <doi:10.1137/20M1323771> and Ming, Williamson, & Guillas (2023) <doi:10.1080/00401706.2022.2124311> and Ming & Williamson (2023) <doi:10.48550/arXiv.2306.01212>. To get started with the package, see https://mingdeyu.github.io/dgpsi-R/.
License: MIT + file LICENSE
URL: https://github.com/mingdeyu/dgpsi-R, https://mingdeyu.github.io/dgpsi-R/
BugReports: https://github.com/mingdeyu/dgpsi-R/issues
Encoding: UTF-8
Depends: R (≥ 4.0)
Imports: reticulate (≥ 1.26), benchmarkme (≥ 1.0.8), utils, ggplot2, ggforce, reshape2, patchwork, lhs, methods, stats, clhs, dplyr, uuid, tidyr, rlang, lifecycle, magrittr, visNetwork, parallel, kableExtra
Suggests: knitr, rmarkdown, MASS, R.utils, spelling
VignetteBuilder: knitr
RoxygenNote: 7.3.2
Language: en-US
NeedsCompilation: no
Packaged: 2024-12-14 23:26:04 UTC; deyu
Author: Deyu Ming [aut, cre, cph], Daniel Williamson [aut]
Repository: CRAN
Date/Publication: 2024-12-14 23:40:02 UTC

dgpsi: Interface to 'dgpsi' for Deep and Linked Gaussian Process Emulations

Description

logo

Interface to the 'python' package 'dgpsi' for Gaussian process, deep Gaussian process, and linked deep Gaussian process emulations of computer models and networks using stochastic imputation (SI). The implementations follow Ming & Guillas (2021) doi:10.1137/20M1323771 and Ming, Williamson, & Guillas (2023) doi:10.1080/00401706.2022.2124311 and Ming & Williamson (2023) doi:10.48550/arXiv.2306.01212. To get started with the package, see https://mingdeyu.github.io/dgpsi-R/.

Author(s)

Maintainer: Deyu Ming deyu.ming.16@ucl.ac.uk [copyright holder]

Authors:

See Also

Useful links:


Locate the next design point(s) for a (D)GP emulator or a bundle of (D)GP emulators using Active Learning MacKay (ALM)

Description

[Updated]

This function searches from a candidate set to locate the next design point(s) to be added to a (D)GP emulator or a bundle of (D)GP emulators using the Active Learning MacKay (ALM) criterion (see the reference below).

Usage

alm(object, ...)

## S3 method for class 'gp'
alm(
  object,
  x_cand = NULL,
  n_start = 20,
  batch_size = 1,
  M = 50,
  workers = 1,
  limits = NULL,
  int = FALSE,
  ...
)

## S3 method for class 'dgp'
alm(
  object,
  x_cand = NULL,
  n_start = 20,
  batch_size = 1,
  M = 50,
  workers = 1,
  limits = NULL,
  int = FALSE,
  aggregate = NULL,
  ...
)

## S3 method for class 'bundle'
alm(
  object,
  x_cand = NULL,
  n_start = 20,
  batch_size = 1,
  M = 50,
  workers = 1,
  limits = NULL,
  int = FALSE,
  aggregate = NULL,
  ...
)

Arguments

object

can be one of the following:

  • the S3 class gp.

  • the S3 class dgp.

  • the S3 class bundle.

...

any arguments (with names different from those of arguments used in alm()) that are used by aggregate can be passed here.

x_cand

a matrix (with each row being a design point and column being an input dimension) that gives a candidate set from which the next design point(s) are determined. If object is an instance of the bundle class and aggregate is not supplied, x_cand can also be a list. The list must have a length equal to the number of emulators in object, with each element being a matrix representing the candidate set for a corresponding emulator in the bundle. Defaults to NULL.

n_start

[New] an integer that gives the number of initial design points to be used to determine next design point(s). This argument is only used when x_cand is NULL. Defaults to 20.

batch_size

an integer that gives the number of design points to be chosen. Defaults to 1.

M

[New] the size of the conditioning set for the Vecchia approximation in the criterion calculation. This argument is only used if the emulator object was constructed under the Vecchia approximation. Defaults to 50.

workers

the number of processes to be used for design point selection. If set to NULL, the number of processes is set to ⁠max physical cores available %/% 2⁠. Defaults to 1. The argument does not currently support Windows machines when the aggregate function is provided, due to the significant overhead caused by initializing the Python environment for each worker under spawning.

limits

a two-column matrix that gives the ranges of each input dimension, or a vector of length two if there is only one input dimension. If a vector is provided, it will be converted to a two-column row matrix. The rows of the matrix correspond to input dimensions, and its first and second columns correspond to the minimum and maximum values of the input dimensions. This argument is only used when x_cand = NULL. Defaults to NULL.

int

[New] a bool or a vector of bools that indicates if an input dimension is an integer type. If a single bool is given, it will be applied to all input dimensions. If a vector is provided, it should have a length equal to the input dimensions and will be applied to individual input dimensions. This argument is only used when x_cand = NULL. Defaults to FALSE.

aggregate

an R function that aggregates scores of the ALM across different output dimensions (if object is an instance of the dgp class) or across different emulators (if object is an instance of the bundle class). The function should be specified in the following basic form:

  • the first argument is a matrix representing scores. The rows of the matrix correspond to different design points. The number of columns of the matrix is equal to:

    • the emulator output dimension if object is an instance of the dgp class; or

    • the number of emulators contained in object if object is an instance of the bundle class.

  • the output should be a vector that gives aggregate scores at different design points.

Set to NULL to disable aggregation. Defaults to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

  1. If x_cand is not NULL:

    • When object is an instance of the gp class, a vector of length batch_size is returned, containing the positions (row numbers) of the next design points from x_cand.

    • When object is an instance of the dgp class, a vector of length batch_size * D is returned, containing the positions (row numbers) of the next design points from x_cand to be added to the DGP emulator.

      • D is the number of output dimensions of the DGP emulator if no likelihood layer is included.

      • For a DGP emulator with a Hetero or NegBin likelihood layer, D = 2.

      • For a DGP emulator with a Categorical likelihood layer, D = 1 for binary output or D = K for multi-class output with K classes.

    • When object is an instance of the bundle class, a matrix is returned with batch_size rows and a column for each emulator in the bundle, containing the positions (row numbers) of the next design points from x_cand for individual emulators.

  2. If x_cand is NULL:

    • When object is an instance of the gp class, a matrix with batch_size rows is returned, giving the next design points to be evaluated.

    • When object is an instance of the dgp class, a matrix with batch_size * D rows is returned, where:

      • D is the number of output dimensions of the DGP emulator if no likelihood layer is included.

      • For a DGP emulator with a Hetero or NegBin likelihood layer, D = 2.

      • For a DGP emulator with a Categorical likelihood layer, D = 1 for binary output or D = K for multi-class output with K classes.

    • When object is an instance of the bundle class, a list is returned with a length equal to the number of emulators in the bundle. Each element of the list is a matrix with batch_size rows, where each row represents a design point to be added to the corresponding emulator.

Note

The first column of the matrix supplied to the first argument of aggregate must correspond to the first output dimension of the DGP emulator if object is an instance of the dgp class, and so on for subsequent columns and dimensions. If object is an instance of the bundle class, the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.

References

MacKay, D. J. (1992). Information-based objective functions for active data selection. Neural Computation, 4(4), 590-604.

Examples

## Not run: 

# load packages and the Python env
library(lhs)
library(dgpsi)

# construct a 1D non-stationary function
f <- function(x) {
 sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4))
}

# generate the initial design
X <- maximinLHS(10,1)
Y <- f(X)

# training a 2-layered DGP emulator with the global connection off
m <- dgp(X, Y, connect = F)

# specify the input range
lim <- c(0,1)

# locate the next design point using ALM
X_new <- alm(m, limits = lim)

# obtain the corresponding output at the located design point
Y_new <- f(X_new)

# combine the new input-output pair to the existing data
X <- rbind(X, X_new)
Y <- rbind(Y, Y_new)

# update the DGP emulator with the new input and output data and refit
m <- update(m, X, Y, refit = TRUE)

# plot the LOO validation
plot(m)

## End(Not run)

Combine layers

Description

[Deprecated]

This function is deprecated and will be removed in the next release, as it is simply a wrapper for the list() function. To construct linked (D)GP structures, please use the updated lgp() function, which provides a simpler and more efficient approach to building (D)GP emulators.

Usage

combine(...)

Arguments

...

a sequence of lists. Each list represents a system layer and contains emulators (produced by gp() or dgp()) in that layer.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

A list defining a linked (D)GP structure to be passed to struc of lgp().


Continue training a DGP emulator

Description

This function implements additional training iterations for a DGP emulator.

Usage

continue(
  object,
  N = NULL,
  cores = 1,
  ess_burn = 10,
  verb = TRUE,
  burnin = NULL,
  B = NULL
)

Arguments

object

an instance of the dgp class.

N

additional number of iterations to train the DGP emulator. If set to NULL, the number of iterations is set to 500 if the DGP emulator was constructed without the Vecchia approximation, and is set to 200 if Vecchia approximation was used. Defaults to NULL.

cores

the number of processes to be used to optimize GP components (in the same layer) at each M-step of the training. If set to NULL, the number of processes is set to ⁠(max physical cores available - 1)⁠ if the DGP emulator was constructed without the Vecchia approximation. Otherwise, the number of processes is set to ⁠max physical cores available %/% 2⁠. Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components is computationally expensive. Defaults to 1.

ess_burn

number of burnin steps for ESS-within-Gibbs at each I-step of the training. Defaults to 10.

verb

a bool indicating if a progress bar will be printed during training. Defaults to TRUE.

burnin

the number of training iterations to be discarded for point estimates calculation. Must be smaller than the overall training iterations so-far implemented. If this is not specified, only the last 25% of iterations are used. This overrides the value of burnin set in dgp(). Defaults to NULL.

B

the number of imputations to produce predictions. Increase the value to account for more imputation uncertainty. This overrides the value of B set in dgp() if B is not NULL. Defaults to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object.

Note

Examples

## Not run: 

# See dgp() for an example.

## End(Not run)

Restore the serialized emulator

Description

[New]

This function restores the serialized emulator created by serialize().

Usage

deserialize(object)

Arguments

object

the serialized object of an emulator.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

The S3 class of a GP emulator, a DGP emulator, a linked (D)GP emulator, or a bundle of (D)GP emulators.

Note

See the Note section in serialize().

Examples

## Not run: 

library(future)
library(future.apply)
library(dgpsi)

# model
f <- function(x) {
 (sin(7.5*x)+1)/2
}

# training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)

# train a DGP emulator
m <- dgp(X, Y, name = "matern2.5")

# testing input data
X_dgp <- seq(0, 1, length = 100)

# serialize the DGP emulator
m_serialized <- serialize(m)

# start a multi-session with three cores for parallel predictions
plan(multisession, workers = 3)

# perform parallel predictions
results <- future_lapply(1:length(X_dgp), function(i) {
  m_deserialized <- deserialize(m_serialized)
  mean_i <- predict(m_deserialized, X_dgp[i])$results$mean
}, future.seed = TRUE)

# reset the future plan to sequential
plan(sequential)

# combine mean predictions
pred_mean <- do.call(rbind, results)

## End(Not run)

Sequential design of a (D)GP emulator or a bundle of (D)GP emulators

Description

[Updated]

This function implements sequential design and active learning for a (D)GP emulator or a bundle of (D)GP emulators, supporting an array of popular methods as well as user-specified approaches. It can also be used as a wrapper for Bayesian optimization methods.

Usage

design(
  object,
  N,
  x_cand,
  y_cand,
  n_sample,
  n_cand,
  limits,
  f,
  reps,
  freq,
  x_test,
  y_test,
  reset,
  target,
  method,
  batch_size,
  eval,
  verb,
  autosave,
  new_wave,
  M_val,
  cores,
  ...
)

## S3 method for class 'gp'
design(
  object,
  N,
  x_cand = NULL,
  y_cand = NULL,
  n_sample = 200,
  n_cand = lifecycle::deprecated(),
  limits = NULL,
  f = NULL,
  reps = 1,
  freq = c(1, 1),
  x_test = NULL,
  y_test = NULL,
  reset = FALSE,
  target = NULL,
  method = vigf,
  batch_size = 1,
  eval = NULL,
  verb = TRUE,
  autosave = list(),
  new_wave = TRUE,
  M_val = 50,
  cores = 1,
  ...
)

## S3 method for class 'dgp'
design(
  object,
  N,
  x_cand = NULL,
  y_cand = NULL,
  n_sample = 200,
  n_cand = lifecycle::deprecated(),
  limits = NULL,
  f = NULL,
  reps = 1,
  freq = c(1, 1),
  x_test = NULL,
  y_test = NULL,
  reset = FALSE,
  target = NULL,
  method = vigf,
  batch_size = 1,
  eval = NULL,
  verb = TRUE,
  autosave = list(),
  new_wave = TRUE,
  M_val = 50,
  cores = 1,
  train_N = NULL,
  refit_cores = 1,
  pruning = TRUE,
  control = list(),
  ...
)

## S3 method for class 'bundle'
design(
  object,
  N,
  x_cand = NULL,
  y_cand = NULL,
  n_sample = 200,
  n_cand = lifecycle::deprecated(),
  limits = NULL,
  f = NULL,
  reps = 1,
  freq = c(1, 1),
  x_test = NULL,
  y_test = NULL,
  reset = FALSE,
  target = NULL,
  method = vigf,
  batch_size = 1,
  eval = NULL,
  verb = TRUE,
  autosave = list(),
  new_wave = TRUE,
  M_val = 50,
  cores = 1,
  train_N = NULL,
  refit_cores = 1,
  ...
)

Arguments

object

can be one of the following:

  • the S3 class gp.

  • the S3 class dgp.

  • the S3 class bundle.

N

the number of iterations for the sequential design.

x_cand

a matrix (with each row being a design point and column being an input dimension) that gives a candidate set from which the next design points are determined. Defaults to NULL.

y_cand

a matrix (with each row being a simulator evaluation and column being an output dimension) that gives the realizations from the simulator at input positions in x_cand. Defaults to NULL.

n_sample

[New] an integer that gives the size of a sub-set to be sampled from the candidate set x_cand at each step of the sequential design to determine the next design point, if x_cand is not NULL.

Defaults to 200.

n_cand

[Deprecated] this argument is deprecated. Use n_sample instead.

limits

a two-column matrix that gives the ranges of each input dimension, or a vector of length two if there is only one input dimension. If a vector is provided, it will be converted to a two-column row matrix. The rows of the matrix correspond to input dimensions, and its first and second columns correspond to the minimum and maximum values of the input dimensions. Set limits = NULL if x_cand is supplied. This argument is only used when x_cand is not supplied, i.e., x_cand = NULL. Defaults to NULL. If you provide a custom method function with an argument called limits, the value of limits will be passed to your function.

f

an R function representing the simulator. f must adhere to the following rules:

  • First argument: a matrix where rows correspond to different design points, and columns represent input dimensions.

  • Function output:

    • a matrix where rows correspond to different outputs (matching the input design points) and columns represent output dimensions. If there is only one output dimension, the function should return a matrix with a single column.

    • alternatively, a list where:

      • the first element is the output matrix as described above.

      • additional named elements can optionally update values of arguments with matching names passed via .... This list output is useful if additional arguments to f, method, or eval need to be updated after each sequential design iteration.

See the Note section below for additional details. This argument is required and must be supplied when y_cand = NULL. Defaults to NULL.

reps

an integer that gives the number of repetitions of the located design points to be created and used for evaluations of f. Set the argument to an integer greater than 1 only if f is a stochastic function that can generate different responses given for the same input and the supplied emulator object can deal with stochastic responses, e.g., a (D)GP emulator with nugget_est = TRUE or a DGP emulator with a likelihood layer. The argument is only used when f is supplied. Defaults to 1.

freq

a vector of two integers with the first element indicating the number of iterations taken between re-estimating the emulator hyperparameters, and the second element defining the number of iterations to take between re-calculation of evaluating metrics on the validation set (see x_test below) via the eval function. Defaults to c(1, 1).

x_test

a matrix (with each row being an input testing data point and each column being an input dimension) that gives the testing input data to evaluate the emulator after each freq[2] iterations of the sequential design. Set to NULL for LOO-based emulator validation. Defaults to NULL. This argument is only used if eval = NULL.

y_test

the testing output data corresponding to x_test for emulator validation after each freq[2] iterations of the sequential design:

  • if object is an instance of the gp class, y_test is a matrix with only one column and each row contains a testing output data point from the corresponding row of x_test.

  • if object is an instance of the dgp class, y_test is a matrix with its rows containing testing output data points corresponding to the same rows of x_test and columns representing the output dimensions.

  • if object is an instance of the bundle class, y_test is a matrix with each row representing the outputs for the corresponding row of x_test and each column representing the output of the different emulators in the bundle.

Set to NULL for LOO-based emulator validation. Defaults to NULL. This argument is only used if eval = NULL.

reset

A bool or a vector of bools indicating whether to reset the hyperparameters of the emulator(s) to their initial values (as set during initial construction) before re-fitting. The re-fitting occurs based on the frequency specified by freq[1]. This option is useful when hyperparameters are suspected to have converged to a local optimum affecting validation performance.

  • If a single bool is provided, it applies to every iteration of the sequential design.

  • If a vector is provided, its length must equal N (even if the re-fit frequency specified in freq[1] is not 1) and it will apply to the corresponding iterations of the sequential design.

Defaults to FALSE.

target

a number or vector specifying the target evaluation metric value(s) at which the sequential design should terminate. Defaults to NULL, in which case the sequential design stops after N steps. See the Note section below for further details about target.

method

[Updated] an R function that determines the next design points to be evaluated by f. The function must adhere to the following rules:

  • First argument: an emulator object, which can be one of the following:

    • an instance of the gp class (produced by gp());

    • an instance of the dgp class (produced by dgp());

    • an instance of the bundle class (produced by pack()).

  • Second argument (if x_cand is not NULL): a candidate matrix representing a set of potential design points from which the method function selects the next points.

  • Function output:

    • If x_cand is not NULL:

      • for gp or dgp objects, the output must be a vector of row indices corresponding to the selected design points from the candidate matrix (the second argument).

      • for bundle objects, the output must be a matrix containing the row indices of the selected design points from the candidate matrix. Each column corresponds to the indices for an individual emulator in the bundle.

    • If x_cand is NULL:

      • for gp or dgp objects, the output must be a matrix where each row represents a new design point to be added.

      • for bundle objects, the output must be a list with a length equal to the number of emulators in the bundle. Each element in the list is a matrix where rows represent the new design points for the corresponding emulator.

See alm(), mice(), and vigf() for examples of built-in method functions. Defaults to vigf().

batch_size

[New] an integer specifying the number of design points to select in a single iteration. Defaults to 1. This argument is used by the built-in method functions alm(), mice(), and vigf(). If you provide a custom method function with an argument named batch_size, the value of batch_size will be passed to your function.

eval

an R function that computes a customized metric for evaluating emulator performance. The function must adhere to the following rules:

  • First argument: an emulator object, which can be one of the following:

    • an instance of the gp class (produced by gp());

    • an instance of the dgp class (produced by dgp());

    • an instance of the bundle class (produced by pack()).

  • Function output:

    • for gp objects, the output must be a single metric value.

    • for dgp objects, the output can be a single metric value or a vector of metric values with a length equal to the number of output dimensions.

    • for bundle objects, the output can be a single metric value or a vector of metric values with a length equal to the number of emulators in the bundle.

If no custom function is provided, a built-in evaluation metric (RMSE or log-loss, in the case of DGP emulators with categorical likelihoods) will be used. Defaults to NULL. See the Note section below for additional details.

verb

a bool indicating if trace information will be printed during the sequential design. Defaults to TRUE.

autosave

a list that contains configuration settings for the automatic saving of the emulator:

  • switch: a bool indicating whether to enable automatic saving of the emulator during sequential design. When set to TRUE, the emulator in the final iteration is always saved. Defaults to FALSE.

  • directory: a string specifying the directory path where the emulators will be stored. Emulators will be stored in a sub-directory of directory named 'emulator-id'. Defaults to './check_points'.

  • fname: a string representing the base name for the saved emulator files. Defaults to 'check_point'.

  • save_freq: an integer indicating the frequency of automatic saves, measured in the number of iterations. Defaults to 5.

  • overwrite: a bool value controlling the file saving behavior. When set to TRUE, each new automatic save overwrites the previous one, keeping only the latest version. If FALSE, each automatic save creates a new file, preserving all previous versions. Defaults to FALSE.

new_wave

a bool indicating whether the current call to design() will create a new wave of sequential designs or add the next sequence of designs to the most recent wave. This argument is relevant only if waves already exist in the emulator. Creating new waves can improve the visualization of sequential design performance across different calls to design() via draw(), and allows for specifying a different evaluation frequency in freq. However, disabling this option can help limit the number of waves visualized in draw() to avoid issues such as running out of distinct colors for large numbers of waves. Defaults to TRUE.

M_val

[New] an integer that gives the size of the conditioning set for the Vecchia approximation in emulator validations. This argument is only used if the emulator object was constructed under the Vecchia approximation. Defaults to 50.

cores

an integer that gives the number of processes to be used for emulator validation. If set to NULL, the number of processes is set to ⁠max physical cores available %/% 2⁠. Defaults to 1. This argument is only used if eval = NULL.

...

Any arguments with names that differ from those used in design() but are required by f, method, or eval can be passed here. design() will forward relevant arguments to f, method, and eval based on the names of the additional arguments provided.

train_N

the number of training iterations to be used for re-fitting the DGP emulator at each step of the sequential design:

  • If train_N is an integer, the DGP emulator will be re-fitted at each step (based on the re-fit frequency specified in freq[1]) using train_N iterations.

  • If train_N is a vector, its length must be N, even if the re-fit frequency specified in freq[1] is not 1.

  • If train_N is NULL, the DGP emulator will be re-fitted at each step (based on the re-fit frequency specified in freq[1]) using:

    • 100 iterations if the DGP emulator was constructed without the Vecchia approximation, or

    • 50 iterations if the Vecchia approximation was used.

Defaults to NULL.

refit_cores

the number of processes to be used to re-fit GP components (in the same layer of a DGP emulator) at each M-step during the re-fitting. If set to NULL, the number of processes is set to ⁠(max physical cores available - 1)⁠ if the DGP emulator was constructed without the Vecchia approximation. Otherwise, the number of processes is set to ⁠max physical cores available %/% 2⁠. Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components is computationally expensive. Defaults to 1.

pruning

a bool indicating if dynamic pruning of DGP structures will be implemented during the sequential design after the total number of design points exceeds min_size in control. The argument is only applicable to DGP emulators (i.e., object is an instance of dgp class) produced by dgp(). Defaults to TRUE.

control

a list that can supply any of the following components to control the dynamic pruning of the DGP emulator:

  • min_size, the minimum number of design points required to trigger dynamic pruning. Defaults to 10 times the number of input dimensions.

  • threshold, the R^2 value above which a GP node is considered redundant. Defaults to 0.97.

  • nexceed, the minimum number of consecutive iterations that the R^2 value of a GP node must exceed threshold to trigger the removal of that node from the DGP structure. Defaults to 3.

The argument is only used when pruning = TRUE.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object is returned with a slot called design that contains:

See Note section below for further information.

Note

Examples

## Not run: 

# load packages and the Python env
library(lhs)
library(dgpsi)

# construct a 2D non-stationary function that takes a matrix as the input
f <- function(x) {
  sin(1/((0.7*x[,1,drop=F]+0.3)*(0.7*x[,2,drop=F]+0.3)))
}

# generate the initial design
X <- maximinLHS(5,2)
Y <- f(X)

# generate the validation data
validate_x <- maximinLHS(30,2)
validate_y <- f(validate_x)

# training a 2-layered DGP emulator with the initial design
m <- dgp(X, Y)

# specify the ranges of the input dimensions
lim_1 <- c(0, 1)
lim_2 <- c(0, 1)
lim <- rbind(lim_1, lim_2)

# 1st wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)

# 2nd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)

# 3rd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)

# draw the design created by the sequential design
draw(m,'design')

# inspect the trace of RMSEs during the sequential design
draw(m,'rmse')

# reduce the number of imputations for faster OOS
m_faster <- set_imp(m, 5)

# plot the OOS validation with the faster DGP emulator
plot(m_faster, x_test = validate_x, y_test = validate_y)

## End(Not run)

Deep Gaussian process emulator construction

Description

This function builds and trains a DGP emulator.

Usage

dgp(
  X,
  Y,
  depth = 2,
  node = ncol(X),
  name = "sexp",
  lengthscale = 1,
  bounds = NULL,
  prior = "ga",
  share = TRUE,
  nugget_est = FALSE,
  nugget = NULL,
  scale_est = TRUE,
  scale = 1,
  connect = TRUE,
  likelihood = NULL,
  training = TRUE,
  verb = TRUE,
  check_rep = TRUE,
  vecchia = FALSE,
  M = 25,
  ord = NULL,
  N = ifelse(vecchia, 200, 500),
  cores = 1,
  blocked_gibbs = TRUE,
  ess_burn = 10,
  burnin = NULL,
  B = 10,
  internal_input_idx = NULL,
  linked_idx = NULL,
  id = NULL
)

Arguments

X

a matrix where each row is an input training data point and each column represents an input dimension.

Y

a matrix containing observed training output data. The matrix has its rows being output data points and columns representing output dimensions. When likelihood (see below) is not NULL, Y must be a matrix with a single column.

depth

number of layers (including the likelihood layer) for a DGP structure. depth must be at least 2. Defaults to 2.

node

number of GP nodes in each layer (except for the final layer or the layer feeding the likelihood node) of the DGP. Defaults to ncol(X).

name

a character or a vector of characters that indicates the kernel functions (either "sexp" for squared exponential kernel or "matern2.5" for Matérn-2.5 kernel) used in the DGP emulator:

  1. if a single character is supplied, the corresponding kernel function will be used for all GP nodes in the DGP hierarchy.

  2. if a vector of characters is supplied, each character of the vector specifies the kernel function that will be applied to all GP nodes in the corresponding layer.

Defaults to "sexp".

lengthscale

initial lengthscales for GP nodes in the DGP emulator. It can be a single numeric value or a vector:

  1. if it is a single numeric value, the value will be applied as the initial lengthscales for all GP nodes in the DGP hierarchy.

  2. if it is a vector, each element of the vector specifies the initial lengthscales that will be applied to all GP nodes in the corresponding layer. The vector should have a length of depth if likelihood = NULL or a length of depth - 1 if likelihood is not NULL.

Defaults to a numeric value of 1.0.

bounds

the lower and upper bounds of lengthscales in GP nodes. It can be a vector or a matrix:

  1. if it is a vector, the lower bound (the first element of the vector) and upper bound (the second element of the vector) will be applied to lengthscales for all GP nodes in the DGP hierarchy.

  2. if it is a matrix, each row of the matrix specifies the lower and upper bounds of lengthscales for all GP nodes in the corresponding layer. The matrix should have its row number equal to depth if likelihood = NULL or to depth - 1 if likelihood is not NULL.

Defaults to NULL where no bounds are specified for the lengthscales.

prior

prior to be used for MAP estimation of lengthscales and nuggets of all GP nodes in the DGP hierarchy:

  • gamma prior ("ga"),

  • inverse gamma prior ("inv_ga"), or

  • jointly robust prior ("ref").

Defaults to "ga".

share

a bool indicating if all input dimensions of a GP node share a common lengthscale. Defaults to TRUE.

nugget_est

a bool or a bool vector that indicates if the nuggets of GP nodes (if any) in the final layer are to be estimated. If a single bool is provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of ncol(Y)) is provided, each bool element in the vector will be applied to the corresponding GP node (if any) in the final layer. The value of a bool has following effects:

  • FALSE: the nugget of the corresponding GP in the final layer is fixed to the corresponding value defined in nugget (see below).

  • TRUE: the nugget of the corresponding GP in the final layer will be estimated with the initial value given by the correspondence in nugget (see below).

Defaults to FALSE.

nugget

the initial nugget value(s) of GP nodes (if any) in each layer:

  1. if it is a single numeric value, the value will be applied as the initial nugget for all GP nodes in the DGP hierarchy.

  2. if it is a vector, each element of the vector specifies the initial nugget that will be applied to all GP nodes in the corresponding layer. The vector should have a length of depth if likelihood = NULL or a length of depth - 1 if likelihood is not NULL.

Set nugget to a small value and the bools in nugget_est to FALSE for deterministic emulation, where the emulator interpolates the training data points. Set nugget to a larger value and the bools in nugget_est to TRUE for stochastic emulation where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to 1e-6 if nugget_est = FALSE and 0.01 if nugget_est = TRUE. If likelihood is not NULL and nugget_est = FALSE, the nuggets of GPs that feed into the likelihood layer default to 1e-4.

scale_est

a bool or a bool vector that indicates if the variance of GP nodes (if any) in the final layer are to be estimated. If a single bool is provided, it will be applied to all GP nodes (if any) in the final layer. If a bool vector (which must have a length of ncol(Y)) is provided, each bool element in the vector will be applied to the corresponding GP node (if any) in the final layer. The value of a bool has following effects:

  • FALSE: the variance of the corresponding GP in the final layer is fixed to the corresponding value defined in scale (see below).

  • TRUE: the variance of the corresponding GP in the final layer will be estimated with the initial value given by the correspondence in scale (see below).

Defaults to TRUE.

scale

the initial variance value(s) of GP nodes (if any) in the final layer. If it is a single numeric value, it will be applied to all GP nodes (if any) in the final layer. If it is a vector (which must have a length of ncol(Y)), each numeric in the vector will be applied to the corresponding GP node (if any) in the final layer. Defaults to 1.

connect

a bool indicating whether to implement global input connection to the DGP structure. Setting it to FALSE may produce a better emulator in some cases at the cost of slower training. Defaults to TRUE.

likelihood

the likelihood type of a DGP emulator:

  1. NULL: no likelihood layer is included in the emulator.

  2. "Hetero": a heteroskedastic Gaussian likelihood layer is added for stochastic emulation where the computer model outputs are assumed to follow a heteroskedastic Gaussian distribution (i.e., the computer model outputs have input-dependent noise).

  3. "Poisson": a Poisson likelihood layer is added for emulation where the computer model outputs are counts and a Poisson distribution is used to model them.

  4. "NegBin": a negative Binomial likelihood layer is added for emulation where the computer model outputs are counts and a negative Binomial distribution is used to capture dispersion variability in input space.

  5. [New] "Categorical": a categorical likelihood layer is added for emulation (classification), where the computer model output is categorical.

When likelihood is not NULL, the value of nugget_est is overridden by FALSE. Defaults to NULL.

training

a bool indicating if the initialized DGP emulator will be trained. When set to FALSE, dgp() returns an untrained DGP emulator, to which one can apply summary() to inspect its specifications or apply predict() to check its emulation performance before training. Defaults to TRUE.

verb

a bool indicating if the trace information on DGP emulator construction and training will be printed during the function execution. Defaults to TRUE.

check_rep

a bool indicating whether to check for repetitions in the dataset, i.e., if one input position has multiple outputs. Defaults to TRUE.

vecchia

[New] a bool indicating whether to use Vecchia approximation for large-scale DGP emulator construction and prediction. Defaults to FALSE.

M

[New] the size of the conditioning set for the Vecchia approximation in the DGP emulator training. Defaults to 25.

ord

[New] an R function that returns the ordering of the input to each GP node contained in the DGP emulator for the Vecchia approximation. The function must satisfy the following basic rules:

  • the first argument represents the input to a GP node scaled by its lengthscales.

  • the output of the function is a vector of indices that gives the ordering of the input to the GP node.

If ord = NULL, the default random ordering is used. Defaults to NULL.

N

number of iterations for the training. Defaults to 500 if vecchia = FALSE and 200 if vecchia = TRUE. This argument is only used when training = TRUE.

cores

the number of processes to be used to optimize GP components (in the same layer) at each M-step of the training. If set to NULL, the number of processes is set to ⁠(max physical cores available - 1)⁠ if vecchia = FALSE and ⁠max physical cores available %/% 2⁠ if vecchia = TRUE. Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components is computationally expensive. Defaults to 1.

blocked_gibbs

a bool indicating if the latent variables are imputed layer-wise using ESS-within-Blocked-Gibbs. ESS-within-Blocked-Gibbs would be faster and more efficient than ESS-within-Gibbs that imputes latent variables node-wise because it reduces the number of components to be sampled during Gibbs steps, especially when there is a large number of GP nodes in layers due to higher input dimensions. Default to TRUE.

ess_burn

number of burnin steps for the ESS-within-Gibbs at each I-step of the training. Defaults to 10. This argument is only used when training = TRUE.

burnin

the number of training iterations to be discarded for point estimates of model parameters. Must be smaller than the training iterations N. If this is not specified, only the last 25% of iterations are used. Defaults to NULL. This argument is only used when training = TRUE.

B

the number of imputations used to produce predictions. Increase the value to refine the representation of imputation uncertainty. Defaults to 10.

internal_input_idx

[Deprecated] The argument will be removed in the next release. To set up connections of emulators for linked emulations, please use the updated lgp() function instead.

Column indices of X that are generated by the linked emulators in the preceding layers. Set internal_input_idx = NULL if the DGP emulator is in the first layer of a system or all columns in X are generated by the linked emulators in the preceding layers. Defaults to NULL.

linked_idx

[Deprecated] The argument will be removed in the next release. To set up connections of emulators for linked emulation, please use the updated lgp() function instead.

Either a vector or a list of vectors:

  • If linked_idx is a vector, it gives indices of columns in the pooled output matrix (formed by column-combined outputs of all emulators in the feeding layer) that feed into the DGP emulator. The length of the vector shall equal to the length of internal_input_idx when internal_input_idx is not NULL. If the DGP emulator is in the first layer of a linked emulator system, the vector gives the column indices of the global input (formed by column-combining all input matrices of emulators in the first layer) that the DGP emulator will use. If the DGP emulator is to be used in both the first and subsequent layers, one should initially set linked_idx to the appropriate values for the situation where the emulator is not in the first layer. Then, use the function set_linked_idx() to reset the linking information when the emulator is in the first layer.

  • When the DGP emulator is not in the first layer of a linked emulator system, linked_idx can be a list that gives the information on connections between the DGP emulator and emulators in all preceding layers. The length of the list should equal to the number of layers before the DGP emulator. Each element of the list is a vector that gives indices of columns in the pooled output matrix (formed by column-combined outputs of all emulators) in the corresponding layer that feed into the DGP emulator. If the DGP emulator has no connections to any emulator in a certain layer, set NULL in the corresponding position of the list. The order of input dimensions in X[,internal_input_idx] should be consistent with linked_idx. For example, a DGP emulator in the 4th-layer that is fed by the output dimension 2 and 4 of emulators in layer 2 and all output dimension 1 to 3 of emulators in layer 3 should have linked_idx = list( NULL, c(2,4), c(1,2,3) ). In addition, the first and second columns of X[,internal_input_idx] should correspond to the output dimensions 2 and 4 from layer 2, and the third to fifth columns of X[,internal_input_idx] should correspond to the output dimensions 1 to 3 from layer 3.

Set linked_idx = NULL if the DGP emulator will not be used for linked emulations. However, if this is no longer the case, one can use set_linked_idx() to add linking information to the DGP emulator. Defaults to NULL.

id

an ID to be assigned to the DGP emulator. If an ID is not provided (i.e., id = NULL), a UUID (Universally Unique Identifier) will be automatically generated and assigned to the emulator. Default to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An S3 class named dgp that contains five slots:

The returned dgp object can be used by

Note

Any R vector detected in X and Y will be treated as a column vector and automatically converted into a single-column R matrix. Thus, if X is a single data point with multiple dimensions, it must be given as a matrix.

Examples

## Not run: 

# load the package and the Python env
library(dgpsi)

# construct a step function
f <- function(x) {
  if (x < 0.5) return(-1)
  if (x >= 0.5) return(1)
  }

# generate training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)

# set a random seed
set_seed(999)

# training a DGP emulator
m <- dgp(X, Y)

# continue for further training iterations
m <- continue(m)

# summarizing
summary(m)

# trace plot
trace_plot(m)

# trim the traces of model parameters
m <- window(m, 800)

# LOO cross validation
m <- validate(m)
plot(m)

# prediction
test_x <- seq(0, 1, length = 200)
m <- predict(m, x = test_x)

# OOS validation
validate_x <- sample(test_x, 10)
validate_y <- sapply(validate_x, f)
plot(m, validate_x, validate_y)

# write and read the constructed emulator
write(m, 'step_dgp')
m <- read('step_dgp')

## End(Not run)

Validation and diagnostic plots for a sequential design

Description

[Updated]

This function draws diagnostic and validation plots for a sequential design of a (D)GP emulator or a bundle of (D)GP emulators.

Usage

draw(object, ...)

## S3 method for class 'gp'
draw(object, type = "rmse", log = FALSE, ...)

## S3 method for class 'dgp'
draw(object, type = "rmse", log = FALSE, ...)

## S3 method for class 'bundle'
draw(object, type = "rmse", log = FALSE, emulator = NULL, ...)

Arguments

object

can be one of the following emulator classes:

  • the S3 class gp.

  • the S3 class dgp.

  • the S3 class bundle.

...

N/A.

type

specifies the type of plot or visualization to generate:

  • "rmse": generates a trace plot of RMSEs, log-losses for DGP emulators with categorical likelihoods, or custom evaluation metrics specified via the "eval" argument in the ⁠[design()]⁠ function.

  • "design": shows visualizations of input designs created by the sequential design procedure.

Defaults to "rmse".

log

a bool indicating whether to plot RMSEs, log-losses (for DGP emulators with categorical likelihoods), or custom evaluation metrics on a log scale when type = "rmse". Defaults to FALSE.

emulator

[Updated] an index or vector of indices of emulators packed in object. This argument is only used if object is an instance of the bundle class. When set to NULL, all emulators in the bundle are drawn. Defaults to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

A patchwork object.

Examples

## Not run: 

# See design() for an example.

## End(Not run)

Get the number of threads

Description

[New]

This function gets the number of threads used for parallel computations involved in the package.

Usage

get_thread_num()

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

the number of threads.


Gaussian process emulator construction

Description

This function builds and trains a GP emulator.

Usage

gp(
  X,
  Y,
  name = "sexp",
  lengthscale = rep(0.1, ncol(X)),
  bounds = NULL,
  prior = "ref",
  nugget_est = FALSE,
  nugget = ifelse(nugget_est, 0.01, 1e-08),
  scale_est = TRUE,
  scale = 1,
  training = TRUE,
  verb = TRUE,
  vecchia = FALSE,
  M = 25,
  ord = NULL,
  internal_input_idx = NULL,
  linked_idx = NULL,
  id = NULL
)

Arguments

X

a matrix where each row is an input data point and each column is an input dimension.

Y

a matrix with only one column and each row being an output data point.

name

kernel function to be used. Either "sexp" for squared exponential kernel or "matern2.5" for Matérn-2.5 kernel. Defaults to "sexp".

lengthscale

initial values of lengthscales in the kernel function. It can be a single numeric value or a vector of length ncol(X):

  • if it is a single numeric value, it is assumed that kernel functions across input dimensions share the same lengthscale;

  • if it is a vector, it is assumed that kernel functions across input dimensions have different lengthscales.

Defaults to a vector of 0.1.

bounds

the lower and upper bounds of lengthscales in the kernel function. It is a vector of length two where the first element is the lower bound and the second element is the upper bound. The bounds will be applied to all lengthscales in the kernel function. Defaults to NULL where no bounds are specified for the lengthscales.

prior

prior to be used for Maximum a Posterior for lengthscales and nugget of the GP: gamma prior ("ga"), inverse gamma prior ("inv_ga"), or jointly robust prior ("ref"). Defaults to "ref". See the reference below for the jointly robust prior.

nugget_est

a bool indicating if the nugget term is to be estimated:

  1. FALSE: the nugget term is fixed to nugget.

  2. TRUE: the nugget term will be estimated.

Defaults to FALSE.

nugget

the initial nugget value. If nugget_est = FALSE, the assigned value is fixed during the training. Set nugget to a small value (e.g., 1e-8) and the corresponding bool in nugget_est to FALSE for deterministic computer models where the emulator should interpolate the training data points. Set nugget to a larger value and the corresponding bool in nugget_est to TRUE for stochastic emulation where the computer model outputs are assumed to follow a homogeneous Gaussian distribution. Defaults to 1e-8 if nugget_est = FALSE and 0.01 if nugget_est = TRUE.

scale_est

a bool indicating if the variance is to be estimated:

  1. FALSE: the variance is fixed to scale.

  2. TRUE: the variance term will be estimated.

Defaults to TRUE.

scale

the initial variance value. If scale_est = FALSE, the assigned value is fixed during the training. Defaults to 1.

training

a bool indicating if the initialized GP emulator will be trained. When set to FALSE, gp() returns an untrained GP emulator, to which one can apply summary() to inspect its specification or apply predict() to check its emulation performance before the training. Defaults to TRUE.

verb

a bool indicating if the trace information on GP emulator construction and training will be printed during function execution. Defaults to TRUE.

vecchia

[New] a bool indicating whether to use Vecchia approximation for large-scale GP emulator construction and prediction. Defaults to FALSE. The Vecchia approximation implemented for the GP emulation largely follows Katzfuss et al. (2022). See reference below.

M

[New] the size of the conditioning set for the Vecchia approximation in the GP emulator training. Defaults to 25.

ord

[New] an R function that returns the ordering of the input to the GP emulator for the Vecchia approximation. The function must satisfy the following basic rules:

  • the first argument represents the input scaled by the lengthscales.

  • the output of the function is a vector of indices that gives the ordering of the input to the GP emulator.

If ord = NULL, the default random ordering is used. Defaults to NULL.

internal_input_idx

[Deprecated] The column indices of X that are generated by the linked emulators in the preceding layers. Set internal_input_idx = NULL if the GP emulator is in the first layer of a system or all columns in X are generated by the linked emulators in the preceding layers. Defaults to NULL.

The argument will be removed in the next release. To set up connections of emulators for linked emulations, please use the updated lgp() function instead.

linked_idx

[Deprecated] Either a vector or a list of vectors:

  • If linked_idx is a vector, it gives indices of columns in the pooled output matrix (formed by column-combined outputs of all emulators in the feeding layer) that feed into the GP emulator. The length of the vector shall equal to the length of internal_input_idx when internal_input_idx is not NULL. If the GP emulator is in the first layer of a linked emulator system, the vector gives the column indices of the global input (formed by column-combining all input matrices of emulators in the first layer) that the GP emulator will use. If the GP emulator is to be used in both the first and subsequent layers, one should initially set linked_idx to the appropriate values for the situation where the emulator is not in the first layer. Then, use the function set_linked_idx() to reset the linking information when the emulator is in the first layer.

  • When the GP emulator is not in the first layer of a linked emulator system, linked_idx can be a list that gives the information on connections between the GP emulator and emulators in all preceding layers. The length of the list should equal to the number of layers before the GP emulator. Each element of the list is a vector that gives indices of columns in the pooled output matrix (formed by column-combined outputs of all emulators) in the corresponding layer that feed into the GP emulator. If the GP emulator has no connections to any emulator in a certain layer, set NULL in the corresponding position of the list. The order of input dimensions in X[,internal_input_idx] should be consistent with linked_idx. For example, a GP emulator in the second layer that is fed by the output dimension 1 and 3 of emulators in layer 1 should have linked_idx = list( c(1,3) ). In addition, the first and second columns of X[,internal_input_idx] should correspond to the output dimensions 1 and 3 from layer 1.

Set linked_idx = NULL if the GP emulator will not be used for linked emulations. However, if this is no longer the case, one can use set_linked_idx() to add linking information to the GP emulator. Defaults to NULL.

The argument will be removed in the next release. To set up connections of emulators for linked emulations, please use the updated lgp() function instead.

id

an ID to be assigned to the GP emulator. If an ID is not provided (i.e., id = NULL), a UUID (Universally Unique Identifier) will be automatically generated and assigned to the emulator. Default to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An S3 class named gp that contains five slots:

The returned gp object can be used by

Note

Any R vector detected in X and Y will be treated as a column vector and automatically converted into a single-column R matrix. Thus, if X is a single data point with multiple dimensions, it must be given as a matrix.

References

Examples

## Not run: 
# load the package and the Python env
library(dgpsi)

# construct a step function
f <- function(x) {
   if (x < 0.5) return(-1)
   if (x >= 0.5) return(1)
  }

# generate training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)

# training
m <- gp(X, Y)

# summarizing
summary(m)

# LOO cross validation
m <- validate(m)
plot(m)

# prediction
test_x <- seq(0, 1, length = 200)
m <- predict(m, x = test_x)

# OOS validation
validate_x <- sample(test_x, 10)
validate_y <- sapply(validate_x, f)
plot(m, validate_x, validate_y)

# write and read the constructed emulator
write(m, 'step_gp')
m <- read('step_gp')

## End(Not run)


'python' environment initialization

Description

This function initializes the 'python' environment for the package.

Usage

init_py(
  py_ver = NULL,
  dgpsi_ver = NULL,
  reinstall = FALSE,
  uninstall = FALSE,
  verb = TRUE
)

Arguments

py_ver

a string that gives the 'python' version to be installed. If py_ver = NULL, the default 'python' version '3.9.13' will be installed.

dgpsi_ver

a string that gives the 'python' version of 'dgpsi' to be used. If dgpsi_ver = NULL,

  • the latest 'python' version of 'dgpsi' will be used, if the package is installed from CRAN;

  • the development 'python' version of 'dgpsi' will be used, if the package is installed from GitHub.

reinstall

a bool that indicates whether to reinstall the 'python' version of 'dgpsi' specified in dgpsi_ver if it has already been installed. This argument is useful when the development version of the R package is installed and one may want to regularly update the development 'python' version of 'dgpsi'. Defaults to FALSE.

uninstall

a bool that indicates whether to uninstall the 'python' version of 'dgpsi' specified in dgpsi_ver if it has already been installed. This argument is useful when the 'python' environment is corrupted and one wants to completely uninstall and reinstall it. Defaults to FALSE.

verb

a bool indicating if trace information will be printed during function execution. Defaults to TRUE.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

No return value, called to install required 'python' environment.

Examples

## Not run: 

# See gp(), dgp(), or lgp() for an example.

## End(Not run)


Linked (D)GP emulator construction

Description

[Updated]

This function constructs a linked (D)GP emulator for a model chain or network.

Usage

lgp(struc, emulators = NULL, B = 10, activate = TRUE, verb = TRUE, id = NULL)

Arguments

struc

the structure of the linked emulator, which can take one of two forms:

  • [Deprecated] a list contains L (the number of layers in a systems of computer models) sub-lists, each of which represents a layer and contains (D)GP emulators (represented by instances of S3 class gp or dgp) of computer models. The sub-lists are placed in the list in the same order of the specified computer model system's hierarchy. This option is deprecated and will be removed in the next release.

  • [New] a data frame that defines the connection structure between emulators in the linked system, with the following columns:

    • From_Emulator: the ID of the emulator providing the output. This ID must match the id slot in the corresponding emulator object (produced by gp() or dgp()) within emulators argument of lgp(), or it should be special value "Global", indicating the global inputs to the model chain or network. The id slot is either automatically generated by gp() or dgp(), or can be manually specified via the id argument in these functions or set with the set_id() function.

    • To_Emulator: the ID of the emulator receiving the input, also matching the id slot in the corresponding emulator object.

    • From_Output: a single integer specifying the output dimension of the From_Emulator that is being connected to the input dimension of the To_Emulator specified by To_Input. If From_Emulator is "Global", then From_Output indicates the dimension of the global input passed to the To_Emulator.

    • To_Input: a single integer specifying the input dimension of the To_Emulator that is receiving the From_Output dimension from the From_Emulator.

    Each row represents a single one-to-one connection between a specified output dimension of From_Emulator and a corresponding input dimension of To_Emulator. If multiple connections are required between two emulators, each connection should be specified in a separate row.

    Note: When using the data frame option for struc, the emulators argument must be provided.

emulators

[New] a list of emulator objects, each containing an id slot that uniquely identifies it within the linked system. The id slot in each emulator object must match the From_Emulator/To_Emulator columns in struc.

If the same emulator is used multiple times within the linked system, the list must contain distinct copies of that emulator, each with a unique ID stored in their id slot. Use the set_id() function to produce copies with different IDs to ensure each instance can be uniquely referenced.

B

the number of imputations used for prediction. Increase the value to refine representation of imputation uncertainty. If the system consists of only GP emulators, B is set to 1 automatically. Defaults to 10.

activate

[New] a bool indicating whether the initialized linked emulator should be activated:

  • If activate = FALSE, lgp() returns an inactive linked emulator, allowing inspection of its structure using summary().

  • If activate = TRUE, lgp() returns an active linked emulator, ready for prediction and validation using predict() and validate(), respectively.

Defaults to TRUE. This argument is only applicable when struc is specified as a data frame.

verb

[New] a bool indicating if the trace information on linked (D)GP emulator construction should be printed during the function call. Defaults to TRUE. This argument is only applicable when struc is specified as a data frame.

id

an ID to be assigned to the linked (D)GP emulator. If an ID is not provided (i.e., id = NULL), a UUID (Universally Unique Identifier) will be automatically generated and assigned to the emulator. Defaults to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An S3 class named lgp that contains three slots:

The returned lgp object can be used by

Examples

## Not run: 

# load the package and the Python env
library(dgpsi)

# model 1
f1 <- function(x) {
  (sin(7.5*x)+1)/2
}
# model 2
f2 <- function(x) {
  2/3*sin(2*(2*x - 1))+4/3*exp(-30*(2*(2*x-1))^2)-1/3
}
# linked model
f12 <- function(x) {
  f2(f1(x))
}

# training data for Model 1
X1 <- seq(0, 1, length = 9)
Y1 <- sapply(X1, f1)
# training data for Model 2
X2 <- seq(0, 1, length = 13)
Y2 <- sapply(X2, f2)

# emulation of model 1
m1 <- gp(X1, Y1, name = "matern2.5", id = "emulator1")
# emulation of model 2
m2 <- dgp(X2, Y2, depth = 2, name = "matern2.5", id = "emulator2")

struc <- data.frame(From_Emulator = c("Global", "emulator1"),
                    To_Emulator = c("emulator1", "emulator2"),
                    From_Output = c(1, 1),
                    To_Input = c(1, 1))
emulators <- list(m1, m2)

# construct the linked emulator for visual inspection
m_link <- lgp(struc, emulators, activate = FALSE)

# visual inspection
summary(m_link)

# build the linked emulator for prediction
m_link <- lgp(struc, emulators, activate = TRUE)
test_x <- seq(0, 1, length = 300)
m_link <- predict(m_link, x = test_x)

# OOS validation
validate_x <- sample(test_x, 20)
validate_y <- sapply(validate_x, f12)
plot(m_link, validate_x, validate_y, style = 2)

# write and read the constructed linked emulator
write(m_link, 'linked_emulator')
m_link <- read('linked_emulator')

## End(Not run)

Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using MICE

Description

[Updated]

This function searches from a candidate set to locate the next design point(s) to be added to a (D)GP emulator or a bundle of (D)GP emulators using the Mutual Information for Computer Experiments (MICE), see the reference below.

Usage

mice(object, ...)

## S3 method for class 'gp'
mice(
  object,
  x_cand = NULL,
  n_cand = 200,
  batch_size = 1,
  M = 50,
  nugget_s = 1e-06,
  workers = 1,
  limits = NULL,
  int = FALSE,
  ...
)

## S3 method for class 'dgp'
mice(
  object,
  x_cand = NULL,
  n_cand = 200,
  batch_size = 1,
  M = 50,
  nugget_s = 1e-06,
  workers = 1,
  limits = NULL,
  int = FALSE,
  aggregate = NULL,
  ...
)

## S3 method for class 'bundle'
mice(
  object,
  x_cand = NULL,
  n_cand = 200,
  batch_size = 1,
  M = 50,
  nugget_s = 1e-06,
  workers = 1,
  limits = NULL,
  int = FALSE,
  aggregate = NULL,
  ...
)

Arguments

object

can be one of the following:

  • the S3 class gp.

  • the S3 class dgp.

  • the S3 class bundle.

...

any arguments (with names different from those of arguments used in mice()) that are used by aggregate can be passed here.

x_cand

a matrix (with each row being a design point and column being an input dimension) that gives a candidate set from which the next design point(s) are determined. If object is an instance of the bundle class and aggregate is not supplied, x_cand can also be a list. The list must have a length equal to the number of emulators in object, with each element being a matrix representing the candidate set for a corresponding emulator in the bundle. Defaults to NULL.

n_cand

an integer specifying the size of the candidate set to be generated for selecting the next design point(s). This argument is used only when x_cand is NULL. Defaults to 200.

batch_size

an integer that gives the number of design points to be chosen. Defaults to 1.

M

[New] the size of the conditioning set for the Vecchia approximation in the criterion calculation. This argument is only used if the emulator object was constructed under the Vecchia approximation. Defaults to 50.

nugget_s

the value of the smoothing nugget term used by MICE. Defaults to 1e-6.

workers

the number of processes to be used for the criterion calculation. If set to NULL, the number of processes is set to ⁠max physical cores available %/% 2⁠. Defaults to 1.

limits

[New] a two-column matrix that gives the ranges of each input dimension, or a vector of length two if there is only one input dimension. If a vector is provided, it will be converted to a two-column row matrix. The rows of the matrix correspond to input dimensions, and its first and second columns correspond to the minimum and maximum values of the input dimensions. This argument is only used when x_cand = NULL. Defaults to NULL.

int

[New] a bool or a vector of bools that indicates if an input dimension is an integer type. If a single bool is given, it will be applied to all input dimensions. If a vector is provided, it should have a length equal to the input dimensions and will be applied to individual input dimensions. This argument is only used when x_cand = NULL. Defaults to FALSE.

aggregate

an R function that aggregates scores of the MICE across different output dimensions (if object is an instance of the dgp class) or across different emulators (if object is an instance of the bundle class). The function should be specified in the following basic form:

  • the first argument is a matrix representing scores. The rows of the matrix correspond to different design points. The number of columns of the matrix equals to:

    • the emulator output dimension if object is an instance of the dgp class; or

    • the number of emulators contained in object if object is an instance of the bundle class.

  • the output should be a vector that gives aggregate scores at different design points.

Set to NULL to disable aggregation. Defaults to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

  1. If x_cand is not NULL:

    • When object is an instance of the gp class, a vector of length batch_size is returned, containing the positions (row numbers) of the next design points from x_cand.

    • When object is an instance of the dgp class, a vector of length batch_size * D is returned, containing the positions (row numbers) of the next design points from x_cand to be added to the DGP emulator.

      • D is the number of output dimensions of the DGP emulator if no likelihood layer is included.

      • For a DGP emulator with a Hetero or NegBin likelihood layer, D = 2.

      • For a DGP emulator with a Categorical likelihood layer, D = 1 for binary output or D = K for multi-class output with K classes.

    • When object is an instance of the bundle class, a matrix is returned with batch_size rows and a column for each emulator in the bundle, containing the positions (row numbers) of the next design points from x_cand for individual emulators.

  2. If x_cand is NULL:

    • When object is an instance of the gp class, a matrix with batch_size rows is returned, giving the next design points to be evaluated.

    • When object is an instance of the dgp class, a matrix with batch_size * D rows is returned, where:

      • D is the number of output dimensions of the DGP emulator if no likelihood layer is included.

      • For a DGP emulator with a Hetero or NegBin likelihood layer, D = 2.

      • For a DGP emulator with a Categorical likelihood layer, D = 1 for binary output or D = K for multi-class output with K classes.

    • When object is an instance of the bundle class, a list is returned with a length equal to the number of emulators in the bundle. Each element of the list is a matrix with batch_size rows, where each row represents a design point to be added to the corresponding emulator.

Note

The first column of the matrix supplied to the first argument of aggregate must correspond to the first output dimension of the DGP emulator if object is an instance of the dgp class, and so on for subsequent columns and dimensions. If object is an instance of the bundle class, the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.

References

Beck, J., & Guillas, S. (2016). Sequential design with mutual information for computer experiments (MICE): emulation of a tsunami model. SIAM/ASA Journal on Uncertainty Quantification, 4(1), 739-766.

Examples

## Not run: 

# load packages and the Python env
library(lhs)
library(dgpsi)

# construct a 1D non-stationary function
f <- function(x) {
 sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4))
}

# generate the initial design
X <- maximinLHS(10,1)
Y <- f(X)

# training a 2-layered DGP emulator with the global connection off
m <- dgp(X, Y, connect = F)

# generate a candidate set
x_cand <- maximinLHS(200,1)

# locate the next design point using MICE
next_point <- mice(m, x_cand = x_cand)
X_new <- x_cand[next_point,,drop = F]

# obtain the corresponding output at the located design point
Y_new <- f(X_new)

# combine the new input-output pair to the existing data
X <- rbind(X, X_new)
Y <- rbind(Y, Y_new)

# update the DGP emulator with the new input and output data and refit
m <- update(m, X, Y, refit = TRUE)

# plot the LOO validation
plot(m)

## End(Not run)

Calculate the predictive negative log-likelihood

Description

This function computes the predictive negative log-likelihood from a DGP emulator with a likelihood layer.

Usage

nllik(object, x, y)

Arguments

object

an instance of the dgp class and it should be produced by dgp() with likelihood not being NULL;

x

a matrix where each row is an input testing data point and each column is an input dimension.

y

a matrix with only one column where each row is a scalar-valued testing output data point.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object is returned with an additional slot named NLL that contains two elements. The first one, named meanNLL, is a scalar that gives the average negative predicted log-likelihood across all testing data points. The second one, named allNLL, is a vector that gives the negative predicted log-likelihood for each testing data point.


Pack GP and DGP emulators into a bundle

Description

This function packs GP emulators and DGP emulators into a bundle class for sequential designs if each emulator emulates one output dimension of the underlying simulator.

Usage

pack(..., id = NULL)

Arguments

...

a sequence or a list of emulators produced by gp() or dgp().

id

an ID to be assigned to the bundle emulator. If an ID is not provided (i.e., id = NULL), a UUID (Universally Unique Identifier) will be automatically generated and assigned to the emulator. Default to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An S3 class named bundle to be used by design() for sequential designs. It has:

Examples

## Not run: 

# load packages
library(lhs)
library(dgpsi)

# construct a function with a two-dimensional output
f <- function(x) {
 y1 = sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4))
 y2 = 1/3*sin(2*(2*x - 1))+2/3*exp(-30*(2*(2*x-1))^2)+1/3
 return(cbind(y1,y2))
}

# generate the initial design
X <- maximinLHS(10,1)
Y <- f(X)

# generate the validation data
validate_x <- maximinLHS(30,1)
validate_y <- f(validate_x)

# training a 2-layered DGP emulator with respect to each output with the global connection off
m1 <- dgp(X, Y[,1], connect = F)
m2 <- dgp(X, Y[,2], connect = F)

# specify the range of the input dimension
lim <- c(0, 1)

# pack emulators to form an emulator bundle
m <- pack(m1, m2)

# 1st wave of the sequential design with 10 iterations and the target RMSE of 0.01
m <- design(m, N = 10, limits = lim, f = f, x_test = validate_x, y_test = validate_y, target = 0.01)

# 2rd wave of the sequential design with additional 10 iterations and the same target
m <- design(m, N = 10, limits = lim, f = f, x_test = validate_x, y_test = validate_y, target = 0.01)

# draw sequential designs of the two packed emulators
draw(m, type = 'design')

# inspect the traces of RMSEs of the two packed emulators
draw(m, type = 'rmse')

# write and read the constructed emulator bundle
write(m, 'bundle_dgp')
m <- read('bundle_dgp')

# unpack the bundle into individual emulators
m_unpacked <- unpack(m)

# plot OOS validations of individual emulators
plot(m_unpacked[[1]], x_test = validate_x, y_test = validate_y[,1])
plot(m_unpacked[[2]], x_test = validate_x, y_test = validate_y[,2])

## End(Not run)

Validation plots of a constructed GP, DGP, or linked (D)GP emulator

Description

[Updated]

This function draws validation plots of a GP, DGP, or linked (D)GP emulator.

Usage

## S3 method for class 'dgp'
plot(
  x,
  x_test = NULL,
  y_test = NULL,
  dim = NULL,
  method = NULL,
  sample_size = 50,
  style = 1,
  min_max = TRUE,
  normalize = TRUE,
  color = "turbo",
  type = "points",
  verb = TRUE,
  M = 50,
  force = FALSE,
  cores = 1,
  ...
)

## S3 method for class 'lgp'
plot(
  x,
  x_test = NULL,
  y_test = NULL,
  dim = NULL,
  method = NULL,
  sample_size = 50,
  style = 1,
  min_max = TRUE,
  color = "turbo",
  type = "points",
  M = 50,
  verb = TRUE,
  force = FALSE,
  cores = 1,
  ...
)

## S3 method for class 'gp'
plot(
  x,
  x_test = NULL,
  y_test = NULL,
  dim = NULL,
  method = NULL,
  sample_size = 50,
  style = 1,
  min_max = TRUE,
  color = "turbo",
  type = "points",
  verb = TRUE,
  M = 50,
  force = FALSE,
  cores = 1,
  ...
)

Arguments

x

can be one of the following emulator classes:

  • the S3 class gp.

  • the S3 class dgp.

  • the S3 class lgp.

x_test

same as that of validate().

y_test

same as that of validate().

dim

[Updated] if dim = NULL, the index of an emulator's input within the design will be shown on the x-axis in validation plots. Otherwise, dim indicates which dimension of an emulator's input will be shown on the x-axis in validation plots:

  • If x is an instance of the gp of dgp class, dim is an integer.

  • [Deprecated] If x is an instance of the lgp class created by lgp() without specifying the struc argument in data frame form, dim can be:

    1. an integer referring to the dimension of the global input to emulators in the first layer of a linked emulator system; or

    2. a vector of three integers referring to the dimension (specified by the third integer) of the global input to an emulator (specified by the second integer) in a layer (specified by the first integer) that is not the first layer of a linked emulator system.

    This option for linked (D)GP emulators is deprecated and will be removed in the next release.

  • [New] If x is an instance of the lgp class created by lgp() with argument struc in data frame form, dim is an integer referring to the dimension of the global input to the linked emulator system.

This argument is only used when style = 1. Defaults to NULL.

method

same as that of validate().

sample_size

same as that of validate().

style

either 1 or 2, indicating two different plotting styles for validation.

min_max

a bool indicating if min-max normalization will be used to scale the testing output, RMSE, predictive mean and std from the emulator. Defaults to TRUE. This argument is not applicable to DGP emulators with categorical likelihoods.

normalize

[New] a bool indicating if normalization will be used to scale the counts in validation plots of DGP emulators with categorical likelihoods when style = 2. Defaults to TRUE.

color

a character string indicating the color map to use when style = 2:

  • 'magma' (or 'A')

  • 'inferno' (or 'B')

  • 'plasma' (or 'C')

  • 'viridis' (or 'D')

  • 'cividis' (or 'E')

  • 'rocket' (or 'F')

  • 'mako' (or 'G')

  • 'turbo' (or 'H')

Defaults to 'turbo' (or 'H').

type

either 'line' or ⁠'points⁠, indicating whether to draw testing data in the OOS validation plot as a line or individual points when the input of the emulator is one-dimensional and style = 1. This argument is not applicable to DGP emulators with categorical likelihoods. Defaults to 'points'

verb

a bool indicating if trace information on plotting will be printed during execution. Defaults to TRUE.

M

[New] same as that of validate().

force

same as that of validate().

cores

same as that of validate().

...

N/A.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

A patchwork object.

Note

Examples

## Not run: 

# See gp(), dgp(), or lgp() for an example.

## End(Not run)

Prediction from GP, DGP, or linked (D)GP emulators

Description

[Updated]

This function implements prediction from GP, DGP, or linked (D)GP emulators.

Usage

## S3 method for class 'dgp'
predict(
  object,
  x,
  method = NULL,
  mode = "label",
  full_layer = FALSE,
  sample_size = 50,
  M = 50,
  cores = 1,
  chunks = NULL,
  ...
)

## S3 method for class 'lgp'
predict(
  object,
  x,
  method = NULL,
  full_layer = FALSE,
  sample_size = 50,
  M = 50,
  cores = 1,
  chunks = NULL,
  ...
)

## S3 method for class 'gp'
predict(
  object,
  x,
  method = NULL,
  sample_size = 50,
  M = 50,
  cores = 1,
  chunks = NULL,
  ...
)

Arguments

object

an instance of the gp, dgp, or lgp class.

x

the testing input data:

  • if object is an instance of the gp or dgp class, x is a matrix where each row is an input testing data point and each column is an input dimension.

  • [Deprecated] if object is an instance of the lgp class created by lgp() without specifying argument struc in data frame form, x can be either a matrix or a list:

    • if x is a matrix, its rows are treated as instances of the Global inputs. In this case, it is assumed that the only global input to the system is the input to the emulators in the first layer and there is no global input to emulators in other layers.

    • if x is a list, it should have L (the number of layers in an emulator system) elements. The first element is a matrix that represents the global testing input data that feed into the emulators in the first layer of the system. The remaining L-1 elements are L-1 sub-lists, each of which contains a number (the same number of emulators in the corresponding layer) of matrices (rows being testing input data points and columns being input dimensions) that represent the global testing input data to the emulators in the corresponding layer. The matrices must be placed in the sub-lists based on how their corresponding emulators are placed in struc argument of lgp(). If there is no global input data to a certain emulator, set NULL in the corresponding sub-list of x.

    This option for linked (D)GP emulators is deprecated and will be removed in the next release.

  • [New] If object is an instance of the lgp class created by lgp() with argument struc in data frame form, x must be a matrix representing the global input, where each row corresponds to a test data point and each column represents a global input dimension. The column indices in x must align with the indices specified in the From_Output column of the struc data frame (used in lgp()), corresponding to rows where the From_Emulator column is "Global".

method

[Updated] the prediction approach to use: either the mean-variance approach ("mean_var") or the sampling approach ("sampling"). The mean-variance approach returns the means and variances of the predictive distributions, while the sampling approach generates samples from predictive distributions using the derived means and variances. For DGP emulators with a categorical likelihood (likelihood = "Categorical" in dgp()), method is only applicable when full_layer = TRUE. In this case, the sampling approach generates samples from the GP nodes in all hidden layers using the derived means and variances, and subsequently propagates these samples through the categorical likelihood. By default, the method is set to "sampling" for DGP emulators with Poisson, Negative Binomial, and Categorical likelihoods, and to "mean_var" otherwise.

mode

[New] whether to predict the classes ("label") or probabilities ("proba") of different classes when object is a DGP emulator with a categorical likelihood. Defaults to "label".

full_layer

a bool indicating whether to output the predictions of all layers. Defaults to FALSE. Only used when object is a DGP or a linked (D)GP emulator.

sample_size

the number of samples to draw for each given imputation if method = "sampling". Defaults to 50.

M

[New] the size of the conditioning set for the Vecchia approximation in the emulator prediction. Defaults to 50. This argument is only used if the emulator object was constructed under the Vecchia approximation.

cores

the number of processes to be used for prediction. If set to NULL, the number of processes is set to ⁠max physical cores available %/% 2⁠. Defaults to 1.

chunks

the number of chunks that the testing input matrix x will be divided into for multi-cores to work on. Only used when cores is not 1. If not specified (i.e., chunks = NULL), the number of chunks is set to the value of cores. Defaults to NULL.

...

N/A.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

The results slot will also include:

Examples

## Not run: 

# See gp(), dgp(), or lgp() for an example.

## End(Not run)

Static pruning of a DGP emulator

Description

This function implements static pruning for a DGP emulator.

Usage

prune(object, control = list(), verb = TRUE)

Arguments

object

an instance of the dgp class that is generated by dgp().

control

a list that can supply the following two components to control static pruning of the DGP emulator:

  • min_size, the minimum number of design points required to trigger pruning. Defaults to 10 times of the input dimensions.

  • threshold, the R^2 value above which a GP node is considered redundant and removable. Defaults to 0.97.

verb

a bool indicating if trace information will be printed during the function execution. Defaults to TRUE.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object that could be an instance of gp, dgp, or bundle (of GP emulators) class.

Note

Examples

## Not run: 

# load the package and the Python env
library(dgpsi)

# construct the borehole function over a hypercube
f <- function(x){
  x[,1] <- (0.15 - 0.5) * x[,1] + 0.5
  x[,2] <- exp((log(50000) - log(100)) * x[,2] + log(100))
  x[,3] <- (115600 - 63070) *x[,3] + 63070
  x[,4] <- (1110 - 990) * x[,4] + 990
  x[,5] <- (116 - 63.1) * x[,5] + 63.1
  x[,6] <- (820 - 700) * x[,6] + 700
  x[,7] <- (1680 - 1120) * x[,7] + 1120
  x[,8] <- (12045 - 9855) * x[,8] + 9855
  y <- apply(x, 1, RobustGaSP::borehole)
}

# set a random seed
set_seed(999)

# generate training data
X <- maximinLHS(80, 8)
Y <- f(X)

# generate validation data
validate_x <- maximinLHS(500, 8)
validate_y <- f(validate_x)

# training a DGP emulator with anisotropic squared exponential kernels
m <- dgp(X, Y, share = F)

# OOS validation of the DGP emulator
plot(m, validate_x, validate_y)

# prune the emulator until no more GP nodes are removable
m <- prune(m)

# OOS validation of the resulting emulator
plot(m, validate_x, validate_y)

## End(Not run)

Load the stored emulator

Description

This function loads the .pkl file that stores the emulator.

Usage

read(pkl_file)

Arguments

pkl_file

the path to and the name of the .pkl file where the emulator is stored.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

The S3 class of a GP emulator, a DGP emulator, a linked (D)GP emulator, or a bundle of (D)GP emulators.

Examples

## Not run: 

# See gp(), dgp(), lgp(), or pack() for an example.

## End(Not run)

Serialize the constructed emulator

Description

[New]

This function serializes the constructed emulator.

Usage

serialize(object, light = TRUE)

Arguments

object

an instance of the S3 class gp, dgp, lgp, or bundle.

light

a bool indicating if a light version of the constructed emulator (that requires a small storage) will be serialized. Defaults to TRUE.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

A serialized version of object.

Note

Since the constructed emulators are 'python' objects, they cannot be directly exported to other R processes for parallel processing in multi-session workers created through spawning. This function provides a solution by converting the emulators into serialized objects, which can be restored using deserialize() for multi-session processing. Note that in forking, serialization is generally not required.

Examples

## Not run: 

library(future)
library(future.apply)
library(dgpsi)

# model
f <- function(x) {
 (sin(7.5*x)+1)/2
}

# training data
X <- seq(0, 1, length = 10)
Y <- sapply(X, f)

# train a DGP emulator
m <- dgp(X, Y, name = "matern2.5")

# testing input data
X_dgp <- seq(0, 1, length = 100)

# serialize the DGP emulator
m_serialized <- serialize(m)

# start a multi-session with three cores for parallel predictions
plan(multisession, workers = 3)

# perform parallel predictions
results <- future_lapply(1:length(X_dgp), function(i) {
  m_deserialized <- deserialize(m_serialized)
  mean_i <- predict(m_deserialized, X_dgp[i])$results$mean
}, future.seed = TRUE)

# reset the future plan to sequential
plan(sequential)

# combine mean predictions
pred_mean <- do.call(rbind, results)

## End(Not run)

Set Emulator ID

Description

[New]

This function assigns a unique identifier to an emulator.

Usage

set_id(object, id)

Arguments

object

an emulator object to which the ID will be assigned.

id

a unique identifier for the emulator as either a numeric or character string. Ensure this ID does not conflict with other emulator IDs, especially when used in linked emulations.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

The updated object, with the assigned ID stored in its id slot.

Examples

## Not run: 

# See lgp() for an example.

## End(Not run)

Reset number of imputations for a DGP emulator

Description

This function resets the number of imputations for prediction from a DGP emulator.

Usage

set_imp(object, B = 5)

Arguments

object

an instance of the S3 class dgp.

B

the number of imputations to produce predictions from object. Increase the value to improve imputation uncertainty quantification. Decrease the value to improve speed of prediction. Defaults to 5.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object with the information of B incorporated.

Note

Examples

## Not run: 

# See design() for an example.

## End(Not run)

Set linked indices

Description

[Deprecated]

This function is deprecated and will be removed in the next release. The updated lgp() function now offers a simpler, more efficient way to specify linked information for (D)GP emulators.

Usage

set_linked_idx(object, idx)

Arguments

object

an instance of the S3 class gp or dgp.

idx

same as the argument linked_idx of gp() and dgp().

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object with the information of idx incorporated.

Note

This function is useful when different models are emulated by different teams. Each team can create their (D)GP emulator even without knowing how different emulators are connected together. When this information is available and different emulators are collected, the connection information between emulators can then be assigned to individual emulators with this function.

Examples

## Not run: 

# See lgp() for an example.

## End(Not run)

Random seed generator

Description

This function initializes a random number generator that sets the random seed in both R and Python to ensure reproducible results from the package.

Usage

set_seed(seed)

Arguments

seed

a single integer value.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

No return value.

Examples

## Not run: 

# See dgp() for an example.

## End(Not run)

Set the number of threads

Description

[New]

This function sets the number of threads for parallel computations involved in the package.

Usage

set_thread_num(num)

Arguments

num

the number of threads. If it is greater than the maximum number of threads available, the number of threads will be set to the maximum value.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

No return value.


Add or remove the Vecchia approximation

Description

[New]

This function adds or removes the Vecchia approximation from a GP, DGP or linked (D)GP emulator constructed by gp(), dgp() or lgp().

Usage

set_vecchia(object, vecchia = TRUE, M = 25, ord = NULL)

Arguments

object

an instance of the S3 class gp, dgp, or lgp.

vecchia

a bool or a list of bools to indicate the addition or removal of the Vecchia approximation:

  • if object is an instance of the gp or dgp class, vecchia is a bool that indicates either addition (vecchia = TRUE) or removal (vecchia = FALSE) of the Vecchia approximation from object.

  • if object is an instance of the lgp class, x can be a bool or a list of bools:

    • if vecchia is a bool, it indicates either addition (vecchia = TRUE) or removal (vecchia = FALSE) of the Vecchia approximation from all individual (D)GP emulators contained in object.

    • if vecchia is a list of bools, it should have same shape as struc that was supplied to lgp(). Each bool in the list indicates if the corresponding (D)GP emulator contained in object shall have the Vecchia approximation added or removed.

M

the size of the conditioning set for the Vecchia approximation in the (D)GP emulator training. Defaults to 25.

ord

an R function that returns the ordering of the input to the (D)GP emulator for the Vecchia approximation. The function must satisfy the following basic rules:

  • the first argument represents the lengthscale-scaled input to the GP emulator or the lengthscale-scaled input to a GP node of the DGP emulator.

  • the output of the function is a vector of indices that gives the ordering of the input to the GP emulator or the input to the GP nodes of the DGP emulator.

If ord = NULL, the default random ordering is used. Defaults to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object with the Vecchia approximation either added or removed.

Note

This function is useful for quickly switching between Vecchia and non-Vecchia approximations for an existing emulator without the need to reconstruct the emulator. If the emulator was built without the Vecchia approximation, the function can add it, and if the emulator was built with the Vecchia approximation, the function can remove it. If the current state already matches the requested state, the emulator remains unchanged.


Summary of a constructed GP, DGP, or linked (D)GP emulator

Description

[Updated]

This function provides a summary of key information for a GP, DGP, or linked (D)GP emulator by generating either a table or an interactive plot of the emulator’s structure.

Usage

## S3 method for class 'gp'
summary(object, type = "plot", ...)

## S3 method for class 'dgp'
summary(object, type = "plot", ...)

## S3 method for class 'lgp'
summary(object, type = "plot", group_size = 1, ...)

Arguments

object

can be one of the following:

  • the S3 class gp.

  • the S3 class dgp.

  • the S3 class lgp.

type

a character string, either "table" or "plot", indicating the format of the output. If set to "table", the function returns a summary in table. If set to "plot", the function returns an interactive visualization. Defaults to "plot". If the object was created with lgp() where struc is not a data frame, type will automatically default to "table".

...

Any arguments that can be passed to kableExtra::kbl() when type = "table".

group_size

an integer specifying the number of consecutive layers to be grouped together in the interactive visualization of linked emulators when type = "plot". This argument is only applicable if object is an instance of the lgp class. Defaults to 1.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

Either a summary table (returned as kableExtra object) or an interactive visualization (returned as a visNetwork object) of the emulator. The visualization is compatible with R Markdown documents and the RStudio Viewer. The summary table can be further customized by kableExtra::kableExtra package. The resulting visNetwork object can be saved as an HTML file using visNetwork::visSave() from the visNetwork::visNetwork package.

Examples

## Not run: 

# See gp(), dgp(), or lgp() for an example.

## End(Not run)

Trace plot for DGP hyperparameters

Description

This function draws trace plots for the hyperparameters of a chosen GP node in a DGP emulator.

Usage

trace_plot(object, layer = NULL, node = 1)

Arguments

object

an instance of the dgp class.

layer

the index of a layer. Defaults to NULL for the final layer.

node

the index of a GP node in the layer specified by layer. Defaults to 1 for the first GP node in the corresponding layer.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

A ggplot object.

Examples

## Not run: 

# See dgp() for an example.

## End(Not run)

Unpack a bundle of (D)GP emulators

Description

This function unpacks a bundle of (D)GP emulators safely so that any further manipulations of unpacked individual emulators will not impact those in the bundle.

Usage

unpack(object)

Arguments

object

an instance of the class bundle.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

A named list that contains individual emulators (named ⁠emulator1,...,emulatorS⁠) packed in object, where S is the number of emulators in object.

Examples

## Not run: 

# See pack() for an example.

## End(Not run)

Update a GP or DGP emulator

Description

This function updates the training input and output of a GP or DGP emulator with an option to refit the emulator.

Usage

update(object, X, Y, refit, reset, verb, ...)

## S3 method for class 'dgp'
update(
  object,
  X,
  Y,
  refit = TRUE,
  reset = FALSE,
  verb = TRUE,
  N = NULL,
  cores = 1,
  ess_burn = 10,
  B = NULL,
  ...
)

## S3 method for class 'gp'
update(object, X, Y, refit = TRUE, reset = FALSE, verb = TRUE, ...)

Arguments

object

can be one of the following:

  • the S3 class gp.

  • the S3 class dgp.

X

the new input data which is a matrix where each row is an input training data point and each column represents an input dimension.

Y

the new output data:

  • If object is an instance of the gp class, Y is a matrix with only one column and each row being an output data point.

  • If object is an instance of the dgp class, Y is a matrix with its rows being output data points and columns being output dimensions. When likelihood (see below) is not NULL, Y must be a matrix with only one column.

refit

a bool indicating whether to re-fit the emulator object after the training input and output are updated. Defaults to TRUE.

reset

a bool indicating whether to reset hyperparameters of the emulator object to the initial values first obtained when the emulator was constructed. Use if it is suspected that a local mode for the hyperparameters has been reached through successive updates. Defaults to FALSE.

verb

a bool indicating if trace information will be printed during the function execution. Defaults to TRUE.

...

N/A.

N

[Updated] number of training iterations used to re-fit the emulator object if it is an instance of the dgp class. If set to NULL, the number of iterations is set to 100 if the DGP emulator was constructed without the Vecchia approximation, and is set to 50 if Vecchia approximation was used. Defaults to NULL.

cores

the number of processes to be used to re-fit GP components (in the same layer) at each M-step during the re-fitting. If set to NULL, the number of processes is set to ⁠(max physical cores available - 1)⁠ if vecchia = FALSE and ⁠max physical cores available %/% 2⁠ if vecchia = TRUE. Only use multiple processes when there is a large number of GP components in different layers and optimization of GP components is computationally expensive. Defaults to 1.

ess_burn

number of burnin steps for the ESS-within-Gibbs sampler at each I-step of the training of the emulator object if it is an instance of the dgp class. Defaults to 10.

B

the number of imputations for predictions from the updated emulator object if it is an instance of the dgp class. This overrides the number of imputations set in object. Set to NULL to use the same number of imputations set in object. Defaults to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object.

Note

Examples

## Not run: 

# See alm(), mice(), or vigf() for an example.

## End(Not run)

Validate a constructed GP, DGP, or linked (D)GP emulator

Description

This function calculates Leave-One-Out (LOO) cross validation or Out-Of-Sample (OOS) validation statistics for a constructed GP, DGP, or linked (D)GP emulator.

Usage

validate(
  object,
  x_test,
  y_test,
  method,
  sample_size,
  verb,
  M,
  force,
  cores,
  ...
)

## S3 method for class 'gp'
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = NULL,
  sample_size = 50,
  verb = TRUE,
  M = 50,
  force = FALSE,
  cores = 1,
  ...
)

## S3 method for class 'dgp'
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = NULL,
  sample_size = 50,
  verb = TRUE,
  M = 50,
  force = FALSE,
  cores = 1,
  ...
)

## S3 method for class 'lgp'
validate(
  object,
  x_test = NULL,
  y_test = NULL,
  method = NULL,
  sample_size = 50,
  verb = TRUE,
  M = 50,
  force = FALSE,
  cores = 1,
  ...
)

Arguments

object

can be one of the following:

  • the S3 class gp.

  • the S3 class dgp.

  • the S3 class lgp.

x_test

OOS testing input data:

  • if object is an instance of the gp or dgp class, x_test is a matrix where each row is a new input location to be used for validating the emulator and each column is an input dimension.

  • [Deprecated] if object is an instance of the lgp class, x_test can be a matrix or a list:

    • if x_test is a matrix, it is the global testing input data that feed into the emulators in the first layer of a system. The rows of x_test represent different input data points and the columns represent input dimensions across all emulators in the first layer of the system. In this case, it is assumed that the only global input to the system is the input to the emulators in the first layer and there is no global input to emulators in other layers.

    • if x_test is a list, it should have L (the number of layers in an emulator system) elements. The first element is a matrix that represents the global testing input data that feed into the emulators in the first layer of the system. The remaining L-1 elements are L-1 sub-lists, each of which contains a number (the same number of emulators in the corresponding layer) of matrices (rows being testing input data points and columns being input dimensions) that represent the global testing input data to the emulators in the corresponding layer. The matrices must be placed in the sub-lists based on how their corresponding emulators are placed in struc argument of lgp(). If there is no global input data to a certain emulator, set NULL in the corresponding sub-list of x_test.

    This option for linked (D)GP emulators is deprecated and will be removed in the next release.

  • [New] If object is an instance of the lgp class created by lgp() with argument struc in data frame form, x_test must be a matrix representing the global input, where each row corresponds to a test data point and each column represents a global input dimension. The column indices in x_test must align with the indices specified in the From_Output column of the struc data frame (used in lgp()), corresponding to rows where the From_Emulator column is "Global".

x_test must be provided if object is an instance of the lgp. x_test must also be provided if y_test is provided. Defaults to NULL, in which case LOO validation is performed.

y_test

the OOS output data corresponding to x_test:

  • if object is an instance of the gp class, y_test is a matrix with only one column where each row represents the output corresponding to the matching row of x_test.

  • if object is an instance of the dgp class, y_test is a matrix where each row represents the output corresponding to the matching row of x_test and with columns representing output dimensions.

  • if object is an instance of the lgp class, y_test can be a single matrix or a list of matrices:

    • if y_test is a single matrix, then there should be only one emulator in the final layer of the linked emulator system and y_test represents the emulator's output with rows being testing positions and columns being output dimensions.

    • if y_test is a list, then y_test should have L matrices, where L is the number of emulators in the final layer of the system. Each matrix has its rows corresponding to testing positions and columns corresponding to output dimensions of the associated emulator in the final layer.

y_test must be provided if object is an instance of the lgp. y_test must also be provided if x_test is provided. Defaults to NULL, in which case LOO validation is performed.

method

[Updated] the prediction approach to use for validation: either the mean-variance approach ("mean_var") or the sampling approach ("sampling"). For details see predict(). For DGP emulators with a categorical likelihood (likelihood = "Categorical" in dgp()), only the sampling approach is supported. By default, the method is set to "sampling" for DGP emulators with Poisson, Negative Binomial, and Categorical likelihoods and "mean_var" otherwise.

sample_size

the number of samples to draw for each given imputation if method = "sampling". Defaults to 50.

verb

a bool indicating if trace information for validation should be printed during function execution. Defaults to TRUE.

M

[New] the size of the conditioning set for the Vecchia approximation in emulator validation. This argument is only used if the emulator object was constructed under the Vecchia approximation. Defaults to 50.

force

a bool indicating whether to force LOO or OOS re-evaluation when the loo or oos slot already exists in object. When force = FALSE, validate() will only re-evaluate the emulators if the x_test and y_test are not identical to the values in the oos slot. If the existing loo or oos validation used a different M in a Vecchia approximation or a different method to the one prescribed in this call, the emulator will be re-evaluated. Set force to TRUE when LOO or OOS re-evaluation is required. Defaults to FALSE.

cores

the number of processes to be used for validation. If set to NULL, the number of processes is set to ⁠max physical cores available %/% 2⁠. Defaults to 1.

...

N/A.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

Note

Examples

## Not run: 

# See gp(), dgp(), or lgp() for an example.

## End(Not run)

Locate the next design point for a (D)GP emulator or a bundle of (D)GP emulators using VIGF

Description

[Updated]

This function searches from a candidate set to locate the next design point(s) to be added to a (D)GP emulator or a bundle of (D)GP emulators using the Variance of Improvement for Global Fit (VIGF). For VIGF on GP emulators, see the reference below.

Usage

vigf(object, ...)

## S3 method for class 'gp'
vigf(
  object,
  x_cand = NULL,
  n_start = 10,
  batch_size = 1,
  M = 50,
  workers = 1,
  limits = NULL,
  int = FALSE,
  ...
)

## S3 method for class 'dgp'
vigf(
  object,
  x_cand = NULL,
  n_start = 10,
  batch_size = 1,
  M = 50,
  workers = 1,
  limits = NULL,
  int = FALSE,
  aggregate = NULL,
  ...
)

## S3 method for class 'bundle'
vigf(
  object,
  x_cand = NULL,
  n_start = 10,
  batch_size = 1,
  M = 50,
  workers = 1,
  limits = NULL,
  int = FALSE,
  aggregate = NULL,
  ...
)

Arguments

object

can be one of the following:

  • the S3 class gp.

  • the S3 class dgp.

  • the S3 class bundle.

...

any arguments (with names different from those of arguments used in vigf()) that are used by aggregate can be passed here.

x_cand

a matrix (with each row being a design point and column being an input dimension) that gives a candidate set from which the next design point(s) are determined. If object is an instance of the bundle class and aggregate is not supplied, x_cand can also be a list. The list must have a length equal to the number of emulators in object, with each element being a matrix representing the candidate set for a corresponding emulator in the bundle. Defaults to NULL.

n_start

[New] an integer that gives the number of initial design points to be used to determine next design point(s). This argument is only used when x_cand is NULL. Defaults to 10.

batch_size

an integer that gives the number of design points to be chosen. Defaults to 1.

M

[New] the size of the conditioning set for the Vecchia approximation in the criterion calculation. This argument is only used if the emulator object was constructed under the Vecchia approximation. Defaults to 50.

workers

the number of processes to be used for design point selection. If set to NULL, the number of processes is set to ⁠max physical cores available %/% 2⁠. Defaults to 1. The argument does not currently support Windows machines when the aggregate function is provided, due to the significant overhead caused by initializing the Python environment for each worker under spawning.

limits

[New] a two-column matrix that gives the ranges of each input dimension, or a vector of length two if there is only one input dimension. If a vector is provided, it will be converted to a two-column row matrix. The rows of the matrix correspond to input dimensions, and its first and second columns correspond to the minimum and maximum values of the input dimensions. This argument is only used when x_cand = NULL. Defaults to NULL.

int

[New] a bool or a vector of bools that indicates if an input dimension is an integer type. If a single bool is given, it will be applied to all input dimensions. If a vector is provided, it should have a length equal to the input dimensions and will be applied to individual input dimensions. This argument is only used when x_cand = NULL. Defaults to FALSE.

aggregate

an R function that aggregates scores of the VIGF across different output dimensions (if object is an instance of the dgp class) or across different emulators (if object is an instance of the bundle class). The function should be specified in the following basic form:

  • the first argument is a matrix representing scores. The rows of the matrix correspond to different design points. The number of columns of the matrix equals to:

    • the emulator output dimension if object is an instance of the dgp class; or

    • the number of emulators contained in object if object is an instance of the bundle class.

  • the output should be a vector that gives aggregate scores at different design points.

Set to NULL to disable aggregation. Defaults to NULL.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

  1. If x_cand is not NULL:

    • When object is an instance of the gp class, a vector of length batch_size is returned, containing the positions (row numbers) of the next design points from x_cand.

    • When object is an instance of the dgp class, a vector of length batch_size * D is returned, containing the positions (row numbers) of the next design points from x_cand to be added to the DGP emulator.

      • D is the number of output dimensions of the DGP emulator if no likelihood layer is included.

      • For a DGP emulator with a Hetero or NegBin likelihood layer, D = 2.

      • For a DGP emulator with a Categorical likelihood layer, D = 1 for binary output or D = K for multi-class output with K classes.

    • When object is an instance of the bundle class, a matrix is returned with batch_size rows and a column for each emulator in the bundle, containing the positions (row numbers) of the next design points from x_cand for individual emulators.

  2. If x_cand is NULL:

    • When object is an instance of the gp class, a matrix with batch_size rows is returned, giving the next design points to be evaluated.

    • When object is an instance of the dgp class, a matrix with batch_size * D rows is returned, where:

      • D is the number of output dimensions of the DGP emulator if no likelihood layer is included.

      • For a DGP emulator with a Hetero or NegBin likelihood layer, D = 2.

      • For a DGP emulator with a Categorical likelihood layer, D = 1 for binary output or D = K for multi-class output with K classes.

    • When object is an instance of the bundle class, a list is returned with a length equal to the number of emulators in the bundle. Each element of the list is a matrix with batch_size rows, where each row represents a design point to be added to the corresponding emulator.

Note

The first column of the matrix supplied to the first argument of aggregate must correspond to the first output dimension of the DGP emulator if object is an instance of the dgp class, and so on for subsequent columns and dimensions. If object is an instance of the bundle class, the first column must correspond to the first emulator in the bundle, and so on for subsequent columns and emulators.

References

Mohammadi, H., & Challenor, P. (2022). Sequential adaptive design for emulating costly computer codes. arXiv:2206.12113.

Examples

## Not run: 

# load packages and the Python env
library(lhs)
library(dgpsi)

# construct a 1D non-stationary function
f <- function(x) {
 sin(30*((2*x-1)/2-0.4)^5)*cos(20*((2*x-1)/2-0.4))
}

# generate the initial design
X <- maximinLHS(10,1)
Y <- f(X)

# training a 2-layered DGP emulator with the global connection off
m <- dgp(X, Y, connect = F)

# specify the input range
lim <- c(0,1)

# locate the next design point using VIGF
X_new <- vigf(m, limits = lim)

# obtain the corresponding output at the located design point
Y_new <- f(X_new)

# combine the new input-output pair to the existing data
X <- rbind(X, X_new)
Y <- rbind(Y, Y_new)

# update the DGP emulator with the new input and output data and refit
m <- update(m, X, Y, refit = TRUE)

# plot the LOO validation
plot(m)

## End(Not run)

Trim the sequence of hyperparameter estimates within a DGP emulator

Description

This function trims the sequence of hyperparameter estimates within a DGP emulator generated during training.

Usage

window(object, start, end = NULL, thin = 1)

Arguments

object

an instance of the S3 class dgp.

start

the first iteration before which all iterations are trimmed from the sequence.

end

the last iteration after which all iterations are trimmed from the sequence. Set to NULL to keep all iterations after (including) start. Defaults to NULL.

thin

the interval between the start and end iterations to thin out the sequence. Defaults to 1.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

An updated object with a trimmed sequence of hyperparameters.

Note

Examples

## Not run: 

# See dgp() for an example.

## End(Not run)

Save the constructed emulator

Description

This function saves the constructed emulator to a .pkl file.

Usage

write(object, pkl_file, light = TRUE)

Arguments

object

an instance of the S3 class gp, dgp, lgp, or bundle.

pkl_file

the path to and the name of the .pkl file to which the emulator object is saved.

light

a bool indicating if a light version of the constructed emulator (that requires less disk space to store) will be saved. Defaults to TRUE.

Details

See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.

Value

No return value. object will be saved to a local .pkl file specified by pkl_file.

Note

Since emulators built from the package are 'python' objects, save() from R will not work as it would for R objects. If object was processed by set_vecchia() to add or remove the Vecchia approximation, light should be set to FALSE to ensure reproducibility after the saved emulator is reloaded by read().

Examples

## Not run: 

# See gp(), dgp(), lgp(), or pack() for an example.

## End(Not run)