Title: | Model-Based Geostatistics |
Version: | 1.1.0 |
Description: | Modern model-based geostatistics for point-referenced data. This package provides a simple interface to run spatial machine learning models and geostatistical models that estimate a continuous (raster) surface from point-referenced outcomes and, optionally, a set of raster covariates. The package also includes functions to summarize raster outcomes by (polygon) region while preserving uncertainty. |
License: | MIT + file LICENSE |
Depends: | R (≥ 4.1.0) |
Imports: | assertthat, caret, data.table, glue, Matrix, matrixStats, purrr, R6, sf, terra, tictoc |
Suggests: | INLA, knitr, rmarkdown, ggplot2, scales |
Additional_repositories: | https://inla.r-inla-download.org/R/stable/ |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
VignetteBuilder: | knitr |
URL: | https://henryspatialanalysis.github.io/mbg/ |
BugReports: | https://github.com/henryspatialanalysis/mbg/issues |
NeedsCompilation: | no |
Packaged: | 2025-04-25 23:29:47 UTC; nathenry |
Author: | Nathaniel Henry |
Maintainer: | Nathaniel Henry <nat@henryspatialanalysis.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-26 02:20:02 UTC |
Behavior when attaching the mbg package
Description
Behavior when attaching the mbg package
Usage
.onAttach(libname, pkgname)
Arguments
libname |
(character(1)) A character string giving the library directory where the package defining the namespace was found. |
pkgname |
(character(1)) A character string giving the name of the package. |
Details
Yields a message if the INLA package namespace is not available.
Value
(invisible) A message may be printed to the console.
MBG model runner class
Description
R6 class to run a full MBG model and make predictions.
Details
To see examples of this object, run vignette('mbg')
Public fields
input_data
(data.table::data.table)
Table containing at least the following fields:
x (
numeric
) location longitude in decimal degrees
y (
numeric
) location latitude in decimal degrees
indicator (
integer
) The number of events in the population
samplesize (
integer
) The total population, denominator forindicator
id_raster
(terra::SpatRaster)
raster showing the total area that will be predicted using this model.covariate_rasters
(
list()
)
A list containing all predictor covariates. Each covariate is a terra::SpatRaster object with the same extent and dimensions asid_raster
.aggregation_table
(data.table::data.table)
A table created by build_aggregation_table, used to link each grid cell to higher-level administrative units.aggregation_levels
(
list()
)
A named list: for each named item, the name is the label for that aggregation level, and the value is a character vector of all fields in the original polygons to be used for aggregation at that level.population_raster
(terra::SpatRaster)
A raster giving population for each grid cell, to be used for population-weighted aggregation from grid cells to polygon boundaries. Should have the same dimensions asid_raster
. If no population raster is passed and the results are aggregated, aggregation will be by simple mean rather than population-weighted meanadmin_bounds
(sf::sf)
Polygons showing the boundaries of administrative divisions within the study region. Only required ifuse_admin_effect
ORstacking_use_admin_bounds
isTRUE
.admin_bounds_id
(
character
)
Field containing unique identifiers foradmin_bounds
, if passed.use_covariates
(
logical(1)
)
Should covariate effects be included in the predictive model?use_gp
(
logical(1)
)
Should a smoothed spatial surface be included in the predictive model?use_admin_effect
(
logical(1)
)
Should IID administrative-level effects be included in the predictive model?use_nugget
(
logical(1)
)
Should an IID effect by pixel be included in the predictive model?use_stacking
(
logical(1)
)
Should machine learning submodels be trained to relate the covariate rasters with the outcome data? Only run ifuse_covariates
isTRUE
.stacking_model_settings
(
list()
)
A named list of submodels to be run. For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.stacking_cv_settings
(
list()
)
How should the stacking submodels be cross-validated? For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
isTRUE
.stacking_use_admin_bounds
(
logical(1)
)
Should admin boundaries be included as features in the stacking submodels? For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.stacking_prediction_range
(
logical(1)
)
Range of possible predictions for the stacking submodels. For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.mesh_max_edge
(
numeric(2)
or NULL)
Maximum size of the INLA SPDE mesh inside (1) and outside (2) of the modeled region. Only considered ifuse_gp
is TRUE.mesh_cutoff
(
numeric(1)
)
Minimum size of the INLA mesh, usually reached in data-dense areas. Only considered ifuse_gp
is TRUE.spde_integrate_to_zero
(
boolean(1)
)
Should the 'volume' under the SPDE mesh integrate to zero? Only considered ifuse_gp
is TRUE.prior_spde_range
(
list()
)
A named list specifying the penalized complexity prior for the SPDE range. The two named items are "threshold", the test threshold (set as a proportion of the overall mesh extent), and "prob_below", the prior probability that the value is BELOW that range threshold. The function automatically converts "threshold" from a proportion of the overall mesh extent into a distance. Only considered ifuse_gp
is TRUE.prior_spde_sigma
(
list()
)
A named list specifying the penalized complexity prior for sigma (standard deviation) of the SPDE object. The two named items are "threshold", the test threshold for the standard deviation, and "prob_above", the prior probability that sigma will EXCEED that threshold. Only considered ifuse_gp
is TRUEprior_nugget
(
list()
)
A named list specifying the penalized complexity prior for the nugget term. The two named items are "threshold", the test threshold for the nugget standard deviation, and "prob_above", the prior probability that the standard deviation will EXCEED that threshold. Only considered ifuse_nugget
is TRUE.prior_admin_effect
(
list()
)
A named list specifying the penalized complexity prior for the admin-level IID term. The two named items are "threshold", the test threshold for the standard deviation of admin-level effects, and "prob_above", the prior probability that the standard deviation will EXCEED that threshold. Only considered ifuse_admin_effect
is TRUE.prior_covariate_effect
(
list()
)
A named list specifying the penalized complexity prior for all covariate effects except for the intercept, if an intercept is included. The two named items are "threshold", the test threshold for the size of each fixed effect, and "prob_above", the prior probability that the beta for each covariate will EXCEED that threshold. Only considered ifuse_covariates
is TRUE anduse_stacking
is FALSE.inla_link
(
character(1)
)
Link function for fitting the INLA model, typically related to the GLMfamily
.inverse_link
(
character(1)
)
Inverse function ofinla_link
.inla_family
(character)
GLM family to use. For more information, see stats::family.nugget_in_predict
(
logical(1)
)
If the nugget is used in model fitting, should it also be included as an IID effect by pixel in the model prediction step?verbose
Should model progress be timed?
model_covariates
(
list()
)
A list of covariates to be included in the INLA model. Either equal tocovariate_rasters
, or ML model predictions for stacked generalization.inla_inputs_list
(
list()
)
List of model inputs yielded by prepare_inla_data_stackinla_fitted_model
(
list()
)
List of model outputs yielded by fit_inla_modelgrid_cell_predictions
List of predictive surfaces yielded by generate_cell_draws_and_summarize
aggregated_predictions
List of predictions by administrative unit. Only created if
aggregation_table
andaggregation_levels
are both defined.
Methods
Public methods
Method new()
Create a new MbgModelRunner object
Usage
MbgModelRunner$new( input_data, id_raster, covariate_rasters = NULL, aggregation_table = NULL, aggregation_levels = NULL, population_raster = NULL, admin_bounds = NULL, admin_bounds_id = NULL, use_covariates = TRUE, use_gp = TRUE, use_admin_effect = FALSE, use_nugget = TRUE, use_stacking = FALSE, stacking_cv_settings = list(method = "repeatedcv", number = 5, repeats = 5), stacking_model_settings = list(gbm = NULL, treebag = NULL, rf = NULL), stacking_use_admin_bounds = FALSE, stacking_prediction_range = NULL, mesh_max_edge = c(0.2, 5), mesh_cutoff = c(0.04), spde_integrate_to_zero = FALSE, prior_spde_range = list(threshold = 0.1, prob_below = 0.05), prior_spde_sigma = list(threshold = 3, prob_above = 0.05), prior_nugget = list(threshold = 3, prob_above = 0.05), prior_admin_effect = list(threshold = 3, prob_above = 0.05), prior_covariate_effect = list(threshold = 3, prob_above = 0.05), inla_link = "logit", inverse_link = "plogis", inla_family = "binomial", nugget_in_predict = TRUE, verbose = TRUE )
Arguments
input_data
(data.table::data.table) Table containing at least the following fields:
x (
numeric
) location x position, in the same projection as theid_raster
y (
numeric
) location y position, in the same projection as theid_raster
indicator (
integer
) The number of events in the population
samplesize (
integer
) The total population, denominator forindicator
id_raster
(terra::SpatRaster) raster showing the total area that will be predicted using this model
covariate_rasters
(
list()
, default NULL) A list containing all predictor covariates. Each covariate is a terra::SpatRaster object with the same extent and dimensions asid_raster
.aggregation_table
(data.table::data.table) A table created by build_aggregation_table, linking each grid cell to one or more polygons
aggregation_levels
(
list()
) A named list: for each named item, the name is the label for that aggregation level, and the value is a character vector of all fields in the original polygons to be used for aggregation at that level.population_raster
(terra::SpatRaster) A raster giving population for each grid cell, to be used for population-weighted aggregation from grid cells to polygon boundaries. Should have the same dimensions as
id_raster
. If no population raster is passed and the results are aggregated, aggregation will be by simple mean rather than population-weighted meanadmin_bounds
(sf::sf, default
NULL
) Polygons showing the boundaries of administrative divisions within the study region. Only required ifuse_admin_effect
ORstacking_use_admin_bounds
isTRUE
.admin_bounds_id
(
character
, defaultNULL
) Field containing unique identifiers foradmin_bounds
, if passed.use_covariates
(
logical(1)
, default TRUE) Should covariate effects be included in the predictive model?use_gp
(
logical(1)
, default TRUE) Should a smoothed spatial surface be included in the predictive model?use_admin_effect
(
logical(1)
default FALSE) Should IID administrative-level effects be included in the predictive model?use_nugget
(
logical(1)
, default TRUE) Should an IID effect by pixel be included in the predictive model?use_stacking
(
logical(1)
, default FALSE) Should machine learning submodels be trained to relate the covariate rasters with the outcome data? Only run ifuse_covariates
isTRUE
.stacking_cv_settings
(
list()
) How should the stacking submodels be cross-validated? For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
isTRUE
.stacking_model_settings
(
list()
) A named list of submodels to be run. For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.stacking_use_admin_bounds
(
logical(1)
, default FALSE) Should admin boundaries be included as features in the stacking submodels? For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.stacking_prediction_range
(
numeric(2)
, default NULL) Range of possible predictions for the stacking submodels. For more information about this term, see run_regression_submodels. Only considered ifuse_stacking
is TRUE.mesh_max_edge
(
numeric(2)
, default c(0.2, 5)) Maximum size of the INLA SPDE mesh inside (1) and outside (2) of the modeled region. Only considered ifuse_gp
is TRUE.mesh_cutoff
(
numeric(1)
, default 0.04) Minimum size of the INLA mesh, usually reached in data-dense areas. Only considered ifuse_gp
is TRUE.spde_integrate_to_zero
(
boolean(1)
, default FALSE) Should the 'volume' under the SPDE mesh integrate to zero? Only considered ifuse_gp
is TRUE.prior_spde_range
(
list()
) A named list specifying the penalized complexity prior for the SPDE range. The two named items are "threshold", the test threshold (set as a proportion of the overall mesh extent), and "prob_below", the prior probability that the value is BELOW that range threshold. The function automatically converts "threshold" from a proportion of the overall mesh extent into a distance. Only considered ifuse_gp
is TRUE.prior_spde_sigma
(
list()
) A named list specifying the penalized complexity prior for sigma (standard deviation) of the SPDE object. The two named items are "threshold", the test threshold for the standard deviation, and "prob_above", the prior probability that sigma will EXCEED that threshold. Only considered ifuse_gp
is TRUEprior_nugget
(
list()
) A named list specifying the penalized complexity prior for the nugget term. The two named items are "threshold", the test threshold for the nugget standard deviation, and "prob_above", the prior probability that the standard deviation will EXCEED that threshold. Only considered ifuse_nugget
is TRUE.prior_admin_effect
(
list()
) A named list specifying the penalized complexity prior for the admin-level IID term. The two named items are "threshold", the test threshold for the standard deviation of admin-level effects, and "prob_above", the prior probability that the standard deviation will EXCEED that threshold. Only considered ifuse_admin_effect
is TRUE.prior_covariate_effect
(
list()
) A named list specifying the penalized complexity prior for all covariate effects except for the intercept, if an intercept is included. The two named items are "threshold", the test threshold for the size of each fixed effect, and "prob_above", the prior probability that the beta for each covariate will exceed that threshold. Only considered ifuse_covariates
is TRUE anduse_stacking
is FALSE.inla_link
(
character(1)
, default 'logit') Link function for fitting the INLA model, typically related to the GLMfamily
.inverse_link
(
character(1)
, default 'plogis') Inverse function ofinla_link
.inla_family
(
character(1)
, default 'binomial') GLM family to use. For more information, seestats::family()
.nugget_in_predict
(
logical(1)
, default TRUE) If the nugget is used in model fitting, should it also be included as an IID effect by pixel in the model prediction step?verbose
(
logical(1)
, default TRUE) Should model progress be timed?
Method prepare_covariates()
Prepare covariates for MBG model fitting
Usage
MbgModelRunner$prepare_covariates()
Method fit_mbg_model()
Fit MBG model
Usage
MbgModelRunner$fit_mbg_model()
Method generate_predictions()
Generate predictions by grid cell
Usage
MbgModelRunner$generate_predictions(n_samples = 1000, ui_width = 0.95)
Arguments
n_samples
(
integer(1)
, default 1000) Number of posterior predictive samples to generate from the fitted model object.ui_width
(
numeric(1)
, default 0.95) Uncertainty interval width. This method will create summary rasters for quantiles ((1 - ui_width)/2) and (1 - (1 - ui_width)/2).
Method aggregate_predictions()
Aggregate grid cell predictions
Usage
MbgModelRunner$aggregate_predictions(ui_width = 0.95)
Arguments
ui_width
(
numeric(1)
, default 0.95) Uncertainty interval width. This method will create summary "upper" and "lower" fields rasters for quantiles ((1 - ui_width)/2) and (1 - (1 - ui_width)/2).
Returns
List with the same names as self$aggregation_levels
, aggregating by the
columns specified in self$aggregation_levels
Method run_mbg_pipeline()
Run a full MBG pipeline, including stacking, MBG model fitting, and prediction
Usage
MbgModelRunner$run_mbg_pipeline(n_samples = 1000, ui_width = 0.95)
Arguments
n_samples
(
integer(1)
, default 1000) Number of posterior predictive samples to generate from the fitted model object.ui_width
(
numeric(1)
, default 0.95)
Method get_predictive_validity()
Get predictive validity metrics for the fitted model
Usage
MbgModelRunner$get_predictive_validity( in_sample = TRUE, validation_data = NULL, na.rm = FALSE )
Arguments
in_sample
(
logical(1)
, default TRUE) Compare model predictions to the data used to generate the model? If FALSE, does not return the WAIC, which is only useful for in-sample predictive validity.validation_data
(data.table::data.table, default NULL) Observed data to compare against. Expected for out-of-sample model validation. Table containing at least the following fields:
x (
numeric
) location x position, in the same projection as theid_raster
y (
numeric
) location y position, in the same projection as theid_raster
indicator (
integer
) The number of events in the population
samplesize (
integer
) The total population, denominator forindicator
na.rm
(
logical(1)
, default FALSE) Should NA values be dropped from the RMSE and log predictive density calculations?
Details
Returns the point RMSE (compared against the mean estimates by pixel), log-posterior density (compared against the predictive draws), and the Watanabe-Aikake Information Criterion (WAIC, only returned for in-sample predictive validity).
Returns
data.table::data.table Containing the following fields:
'rmse': Root mean squared error when compared against the mean estimates by pixel. Lower RMSE is better.
'lpd': Log posterior predictive density when compared against pixel-level samples from the model. Higher LPD is better.
'waic' (in-sample only): Watanable-Aikake information criterion estimated by INLA. Lower WAIC is better.
For clarity, these fields will have the suffix "_is" for in-sample models, and "_oos" for out-of-sample models.
Method clone()
The objects of this class are cloneable with this method.
Usage
MbgModelRunner$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Aggregate grid cell draws to polygons
Description
Aggregate grid cell draws to polygons using an aggregation table
Usage
aggregate_draws_to_polygons(
draws_matrix,
aggregation_table,
aggregation_cols = "polygon_id",
method = "mean",
z_dimension = NULL,
z_dimension_name = "z",
weighting_raster = NULL,
na.rm = TRUE
)
Arguments
draws_matrix |
|
aggregation_table |
data.table::data.table Aggregation table linking pixels to
polygon identifiers, created using |
aggregation_cols |
(character vector, default 'polygon_id') Polygon identifiers
to use for aggregation. Must be field names within |
method |
(character, default 'mean') Aggregation method: one of 'mean', 'sum',
'weighted.mean', or 'weighted.sum'. The latter two methods require a
|
z_dimension |
(vector, default NULL) If passing a |
z_dimension_name |
(default 'z') The field name for the "z" dimension
corresponding to each layer of the |
weighting_raster |
(terra::SpatRaster, default NULL) The relative weighting of each whole pixel to the overall polygon value, for example, if calculating a population-weighted mean. Required for methods 'weighted.mean' and 'weighted.sum', ignored for the other methods. |
na.rm |
(bool, default TRUE) How to handle NA pixels in |
Details
This is a more efficient and feature-rich alternative to terra's zonal statistics functions. Features include:
Always fractionally aggregate, weighting by area of the pixel in a polygon
Optionally weight by both area fraction and a weighting raster (e.g. population)
Means or sums of raster values across polygons
Optionally aggregate multiple years of raster data at once
Value
data.table::data.table containing polygon identifiers, (optionally) layer
identifiers in the z_dimension_name
column, and data values aggregated by polygon.
See Also
build_aggregation_table
Examples
## Not run:
polygons <- sf::st_read(system.file('extdata/Benin_communes.gpkg', package = 'mbg'))
id_raster <- build_id_raster(polygons)
n_data_pixels <- sum(!is.na(terra::values(id_raster)))
# Example grid-level draws from e.g. mbg::generate_cell_draws_and_summarize()
draws_matrix <- matrix(rnorm(n_data_pixels * 5), nrow = n_data_pixels)
# Build aggregation table, which can be used across many aggregations
aggregation_table <- build_aggregation_table(
polygons, id_raster, polygon_id_field = 'commune_code'
)
# Aggregate the grid-level draws to polygon-level draws
aggregated <- aggregate_draws_to_polygons(
draws_matrix = draws_matrix,
aggregation_table = aggregation_table,
aggregation_cols = 'commune_code',
method = 'mean'
)
head(aggregated)
## End(Not run)
Aggregate grid cell draws to polygons: validation
Description
Data validation for aggregate_draws_to_polygons
Usage
aggregate_draws_to_polygons_validation(
draws_matrix,
aggregation_table,
aggregation_cols,
method,
z_dimension,
z_dimension_name,
weighting_raster,
na.rm
)
Arguments
draws_matrix |
|
aggregation_table |
data.table::data.table Aggregation table linking pixels to
polygon identifiers, created using |
aggregation_cols |
(character vector, default 'polygon_id') Polygon identifiers to use for aggregation. |
method |
(character, default 'mean') Aggregation method: one of 'mean', 'sum', 'weighted.mean', or 'weighted.sum'. |
z_dimension |
(vector, default NULL) If passing a |
z_dimension_name |
(default 'z') The field name for the "z" dimension
corresponding to each set of estimates contained in |
weighting_raster |
(terra::SpatRaster, default NULL) The relative weighting of each whole pixel to the overall polygon value, for example, if calculating a population-weighted mean. |
na.rm |
(bool, default TRUE) How to handle NA values in |
Value
Errors if checks fail; silently passes if checks pass
See Also
aggregate_draws_to_polygons
Aggregate a raster to polygons
Description
Aggregate raster values to polygons using an aggregation table
Usage
aggregate_raster_to_polygons(
data_raster,
aggregation_table,
aggregation_cols = "polygon_id",
method = "mean",
aggregated_field = "data",
z_dimension = NULL,
z_dimension_name = "z",
weighting_raster = NULL,
na.rm = TRUE
)
Arguments
data_raster |
terra::SpatRaster containing data to be aggregated to polygons. |
aggregation_table |
data.table::data.table Aggregation table linking pixels to
polygon identifiers, created using |
aggregation_cols |
(character vector, default 'polygon_id') Polygon identifiers
to use for aggregation. Must be field names within |
method |
(character, default 'mean') Aggregation method: one of 'mean', 'sum',
'weighted.mean', or 'weighted.sum'. The latter two methods require a
|
aggregated_field |
(character, default 'data') Name of the aggregated raster values in the output table. |
z_dimension |
(vector, default NULL) If passing a |
z_dimension_name |
(default 'z') The field name for the "z" dimension
corresponding to each layer of the |
weighting_raster |
(terra::SpatRaster, default NULL) The relative weighting of each whole pixel to the overall polygon value, for example, if calculating a population-weighted mean. Required for methods 'weighted.mean' and 'weighted.sum', ignored for the other methods. |
na.rm |
(bool, default TRUE) How to handle NA pixels in |
Details
This is a more efficient and feature-rich alternative to terra's zonal statistics functions. Features include:
Always fractionally aggregate, weighting by area of the pixel in a polygon
Optionally weight by both area fraction and a weighting raster (e.g. population)
Means or sums of raster values across polygons
Optionally aggregate multiple years of raster data at once
Value
data.table containing polygon identifiers, (optionally) layer identifiers in
the z_dimension_name
column, and data values aggregated by polygon.
See Also
build_aggregation_table
Examples
## Not run:
polygons <- sf::st_read(system.file('extdata/Benin_communes.gpkg', package = 'mbg'))
id_raster <- build_id_raster(polygons)
n_data_pixels <- sum(!is.na(terra::values(id_raster)))
# Example ID raster filled with data
# This is an example of pixel-level covariate data or model estimates
data_raster <- mbg::values_to_raster(stats::rnorm(n_data_pixels), id_raster)
# Build aggregation table, which can be used across many aggregations
aggregation_table <- build_aggregation_table(
polygons, id_raster, polygon_id_field = 'commune_code'
)
# Aggregate the raster to the polygons
aggregated <- aggregate_raster_to_polygons(
data_raster = data_raster,
aggregation_table = aggregation_table,
aggregation_cols = 'commune_code',
method = 'mean'
)
head(aggregated)
## End(Not run)
Aggregate a raster to polygons: validation
Description
Data validation for aggregate_raster_to_polygons
Usage
aggregate_raster_to_polygons_validation(
data_raster,
aggregation_table,
aggregation_cols,
method,
aggregated_field,
z_dimension,
z_dimension_name,
weighting_raster,
na.rm
)
Arguments
data_raster |
terra::SpatRaster containing data to be aggregated to polygons. |
aggregation_table |
Aggregation table linking pixels to polygon identifiers,
created using |
aggregation_cols |
(character vector, default 'polygon_id') Polygon identifiers to use for aggregation. |
method |
(character, default 'mean') Aggregation method: one of 'mean', 'sum', 'weighted.mean', or 'weighted.sum'. |
aggregated_field |
(character, default 'data') Name of the aggregated raster values in the output table. |
z_dimension |
(vector, default NULL) If passing a |
z_dimension_name |
(default 'z') The field name for the "z" dimension
corresponding to each layer of the |
weighting_raster |
(terra::SpatRaster, default NULL) The relative weighting of each whole pixel to the overall polygon value, for example, if calculating a population-weighted mean. |
na.rm |
(bool, default TRUE) How to handle NA pixels in |
Value
Errors if checks fail; silently passes if checks pass
See Also
aggregate_raster_to_polygons
Build aggregation table
Description
Build a table to quickly aggregate from pixels to polygons
Usage
build_aggregation_table(polygons, id_raster, polygon_id_field, verbose = FALSE)
Arguments
polygons |
terra::SpatVector object. Should contain a unique ID field. |
id_raster |
terra::SpatRaster object. ID raster created by |
polygon_id_field |
( |
verbose |
( |
Value
data.table with fields:
polygon_id: Unique polygon identifier
pixel_id: unique pixel ID from the ID raster
masked_pixel_id: Index counting only non-NA pixels from the ID raster
area_fraction: fraction of the pixel area falling within this polygon
Merged fields from the table of polygons
See Also
calculate_pixel_fractions_single_polygon()
Examples
## Not run:
polygons <- sf::st_read(system.file('extdata/Benin_communes.gpkg', package = 'mbg'))
id_raster <- build_id_raster(polygons)
aggregation_table <- build_aggregation_table(
polygons, id_raster, polygon_id_field = 'commune_code'
)
## End(Not run)
Validation: Build aggregation table
Description
Input data validation for build_aggregation_table()
Usage
build_aggregation_table_validation(
polygons,
id_raster,
polygon_id_field,
polygon_ids
)
Arguments
polygons |
terra::SpatVector object. Should contain a unique ID field. |
id_raster |
terra::SpatRaster object. ID raster created by |
polygon_id_field |
( |
polygon_ids |
(vector, any type) Polygon identifiers from |
Value
Errors if checks fail; silently passes if checks pass
See Also
build_aggregation_table
Build ID raster
Description
Build an ID raster matching the extent of a vector dataset
Usage
build_id_raster(polygons, template_raster = NULL)
Arguments
polygons |
terra::SpatVector object. The polygons to be aggregated to |
template_raster |
(optional) terra::SpatRaster object. The template raster should
contain and have the same CRS as the polygons. If template raster is |
Details
The ID raster will be used to build the aggregation table. Each pixel has a unique integer value from 1 to the number of pixels in the ID raster.
Value
ID raster. A terra::SpatRaster object that minimally encloses the polygons
Examples
## Not run:
polygons <- sf::st_read(system.file('extdata/Benin_communes.gpkg', package = 'mbg'))
build_id_raster(polygons)
## End(Not run)
Calculate fractional pixels in a polygon
Description
Calculate the fraction of each pixel's area that falls within a single polygon
Usage
calculate_pixel_fractions_single_polygon(polygon, id_raster, polygon_id = NULL)
Arguments
polygon |
terra::SpatVector object of length 1. The polygon to calculate fractional areas across. |
id_raster |
terra::SpatRaster object. ID raster created for the set of all
polygons to be considered, created by |
polygon_id |
(optional). ID for this polygon. Must have length 1. |
Details
This is a helper function called by build_aggregation_table()
.
Value
data.table containing two or three columns:
pixel_id: unique pixel ID from the ID raster
area_fraction: fraction of the pixel area falling within this polygon
polygon_id (optional): If
polygon_id
was defined, it is added to the table
See Also
build_aggregation_table
Examples
## Not run:
polygons <- sf::st_read(system.file('extdata/Benin_communes.gpkg', package = 'mbg'))
id_raster <- build_id_raster(polygons)
pixel_fractions <- calculate_pixel_fractions_single_polygon(
polygon = polygons[1, ], id_raster
)
head(pixel_fractions)
## End(Not run)
Dissolve sf object by attribute
Description
Dissolve an SF object by attribute
Usage
dissolve_sf_by_attribute(x, by = character(0))
Arguments
x |
(sf::sf object) SF object to dissolve |
by |
( |
Details
Inspired by spatialEco::sf_dissolve
Value
Dissolved sf::sf object
Examples
## Not run:
communes_sf <- sf::st_read(system.file("extdata/Benin_communes.gpkg", package = "mbg"))
departments_sf <- mbg::dissolve_sf_by_attribute(
x = communes_sf,
by = c('department', 'department_code')
)
## End(Not run)
Fit INLA model
Description
Fit an INLA model based on a constructed data stack and formula
Usage
fit_inla_model(
formula,
data_stack,
spde,
samplesize_vec = 1,
precision_vec = 1,
family = "binomial",
link = "logit",
fixed_effects_pc_prior = list(threshold = 3, prob_above = 0.05),
verbose = TRUE
)
Arguments
formula |
(character) INLA formula to fit. Generated in
|
data_stack |
Stacked data, covariates, and spatial index. Generated in
|
spde |
SPDE object generated by |
samplesize_vec |
( |
precision_vec |
( |
family |
(character, default 'binomial') GLM family to use. For more information,
see |
link |
(character, default 'logit') Link function to use, typically related to the
GLM |
fixed_effects_pc_prior |
A named list specifying the penalized complexity prior for all fixed effects except for the intercept. The two named items are "threshold", the test threshold for the size of each fixed effect, and "prob_above", the prior probability that the beta for each covariate will EXCEED that threshold. |
verbose |
( |
Details
Using INLA::inla()
with reasonable defaults and settings tuned to predict
across a grid.
Value
A fitted INLA model object created by INLA::inla()
See Also
Generate cell draws and summary rasters from INLA model
Description
Use INLA posteriors to predict out across a grid
Usage
generate_cell_draws_and_summarize(
inla_model,
inla_mesh,
n_samples,
id_raster,
covariates,
inverse_link_function,
nugget_in_predict = TRUE,
admin_boundaries = NULL,
ui_width = 0.95,
verbose = TRUE
)
Arguments
inla_model |
Output from |
inla_mesh |
An SPDE mesh used to define the spatial integration points of the INLA
geostatistical model. Typically created using |
n_samples |
(numeric) Number of posterior predictive samples to draw. |
id_raster |
(terra::SpatRaster) raster showing all cell locations where predictions should be taken. |
covariates |
(list) Named list of all covariate effects included in the model,
typically generated by |
inverse_link_function |
(character) If a link function was used in the INLA model, name of the R function to transform the predictive draws from link space to natural space. For example, in a logit-linked binomial model, pass 'plogis' (as a string is fine) to invert-logit the predictive draws. |
nugget_in_predict |
( |
admin_boundaries |
(sf object, default NULL) The same admin boundaries used to create the admin-level effect, if one was defined in the model. Only used if an admin-level effect was defined in the model. |
ui_width |
(numeric, default 0.95) Size of the uncertainty interval width when calculating the upper and lower summary rasters |
verbose |
( |
Details
Based on a fitted INLA model, the survey area defined in an ID raster, and a set of covariates, generate predictive grid cell draws and summary rasters across a study area.
Value
Named list containing at least the following items:
"parameter_draws": posterior samples generated from
INLA::inla.posterior.sample()
"cell_draws": A matrix of grid cell draws. Each row represents a non-NA pixel in the
id_raster
, in the same order that would be pulled byterra::values()
, and each column represents a different posterior draw."cell_pred_mean": Mean predictive estimate by grid cell, formatted as a terra SpatRaster
"cell_pred_lower": Lower bound of (X%) uncertainty interval, formatted as a terra SpatRaster
"cell_pred_upper": Upper bound of (X%) uncertainty interval, formatted as a terra SpatRaster
Helper function to get the current year
Description
Helper function to get the current year
Usage
get_current_year()
Load covariates
Description
Load covariates
Usage
load_covariates(
directory,
covariates_table,
id_raster,
year = NULL,
file_format = "tif",
add_intercept = FALSE,
check_previous_years = 10
)
Arguments
directory |
Directory containing all covariate sub-directories |
covariates_table |
|
id_raster |
terra::SpatRaster with non-NA pixels delineating the extent of the study area |
year |
( |
file_format |
( |
add_intercept |
( |
check_previous_years |
( |
Details
Load a list of covariates from a specified directory structure
Value
A named list of formatted covariates. Each list item is a terra::SpatRaster
with one layer and the same dimensions as the id_raster
Generate log posterior predictive density from a geostatistical surface onto point data
Description
Generate log posterior predictive density from a geostatistical surface onto point data
Usage
log_posterior_density(draws, validation_data, id_raster, na.rm = FALSE)
Arguments
draws |
( |
validation_data |
(
|
id_raster |
(terra::SpatRaster) Raster showing the sample study area, created using build_id_raster. |
na.rm |
( |
Details
Calculated across draws. Requires an ID raster to match each point observation to a set of draws. Assumes binomial data.
For examples, see vignette('model-comparison', package = 'mbg')
Value
(numeric(1)
) Log predictive density of the validation data given the draw
estimates.
Add global indent for logging
Description
Change a global option that controls indentation for MBG package logging
Usage
logging_add_indent()
Details
Increases the MbgLoggingIndent
option, with default of one if not set
Remove global indent for logging
Description
Change a global option that controls indentation for MBG package logging
Usage
logging_drop_indent()
Details
Decreases the MbgLoggingIndent
option, with default of zero if not set
Get timer log
Description
Return a log of all timed events as a data.table
Usage
logging_get_timer_log(clear_log = FALSE, deindent = TRUE)
Arguments
clear_log |
( |
deindent |
( |
Examples
mbg::logging_start_timer(msg = 'Test logging')
Sys.sleep(0.1)
mbg::logging_stop_timer()
log_results <- mbg::logging_get_timer_log()
print(log_results)
Start logging timer
Description
Start a nested timer with an optional message
Usage
logging_start_timer(msg, echo = TRUE, indentation_text = " ")
Arguments
msg |
( |
echo |
( |
indentation_text |
( |
Examples
mbg::logging_start_timer(msg = 'Test logging')
Sys.sleep(0.1)
mbg::logging_stop_timer()
log_results <- mbg::logging_get_timer_log()
print(log_results)
End logging timer
Description
End a nested timer
Usage
logging_stop_timer(echo = TRUE)
Arguments
echo |
( |
Examples
mbg::logging_start_timer(msg = 'Test logging')
Sys.sleep(0.1)
mbg::logging_stop_timer()
log_results <- mbg::logging_get_timer_log()
print(log_results)
Make time stamp
Description
Create a string time stamp based on current detailed date/time
Usage
make_time_stamp(suffix = NULL, milliseconds = TRUE)
Arguments
suffix |
( |
milliseconds |
( |
Value
A string formatted as 'YYYYMMDD_HH_MM_SS(_optional MS)(_optional suffix)'
Make world template raster
Description
Create a template raster for the world with approximately 5x5km resolution at the equator, matching many common raster covariates for health.
Usage
make_world_template_raster()
Details
The raster has the following specifications:
4320 rows, 8640 columns
Resolution: 0.04166667 decimal degrees, approx. 5km at the equator
CRS: WGS 1984 unprojected latitude/longitude,
terra::crs('EPSG:4326')
Values: All NA. Used exclusively for creating a shapefile-specific ID raster
Value
terra::SpatRaster object matching the specifications above
Prepare data stack for INLA
Description
Prepare data stack for INLA
Usage
prepare_inla_data_stack(
input_data,
id_raster,
covariates,
use_covariates = TRUE,
covariates_sum_to_one = FALSE,
family = "binomial",
use_spde = TRUE,
spde_range_pc_prior = list(threshold = 0.1, prob_below = 0.05),
spde_sigma_pc_prior = list(threshold = 3, prob_above = 0.05),
spde_integrate_to_zero = TRUE,
mesh_max_edge = c(0.2, 5),
mesh_cutoff = 0.04,
use_nugget = TRUE,
nugget_pc_prior = list(threshold = 3, prob_above = 0.05),
use_admin_effect = FALSE,
admin_boundaries = NULL,
admin_pc_prior = list(threshold = 3, prob_above = 0.05)
)
Arguments
input_data |
A data.frame with at least the following columns:
|
id_raster |
terra::SpatRaster with non-NA pixels delineating the extent of the study area |
covariates |
(list) Named list of all covariate effects included in the model,
typically generated by |
use_covariates |
( |
covariates_sum_to_one |
(logical, default FALSE) Should the input covariates be constrained to sum to one? Usually FALSE when raw covariates are passed to the model, and TRUE if running an ensemble (stacking) model. |
family |
( |
use_spde |
( |
spde_range_pc_prior |
(list) A named list specifying the penalized complexity prior for the SPDE range. The two named items are "threshold", the test threshold (set as a proportion of the overall mesh extent), and "prob_below", the prior probability that the value is BELOW that range threshold. The function automatically converts "threshold" from a proportion of the overall mesh extent into a distance. |
spde_sigma_pc_prior |
(list) A named list specifying the penalized complexity prior for sigma (standard deviation) of the SPDE object. The two named items are "threshold", the test threshold for the standard deviation, and "prob_above", the prior probability that sigma will EXCEED that threshold. |
spde_integrate_to_zero |
( |
mesh_max_edge |
( |
mesh_cutoff |
( |
use_nugget |
( |
nugget_pc_prior |
A named list specifying the penalized complexity prior for the
nugget term. The two named items are "threshold", the test threshold for the nugget
standard deviation, and "prob_above", the prior probability that the standard
deviation will EXCEED that threshold. Only used if |
use_admin_effect |
( |
admin_boundaries |
(sf object, default NULL) admin boundaries spatial
object. Only used if |
admin_pc_prior |
(list) A named list specifying the penalized complexity prior
for the admin-level IID term. The two named items are "threshold", the test threshold
for the standard deviation of admin-level effects, and "prob_above", the prior
probability that the standard deviation will EXCEED that threshold. Only used if
|
Details
Creates the formatted input data to be used by the INLA model. For more information about penalized complexity priors, see Daniel Simpson's paper on the subject: doi:10.1214/16-STS576
Value
List containing the following items:
"mesh": The mesh used to approximate the latent Gaussian process
"spde": The SPDE object that will be used to fit the INLA model
"inla_data_stack": The data stack to be passed to
INLA::inla()
"formula_string": The formula specification to be passed to
INLA::inla()
See Also
fit_inla_model()
MbgModelRunner
Generate RMSE from an estimated raster surface and point data
Description
Generate RMSE from an estimated raster surface and point data
Usage
rmse_raster_to_point(estimates, validation_data, outcome_field, na.rm = FALSE)
Arguments
estimates |
(terra::SpatRaster) Raster surface containing point estimates. This could also be the mean surface of a Bayesian geostatistical model |
validation_data |
(
|
outcome_field |
( |
na.rm |
( |
Details
For examples, see vignette('model-comparison', package = 'mbg')
Value
A single number giving RMSE between the point data and estimates raster.
Run regression sub-models
Description
Wrapper to run many regression sub-models using the caret package
Usage
run_regression_submodels(
input_data,
id_raster,
covariates,
cv_settings,
model_settings,
family = "binomial",
clamping = TRUE,
use_admin_bounds = FALSE,
admin_bounds = NULL,
admin_bounds_id = "polygon_id",
prediction_range = c(-Inf, Inf),
verbose = TRUE
)
Arguments
input_data |
A data.frame with at least the following columns:
|
id_raster |
terra::SpatRaster with non-NA pixels delineating the extent of the study area |
covariates |
(list) Named list of all covariate effects included in the model,
typically generated by |
cv_settings |
Named list of cross-validation settings, passed to caret::trainControl. |
model_settings |
Named list where the name of each header corresponds to a model run in caret::train, and the arguments correspond to the model-specific settings for that model type. |
family |
( |
clamping |
( |
use_admin_bounds |
( |
admin_bounds |
(sf, default NULL) Administrative boundaries to use.
Only considered if |
admin_bounds_id |
( |
prediction_range |
( |
verbose |
( |
Value
List with two items:
"models": A list containing summary objects for each regression model
"predictions": Model predictions covering the entire id_raster
Summarize draws
Description
Helper function to summarize a matrix or data.frame of predictive draws
Usage
summarize_draws(
draws,
id_fields = NULL,
draw_fields = NULL,
ui_width = 0.95,
na.rm = TRUE
)
Arguments
draws |
A |
id_fields |
(default NULL) Only considered for data.frame-like |
draw_fields |
(default NULL) Only considered for data.frame-like |
ui_width |
( |
na.rm |
( |
Value
A data.table::data.table containing at least the following fields:
The
id_fields
, if passed"mean": Mean across predictive draws
"lower": Lower bound of the (X%) uncertainty interval
"upper": Upper bound of the (X%) uncertainty interval
"ui_width": "upper" - "lower"
Examples
# Summarize a draws matrix
draws_matrix <- matrix(rnorm(200), nrow = 10)
summary_table_a <- summarize_draws(draws_matrix)
head(summary_table_a)
# Summarize a draws data.table with location IDs
draws_table <- matrix(c(1:10, rnorm(200)), nrow = 10) |>
data.table::as.data.table() |>
data.table::setnames(c('location_id', paste0('draw_', 1:20)))
summary_table_b <- summarize_draws(draws_table, id_fields = 'location_id')
head(summary_table_b)
Insert values into a raster
Description
Insert a vector or matrix of values into an ID spatRaster
Usage
values_to_raster(x, id_raster)
Arguments
x |
Vector, matrix, data.frame, or data.table of values that will be inserted into
the ID raster. The length of x must be exactly divisible by
|
id_raster |
ID raster showing the outline of the study area, created using
|
Details
The length of the vector or matrix must be a multiple of the number of non-NA pixels in the ID raster. Values from the vector/matrix are then inserted into the non-NA pixels of the spatRaster.
Value
SpatRaster with the same outline as the ID raster and (# values / # non-NA pixels in the ID raster) layers.
See Also
Examples
# Example ID raster with 10 rows and 10 columns, and 99 valid pixels
example_id_raster <- terra::rast(matrix(c(seq_len(99), NA), nrow = 10))
# Inserting 99 values yields a spatRaster with 1 layer
mbg::values_to_raster(stats::rnorm(99), example_id_raster)
# Inserting 99 * 3 values yields a spatRaster with 3 layers
mbg::values_to_raster(seq_len(99 * 3), example_id_raster)
# Trying to insert values with length not divisible by 99 yields an error
try(mbg::values_to_raster(seq_len(100), example_id_raster))
Run variance inflation factor (VIF) selection on input covariates
Description
Run variance inflation factor (VIF) selection on input covariates
Usage
vif_covariate_select(dataset, vif_cutoff = 5)
Arguments
dataset |
data.frame-like object with named columns containing all covariates to consider in the VIF analysis. |
vif_cutoff |
( |
Value
data.table listing each variable, VIF in most recent round, and whether the indicator should be included or not