BoundaryStats Analysis Workflow

Amy Luo

2025-07-22

This vignette is a walkthrough of the workflow for calculating boundary statistics and boundary overlap statistics for two ecological variables. Here, we use East African ecoregion data and genomic analyses from Barratt et al. 2018. The following code tests for (1) whether there are significant geographic boundaries between genetic groups of the frog species Leptopelis flavomaculatus and (2) whether those boundaries overlap significantly with ecoregion boundaries.

Load packages used in this vignette

library(BoundaryStats)
library(terra)
#> Warning: package 'terra' was built under R version 4.3.3

Input data

Ecoregion data

data(ecoregions)
ecoregions <- rast(ecoregions_matrix, crs = ecoregions_crs)
ext(ecoregions) <- ecoregions_ext
plot(ecoregions)

Genetic data

These data are based on the ADMIXTURE results from Barratt et al. 2018. The point data (assignment probabilities for each individual) have been interpolated using universal kriging to produce a raster surface.

data(L.flavomaculatus)
L.flavomaculatus <- rast(L.flavomaculatus_matrix, crs = L.flavomaculatus_crs)
ext(L.flavomaculatus) <- L.flavomaculatus_ext
plot(L.flavomaculatus)

Matching datasets

In order to test for significant overlap between the traits, the SpatRaster objects need to be aligned in extent, resolution, and projection. We are first matching the projection, then downsampling and cropping the ecoregion raster to match the genetic data. We are also masking the genetic raster with the ecoregion, since it originally includes space off the coastline.

crs(ecoregions) <- crs(L.flavomaculatus)
ecoregions <- resample(ecoregions, L.flavomaculatus) |>
  crop(L.flavomaculatus) |>
  mask(L.flavomaculatus)
L.flavomaculatus <- crop(L.flavomaculatus, ecoregions) |>
  mask(ecoregions)

Define trait boundaries and ecoregion boundaries

There are two functions to define geographical boundaries in BoundaryStats, which take different data. The function define_boundary takes either continuous trait data and boundary intensities. If inputting continuous trait data, use convert = TRUE to convert from trait data into boundaries. If inputting boundary intensity data (e.g., urbanization metrics if urban land uses are boundaries), use convert = FALSE to define boundaries based on an intensity threshold.

The two datasets in this vignette are categorical–ecoregion and genetic group identity–so we are using the other boundary definition function, categorical_boundary to identify spatial transitions from one category to another.

ecoregions_boundaries <- define_boundary(ecoregions, cat = T)
L.flavomaculatus_boundaries <- define_boundary(L.flavomaculatus, cat = T)

Plot boundary overlap

The overlap in boundaries between two variables can be plotted using the plot_boundary function, which is a wrapper for ggplot2. If output_raster = TRUE, the function will return a SpatRaster object with a single layer containing boundary elements for each trait and cells with overlapping boundary elements.

plot_boundary(L.flavomaculatus_boundaries, ecoregions_boundaries, trait_names = c('A. delicatus genetic group', 'Ecoregion'), output_raster = F)

Calculate boundary statistics

Create null distributions

The function boundary_null_distrib simulates neutral landscapes based on the input data. The default number of iterations is 10, but a value between 100 and 1000 is recommended. This step may take a while, depending on the selected neutral model and number of iterations, so we are maintaining the default 10 iterations.

Three neutral models are currently available: complete stochasticity (default), Gaussian random fields, and modified random clusters. Random cluster models are suited to categorical variables like group identity (cat = T), so we use it here.

L.flav_bound.null <- boundary_null_distrib(L.flavomaculatus, cat = T, n_iterations = 10, model = 'random_cluster', p = 0.5, progress = F)
#> DONE

Run statistical tests

n_boundaries is the number of boundaries (i.e., contiguous groups of cells representing boundaries), and longest_boundary is the length of the longest subgraph.

n_boundaries(L.flavomaculatus_boundaries, L.flav_bound.null)
#> n_boundary    p-value 
#>          2          0
longest_boundary(L.flavomaculatus_boundaries, L.flav_bound.null)
#> longest_boundary          p-value 
#>         284984.9              0.0

Calculate boundary overlap statistics

Create null distributions

Usage for overlap_null_distrib is similar to boundary_null_distrib, but takes raster surfaces for two traits (x and y), along with arguments for each trait. Since we are testing the effects of relatively static ecoregions on L. flavomaculatus population structure, we are not going to simulate randomized rasters for the ecoregions.

L.flav_overlap.null <- overlap_null_distrib(L.flavomaculatus, ecoregions, rand_both = F, x_cat = T, n_iterations = 10, x_model = 'random_cluster', px = 0.5, progress = F)
#> DONE

Run statistical tests

n_overlap_boundaries is Od, the number of directly overlapping boundary elements between the two variables.average_min_x_to_y is Ox, the average minimum distance for a a Trait x boundary element to the nearest Trait y boundary element. It assumes that boundaries for Trait x depend on the boundaries in Trait y. average_min_distance is Oxy, the average minimum distance between boundary elements in x and y (x and y affect each other reciprocally).

n_overlap_boundaries(L.flavomaculatus_boundaries, ecoregions_boundaries, L.flav_overlap.null)
#> n_overlapping       p-value 
#>             8             0
average_min_x_to_y(L.flavomaculatus_boundaries, ecoregions_boundaries, L.flav_overlap.null)
#> avg_min_x_to_y        p-value 
#>       48465.28           0.00
average_min_distance(L.flavomaculatus_boundaries, ecoregions_boundaries, L.flav_overlap.null)
#> avg_min_dist      p-value 
#>     161879.4          0.0

Citation

Barratt, C.D., Bwong, B.A., Jehle, R., Liedtke, C.H., Nagel, P., Onstein, R.E., Portik, D.M., Streicher, J.W. & Loader, S.P. (2018) Vanishing refuge? Testing the forest refuge hypothesis in coastal East Africa using genome‐wide sequence data for seven amphibians. Molecular Ecology, 27, 4289-4308.