Type: | Package |
Title: | Species Distribution Modeling and Ecological Niche Modeling |
Version: | 1.2.12 |
Date: | 2025-03-289 |
Description: | Implements species distribution modeling and ecological niche modeling, including: bias correction, spatial cross-validation, model evaluation, raster interpolation, biotic "velocity" (speed and direction of movement of a "mass" represented by a raster), interpolating across a time series of rasters, and use of spatially imprecise records. The heart of the package is a set of "training" functions which automatically optimize model complexity based number of available occurrences. These algorithms include MaxEnt, MaxNet, boosted regression trees/gradient boosting machines, generalized additive models, generalized linear models, natural splines, and random forests. To enhance interoperability with other modeling packages, no new classes are created. The package works with 'PROJ6' geodetic objects and coordinate reference systems. |
Depends: | R (≥ 4.0.0) |
Imports: | AICcmodavg, boot, cowplot, data.table, doParallel, DT, foreach, gbm, ggplot2, graphics, ks, maxnet, methods, mgcv, omnibus, parallel, predicts, ranger, rJava, scales, sf, shiny, sp, statisfactory, stats, terra, utils |
LazyData: | true |
LazyLoad: | yes |
URL: | https://github.com/adamlilith/enmSdmX |
BugReports: | https://github.com/adamlilith/enmSdmX/issues |
Encoding: | UTF-8 |
License: | MIT + file LICENSE |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-03-29 05:15:57 UTC; adame |
Author: | Adam B. Smith |
Maintainer: | Adam B. Smith <adam.smith@mobot.org> |
Repository: | CRAN |
Date/Publication: | 2025-03-29 05:30:08 UTC |
enmSdmX: Species distribution modeling and ecological niche modeling
Description
Tools for implementing species distribution models and ecological niche models, including: bias correction, spatial cross-validation, model evaluation, raster interpolation, biotic "velocity" (speed and direction of movement of a "mass" represented by a raster), and tools for using spatially imprecise records. The heart of the package is a set of "training" functions which automatically optimize model complexity based number of available occurrences. These algorithms include MaxEnt, MaxNet, boosted regression trees/gradient boosting machines, generalized additive models, generalized linear models, natural splines, and random forests. To enhance interoperability to and from other packages, the package does not create any new classes. The package works with PROJ6 geodetic objects and coordinate reference systems.
Details
Create an issue on GitHub.
Using imprecisely-georeferenced occurrences
coordImprecision
: Coordinate imprecision
nearestEnvPoints
: Extract "most conservative" environments from points and/or polygons
nearestGeogPoints
: Create a minimum convex polygon from a set of spatial polygons and/or points
Data preparation
geoFold
: Assign geographically-distinct k-folds
geoFoldContrast
: Assign geographically-distinct k-folds to background or absence sites
elimCellDuplicates
: Eliminate duplicate points in each cell of a raster
Bias correction
geoThin
: Thin geographic points deterministically or randomly
weightByDist
: Proximity-based weighting for occurrences (points) to correct for spatial bias
Model calibration
trainByCrossValid
: and summaryByCrossValid
: Implement a trainXYZ
function across calibration folds (which are distinct from evaluation folds).
trainBRT
: Boosted regression trees (BRTs)
trainESM
: Ensembles of small models (ESMs)
trainGLM
: Generalized linear models (GLMs)
trainGLM
: Generalized linear models (GLMs)
trainMaxEnt
: MaxEnt models
trainMaxNet
: MaxNet models
trainNS
: Natural spline (NS) models
trainRF
: Random forest (RF) models
Model prediction
predictEnmSdm
: Predict most model types using default settings
predictMaxEnt
: Predict MaxEnt model
predictMaxNet
: Predict MaxNet model
Model evaluation
evalAUC
: AUC (with/out site weights)
evalMultiAUC
: Multivariate version of AUC (with/out site weight)
evalContBoyce
: Continuous Boyce Index (with/out site weights)
evalThreshold
: Thresholds to convert continuous predictions to binary predictions (with/out site weights)
evalThresholdStats
: Model performance statistics based on thresholded predictions (with/out site weights)
evalTjursR2
: Tjur's R2 (with/out site weights)
evalTSS
: True Skill Statistic (TSS) (with/out site weights)
modelSize
: Number of response values in a model object
Response curves and niche overlap
compareResponse
: Compare model responses to a single variable
nicheOverlapMetrics
: Niche overlap metrics
responseCurves
: Create plots of response curves for one or more models
Functions for rasters
bioticVelocity
: Velocity of movement across a series of rasters
getValueByCell
: Get value(s) in raster cell(s) by cell number
globalx
: "Friendly" wrapper for terra::global() for calculating raster statistics
interpolateRasts
: Interpolate a stack of rasters
longLatRasts
: Generate rasters with values of longitude/latitude for cell values
sampleRast
: Sample raster with/out replacement
setValueByCell
: Set value(s) in raster cell(s) by cell number
squareCellRast
: Create a raster with square cells
Coordinate reference systems
getCRS
: Return a WKT2 string (coordinate reference system string) using a nickname
crss
: Table of coordinate reference systems and nicknames
customAlbers
: Create a custom Albers conic equal-area projection
customLambert
: Create a custom Lambert azimuthal equal-area projection
customVNS
: Create a custom "vertical near-side" projection
Geographic utility functions
countPoints
: Number of points in a "spatial points" object
decimalToDms
: Convert decimal coordinate to degrees-minutes-seconds
dmsToDecimal
: Convert degrees-minutes-seconds coordinate to decimal
extentToVect
: Convert extent to polygon
plotExtent
: Create a 'SpatialPolygon' the same size as a plot region
spatVectorToSpatial
: Convert SpatVector
object to a Spatial
* object.
Data
canada
: Outline of Canada
lemurs
: Lemur occurrences
mad0
: Madagascar spatial object
mad1
: Madagascar spatial object
madClim
: Madagascar climate rasters for the present
madClim2030
: Madagascar climate rasters for the 2030s
madClim2050
: Madagascar climate rasters for the 2050s
madClim2070
: Madagascar climate rasters for the 2070s
madClim2090
: Madagascar climate rasters for the 2090s
Author(s)
Adam B. Smith
See Also
Useful links:
Calculate weights for a model
Description
Calculates weighting for a model. Each record receives a numeric weight.
Usage
.calcWeights(w, data, resp, family)
Arguments
w |
Either logical in which case |
data |
Data frame |
resp |
Name of response column |
family |
Name of family |
Value
A numeric vector.
Movement of occupied cells in a given direction of a fixed point
Description
This function calculates the weighted distance moved by a mass represented by set of cells which fall north, south, east, or west of a given location (i.e., typically the centroid of the starting population). Values >0 confer movement to the north, south, east, or west of this location.
Usage
.cardinalDistance(
direction,
longOrLat,
coordVect,
x1,
x2,
refCoord,
x1weightedLongs,
x1weightedLats,
x2weightedLongs,
x2weightedLats,
x1weightedElev = NULL,
x2weightedElev = NULL
)
Arguments
direction |
Any of: |
longOrLat |
Numeric matrix, latitude or longitudes. If |
coordVect |
Vector of latitude or longitude of cell centers, depending on value of |
x1 |
Matrix of weights in time 1 (i.e., population size). |
x2 |
Matrix of weights in time 2 (i.e., population size). |
refCoord |
Numeric, latitude or longitude (depending on |
x1weightedLongs |
Matrix of longitudes weighted (i.e., by population size, given by |
x1weightedLats |
Matrix of latitudes weighted (i.e., by population size, given by |
x2weightedLongs |
Matrix of longitudes weighted (i.e., by population size, given by |
x2weightedLats |
Matrix of latitudes weighted (i.e., by population size, given by |
x1weightedElev |
Matrix of elevations weighted by x1 or |
x2weightedElev |
Matrix of elevations weighted by x2 or |
Value
A list object with distance moved and abundance of all cells north/south/east/west of reference point.
Euclidean distance between a pair of points
Description
Euclidean distance between a pair of points or two points. Note that the output is unsigned if x2
and y2
are provided, but signed if not.
Usage
.euclid(a, b)
Arguments
a |
Numeric vector from 1 to 3 elements long |
b |
Numeric vector from 1 to 3 elements long |
Latitude of quantile(s) of the geographic abundance distribution
Description
This function returns the latitude or longitude of quantile(s) of the geographic abundance distribution. The input is derived from a rasterized map of the species abundance distribution. If a quantile would occur somewhere between two cells, the latitude/longitude is linearly interpolated between the cells bracketing its value.
Usage
.interpCoordFromQuantile(
latOrLong,
quants,
x,
coordVect,
weightedElev = NULL,
warn = TRUE
)
Arguments
latOrLong |
Either 'latitude' or 'longitude' |
quants |
Quantile value(s) (i.e., in the range [0, 1]) |
x |
Matrix of abundances. |
coordVect |
Vector of latitudes, one per row in |
weightedElev |
Raster of elevations weighted by x1 or x2 or |
warn |
Logical. Show warnings. |
Scales predictors
Description
Scales predictors.
Usage
.scalePredictors(scale, preds, data)
Arguments
scale |
Either |
preds |
A character vector with names of the predictors in |
data |
A data frame. |
Velocity of shifts in densities across a series of rasters
Description
Calculates metrics of "movement" of cell densities across a time series of rasters. Rasters could represent, for example, the probability of presence of a species through time. In this case, velocities would indicate rates and directions of range shift. The simplest metric is movement of the density-weighted centroid (i.e., range "center"), but many more are available to provide a nuanced indicator of velocity. See Details
for the types of metrics that can be calculated.
Usage
bioticVelocity(
x,
times = NULL,
atTimes = NULL,
elevation = NULL,
metrics = c("centroid", "nsCentroid", "ewCentroid", "nCentroid", "sCentroid",
"eCentroid", "wCentroid", "nsQuants", "ewQuants", "similarity", "summary"),
quants = c(0.05, 0.1, 0.5, 0.9, 0.95),
onlyInSharedCells = FALSE,
cores = 1,
warn = TRUE,
longitude = NULL,
latitude = NULL,
paths = NULL,
...
)
Arguments
x |
Either a
|
times |
Numeric vector with the same number of layers in |
atTimes |
Numeric, values of |
elevation |
Either |
metrics |
Biotic velocity metrics to calculate (default is to calculate them all). All metrics ignore
|
quants |
Numeric vector indicating the quantiles at which biotic velocity is calculated for the " |
onlyInSharedCells |
If |
cores |
Positive integer. Number of processor cores to use. Note that if the number of time steps at which velocity is calculated is small, using more cores may not always be faster. If you have issues when |
warn |
Logical, if |
longitude |
Numeric matrix or
|
latitude |
Numeric matrix or
|
paths |
This is used internally and rarely (never?) needs to be defined by a user (i.e., leave it as |
... |
Other arguments (not used). |
Details
Attention:
This function may yield erroneous velocities if the region of interest is near or spans a pole or the international date line. Results using the "Quant" and "quant" metrics may be somewhat counterintuitive if just one cell is >0, or one row or column has the same values with all other values equal to 0 or NA
because defining quantiles in these situations is not intuitive. Results may also be counterintuitive if some cells have negative values because they can "push" a centroid away from what would seem to be the center of mass as assessed by visual examination of a map.
Note:
For the nsQuants
and ewQuants
metrics it is assumed that the latitude/longitude assigned to a cell is at its exact center. For calculating the position of a quantile, density is interpolated linearly from one cell center to the center of the adjacent cell. If a desired quantile does not fall exactly on the cell center, it is calculated from the interpolated values. For quantiles that fall south/westward of the first row/column of cells, the cell border is assumed to be at 0.5 * cell length south/west of the cell center.
Value
A data frame with biotic velocities and related values. Fields are as follows:
-
timeFrom
: Start time of interval -
timeTo
: End time of interval -
timeMid
: Time point betweentimeFrom
andtimeTo
-
timeSpan
: Duration of interval
Depending on metrics
that are specified, additional fields are as follows. All measurements of velocity are in distance units (typically meters) per time unit (which is the same as the units used for times
and atTimes
). For example, if the rasters are in an Albers equal-area projection and times
are in years, then the output will be meters per year.
If
metrics
has'centroid'
: Columns namedcentroidVelocity
,centroidLong
,centroidLat
– Speed of weighted centroid, plus its longitude and latitude (in thetimeTo
period of each time step). Values are always >= 0.If
metrics
has'nsCentroid'
: Columns namednsCentroid
andnsCentroidLat
– Velocity of weighted centroid in north-south direction, plus its latitude (in thetimeTo
period of each time step). Positive values connote movement north, and negative values south.If
metrics
has'ewCentroid'
:ewCentroid
andewCentroidLong
– Velocity of weighted centroid in east-west direction, plus its longitude (in thetimeTo
period of each time step). Positive values connote movement east, and negative values west.If
metrics
has'nCentroid'
,'sCentroid'
,'eCentroid'
, and/or'wCentroid'
: Columns namednCentroidVelocity
andnCentroidAbund
,sCentroid
andsCentroidAbund
,eCentroid
andeCentroidAbund
, and/orwCentroid
andwCentroidAbund
– Speed of weighted centroid of all cells that fall north, south, east, or west of the landscape-wide centroid, plus a column indicating the total weight (abundance) of all such populations. Values are always >= 0.If
metrics
contains any ofnsQuants
orewQuants
: Columns namednsQuantVelocity_quant
Q andnsQuantLat_quant
Q, orewQuantVelocity_quant
Q andewQuantLat_quant
Q: Velocity of the Qth quantile weight in the north-south or east-west directions, plus the latitude or longitude thereof (in thetimeTo
period of each time step). Quantiles are cumulated starting from the south or the west, so the 0.05th quantile, for example, is in the far south or west of the range and the 0.95th in the far north or east. Positive values connote movement north or east, and negative values movement south or west.If
metrics
containssimilarity
, metrics of similarity are calculated for each pair of successive landscapes, defined below asx1
(raster intimeFrom
) andx2
(raster intimeTo
), with the number of shared non-NA
cells between them beingn
:A column named
simpleMeanDiff
:sum(x2 - x1, na.rm = TRUE) / n
A column named
meanAbsDiff
:sum(abs(x2 - x1), na.rm = TRUE) / n
A column named
rmsd
(root-mean square difference):sqrt(sum((x2 - x1)^2, na.rm = TRUE)) / n
A column named
godsoeEsp
:1 - sum(2 * (x1 * x2), na.rm=TRUE) / sum(x1 + x2, na.rm = TRUE)
, values of 1 ==> maximally similar, 0 ==> maximally dissimilar.A column named
schoenersD
:1 - (sum(abs(x1 - x2), na.rm = TRUE) / n)
, values of 1 ==> maximally similar, 0 ==> maximally dissimilar.A column named
warrensI
:1 - sqrt(sum((sqrt(x1) - sqrt(x2))^2, na.rm = TRUE) / n)
, values of 1 ==> maximally similar, 0 ==> maximally dissimilar.A column named
cor
: Pearson correlation betweenx1
andx2
.A column named
rankCor
: Spearman rank correlation betweenx1
andx2
.
If
metrics
containselevCentroid
: Columns namedelevCentroidVelocity
andelevCentroidElev
– Velocity of the centroid in elevation (up or down) and the elevation in the "to" timestep. Positive values of velocity connote movement upward, and negative values downward.If
metrics
containselevQuants
: Columns namedelevQuantVelocity_quant
Q andelevQuantVelocityElev_quant
Q – Velocity of the Nth quantile of mass in elevation (up or down) and the elevation of this quantile in the "to" timestep. Positive values of velocity connote movement upward, and negative values downward.If
metrics
containssummary
:A column named
propSharedCellsNotNA
: Proportion of cells that are notNA
in both the "from" and "to" time steps. The proportion is calculated using the total number of cells in a raster as the denominator (i.e., not total number of cells across two rasters).Columns named
timeFromPropNotNA
andtimeToPropNotNA
: Proportion of cells in the "from" time and "to" steps that are notNA
.A column named
mean
: Mean weight intimeTo
time step. In the same units as the values of the cells.Columns named
quantile_quant
Q: The Qth quantile(s) of weight in thetimeTo
time step. In the same units as the values of the cells.A column named
prevalence
: Proportion of non-NA
cells with weight >0 in thetimeTo
time step relative to all non-NA
cells. Unitless.
Examples
# NB These examples can take a few minutes to run.
# To illustrate calculation and interpretation of biotic velocity,
# we will calibrate a SDM for the Red-Bellied Lemur and project
# the model to the present and successive future climates. The time series
# of rasters is then used to calculate biotic velocity.
library(sf)
library(terra)
### process environmental rasters
#################################
# get rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
rastFile <- system.file('extdata/madClim2030.tif', package='enmSdmX')
madClim2030 <- rast(rastFile)
rastFile <- system.file('extdata/madClim2050.tif', package='enmSdmX')
madClim2050 <- rast(rastFile)
rastFile <- system.file('extdata/madClim2070.tif', package='enmSdmX')
madClim2070 <- rast(rastFile)
rastFile <- system.file('extdata/madClim2090.tif', package='enmSdmX')
madClim2090 <- rast(rastFile)
# The bioticVelocity() function needs rasters to be in equal-area
# projection, so we will project them here.
madAlbers <- getCRS('madAlbers') # Albers projection for Madagascar
madClim <- project(madClim, madAlbers)
madClim2030 <- project(madClim2030, madAlbers)
madClim2050 <- project(madClim2050, madAlbers)
madClim2070 <- project(madClim2070, madAlbers)
madClim2090 <- project(madClim2090, madAlbers)
# Coordinate reference systems:
wgs84 <- getCRS('WGS84') # WGS84
madAlbers <- getCRS('madAlbers') # Madagascar Albers
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- project(occs, madAlbers)
# eliminate cell duplicates
occs <- elimCellDuplicates(occs, madClim)
# extract environment at occurrences
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create background sites (using just 1000 to speed things up!)
bgEnv <- terra::spatSample(madClim, 3000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
### calibrate model
###################
predictors <- c('bio1', 'bio12')
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
cores = 2
)
### project to present and future climate
#########################################
predPresent <- predictEnmSdm(mx, madClim)
pred2030 <- predictEnmSdm(mx, madClim2030)
pred2050 <- predictEnmSdm(mx, madClim2050)
pred2070 <- predictEnmSdm(mx, madClim2070)
pred2090 <- predictEnmSdm(mx, madClim2090)
plot(predPresent, main = 'Present Suitability')
# plot change in suitability between present and 2090s
delta <- pred2090 - predPresent
plot(delta, main = 'Change in Suitability')
### calculate biotic velocity
#############################
series <- c(
predPresent,
pred2030,
pred2050,
pred2070,
pred2090
)
names(series) <- c('present', 't2030', 't2050', 't2070', 't2090')
plot(series)
times <- c(1985, 2030, 2050, 2070, 2090)
quants <- c(0.10, 0.90)
bv <- bioticVelocity(
x = series,
times = times,
quants = quants,
cores = 2
)
bv
### centroid velocities
# centroid (will always be >= 0)
# fastest centroid movement around 2060
plot(bv$timeMid, bv$centroidVelocity, type = 'l',
xlab = 'Year', ylab = 'Speed (m / y)', main = 'Centroid Speed')
# velocity northward/southward through time
# shows northward shift because always positive, fastest around 2060
plot(bv$timeMid, bv$nsCentroidVelocity, type = 'l',
xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Centroid N/S Velocity')
# velocity eastward (positive)/westward (negative) through time
# movement eastward (positive) first, then westward (negative)
plot(bv$timeMid, bv$ewCentroidVelocity, type = 'l',
xlab = 'Year', ylab = 'Velocity (m / y)', main = 'Centroid E/W Velocity')
### map of centroid location through time
# shows centroid moves slightly northward through time
plot(delta, main = 'Centroid Location &\nChange in Suitability')
points(bv$centroidLong[1], bv$centroidLat[1], pch = 1)
points(bv$centroidLong[4], bv$centroidLat[4], pch = 16)
lines(bv$centroidLong, bv$centroidLat)
legend(
'bottomright',
legend = c(
'start (~1985)',
'stop (~2090)',
'trajectory'
),
pch = c(1, 16, NA),
lwd = c(NA, NA, 1)
)
### velocities of portions of range north/south/east/west of centroid
# positive ==> northward shift
# negative ==> southward shift
# portion of range north of centroid
# shows northward expansion because always positive
plot(bv$timeMid, bv$nCentroidVelocity, type = 'l',
xlab = 'Year', ylab = 'Velocity (m / y)',
main = 'Northern Part of Range')
# portion of range south of centroid
# shows northward contraction because always positive
plot(bv$timeMid, bv$sCentroidVelocity, type = 'l',
xlab = 'Year', ylab = 'Velocity (m / y)',
main = 'Southern Part of Range')
# portion of range east of centroid
# shows eastern portion moves farther east
plot(bv$timeMid, bv$eCentroidVelocity, type = 'l',
xlab = 'Year', ylab = 'Velocity (m / y)',
main = 'Eastern Part of Range')
# portion of range west of centroid
# shows western portion moves east
plot(bv$timeMid, bv$wCentroidVelocity, type = 'l',
xlab = 'Year', ylab = 'Velocity (m / y)',
main = 'Western Part of Range')
### velocities of range margins
# from south to north, 10th and 90th quantiles of density
# positive ==> northward shift
# negative ==> southward shift
# shows both northern and southern range margins shift northward
# because always positive... northern margin shift usually slower
ylim <- range(bv$nsQuantVelocity_quant0p1, bv$nsQuantVelocity_quant0p9)
plot(bv$timeMid, bv$nsQuantVelocity_quant0p1, type = 'l', ylim = ylim,
xlab = 'Year', ylab = 'Velocity (m / y)',
main = 'Northern/Southern Range Margins')
lines(bv$timeMid, bv$nsQuantVelocity_quant0p9, lty = 'dashed')
legend(
'bottomright',
legend = c('Southern Margin', 'Northern Margin'),
lty = c('solid', 'dashed')
)
# from east to west, 10th and 90th quantiles of density
# positive ==> eastward shift
# negative ==> westward shift
ylim <- range(bv$ewQuantVelocity_quant0p1, bv$ewQuantVelocity_quant0p9)
plot(bv$timeMid, bv$ewQuantVelocity_quant0p1, type = 'l', ylim = ylim,
xlab = 'Year', ylab = 'Velocity (m / y)',
main = 'Eastern/Western Range Margins')
lines(bv$timeMid, bv$ewQuantVelocity_quant0p9, lty = 'dashed')
legend(
'bottomright',
legend = c('Eastern Margin', 'Western Margin'),
lty = c('solid', 'dashed')
)
### summary statistics
# mean density across cells through time
plot(bv$timeMid, bv$mean, type = 'l',
xlab = 'Year', ylab = 'Mean Density',
main = 'Mean Density')
# sum of density across cells through time
plot(bv$timeMid, bv$sum, type = 'l',
xlab = 'Year', ylab = 'Sum of Density',
main = 'Sum of Density')
### change metrics
# average change in suitability from one time period to next
# shows average conditions getting worse
plot(bv$timeMid, bv$simpleMeanDiff, type = 'l',
xlab = 'Year', ylab = 'Mean Change in Suitability')
# average absolute change in suitability from one time period to next
# shows average absolute change declining
plot(bv$timeMid, bv$meanAbsDiff, type = 'l',
xlab = 'Year', ylab = 'Mean Absolute Change in Suitability')
# root-mean square difference from one time period to the next
# shows difference between successive rasters declines through time
plot(bv$timeMid, bv$rmsd, type = 'l',
xlab = 'Year', ylab = 'RMSD')
### raster similarity
# most indicate that successive rasters are similar through time
ylim <- range(bv$godsoeEsp, bv$schoenerD, bv$warrenI, bv$cor, bv$warrenI)
plot(bv$timeMid, bv$godsoeEsp, type = 'l', lty = 1, col = 1,
xlab = 'Year', ylab = 'Raster similarity', ylim = ylim)
lines(bv$timeMid, bv$schoenerD, lty = 2, col = 2)
lines(bv$timeMid, bv$warrenI, lty = 3, col = 3)
lines(bv$timeMid, bv$cor, lty = 4, col = 4)
lines(bv$timeMid, bv$rankCor, lty = 5, col = 5)
legend(
'right',
legend = c(
'Godsoe\'s ESP',
'Schoener\'s D',
'Warren\'s I',
'Correlation',
'Rank Correlation'
),
col = 1:5,
lty = 1:5
)
# values of 10th and 90th quantiles across cells through time
# shows most favorable cells becoming less favorable
# least favorable cells remain mainly unchanged
ylim <- range(bv$quantile_quant0p1, bv$quantile_quant0p9)
plot(bv$timeMid, bv$quantile_quant0p1, type = 'l', ylim = ylim,
xlab = 'Year', ylab = 'Quantile Value',
main = 'Quantiles across Cells')
lines(bv$timeMid, bv$quantile_quant0p9, lty = 'dashed')
legend(
'topright',
legend = c('10th quantile', '90th quantile'),
lty = c('solid', 'dashed')
)
### map of northern/southern range margins through time
# range of longitude shown in plot
madExtent <- ext(madClim)
xExtent <- as.vector(madExtent)[1:2]
plot(predPresent, main = 'North/South Range Margin Location')
lines(c(xExtent[1], xExtent[2]),
c(bv$nsQuantLat_quant0p9[1], bv$nsQuantLat_quant0p9[1]))
lines(c(xExtent[1], xExtent[2]),
c(bv$nsQuantLat_quant0p9[2], bv$nsQuantLat_quant0p9[2]), lty = 'dashed')
lines(c(xExtent[1], xExtent[2]),
c(bv$nsQuantLat_quant0p9[3], bv$nsQuantLat_quant0p9[3]), lty = 'dotdash')
lines(c(xExtent[1], xExtent[2]),
c(bv$nsQuantLat_quant0p9[4], bv$nsQuantLat_quant0p9[4]), lty = 'dotted')
lines(c(xExtent[1], xExtent[2]),
c(bv$nsQuantLat_quant0p1[1], bv$nsQuantLat_quant0p1[1]))
lines(c(xExtent[1], xExtent[2]),
c(bv$nsQuantLat_quant0p1[2], bv$nsQuantLat_quant0p1[2]), lty = 'dashed')
lines(c(xExtent[1], xExtent[2]),
c(bv$nsQuantLat_quant0p1[3], bv$nsQuantLat_quant0p1[3]), lty = 'dotdash')
lines(c(xExtent[1], xExtent[2]),
c(bv$nsQuantLat_quant0p1[4], bv$nsQuantLat_quant0p1[4]), lty = 'dotted')
legend(
'bottomright',
legend = c(
'1980s',
'2030s',
'2050s',
'2070s',
'2090s'
),
lty = c('solid', 'dashed', 'dotdash', 'dotted')
)
### map of eastern/western range margins through time
# range of longitude shown in plot
madExtent <- ext(madClim)
yExtent <- as.vector(madExtent)[3:4]
plot(predPresent, main = 'North/South Range Margin Location')
lines(c(bv$ewQuantLong_quant0p9[1], bv$ewQuantLong_quant0p9[1]),
c(yExtent[1], yExtent[2]))
lines(c(bv$ewQuantLong_quant0p9[2], bv$ewQuantLong_quant0p9[2]),
c(yExtent[1], yExtent[2]), lty = 'dashed')
lines(c(bv$ewQuantLong_quant0p9[3], bv$ewQuantLong_quant0p9[3]),
c(yExtent[1], yExtent[2]), lty = 'dotdash')
lines(c(bv$ewQuantLong_quant0p9[4], bv$ewQuantLong_quant0p9[4]),
c(yExtent[1], yExtent[2]), lty = 'dotted')
lines(c(bv$ewQuantLong_quant0p1[1], bv$ewQuantLong_quant0p1[1]),
c(yExtent[1], yExtent[2]))
lines(c(bv$ewQuantLong_quant0p1[2], bv$ewQuantLong_quant0p1[2]),
c(yExtent[1], yExtent[2]), lty = 'dashed')
lines(c(bv$ewQuantLong_quant0p1[3], bv$ewQuantLong_quant0p1[3]),
c(yExtent[1], yExtent[2]), lty = 'dotdash')
lines(c(bv$ewQuantLong_quant0p1[4], bv$ewQuantLong_quant0p1[4]),
c(yExtent[1], yExtent[2]), lty = 'dotted')
legend(
'bottomright',
legend = c(
'1980s',
'2030s',
'2050s',
'2070s',
'2090s'
),
lty = c('solid', 'dashed', 'dotdash', 'dotted')
)
Vector outline of Canada
Description
This SpatVector
represents the outline of Canada in WGS84 (unprojected) coordinates. This is the "low resolution" (less accurate) version from GADM.
Format
An object of class 'SpatVector'
.
Source
Database of Global Administrative Areas (GADM)
Examples
library(terra)
canFile <- system.file('extdata', 'canada_level0_gadm41.gpkg', package='enmSdmX')
canada <- vect(canFile)
plot(canada)
Compare two response curves along one or more predictors
Description
This function calculates a suite of metrics reflecting of niche overlap for two response curves. Response curves are predicted responses of a uni- or multivariate model along a single variable. Depending on the user-specified settings the function calculates these values either at each pair of values of pred1
and pred2
or along a smoothed version of pred1
and pred2
.
Usage
compareResponse(
pred1,
pred2,
data,
predictor = names(data),
adjust = FALSE,
gap = Inf,
smooth = FALSE,
smoothN = 1000,
smoothRange = c(0, 1),
graph = FALSE,
...
)
Arguments
pred1 |
Numeric vector. Predictions from first model along |
pred2 |
Numeric vector. Predictions from second model along |
data |
Data frame or matrix corresponding to |
predictor |
Character vector. Name(s) of predictor(s) for which to calculate comparisons. These must appear as column names in |
adjust |
Logical. If |
gap |
Numeric >0. Proportion of range of predictor variable across which to assume a gap exists. Calculation of |
smooth |
Logical. If |
smoothN |
|
smoothRange |
2-element numeric vector or |
graph |
Logical. If |
... |
Arguments to pass to functions like |
Value
Either a data frame (if smooth = FALSE
or a list object with the smooth model plus a data frame (if smooth = TRUE
) . The data frame represents metrics comparing response curves of pred1
and pred2
:
-
predictor
Predictor for which comparison was made -
n
Number of values of predictor at which comparison was calculated -
adjust
adjust
argument. -
smooth
smooth
argument. -
meanDiff
Mean difference between predictions ofpred1
andpred2
(higher ==> more different). -
meanAbsDiff
Mean absolute value of difference between predictions ofpred1
andpred2
(higher ==> more different). -
areaAbsDiff
Sum of the area between curves predicted bypred1
andpred2
, standardized by total potential area between the two curves (i.e., the area available between the minimum and maximum prediction along the minimum and maximum values of the predictor) (higher ==> more different). -
d
Schoener's D -
i
Hellinger's I (adjusted to have a range [0, 1]) -
esp
Godsoe's ESP -
cor
Pearson correlation between predictions ofpred1
andpred2
. -
rankCor
Spearman rank correlation between predictions ofpred1
andpred2
.
References
Warren, D.L., Glor, R.E., and Turelli, M. 2008. Environmental niche equivalency versus conservatism: Quantitative approaches to niche evolution. Evolution 62:2868-2883.
Warren, D.L., Glor, R.E., and Turelli, M. 2008. Erratum. Evolution 62:2868-2883.
Godsoe, W. 2014. Inferring the similarity of species distributions using Species Distribution Models. Ecography 37:130-136.
See Also
Examples
set.seed(123)
data <- data.frame(
x1=seq(-1, 1, length.out=100),
x2=seq(-1, 1, length.out=100) + rnorm(100, 0, 0.3)
)
pred1 <- 1 / (1 + exp(-(0.3 + 2 * (data$x1 - 0.2) -0.3 * data$x2)))
pred2 <- 1 / (1 + exp(-(-0 + 0.1 * data$x1 - 4 * data$x1^2 + 0.4 * data$x2)))
compareResponse(pred1, pred2, data, graph=TRUE)
compareResponse(pred1, pred2, data, smooth=TRUE, graph=TRUE)
compareResponse(pred1, pred2, data, adjust=TRUE, graph=TRUE)
Calculate the precision of a geographic coordinate
Description
This function calculates the imprecision of geographic coordinates due to rounded coordinate values. See Details for an explanation of how this is calculated.
Usage
coordImprecision(x, dms = FALSE, epsilon = 2)
Arguments
x |
Spatial points represented as a |
dms |
Logical, if |
epsilon |
Zero or positive integer, number of digits to which to round seconds value if |
Details
For coordinates originally reported in decimal notation, coordinate imprecision is half the distance between the two opposing corners on a bounding box whose size is based on the number of significant digits in the coordinates. This box is defined by 1) finding the maximum number of significant digits after the decimal in the longitude/latitude pair; 2) adding/subtracting 5 to the decimal place that falls just after this; and 3) calculating the distance between these points then dividing by 2. For example, if longitude is 82.37 and latitude 45.8 then the number of significant digits after the decimal place is 2 and 1, respectively so 2 is used on the assumption that latitude is measured to the nearest 100th of a degree. The precision is then the distance between the point pairs (82.37 - 0.005 = 82.365, 45.8 - 0.005 = 45.795) and (82.37 + 0.005 = 82.375, 45.8 + 0.005 = 45.805).
For coordinates originally reported in degree-minus-second (DMS) format, the bounding box is defined by adding/subtracting 0.5 units (degrees, minutes, or seconds, depending on the smallest non-zero unit reported) from the coordinate. For example, if longitude is 90deg 00min 00sec and latitude is 37deg 37min 37sec, then the bounding box will be defined by adding/subtracting 0.5 arcsec to the coordinates.
Value
Numeric values (by default in units of meters).
Examples
# coarse-precision cases
long <- c(45, 45.1, 45.1)
lat <- c(45, 45.1, 45)
ll <- cbind(long, lat)
precision_m <- coordImprecision(ll)
cbind(ll, precision_m)
# fine-precision cases
long <- rep(45, 8)
lat <- c(45, 45.1, 45.11, 45.111, 45.1111, 45.11111, 45.111111, 45.1111111)
ll <- cbind(long, lat)
precision_m <- coordImprecision(ll)
cbind(ll, precision_m)
# precision varies with latitude
long <- rep(45, 181)
lat <- seq(-90, 90)
ll <- cbind(long, lat)
precision_m <- coordImprecision(ll)
cbind(ll, precision_m)
plot(lat, precision_m / 1000, xlab='Latitude', ylab='Precision (km)')
# dateline/polar cases
long <- c(0, 180, 45, 45)
lat <- c(45, 45, 90, -90)
ll <- cbind(long, lat)
precision_m <- coordImprecision(ll)
cbind(ll, precision_m)
# original coordinates in degrees-minutes-seconds format
longDD <- c(90, 90, 90, 90, 90, 90)
longMM <- c(0, 0, 0, 11, 11, 0)
longSS <- c(0, 0, 0, 0, 52, 52)
latDD <- c(38, 38, 38, 38, 38, 38)
latMM <- c(0, 37, 37, 37, 37, 0)
latSS <- c(0, 0, 38, 38, 38, 0)
longHemis <- rep('W', 6)
latHemis <- rep('N', 6)
longDec <- dmsToDecimal(longDD, longMM, longSS, longHemis)
latDec <- dmsToDecimal(latDD, latMM, latSS, latHemis)
decimal <- cbind(longDec, latDec)
(decImp <- coordImprecision(decimal))
(dmsImp <- coordImprecision(decimal, dms=TRUE))
# What if we do not know if coordinates were originally reported in
# decimal or degrees-minutes-seconds format? Most conservative option
# is to use maximum:
pmax(decImp, dmsImp)
# when longitude is negative and latitude is -90
long <- -45
lat <- -90
ll <- cbind(long, lat)
coordImprecision(ll)
Number of points in a "spatial points" object
Description
Returns the number of points in a sf
or SpatVector
object. This is typically done using either length(x)
or nrow(x)
, depending on whether the object in question has rows or not. This function helps in ambiguous cases, so users need not care if nrow
or length
is needed.
Usage
countPoints(x, byFeature = FALSE)
Arguments
x |
Object of class |
byFeature |
If |
Value
Numeric.
Examples
library(sf)
# lemur occurrence data
data(lemurs)
wgs84 <- getCRS('WGS84')
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- sf::st_as_sf(occs, coords=c('longitude', 'latitude'), crs=wgs84)
countPoints(occs)
Coordinate reference systems (CRSs) and nicknames
Description
A table of commonly-used coordinate reference systems, their nicknames, and WKT2 (well-known text) strings
Usage
data(crss)
Format
An object of class data.frame
. This is a table with "named" coordinate referenbce systems and their well-known-text (WKT2) representation. It can be used as-is, or with getCRS
to quickly get a WKT for a particular CRS. The fields are as:
-
long
: "Long" name of the CRS -
short1
andshort2
: "Short" names of the CRS -
region
: Region for which CRS is fit -
projected
: Is the CRS projected or not? -
projectionGeometry
: Type of projection (NA
, 'cylindrical', 'conic', or 'planar') -
datum
: Datum -
type
: Either 'CRS' or 'data'. The former are proper CRSs, and the latter are those used by popular datasets. -
wkt2
: WKT2 string. -
notes
: Notes.
Examples
data(crss)
getCRS('North America Albers', nice = TRUE)
Custom coordinate reference system WKT2 string
Description
These functions take as input either a spatial object or coordinate pair and a custom WKT2 (well-known text) coordinate reference system string centered on the object or coordinate. Projections include:
Albers conic equal-area
Lambert azimuthal equal-area
Vertical near-side (i.e., as the world appears from geosynchronous orbit)
Please note that these are NOT standard projections, so do not have an EPSG or like code.
Usage
customAlbers(x)
customLambert(x)
customVNS(x, alt = 35800)
Arguments
x |
Either an object of class |
alt |
Altitude in meters of the viewpoint in km. The default (35800 km) is geosynchronous orbit. |
Value
A WKT2 (well-known text) string.
Functions
-
customLambert()
: Custom coordinate reference system WKT2 string -
customVNS()
: Custom coordinate reference system WKT2 string
See Also
getCRS
, customAlbers
, customLambert
, customVNS
Examples
library(sf)
### Madagascar
data(mad0)
alb <- customAlbers(mad0)
lamb <- customLambert(mad0)
vert <- customVNS(mad0)
madAlb <- st_transform(mad0, alb)
madLamb <- st_transform(mad0, lamb)
madVert <- st_transform(mad0, vert)
oldPar <- par(mfrow=c(2, 2))
plot(st_geometry(mad0), main='Unprojected (WGS84)')
plot(st_geometry(madAlb), main='Albers')
plot(st_geometry(madLamb), main='Lambert')
plot(st_geometry(madVert), main='Vertical')
par(oldPar)
### Canada
# The effect is more noticeable when plotting large areas,
# especially if they lie near the poles.
# This example can take a few minutes to run and plot.
library(terra)
canFile <- system.file('extdata', 'canada_level0_gadm41.gpkg', package='enmSdmX')
can <- vect(canFile)
alb <- customAlbers(can)
lamb <- customLambert(can)
vert <- customVNS(can)
canAlb <- project(can, alb)
canLamb <- project(can, lamb)
canVert <- project(can, vert)
oldPar <- par(mfrow=c(2, 2))
plot(can, main = 'Unprojected (WGS84)')
plot(canAlb, main = 'Albers')
plot(canLamb, main = 'Lambert')
plot(canVert, main = 'Vertical')
par(oldPar)
Convert geographic coordinates in decimal format to degrees-minutes-second
Description
This function converts geographic coordinates in decimal format to degrees-minutes-seconds (DD-MM-SS) format.
Usage
decimalToDms(x)
Arguments
x |
Numeric or vector of numeric values, longitude or latitude in decimal format. |
Value
A numeric matrix with three columns: degrees, seconds, and seconds. Note that the hemisphere (i.e., indicated by the sign of x) is not returned since it could be either north/south or east/west.
Examples
decimalToDms(38.56123) # latitude of St. Louis, Missouri, USA
decimalToDms(90.06521) # longitude of St. Louis, Missouri, USA
Convert geographic coordinates in degrees-minutes-second to decimal format
Description
This function converts geographic coordinates in degrees-minutes-seconds (DD-MM-SS) format to decimal format.
Usage
dmsToDecimal(dd, mm, ss, hemis = NULL)
Arguments
dd |
Numeric. Degrees longitude or latitude. Can be a decimal value. |
mm |
Numeric. Minutes longitude or latitude. Can be a decimal value. |
ss |
Numeric. Second longitude or latitude. Can be a decimal value. |
hemis |
Character or |
Value
Numeric.
Examples
dmsToDecimal(38, 37, 38) # latitude of St. Louis, Missouri, USA
dmsToDecimal(38, 37, 38, 'N') # latitude of St. Louis, Missouri, USA
dmsToDecimal(90, 11, 52.1) # longitude of St. Louis, Missouri, USA
dmsToDecimal(90, 11, 52.1, 'W') # longitude of St. Louis, Missouri, USA
Thin spatial points so that there is but one per raster cell
Description
This function thins spatial points such that no more than one point falls within each cell of a reference raster. If more than one point falls in a cell, the first point in the input data is retained unless the user specifies a priority for keeping points.
Usage
elimCellDuplicates(x, rast, longLat = NULL, priority = NULL)
Arguments
x |
Points. This can be either a |
rast |
|
longLat |
Two-element character vector or two-element integer vector. If |
priority |
Either |
Value
Object of class x
.
Examples
# This example can take >10 second to run.
library(terra)
x <- data.frame(
long=c(-90.1, -90.1, -90.2, 20),
lat=c(38, 38, 38, 38), point=letters[1:4]
)
rast <- rast() # empty raster covering entire world with 1-degree resolution
elimCellDuplicates(x, rast, longLat=c(1, 2))
elimCellDuplicates(x, rast, longLat=c(1, 2), priority=c(3, 2, 1, 0))
Weighted AUC
Description
This function calculates the area under the receiver-operator characteristic curve (AUC) following Mason & Graham (2002). Each case (presence/non-presence) can be assigned a weight, if desired.
Usage
evalAUC(
pres,
contrast,
presWeight = rep(1, length(pres)),
contrastWeight = rep(1, length(contrast)),
na.rm = FALSE,
...
)
Arguments
pres |
Vector of predictions for "positive" cases (e.g., predictions at presence sites). |
contrast |
Vector of predictions for "negative" cases (e.g., predictions at absence/background sites). |
presWeight |
Weights of positive cases. The default is to assign each positive case a weight of 1. |
contrastWeight |
Weights of contrast cases. The default is to assign each case a weight of 1. |
na.rm |
Logical. If |
... |
Other arguments (unused). |
Value
A Numeric value.
References
Mason, S.J. and N.E. Graham. 2002. Areas beneath the relative operating characteristics (ROC) and relative operating levels (ROL) curves: Statistical significance and interpretation. Quarterly Journal of the Royal Meteorological Society 128:2145-2166. doi:10.1256/003590002320603584
See Also
Examples
pres <- seq(0.5, 1, by=0.1)
contrast <- seq(0, 1, by=0.01)
# unweighted
evalAUC(pres, contrast)
# weighted (weight presences with low predictions more)
presWeight <- c(1, 1, 1, 0.5, 0.5, 0.5)
evalAUC(pres, contrast, presWeight=presWeight)
# weighted (weight presences with high predictions more)
presWeight <- c(0.5, 0.5, 0.5, 1, 1, 1)
evalAUC(pres, contrast, presWeight=presWeight)
# weight presences and absences
contrastWeight <- sqrt(contrast)
evalAUC(pres, contrast, presWeight=presWeight, contrastWeight=contrastWeight)
Continuous Boyce Index (CBI) with weighting
Description
This function calculates the continuous Boyce index (CBI), a measure of model accuracy for presence-only test data.
Usage
evalContBoyce(
pres,
contrast,
numBins = 101,
binWidth = 0.1,
presWeight = rep(1, length(pres)),
contrastWeight = rep(1, length(contrast)),
autoWindow = TRUE,
method = "spearman",
dropZeros = TRUE,
graph = FALSE,
table = FALSE,
na.rm = FALSE,
...
)
Arguments
pres |
Numeric vector. Predicted values at presence sites. |
contrast |
Numeric vector. Predicted values at background sites. |
numBins |
Positive integer. Number of (overlapping) bins into which to divide predictions. |
binWidth |
Positive numeric value < 1. Size of a bin. Each bin will be |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
autoWindow |
Logical. If |
method |
Character. Type of correlation to calculate. The default is |
dropZeros |
Logical. If |
graph |
Logical. If |
table |
Logical. If |
na.rm |
Logical. If |
... |
Other arguments (not used). |
Details
CBI is the Spearman rank correlation coefficient between the proportion of sites in each prediction class and the expected proportion of predictions in each prediction class based on the proportion of the landscape that is in that class. The index ranges from -1 to 1. Values >0 indicate the model's output is positively correlated with the true probability of presence. Values <0 indicate it is negatively correlated with the true probability of presence.
Value
Numeric value, or if table
is TRUE
, then a list object with CBI plus a data frame with P (proportion of presence weights per bin), E (expected proportion of presence weights per bin–from contrast sites), and the ratio of the two.
References
Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecological Modeling 157:281-300. doi:10.1016/S0304-3800(02)00200-4
Hirzel, A.H., Le Lay, G., Helfer, V., Randon, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modeling 199:142-152. doi:10.1016/j.ecolmodel.2006.05.017
See Also
cor
, pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThreshold
, evalThresholdStats
, evalTjursR2
, evalTSS
Examples
set.seed(123)
pres <- sqrt(runif(100))
contrast <- runif(1000)
evalContBoyce(pres, contrast)
presWeight <- c(rep(1, 10), rep(0.5, 90))
evalContBoyce(pres, contrast, presWeight=presWeight)
Calculate multivariate weighted AUC
Description
This function calculates a multivariate version of the area under the receiver-operator characteristic curve (AUC). The multivariate version is simply the mean AUC across all possible pairwise AUCs for all cases (Hand & Till 2001). For example, if we have predictions that can be classified into three groups of expectation, say A, B, and C, where we expect predictions assigned to group A are > those in B and C, and predictions in group B are expected to be > those in group C, the multivariate AUC for this situation is mean(wAB * auc_mean(A, B), wAC * auc_mean(A, C), wBC * auc_mean(B, C))
, where auc_mean(X, Y)
, is the AUC calculated between cases X
and Y
, and wXY
is a weight for that case-comparison.
Usage
evalMultiAUC(..., weightBySize = FALSE, na.rm = FALSE)
Arguments
... |
A set of two or more numeric vectors or two or more 2-column matrices or data frames. The objects must be listed in order of expected probability. For example, you might have a set of predictions for objects you expect to have a low predicted probability (e.g., long-term absences of an animal), a set that you expect to have middle levels of probability (e.g., sites that were recently vacated), and a set for which you expect a high level of predicted probability (e.g., sites that are currently occupied). In this case you should list the cases in order: low, middle, high. If a 2-column matrix or data frame is supplied, then the first column is assumed to represent predictions and the second assumed to represent site-level weights (see |
weightBySize |
Logical, if |
na.rm |
Logical. If |
Value
Named numeric vector. The names will appear as case2_over_case1
(which in this example means the AUC of item #1 in the ...
when compared to the second item in ...
), plus multivariate
(which is the multivariate AUC).
References
Hand, DJ and Till, RJ. 2001. A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45:171-186 doi:10.1023/A:1010920819831.
See Also
pa_evaluate
, evalAUC
, evalContBoyce
, evalThreshold
, evalThresholdStats
, evalTjursR2
, evalTSS
Examples
set.seed(123)
# no weights
low <- runif(10)^2
middle <- runif(10)
high <- sqrt(runif(20))
evalMultiAUC(low, middle, high)
# equal weights
low <- matrix(c(low, rep(1, length(low))), ncol=2)
middle <- matrix(c(middle, rep(1, length(middle))), ncol=2)
high <- matrix(c(high, rep(1, length(high))), ncol=2)
evalMultiAUC(low, middle, high)
# equal weights with weighting by number of comparisons
evalMultiAUC(low, middle, high, weightBySize=TRUE)
# unequal weights
middle[ , 2] <- ifelse(middle[ , 1] > 0.5, 0.1, 1)
evalMultiAUC(low, middle, high)
# unequal weights with weighting by number of comparisons
evalMultiAUC(low, middle, high, weightBySize=TRUE)
Weighted True Skill Statistic (TSS)
Description
This function calculates the True Skill Statistic (TSS).
Usage
evalTSS(
pres,
contrast,
presWeight = rep(1, length(pres)),
contrastWeight = rep(1, length(contrast)),
thresholds = seq(0, 1, by = 0.001),
na.rm = FALSE,
...
)
Arguments
pres |
Numeric vector. Predicted values at test presences |
contrast |
Numeric vector. Predicted values at background/absence sites. |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
thresholds |
Numeric vector. Thresholds at which to calculate the sum of sensitivity and specificity. The default evaluates all values from 0 to 1 in steps of 0.01. |
na.rm |
Logical. If |
... |
Other arguments (unused). |
Details
This function calculates the maximum value of the True Skill Statistic (i.e., across all thresholds, the values that maximizes sensitivity plus specificity).
Value
Numeric value.
References
See Allouche, O., Tsoar, A., and Kadmon, R. 2006. Assessing the accuracy of species distribution models: Prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology 43:1223-1232. doi:10.1111/j.1365-2664.2006.01214.x
See Also
pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThreshold
, evalThresholdStats
, evalTjursR2
Examples
set.seed(123)
# set of bad and good predictions at presences
bad <- runif(30)^2
good <- runif(30)^0.1
hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences')
hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE)
pres <- c(bad, good)
contrast <- runif(1000)
evalTSS(pres, contrast)
# upweight bad predictions
presWeight <- c(rep(1, 30), rep(0.1, 30))
evalTSS(pres, contrast, presWeight=presWeight)
# upweight good predictions
presWeight <- c(rep(0.1, 30), rep(1, 30))
evalTSS(pres, contrast, presWeight=presWeight)
Weighted thresholds for predictions
Description
This function is similar to the threshold
function in the predicts package, which calculates thresholds to create binary predictions from continuous values. However, unlike that function, it allows the user to specify weights for presences and absence/background predictions. The output will thus be the threshold that best matches the specified criterion taking into account the relative weights of the input values.
Usage
evalThreshold(
pres,
contrast,
presWeight = rep(1, length(pres)),
contrastWeight = rep(1, length(contrast)),
at = c("msss", "mdss", "minPres", "prevalence", "sensitivity"),
sensitivity = 0.9,
thresholds = seq(0, 1, by = 0.001),
na.rm = FALSE,
...
)
Arguments
pres |
Numeric vector. Predicted values at test presences. |
contrast |
Numeric vector. Predicted values at background/absence sites. |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
at |
Character or character vector, name(s) of threshold(s) to calculate. The default is to calculate them all.
|
sensitivity |
Value of specificity to match (used only if |
thresholds |
Numeric vector. Thresholds at which to calculate the sum of sensitivity and specificity. The default evaluates all values from 0 to 1 in steps of 0.01. |
na.rm |
Logical. If |
... |
Other arguments (unused). |
Value
Named numeric vector. Fielding, A.H. and J.F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49. doi:10.1017/S0376892997000088
See Also
threshold
, pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThresholdStats
, evalTjursR2
, evalTSS
Examples
set.seed(123)
# set of bad and good predictions at presences
bad <- runif(100)^2
good <- runif(100)^0.1
hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences')
hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE)
pres <- c(bad, good)
contrast <- runif(1000)
evalThreshold(pres, contrast)
# upweight bad predictions
presWeight <- c(rep(1, 100), rep(0.1, 100))
evalThreshold(pres, contrast, presWeight=presWeight)
# upweight good predictions
presWeight <- c(rep(0.1, 100), rep(1, 100))
evalThreshold(pres, contrast, presWeight=presWeight)
Thresholded evaluation statistics
Description
This function calculates a series of evaluation statistics based on a threshold or thresholds used to convert continuous predictions to binary predictions.
Usage
evalThresholdStats(
thresholds,
pres,
contrast,
presWeight = rep(1, length(pres)),
contrastWeight = rep(1, length(contrast)),
delta = 0.001,
na.rm = FALSE,
bg = NULL,
bgWeight = NULL,
...
)
Arguments
thresholds |
Numeric or numeric vector. Threshold(s) at which to calculate sensitivity and specificity. |
pres |
Numeric vector. Predicted values at test presences |
contrast |
Numeric vector. Predicted values at background/absence sites. |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
delta |
Positive numeric >0 in the range [0, 1] and usually very small. This value is used only if calculating the SEDI threshold when any true positive rate or false negative rate is 0 or the false negative rate is 1. Since SEDI uses log(x) and log(1 - x), values of 0 and 1 will produce |
na.rm |
Logical. If |
bg |
Same as |
bgWeight |
Same as |
... |
Other arguments (unused). |
Value
8-column matrix with the following named columns. a = weight of presences >= threshold, b = weight of backgrounds >= threshold, c = weight of presences < threshold, d = weight of backgrounds < threshold, and N = sum of presence and background weights.
-
'threshold'
: Threshold -
'sensitivity'
: Sensitivity (a / (a + c)) -
'specificity'
: Specificity (d / (d + b)) -
'ccr'
: Correct classification rate ((a + d) / N) -
'ppp'
: Positive predictive power (a / (a + b)) -
'npp'
: Negative predictive power (d / (c + d)) -
'mr'
: Misclassification rate ((b + c) / N)
Fielding, A.H. and J.F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49. doi:10.1017/S0376892997000088
See Also
threshold
, pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThreshold
, evalTjursR2
, evalTSS
Examples
set.seed(123)
# set of bad and good predictions at presences
bad <- runif(100)^2
good <- runif(100)^0.1
hist(good, breaks=seq(0, 1, by=0.1), border='green', main='Presences')
hist(bad, breaks=seq(0, 1, by=0.1), border='red', add=TRUE)
pres <- c(bad, good)
contrast <- runif(1000)
thresholds <- c(0.1, 0.5, 0.9)
evalThresholdStats(thresholds, pres, contrast)
# upweight bad predictions
presWeight <- c(rep(1, 100), rep(0.1, 100))
evalThresholdStats(thresholds, pres, contrast, presWeight=presWeight)
# upweight good predictions
presWeight <- c(rep(0.1, 100), rep(1, 100))
evalThresholdStats(thresholds, pres, contrast, presWeight=presWeight)
Weighted Tjur's R2
Description
This function calculates Tjur's R2 metric of model discrimination accuracy. Unweighted R2 is simply the difference between the mean predicted value at presence sites and the mean predicted value at absence/background sites. The weighted version allows for differing weights between presences and between absences/contrast values (i.e., the difference between the weighted mean of predictions at presences and weighted mean predictions at absences/contrast locations).
Usage
evalTjursR2(
pres,
contrast,
presWeight = rep(1, length(pres)),
contrastWeight = rep(1, length(contrast)),
na.rm = FALSE,
...
)
Arguments
pres |
Predictions at presence sites. |
contrast |
Predictions at absence/background sites. |
presWeight |
Weights of presence cases. The default is to assign each presence case a weight of 1. |
contrastWeight |
Weights of absence/background cases. The default is to assign each case a weight of 1. |
na.rm |
Logical. If |
... |
Other arguments (unused). |
Value
Numeric value.
References
Tjur, T. 2009. Coefficients of determination in logistic regression models-A new proposal: The coefficient of discrimination. The American Statistician 63:366-372. doi:10.1198/tast.2009.08210
See Also
pa_evaluate
, evalAUC
, evalMultiAUC
, evalContBoyce
, evalThreshold
, evalThresholdStats
, evalTSS
Examples
pres <- seq(0.5, 1, by=0.1)
contrast <- seq(0, 1, by=0.01)
# unweighted
evalTjursR2(pres, contrast)
# weighted (weight presences with low predictions more)
presWeight <- c(1, 1, 1, 0.5, 0.5, 0.5)
evalTjursR2(pres, contrast, presWeight=presWeight)
# weighted (weight presences with high predictions more)
presWeight <- c(0.5, 0.5, 0.5, 1, 1, 1)
evalTjursR2(pres, contrast, presWeight=presWeight)
# weight presences and absences
contrastWeight <- sqrt(contrast)
evalTjursR2(pres, contrast, presWeight=presWeight, contrastWeight=contrastWeight)
Convert extent to a spatial polygon
Description
This function returns a SpatVector
or sf
polygon representing an extent. The input can be a SpatExtent
or sf
object, or an object from which a SpatExtent
(extent) can be obtained.
Usage
extentToVect(x, ...)
Arguments
x |
A |
... |
Arguments to supply to |
Value
A SpatVector
(usual) or, if the input is an sf
object, an sf
polygon object.
See Also
Examples
data(mad0)
madExtent <- extentToVect(mad0)
plot(madExtent, border='blue', lty='dotted')
plot(mad0[1], add=TRUE)
# NB This is the same as:
library(terra)
madExtent <- ext(mad0)
madExtent <- as.polygons(madExtent, crs = crs(mad0))
Assign geographically-distinct k-folds
Description
This function generates geographically-distinct cross-validation folds, or "geo-folds" ("g-folds" for short). Points are grouped by proximity to one another. Folds can be forced to have at least a minimum number of points in them. Results are deterministic (i.e., the same every time for the same data).
More specifically, g-folds are created using this process:
To start, all pairwise distances between points are calculated. These are used in a clustering algorithm to create a dendrogram of relationships by distance. The dendrogram is then "cut" so it has
k
groups (folds). If each fold has at least the minimum desired number of points (minIn
), then the process stops and fold assignments are returned.However, if at least one fold has fewer than the desired number of points, a series of steps is executed.
First, the fold with a centroid that is farthest from all others is selected. If it has sufficient points, then the next-most distant fold is selected, and so on.
Once a fold is identified that has fewer than the desired number of points, it is grown by adding to it the points closest to its centroid, one at a time. Each time a point is added, the fold centroid is calculated again. The fold is grown until it has the desired number of points. Call this "fold #1". From hereafter, these points are considered "assigned" and not eligible for re-assignment.
The remaining "unassigned" points are then clustered again, but this time into
k - 1
folds. And again, the most-distant group found that has fewer than the desired number of points is found. This fold is then grown as before, using only unassigned points. This fold then becomes "fold #2."The process repeats iteratively until there are
k
folds assigned, each with at least the desired number of points.
The potential downside of this approach is that the last fold is assigned the remainder of points, so will be the largest. One way to avoid gross imbalance is to select the value of minIn
such that it divides the points into nearly equally-sized groups.
Usage
geoFold(x, k, minIn = 1, longLat = 1:2, method = "complete", ...)
Arguments
x |
A "spatial points" object of class |
k |
Numeric: Number of folds to create. |
minIn |
Numeric: Minimum number of points required to be in a fold. |
longLat |
Character or integer vector: This is ignored if |
method |
Character: Method used by |
... |
Additional arguments (unused) |
Details
Note that in general it is probably mathematically impossible to cluster points in 2-dimensional space into k
groups, each with at least minIn
points, in a manner that seems "reasonable" to the eye in all cases. In experimentation, "unreasonable" results often appear when the number of groups is high.
Value
A vector of integers the same length as the number of points in x
. Each integer indicates which fold a point in x
belongs to.
See Also
Examples
library(sf)
library(terra)
# lemur occurrence data
data(mad0)
data(lemurs)
crs <- getCRS('WGS84')
ll <- c('longitude', 'latitude')
# use occurrences of all species... easier to see on map
occs <- st_as_sf(lemurs, coords = ll, crs = getCRS('WGS84'))
# create 100 background points
mad0 <- vect(mad0)
bg <- spatSample(mad0, 100)
### assign 3 folds to occurrences and to background sites
k <- 3
minIn <- floor(nrow(occs) / k) # maximally spread between folds
presFolds <- geoFold(occs, k = k, minIn = minIn)
bgFolds <- geoFoldContrast(bg, pres = occs, presFolds = presFolds)
# number of sites per fold
table(presFolds)
table(bgFolds)
# map
plot(mad0, border = 'gray', main = paste(k, 'geo-folds'))
plot(bg, pch = 3, col = bgFolds + 1, add = TRUE)
plot(st_geometry(occs), pch = 20 + presFolds, bg = presFolds + 1, add = TRUE)
legend(
'bottomright',
legend = c(
'presence fold 1',
'presence fold 2',
'presence fold 3',
'background fold 1',
'background fold 2',
'background fold 3'
),
pch = c(21, 22, 23, 3, 3),
col = c(rep('black', 3), 2, 3),
pt.bg = c(2, 3, 4, NA, NA)
)
Assign geographically-distinct k-folds to background/absence sites
Description
This function generates geographically-distinct cross-validation folds, or "geo-folds" of background or absence sites (i.e., "contrast" sites). Each contrast site is assigned to a fold based on the fold of the presence site that is closest. Typically, this function is run after geoFold
is run to assign presences to folds.
Usage
geoFoldContrast(
contrast,
pres,
presFolds,
contrastLongLat = 1:2,
presLongLat = 1:2,
...
)
Arguments
contrast |
A "spatial points" object representing contrast sites:
|
pres |
A "spatial points" object representing presence sites:
|
presFolds |
Numeric vector: These provide the folds to which |
contrastLongLat , presLongLat |
Character or integer vector: A character or integer vector specifying the columns in |
... |
Additional arguments (unused) |
Value
A vector of integers the same length as the number of points in contrast
. Each integer indicates which fold a point in contrast
belongs to.
See Also
Examples
library(sf)
library(terra)
# lemur occurrence data
data(mad0)
data(lemurs)
crs <- getCRS('WGS84')
ll <- c('longitude', 'latitude')
# use occurrences of all species... easier to see on map
occs <- st_as_sf(lemurs, coords = ll, crs = getCRS('WGS84'))
# create 100 background points
mad0 <- vect(mad0)
bg <- spatSample(mad0, 100)
### assign 3 folds to occurrences and to background sites
k <- 3
minIn <- floor(nrow(occs) / k) # maximally spread between folds
presFolds <- geoFold(occs, k = k, minIn = minIn)
bgFolds <- geoFoldContrast(bg, pres = occs, presFolds = presFolds)
# number of sites per fold
table(presFolds)
table(bgFolds)
# map
plot(mad0, border = 'gray', main = paste(k, 'geo-folds'))
plot(bg, pch = 3, col = bgFolds + 1, add = TRUE)
plot(st_geometry(occs), pch = 20 + presFolds, bg = presFolds + 1, add = TRUE)
legend(
'bottomright',
legend = c(
'presence fold 1',
'presence fold 2',
'presence fold 3',
'background fold 1',
'background fold 2',
'background fold 3'
),
pch = c(21, 22, 23, 3, 3),
col = c(rep('black', 3), 2, 3),
pt.bg = c(2, 3, 4, NA, NA)
)
Thin geographic points deterministically or randomly
Description
This function thins geographic points such that none have nearest neighbors closer than some user-specified distance. For a given set of points that fall within this distance, thinning can be conducted in two ways. Both begin by first calculating all pairwise distances between points. Then, clusters of points are found based on proximity using the "single-linkage" method (i.e., based on minimum distance between groups). Then, either a deterministic or random method is used to select the retained points:
Deterministic: For each cluster, distances between each point in the cluster and all points outside of the cluster are calculated. The point retained in each cluster is the one with the greatest minimum pairwise distance to any points in any other cluster. This point will this be maximally isolated from any other point.
Random: For each cluster, a random point is chosen.
Usage
geoThin(x, minDist, random = FALSE, longLat = 1:2, method = "single", ...)
Arguments
x |
A "spatial points" object of class |
minDist |
Minimum distance (in meters) needed between points to retain them. Points falling closer than this distance will be candidates for being discarded. |
random |
Logical: If |
longLat |
Numeric or integer vector: This is ignored if |
method |
Character: Method used by |
... |
Additional arguments. Not used. |
Value
Object of class x
.
Examples
library(sf)
# lemur occurrence data
data(mad0)
data(lemurs)
crs <- getCRS('WGS84')
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
ll <- c('longitude', 'latitude')
occs <- st_as_sf(occs, coords = ll, crs = getCRS('WGS84'))
# deterministically thin
det <- geoThin(x = occs, minDist = 30000)
# randomly thin
set.seed(123)
rand <- geoThin(x = occs, minDist = 30000, random = TRUE)
# map
oldPar <- par(mfrow = c(1, 2))
plot(st_geometry(occs), cex = 1.4, main = 'Deterministic')
plot(st_geometry(det), pch = 21, cex = 1.4, bg = 1:nrow(det), add = TRUE)
plot(st_geometry(mad0), add = TRUE)
plot(st_geometry(occs), cex = 1.4, main = 'Random')
plot(st_geometry(rand), pch = 21, cex = 1.4, bg = 1:nrow(rand), add = TRUE)
plot(st_geometry(mad0), add = TRUE)
par(oldPar)
WKT string for a named coordinate reference system or a spatial object
Description
Retrieve the Well-Known text string (WKT2) for a coordinate reference system (CRS) by name or from a spatial object. The most common usage of the function is to return the WKT2 string using an easy-to-remember name. For example, getCRS('wgs84')
returns the WKT2 string for the WGS84 datum. To get a table of strings, just use getCRS()
.
Usage
getCRS(x = NULL, nice = FALSE, warn = TRUE)
Arguments
x |
This can be any of:
|
nice |
If |
warn |
If |
Value
A string representing WKT2 (well-known text) object or a data.frame
.
Examples
# view table of available CRSs
getCRS()
# get specific WKT2 strings
getCRS('WGS84')
getCRS('Mollweide')
getCRS('WorldClim')
# WKT2 strings nice for your eyes
getCRS('WGS84', TRUE)
data(mad0)
getCRS(mad0)
Get or assign values to cells in a raster
Description
These functions get values from a raster at specific cells, or values to specific cells.
Usage
getValueByCell(x, cell, format = "raster")
setValueByCell(x, val, cell, format = "raster")
Arguments
x |
A |
cell |
Cell indices. There must be one per value in |
format |
The type of cell indexing used. This can be either "raster" for row indexing (default) or "matrix" for column indexing. Row indexing (the default for rasters), starts with cell "1" in the upper left, cell "2" is to its right, and so on. Numbering then wraps around to the next row. Column indexing (the default for matrices) has the cell "1" in the upper left corner of the matrix. The cell "2" is below it, and so on. The numbering then wraps around to the top of the next column. |
val |
One or more values. If more the number of cells specified is greater than the number of values in |
Value
A data frame (getValueByCell
) with cell numbers (in row format), or a SpatRaster
(setValueByCell
).
See Also
Examples
library(terra)
x <- rast(nrow=10, ncol=10)
x[] <- round(10 * runif(100))
cell <- c(1, 20, 40, 80)
getValueByCell(x, cell = cell)
getValueByCell(x, cell = cell, format = 'matrix')
y <- setValueByCell(x, val = 20, cell = cell)
plot(y)
z <- setValueByCell(x, val = 30, cell = cell, format = 'matrix')
plot(c(x, y, z))
"Friendly" wrapper for terra::global() for calculating raster statistics
Description
Calculate "global" statistics across all the values in a raster. This function is a wrapper for global
. That function, by default, sets na.rm = FALSE
, so any cell that is NA
can cause the summary statistic to also be NA
(usually undesirable). The function also returns a data.frame
, so often needs a further line of code to get the actual value(s). This function sets na.rm = TRUE
by default, and returns a numeric vector (not a data.frame
).
Usage
globalx(x, fun, na.rm = TRUE, ..., weights = NULL)
Arguments
x |
A |
fun |
A function or the name of a function (in quotes). See |
na.rm |
If |
... |
Additional arguments to pass to |
weights |
Either |
Value
A numeric vector, one value per layer in x
.
Examples
library(terra)
r <- rast(ncols=10, nrows=10)
values(r) <- 1:ncell(r)
global(r, 'sum') # terra
globalx(r, 'sum') # enmSdmX
global(r, "mean", na.rm=TRUE)[1, 1] # terra... same as enmSdmX::globalx
Interpolate values from a series of rasters
Description
This function returns a series of rasters interpolated from another series of rasters. For example, the input might represent rasters of a process measured at times t, t + 1, and t + 4. The rasters at t + 2 and t + 3 could be interpolated based on the values in the other rasters. Note that this function can take a lot of time and memory, even for relatively small rasters.
Usage
interpolateRasts(
rasts,
interpFrom,
interpTo,
type = "linear",
onFail = NA,
useRasts = FALSE,
na.rm = TRUE,
verbose = TRUE,
...
)
Arguments
rasts |
A "stack" of |
interpFrom |
Numeric vector, one value per raster in |
interpTo |
Numeric vector, values of "distances" at which to interpolate the rasters. |
type |
Character. The type of model used to do the interpolation. Note that some of these (the first few) are guaranteed to go through every point being interpolated from. The second set, however, are effectively regressions so are not guaranteed to do through any of the points. Note that some methods cannot handle cases where at least some series of cells have < a given number of non-
|
onFail |
Either |
useRasts |
Logical. If |
na.rm |
Logical, if |
verbose |
Logical. If |
... |
Other arguments passed to |
Details
This function can be very memory-intensive for large rasters. It may speed things up (and make them possible) to do interpolations piece by piece (e.g., instead of interpolating between times t0, t1, t2, t3, ..., interpolate between t0 and t1, then t1 and t2, etc. This may give results that differ from using the entire set, however, depending on what type of interpolation is used. Note that using linear and splines will often yield very similar results, except that in a small number of cases splines may produce very extreme interpolated values.
Value
A SpatRaster
"stack" with one layer per element in interpTo
.
See Also
approximate
, approxfun
, splinefun
, trainNS
, glm
, , bs
, smooth.spline
.
Examples
# NB: The example below can take a few minutes to run.
library(terra)
interpFrom <- c(1, 3, 4, 8, 10, 11, 15)
interpTo <- 1:15
rx <- rast(nrows=10, ncols=10)
r1 <- setValues(rx, rnorm(100, 1))
r3 <- setValues(rx, rnorm(100, 3))
r4 <- setValues(rx, rnorm(100, 5))
r8 <- setValues(rx, rnorm(100, 11))
r10 <- setValues(rx, rnorm(100, 3))
r11 <- setValues(rx, rnorm(100, 5))
r15 <- setValues(rx, rnorm(100, 13))
rasts <- c(r1, r3, r4, r8, r10, r11, r15)
names(rasts) <- paste0('rasts', interpFrom)
linear <- interpolateRasts(rasts, interpFrom, interpTo)
spline <- interpolateRasts(rasts, interpFrom, interpTo, type='spline')
gam <- interpolateRasts(rasts, interpFrom, interpTo, type='gam', onFail='linear')
ns <- interpolateRasts(rasts, interpFrom, interpTo, type='ns', onFail='linear', verbose=FALSE)
poly <- interpolateRasts(rasts, interpFrom, interpTo, type='poly', onFail='linear')
bs <- interpolateRasts(rasts, interpFrom, interpTo, type='bs', onFail='linear')
ss <- interpolateRasts(rasts, interpFrom, interpTo, type='smooth.spline', onFail='linear',
verbose=FALSE)
# examine trends for a particular point on the landscape
pts <- matrix(c(-9, 13), ncol = 2)
pts <- vect(pts)
linearExt <- unlist(terra::extract(linear, pts, ID=FALSE))
splineExt <- unlist(terra::extract(spline, pts, ID=FALSE))
gamExt <- unlist(terra::extract(gam, pts, ID=FALSE))
nsExt <- unlist(terra::extract(ns, pts, ID=FALSE))
polyExt <- unlist(terra::extract(poly, pts, ID=FALSE))
bsExt <- unlist(terra::extract(bs, pts, ID=FALSE))
ssExt <- unlist(terra::extract(ss, pts, ID=FALSE))
mins <- min(linearExt, splineExt, gamExt, nsExt, polyExt, bsExt, ssExt)
maxs <- max(linearExt, splineExt, gamExt, nsExt, polyExt, bsExt, ssExt)
plot(interpTo, linearExt, type='l', lwd=2, ylim=c(mins, maxs), ylab='Value')
lines(interpTo, splineExt, col='blue')
lines(interpTo, gamExt, col='green')
lines(interpTo, nsExt, col='orange')
lines(interpTo, polyExt, col='gray')
lines(interpTo, bsExt, col='magenta')
lines(interpTo, ssExt, col='cyan')
ext <- unlist(extract(rasts, pts, ID = FALSE))
points(interpFrom, ext)
legend('topleft', inset=0.01, lty=c(rep(1, 7), NA),
legend=c('linear', 'spline', 'GAM', 'NS', 'polynomial', 'B-spline',
'Smooth spline', 'Observed'), col=c('black', 'blue', 'green',
'orange', 'gray', 'magenta', 'cyan'), pch=c(rep(NA, 7), 1))
Lemur occurrences from GBIF
Description
Data frame of lemur occurrences
Usage
data(lemurs)
Format
An object of class 'data.frame'
.
Source
Examples
data(lemurs)
lemurs
Generate rasters with cell values equal to cell longitude or latitude
Description
This function generates a raster stack with two rasters, one with cell values equal to the cell's longitude and the other with cell values equal to the cell's latitude.
Usage
longLatRasts(x, m = TRUE, filePath = NULL, ...)
Arguments
x |
|
m |
Any of:
|
filePath |
String or |
... |
Arguments to pass to |
Value
Object of class SpatRaster
.
Examples
library(terra)
# generate long/lat rasters for the world
x <- rast() # raster with 1 deg resolution and extent equal to entire world
x[] <- 1:ncell(x)
longLat <- longLatRasts(x)
plot(longLat)
# demonstrate masking
# randomly force some cells to NA
v <- 1:ncell(x)
n <- 10000
v[sample(v, n)] <- NA
x[] <- v
longLatTRUE <- longLatRasts(x, m = TRUE)
longLatFALSE <- longLatRasts(x, m = FALSE)
rasts <- c(x, longLatTRUE, x, longLatFALSE)
names(rasts) <- c('x', 'long_m_TRUE', 'lat_m_TRUE',
'x', 'long_m_FALSE', 'lat_m_FALSE')
plot(rasts)
Madagascar spatial object
Description
Outline of Madagascar from GADM. The geometry has been simplified from the version available in GADM, so pleased do not use this for "official" analyses.
Usage
data(mad0, package='enmSdmX')
Format
An object of class sf
.
Source
Examples
library(sf)
data(mad0)
mad0
plot(st_geometry(mad0), main='Madagascar')
Madagascar spatial object
Description
Outlines of regions ("Faritra") of Madagascar from GADM. The geometry has been simplified from the version available in GADM, so pleased do not use this for "official" analyses.
Usage
data(mad1, package='enmSdmX')
Format
An object of class sf
.
Source
Examples
library(sf)
data(mad1)
mad1
plot(st_geometry(mad1), main='Madagascar')
Present-day climate rasters for Madagascar
Description
Rasters representing average climate across 1970-2000 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
Format
An object of class 'SpatRaster'
.
Source
Examples
library(terra)
rastFile <- system.file('extdata', 'madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
plot(madClim)
Future climate rasters for Madagascar
Description
Rasters representing average climate across 2021-2040 modeled with CanESM5 for SSP 585 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
Format
An object of class 'SpatRaster'
.
Source
Examples
library(terra)
rastFile <- system.file('extdata', 'madClim2030.tif', package='enmSdmX')
madClimFut <- rast(rastFile)
plot(madClimFut)
Future climate rasters for Madagascar
Description
Rasters representing average climate across 2041-2060 modeled with CanESM5 for SSP 585 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
Format
An object of class 'SpatRaster'
.
Source
Examples
library(terra)
rastFile <- system.file('extdata', 'madClim2050.tif', package='enmSdmX')
madClimFut <- rast(rastFile)
plot(madClimFut)
Future climate rasters for Madagascar
Description
Rasters representing average climate across 2061-2080 modeled with CanESM5 for SSP 585 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
Format
An object of class 'SpatRaster'
.
Source
Examples
library(terra)
rastFile <- system.file('extdata', 'madClim2070.tif', package='enmSdmX')
madClimFut <- rast(rastFile)
plot(madClimFut)
Future climate rasters for Madagascar
Description
Rasters representing average climate across 2081-2100 modeled with CanESM5 for SSP 585 for Madagascar from WorldClim version 2.1. Values of these rasters have been rounded to one digit, so please do not use these for "official" work. Please also note that CanESM5 in CMIP6 is known to run "too hot", but is useful here to aid illustration.
Format
An object of class 'SpatRaster'
.
Source
Examples
library(terra)
rastFile <- system.file('extdata', 'madClim2090.tif', package='enmSdmX')
madClimFut <- rast(rastFile)
plot(madClimFut)
Number of response data in a model object
Description
This function returns the number of response data used in a model (i.e., the sample size). If the data are binary it can return the number of 1s and 0s.
Usage
modelSize(x, binary = TRUE, graceful = TRUE)
Arguments
x |
A model object. This can be of many classes, including "gbm", "glm", "gam", "MaxEnt", and so on. |
binary |
If |
graceful |
If |
Value
One or two named integers.
Examples
set.seed(123)
y <- runif(1:101)^2
yBinary <- as.integer(y > 0.6)
x <- data.frame(x1=1:101, x2=rnorm(101))
model <- lm(y ~ x1 + x2, data=x)
modelBinary <- glm(yBinary ~ x1 + x2, data=x, family='binomial')
modelSize(model, FALSE)
modelSize(model, TRUE) # not binary input... notice warning
modelSize(modelBinary)
modelSize(modelBinary, FALSE)
Extract "most conservative" environments from points and/or polygons
Description
This function implements the "nearest environmental point" method (Smith et al. 2023) to enable the use of occurrence records geolocated only to a general place (e.g., a country or province), along with occurrences georeferenced with little error. The function returns environments from a set of precisely-geolocated points plus the environment associated with each imprecise record.
Usage
nearestEnvPoints(
rasts,
pts = NULL,
polys = NULL,
centerFrom = "pts",
pca = TRUE,
numPcs = 3,
center = TRUE,
scale = TRUE,
rule = "nearest",
na.rm = TRUE,
out = "both"
)
Arguments
rasts |
A |
pts |
A set of spatial points of class |
polys |
A set of spatial polygons of class |
centerFrom |
Indicates how to locate the "reference" centroid used to identify single points on each polygon. This is only relevant if both
|
pca |
If |
numPcs |
The number of PC axes used to find environmental centroids. This is only used if |
center , scale |
Settings for |
rule |
Determines how to identify the single environmental point to associate with each polygon. Options include:
|
na.rm |
If |
out |
Determines what is returned. Only used if both
|
Details
This function locates a set of points from the environments covered by each polygon using the following procedure, the details of which depend on what arguments are specified:
Only
pts
is specified: Environments are taken directly from the locations ofpts
in environmental space.Only
polys
is specified: Environments are taken from the closest environment of all the environments associated with each each polygon that is closest to the environmental centroid of the environmental centroids of the polygons (that may be confusing, but it is not a typo).-
pts
andpolys
are specified: Environments are taken from the locations ofpts
plus the environment from each polygon closest to the environmental centroid ofpts
. By default, the function uses the environmental centroid of the precise occurrences in step (1), but this can be changed to the environmental centroid of the centroids of the polygons or the environmental centroid of the points defined by the union of precise occurrence points plus the environmental centroids of the polygons.
The function can alternatively return the points on the vertices of the MCP, or points on the input polygons closest to the reference centroid.
Value
A data frame.
References
Smith, A.B., Murphy, S.J., Henderson, D., and Erickson, K.D. 2023. Including imprecisely georeferenced specimens improves accuracy of species distribution models and estimates of niche breadth. Global Ecology and Biogeography In press. Open access pre-print: doi:10.1101/2021.06.10.447988
See Also
nearestGeogPoints
for the "nearest geographic point" method, a related approach for geographic space.
Examples
# This is a contrived example based on red-bellied lemurs in Madagascar.
# Point locations (which are real data) will be assumed to be "precise"
# records. We will designate a set of Faritas ("counties") to represent
# "imprecise" occurrences that can only be georeferenced to a geopolitical
# unit.
library(sf)
library(terra)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur point data
data(lemurs)
precise <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
ll <- c('longitude', 'latitude')
precise <- sf::st_as_sf(precise[ , ll], coords=ll, crs=wgs84)
# *fake* lemur administrative unit-level data
faritras <- c('Vakinankaratra', 'Haute matsiatra', 'Ihorombe',
'Vatovavy Fitovinany', 'Alaotra-Mangoro', 'Analanjirofo', 'Atsinanana',
'Analamanga', 'Itasy')
data(mad1)
imprecise <- mad1[mad1$NAME_2 %in% faritras, ]
# climate predictors
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
rasts <- rast(rastFile)
### Plot environment of points and environments of each polygon closest to
### centroid of environments of points. In this example, we use the first two
### principal component axes to characterize the niche.
# apply Nearest Environmental Point method
envPtsPolys <- nearestEnvPoints(rasts, pts = precise, polys = imprecise,
pca = TRUE, numPcs = 2)
envPolys <- nearestEnvPoints(rasts, pts = precise, polys = imprecise, numPcs = 2,
out = 'polys')
envPts <- nearestEnvPoints(rasts, pts = precise, polys = imprecise, numPcs = 2,
out = 'pts')
allPolyEnvs <- extract(rasts, imprecise)
# plot occurrences in environmental space
plot(envPtsPolys$PC1, envPtsPolys$PC2, pch=16, col='black',
xlab='PC1', ylab='PC2')
points(envPolys$PC1, envPolys$PC2, pch=21, bg='orange')
legend(
'bottomleft',
inset = 0.01,
legend = c('precise', 'imprecise (closest)'),
pch = c(16, 21),
col = c('black', 'black'),
pt.bg = c('orange', 'orange')
)
### compare identified environments to all environments across all polygons
###########################################################################
env <- as.data.frame(rasts)
pca <- stats::prcomp(env, center=TRUE, scale.=TRUE)
allPolyEnvs <- extract(rasts, imprecise, ID = FALSE)
allPolyEnvsPcs <- predict(pca, allPolyEnvs)
allPolyEnvs <- cbind(allPolyEnvs, allPolyEnvsPcs)
# plot in environmental space
plot(allPolyEnvs$PC1, allPolyEnvs$PC2, pch=16, col='orange',
xlab='PC1', ylab='PC2')
points(envPts$PC1, envPts$PC2, pch=16)
points(envPolys$PC1, envPolys$PC2, pch=1)
legend(
'bottomleft',
inset = 0.01,
legend = c('precise', 'imprecise (closest)', 'imprecise (all)'),
pch = c(16, 21, 16),
col = c('black', 'black', 'orange'),
pt.bg = c(NA, 'orange')
)
### display niches (minimum convex hulls) estimated
### using just precise or precise + imprecise records
#####################################################
pcs <- c('PC1', 'PC2')
preciseIndices <- chull(envPts[ , pcs])
preciseImpreciseIndices <- chull(envPtsPolys[ , pcs])
preciseIndices <- c(preciseIndices, preciseIndices[1])
preciseImpreciseIndices <- c(preciseImpreciseIndices,
preciseImpreciseIndices[1])
preciseOnlyNiche <- envPts[preciseIndices, pcs]
preciseImpreciseNiche <- envPtsPolys[preciseImpreciseIndices, pcs]
# plot in environmental space
plot(allPolyEnvs$PC1, allPolyEnvs$PC2, pch=16, col='orange',
xlab='PC1', ylab='PC2')
points(envPts$PC1, envPts$PC2, pch=16)
points(envPolys$PC1, envPolys$PC2, pch=1)
lines(preciseImpreciseNiche, col='coral4', lwd=2)
lines(preciseOnlyNiche, lty='dotted')
legend(
'bottomleft',
inset = 0.01,
legend = c('precise', 'imprecise (closest)', 'imprecise (all)',
'MCP imprecise-only', 'MCP precise + imprecise'),
pch = c(16, 21, 16, NA, NA),
col = c('black', 'black', 'orange', 'black', 'coral4'),
pt.bg = c(NA, 'orange', NA, NA, NA),
lwd = c(NA, NA, NA, 1, 2),
lty = c(NA, NA, NA, 'dotted', 'solid')
)
Minimum convex polygon from a set of spatial polygons and/or points
Description
This function implements the "nearest geographic point" method (Smith et al. 2023) to enable the use of occurrence records geolocated only to a general place (e.g., a country or province), along with occurrences georeferenced with little error. The function returns a minimum convex polygon (MCP) constructed from a set of spatial polygons and/or points.
Usage
nearestGeogPoints(
pts = NULL,
polys = NULL,
centerFrom = "pts",
return = "mcp",
terra = TRUE
)
Arguments
pts |
Either |
polys |
Either |
centerFrom |
Indicates how to locate the "reference" centroid used to identify points on each polygon. This is only relevant if both
|
return |
Determines what is returned:
|
terra |
If |
Details
This function constructs a minimum convex polygon (MCP) from a set of spatial points and/or spatial polygons. The manner in which this is done depends on whether polys
and/or pts
are specified:
Only
pts
is supplied: The MCP is constructed directly from the points.Only
polys
is supplied: The MCP is constructed from the point on each polygon closest to the centroid of the centroids of the polygons.Both
pts
andpolys
are supplied: The MCP is constructed from the combined set ofpts
and from the point on each polygon closest to the centroid ofpts
. By default, the function uses the centroid of the precise occurrences in step (1), but this can be changed to the centroid of the centroids of the polygons or the centroid of the points defined by the union of precise occurrence points plus the centroids of the polygons.
The function can alternatively return the points on the vertices of the MCP, or points on the input polygons closest to the reference centroid.
Value
SpatVector
, or sf POLYGON
representing a minimum convex polygon.
References
Smith, A.B., Murphy, S.J., Henderson, D., and Erickson, K.D. 2023. Including imprecisely georeferenced specimens improves accuracy of species distribution models and estimates of niche breadth. Global Ecology and Biogeography 32:342-355. doi:10.1111/geb.13628 Open access pre-print: doi:10.1101/2021.06.10.447988.
See Also
nearestEnvPoints
for the "nearest environmental point" method, a related application for estimating niche breadth in environmental space.
Examples
library(sf)
library(terra)
#######################################################
### example using SpatVector inputs (terra package) ###
#######################################################
### prepare data
################
# Get coordinate reference systems:
# * WGS84
# * Tananarive (Paris) / Laborde Grid - EPSG:29701
wgs84 <- getCRS('WGS84')
madProj <- getCRS('Madagascar Albers')
# outline of Madagascar faritras
data(mad1)
mad1 <- vect(mad1)
mad1 <- project(mad1, madProj)
# lemur point data
data(lemurs)
redBelly <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
ll <- c('longitude', 'latitude')
redBelly <- vect(redBelly, geom=ll, crs=wgs84)
redBelly <- project(redBelly, madProj)
# *fake* lemur farita-level data
faritras <- c('Toamasina', 'Atsimo-Atsinana',
'Amoron\'i mania', 'Sava', 'Itasy')
polys <- mad1[mad1$NAME_2 %in% faritras, ]
### apply Nearest Geographic Point method
#########################################
# get three kinds of minimum convex polygons (MCPs):
# MCP using just polygons
mcpPolys <- nearestGeogPoints(polys = polys)
# MCP using just points
mcpPts <- nearestGeogPoints(pts = redBelly)
# MCP using points & polys
mcpPolysPoints <- nearestGeogPoints(pts = redBelly, polys = polys)
# compare extent of occurrence (EOO) in m2
expanse(mcpPolys)
expanse(mcpPts)
expanse(mcpPolysPoints)
### plot minimum convex polygons
################################
# MCP from precise occurrences only
plot(mad1, border='gray', main='MCP points only')
plot(polys, col='gray80', add=TRUE)
plot(mcpPts, col=scales::alpha('red', 0.4), add=TRUE)
plot(redBelly, pch=21, bg='red', add=TRUE)
legend('topleft',
legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'),
fill=c(NA, 'gray', scales::alpha('red', 0.4)),
pch=c(21, NA, NA),
pt.bg=c('red', NA, NA),
border=c(NA, 'black', 'black'))
# MCP from imprecise occurrences only
plot(mad1, border='gray', main='MCP polys only')
plot(polys, col='gray80', add=TRUE)
plot(mcpPolys, col=scales::alpha('orange', 0.4), add=TRUE)
plot(redBelly, pch=21, bg='red', add=TRUE)
legend('topleft',
legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'),
fill=c(NA, 'gray', scales::alpha('orange', 0.4)),
pch=c(21, NA, NA),
pt.bg=c('red', NA, NA),
border=c(NA, 'black', 'black'))
# MCP from precise and imprecise occurrences
plot(mad1, border='gray', main='MCP polys + points')
plot(polys, col='gray80', add=TRUE)
plot(mcpPolysPoints, col=scales::alpha('green', 0.4), add=TRUE)
plot(redBelly, pch=21, bg='red', add=TRUE)
legend('topleft',
legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'),
fill=c(NA, 'gray', scales::alpha('green', 0.4)),
pch=c(21, NA, NA),
pt.bg=c('red', NA, NA),
border=c(NA, 'black', 'black'))
############################################
### example using sf inputs (sf package) ###
############################################
### prepare data
################
# Get coordinate reference systems:
# * WGS84
# * Tananarive (Paris) / Laborde Grid - EPSG:29701
madProj <- sf::st_crs(getCRS('Madagascar Albers'))
wgs84 <- getCRS('WGS84')
# outline of Madagascar faritras
data(mad1)
mad1 <- sf::st_transform(mad1, madProj)
# lemur point occurrence data
data(lemurs)
redBelly <- lemurs[lemurs$species == 'Eulemur rubriventer', ]
ll <- c('longitude', 'latitude')
redBelly <- sf::st_as_sf(redBelly[ , ll], crs=wgs84, coords=ll)
redBelly <- sf::st_transform(redBelly, madProj)
# *fake* farita-level occurrences
faritras <- c('Toamasina', 'Atsimo-Atsinana',
'Amoron\'i mania', 'Sava', 'Itasy')
polys <- mad1[mad1$NAME_2 %in% faritras, ]
### apply Nearest Geographic Point method
#########################################
# get three kinds of minimum convex polygons (MCPs):
# MCP using just polygons
mcpPolys <- nearestGeogPoints(polys = polys, terra = FALSE)
# MCP using just points
mcpPts <- nearestGeogPoints(pts = redBelly, terra = FALSE)
# MCP using points & polys
mcpPolysPoints <- nearestGeogPoints(pts = redBelly, polys = polys,
terra = FALSE)
# extent of occurrence (EOO) in m2
sf::st_area(mcpPolys)
sf::st_area(mcpPts)
sf::st_area(mcpPolysPoints)
### plot minimum convex polygons
################################
# MCP from precise occurrences only
plot(st_geometry(mad1), border='gray', main='MCP points only')
plot(st_geometry(polys), col='gray80', add=TRUE)
plot(st_geometry(mcpPts), col=scales::alpha('red', 0.4), add=TRUE)
plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE)
legend('topleft',
legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'),
fill=c(NA, 'gray', scales::alpha('red', 0.4)),
pch=c(21, NA, NA),
pt.bg=c('red', NA, NA),
border=c(NA, 'black', 'black'))
# MCP from imprecise occurrences only
plot(st_geometry(mad1), border='gray', main='MCP points only')
plot(st_geometry(polys), col='gray80', add=TRUE)
plot(st_geometry(mcpPolys), col=scales::alpha('orange', 0.4), add=TRUE)
plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE)
legend('topleft',
legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'),
fill=c(NA, 'gray', scales::alpha('orange', 0.4)),
pch=c(21, NA, NA),
pt.bg=c('red', NA, NA),
border=c(NA, 'black', 'black'))
# MCP from precise and imprecise occurrences
plot(st_geometry(mad1), border='gray', main='MCP points only')
plot(st_geometry(polys), col='gray80', add=TRUE)
plot(st_geometry(mcpPolysPoints), col=scales::alpha('green', 0.4), add=TRUE)
plot(st_geometry(redBelly), pch=21, bg='red', add=TRUE)
legend('topleft',
legend=c('Precise occurrence', 'Imprecise occurrence', 'MCP'),
fill=c(NA, 'gray', scales::alpha('green', 0.4)),
pch=c(21, NA, NA),
pt.bg=c('red', NA, NA),
border=c(NA, 'black', 'black'))
### NOTE
# Using SpatVector input (terra package) yields EOOs that are slightly
# larger than using Spatial* (sp) or sf (sf) objects (by about 0.03-0.07%
# in this example). The difference arises because terra::expanse() yields a
# different value than sf::st_area.
Metrics of niche overlap
Description
This function calculates several metrics of niche overlap based on predictions for two species (or for the same species but different models) at the same sites.
Usage
nicheOverlapMetrics(
x1,
x2,
method = c("meanDiff", "meanAbsDiff", "rmsd", "d", "i", "esp", "cor", "rankCor"),
w = rep(1, length(x1)),
na.rm = FALSE,
...
)
Arguments
x1 |
Numeric. Vector of predictions from a model. |
x2 |
Numeric. Vector of predictions from another model. |
method |
Character vector, indicates type of metric to calculate:
|
w |
Numeric vector. Weights of predictions in |
na.rm |
Logical. If T |
... |
Other arguments (not used). |
Value
List object with one element per value specified by the argument in method
.
References
Warren, D.L., Glor, R.E., and Turelli, M. 2008. Environmental niche equivalency versus conservatism: Quantitative approaches to niche evolution. Evolution 62:2868-2883. doi:10.1111/j.1558-5646.2008.00482.x
Warren, D.L., Glor, R.E., and Turelli, M. 2008. Erratum. Evolution 62:2868-2883. doi:10.1111/j.1558-5646.2010.01204.x
Godsoe, W. 2014. Inferring the similarity of species distributions using Species' Distribution Models. Ecography 37:130-136. doi:10.1111/j.1600-0587.2013.00403.x
See Also
Examples
x1 <- seq(0, 1, length.out=100)
x2 <- x1^2
nicheOverlapMetrics(x1, x2)
Create spatial polygon same size as a plot
Description
This function creates a "rectangular" SpatVector
object with the same dimensions as a plot window. It is especially useful for cropping subsequent rasters or vector objects to the plot window. A plot must be made before calling this function.
Usage
plotExtent(x = NULL)
Arguments
x |
Either |
Value
SpatVector
See Also
Examples
if (FALSE) {
library(sf)
data(mad0)
plot(st_geometry(mad0))
outline <- plotExtent(mad0)
plot(outline, col='cornflowerblue', lty='dotted')
plot(st_geometry(mad0), add=TRUE)
}
Generic predict function for SDMs/ENMs
Description
This is a generic predict function that automatically uses the model common arguments for predicting models of the following types: linear models, generalized linear models (GLMs), generalized additive models (GAMs), random forests, boosted regression trees (BRTs)/gradient boosting machines (GBMs), conditional random forests, MaxEnt, and more.
Usage
predictEnmSdm(
model,
newdata,
maxentFun = "terra",
scale = TRUE,
cores = 1,
nrows = nrow(newdata),
paths = .libPaths(),
...
)
Arguments
model |
Object of class |
newdata |
Data frame or matrix, or |
maxentFun |
This argument is only used if the |
scale |
Logical. If the model is a GLM trained with |
cores |
Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. This is forced to 1 if |
nrows |
Number of rows of |
paths |
Locations where packages are stored. This is typically not useful to the general user, and is only supplied for when the function is called as a functional. |
... |
Arguments to pass to the algorithm-specific |
Value
Numeric or SpatRaster
.
See Also
predict
from the stats package, predict
from the terra package, predictMaxEnt
, predictMaxNet
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Predict a MaxEnt model object (with optional feature-level permutation)
Description
Takes a MaxEnt lambda
object or a MaxEnt object and returns raw or logistic predictions. Its output is the same as the predict
function from the terra package, and in fact, is slower than the function from terra. However, this function does allow custom manipulations that those functions do not allow (e.g., permuting product features while leaving other features with the same variables intact). This function does not clamp predictions–beyond the range of the training data, it extends the prediction in the direction it was going (up/down/no change). The function is based on Peter D. Wilson's document "Guidelines for computing MaxEnt model output values from a lambdas file". The function has a special feature in that it allows you to permute single variables or combinations of variables in specific features before making predictions. This is potentially useful, for example, if you wanted to determine the relative importance of a quadratic feature for a particular variable in a Maxent model relative to the other features in the model. You can also permute values of a variable regardless of which features they appear in. For product features, you can implement the permutation before or after the values are multiplied together (before often makes for bigger differences in predictions).
Usage
predictMaxEnt(
x,
data,
type = "cloglog",
perm = NULL,
permLinear = NULL,
permQuad = NULL,
permHinge = NULL,
permThresh = NULL,
permProd = NULL,
permProdRule = NULL,
...
)
Arguments
x |
Either a MaxEnt lambda object or a MaxEnt model object |
data |
Data frame. Data to which to make predictions |
type |
Character. One of:
|
perm |
Character vector. Name(s) of variable to permute before calculating predictions. This permutes the variables for all features in which they occur. If a variable is named here, it overrides permutation settings for each feature featType. Note that for product features the variable is permuted before the product is taken. This permutation is performed before any subsequent permutations (i.e., so if both variables in a product feature are included in |
permLinear |
Character vector. Names(s) of variables to permute in linear features before calculating predictions. Ignored if |
permQuad |
Names(s) of variables to permute in quadratic features before calculating predictions. Ignored if |
permHinge |
Character vector. Names(s) of variables to permute in forward/reverse hinge features before calculating predictions. Ignored if |
permThresh |
Character vector. Names(s) of variables to permute in threshold features before calculating predictions. Ignored if |
permProd |
Character list. A list object of |
permProdRule |
Character. Rule for how permutation of product features is applied: |
... |
Extra arguments (not used). |
Value
Numeric.
See Also
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Predictions from a MaxNet model
Description
This function is the same as the predict
function in the maxnet package, except that:
If the input is a data frame, the output is a vector as output (not a single-column matrix);
If the input is a
SpatRaster
, the output is aSpatRaster
;The default output is on the cloglog scale;
The function can be explicitly called (versus doing, say,
maxnet:::predict.maxnet
, which does not work even when that would be really useful...).
Usage
predictMaxNet(model, newdata, clamp = TRUE, type = "cloglog", ...)
Arguments
model |
Object of class |
newdata |
Object of class |
clamp |
If |
type |
One of:
|
... |
Other arguments (unused). |
Value
Numeric vector or SpatRaster
See Also
predict
from the terra package, and maxnet
(see the predict
function therein)
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Plot response curves for one or more models
Description
This function creates plots of response curves for one or more models. Response curves show how model predictions change as a particular predictor is changed, while holding all other predictors constant. The function can plot response curves for a single model or for multiple models, and can plot response curves for multiple predictors across multiple models.
Usage
responseCurves(
models,
env,
ref = NULL,
vars = NULL,
modelNames = NULL,
constantFx = mean,
combine = TRUE,
ncol = NULL,
n = 200
)
Arguments
models |
Either a model object (like |
env |
A |
ref |
Either
|
vars |
Either |
modelNames |
Either |
constantFx |
Function used to calculate the constant value for each predictor. The default is |
combine |
Logical. If |
ncol |
Number of columns to use when combining plots. If |
n |
Number of points to use for each variable in the response curve. Default is 200. |
Value
Either a ggplot grob
object or a list
of ggplot grob
objects.
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Sample random points from a raster with/out replacement
Description
This function returns coordinates randomly located on a raster where cells can be sampled with replacement (if desired) and where the probability of selection is proportionate to the cell value, cell area, or the product of cell value times cell area.
Usage
sampleRast(x, n, adjArea = TRUE, replace = TRUE, prob = TRUE)
Arguments
x |
|
n |
Positive integer. Number of points to draw. |
adjArea |
If |
replace |
If |
prob |
If |
Value
2-column matrix with longitude and latitude of random points.
See Also
Examples
library(terra)
r <- rast()
nc <- ncell(r)
r[] <- 1:nc
rands1 <- sampleRast(r, 10000)
rands2 <- sampleRast(r, 10000, adjArea=FALSE)
rands3 <- sampleRast(r, 10000, prob=FALSE)
rands4 <- sampleRast(r, 10000, adjArea=FALSE, prob=FALSE)
oldPar <- par(mfrow=c(2, 2))
plot(r, main='adjArea = TRUE & prob = TRUE')
points(rands1, pch='.')
plot(r, main='adjArea = FALSE & prob = TRUE')
points(rands2, pch='.')
plot(r, main='adjArea = TRUE & prob = FALSE')
points(rands3, pch='.')
plot(r, main='adjArea = FALSE & prob = FALSE')
points(rands4, pch='.')
par(oldPar)
Convert SpatVector to Spatial*
Description
This function converts a SpatVector
object from the terra package to a Spatial
object of the appropriate class (SpatialPoints
, SpatialPointsDataFrame
, SpatialPolygons
, or SpatialPolygonsDataFrame
) from the sp package. Note that sp is to be retired in 2023, so this function is to be come useful only for legacy applications.
Usage
spatVectorToSpatial(x)
Arguments
x |
|
Value
Object of class Spatial
.
Examples
library(terra)
f <- system.file('ex/lux.shp', package='terra')
v <- vect(f)
spat <- spatVectorToSpatial(v)
class(spat)
Create a raster with square cells
Description
This function creates a raster from an object with an extent (i.e., another raster or similar spatial object) with square cells. The user can specify cell resolution (linear dimension) or the approximate number of cells desired.
Usage
squareCellRast(x, numCells = NULL, res = NULL, vals = NULL)
Arguments
x |
An object with a spatial extent property (e.g., a |
numCells |
Positive integer, approximate number of cells desired. If this is specified, then |
res |
Positive numeric. Size of a cell in the units of the projection of |
vals |
Numeric, value to assign to cells. Note that if this is shorter than the number of cells in the output, then values will be recycled. If longer, then values will be truncated. The default is to assign all 0s. |
Value
SpatRaster
object. The raster will have an extent of the same size or larger than the extent of x
.
Examples
library(sf)
library(terra)
# project outline of Madagascar to equal-area:
data(mad0)
mad0Ea <- st_transform(mad0, getCRS('madAlbers'))
n <- 101
cellSize_meters <- 10E4
byNumCells <- squareCellRast(mad0Ea, numCells=n)
byCellSize <- squareCellRast(mad0Ea, res=cellSize_meters)
oldPar <- par(mfrow=c(1, 2))
main1 <- paste0('Cells: ', n, ' desired, ', ncell(byNumCells), ' actual')
plot(byNumCells, main = main1)
plot(mad0Ea, add = TRUE)
main2 <- paste0('Cells ', cellSize_meters, ' m on a side')
plot(byCellSize, main = main2)
plot(mad0Ea, add = TRUE)
par(oldPar)
# Note that in this example they look the same, but the one on the left
# has one less row than the one on the right.
Summarize distribution/niche model cross-validation object
Description
This function summarizes models calibrated using the trainByCrossValid
function. It returns aspects of the best models across k-folds (the particular aspects depends on the kind of models used).
Usage
summaryByCrossValid(
x,
metric = "cbiTest",
decreasing = TRUE,
interceptOnly = TRUE
)
Arguments
x |
The output from the |
metric |
Metric by which to select the best model in each k-fold. This can be any of the columns that appear in the data frames in
|
decreasing |
Logical, if |
interceptOnly |
Logical. If |
Value
Data frame with statistics on the best set of models across k-folds. In each case, the "best" model for each fold is used. Depending on the model algorithm, this output is a data frame with these fields:
BRTs (boosted regression trees): Learning rate, tree complexity, and bag fraction.
GLMs (generalized linear models): Frequency of use of each term in the best models.
MaxEnt and MaxNet: Frequency with which each combination of feature classes was used and mean master regularization multiplier.
NSs (natural splines): One row per fold and one column per predictor, with values representing the maximum degrees of freedom used for each variable in the best model of each fold.
RFs (random forests): One row per fold, with values representing the optimal value of
numTrees
andmtry
(seeranger
).
See Also
trainByCrossValid
, trainBRT
, trainGAM
, trainGLM
, trainMaxEnt
, trainNS
, trainRF
Examples
# The example below show a very basic modeling workflow. It has been
# designed to work fast, not produce accurate, defensible models.
# The general idea is to calibrate a series of models and evaluate them
# against a withheld set of data. One can then use the series of models
# of the top models to better select a "final" model.
## Not run:
# Running the entire set of commands can take a few minutes. This can
# be sped up by increasing the number of cores used. The examples below use
# one core, but you can change that argument according to your machine's
# capabilities.
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create background sites (using just 1000 to speed things up!)
bgEnv <- terra::spatSample(madClim, 3000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
# using "vector" form of "folds" argument
folds <- predicts::kfold(env, 3) # just 3 folds (for speed)
### calibrate models
####################
cores <- 1 # increase this to go faster, if your computer handles it
## MaxEnt
mxx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainMaxEnt,
regMult = 1:2, # too few values for valid model, but fast!
verbose = 1,
cores = cores
)
# summarize MaxEnt feature sets and regularization across folds
summaryByCrossValid(mxx)
## MaxNet
mnx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainMaxNet,
regMult = 1:2, # too few values for valid model, but fast!
verbose = 1,
cores = cores
)
# summarize MaxEnt feature sets and regularization across folds
summaryByCrossValid(mnx)
## generalized linear models
glx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainGLM,
verbose = 1,
cores = cores
)
# summarize GLM terms in best models
summaryByCrossValid(glx)
## generalized additive models
gax <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainGAM,
verbose = 1,
cores = cores
)
# summarize GAM terms in best models
summaryByCrossValid(gax)
## natural splines
nsx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainNS,
df = 1:2,
verbose = 1,
cores = cores
)
# summarize NS terms in best models
summaryByCrossValid(nsx)
## boosted regression trees
brtx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainBRT,
learningRate = c(0.001, 0.0001), # too few values for reliable model(?)
treeComplexity = c(2, 4), # too few values for reliable model, but fast
minTrees = 1000,
maxTrees = 1500, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = 1,
cores = cores
)
# summarize BRT parameters across best models
summaryByCrossValid(brtx)
## random forests
rfx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainRF,
verbose = 1,
cores = cores
)
# summarize RF parameters in best models
summaryByCrossValid(rfx)
## End(Not run)
Calibrate a boosted regression tree (generalized boosting machine) model
Description
This function calibrates a boosted regression tree (or gradient boosting machine) model, and is a wrapper for gbm
. The function uses a grid search to assess the best combination of learning rate, tree depth, and bag fraction based on cross-validated deviance. If a particular combination of parameters leads to an unconverged model, the script attempts again using slightly different parameters. Its output is any or all of: a table with deviance of evaluated models; all evaluated models; and/or the single model with the lowest deviance.
Usage
trainBRT(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
learningRate = c(1e-04, 0.001, 0.01),
treeComplexity = c(5, 3, 1),
bagFraction = 0.6,
minTrees = 1000,
maxTrees = 8000,
tries = 5,
tryBy = c("learningRate", "treeComplexity", "maxTrees", "stepSize"),
w = TRUE,
anyway = FALSE,
family = "bernoulli",
out = "model",
cores = 1,
verbose = FALSE,
...
)
Arguments
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
learningRate |
Numeric. Learning rate at which model learns from successive trees (Elith et al. 2008 recommend 0.0001 to 0.1). |
treeComplexity |
Positive integer. Tree complexity: depth of branches in a single tree (1 to 16). |
bagFraction |
Numeric in the range [0, 1]. Bag fraction: proportion of data used for training in cross-validation (Elith et al. 2008 recommend 0.5 to 0.7). |
minTrees |
Positive integer. Minimum number of trees to be scored as a "usable" model (Elith et al. 2008 recommend at least 1000). Default is 1000. |
maxTrees |
Positive integer. Maximum number of trees in model set. |
tries |
Integer > 0. Number of times to try to train a model with a particular set of tuning parameters. The function will stop training the first time a model converges (usually on the first attempt). Non-convergence seems to be related to the number of trees tried in each step. So if non-convergence occurs then the function automatically increases the number of trees in the step size until |
tryBy |
Character vector. A list that contains one or more of |
w |
Weights. Any of:
|
anyway |
Logical. If |
family |
Character. Name of error family. |
out |
Character vector. One or more values:
|
cores |
Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Additional arguments (not used). |
Value
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of two or more of these.
References
Elith, J., J.R. Leathwick, & T. Hastie. 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77:802-813. doi:10.1111/j.1365-2656.2008.01390.x
See Also
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Calibrate a distribution/niche model using cross-validation
Description
This function is an extension of any of the trainXYZ
functions for calibrating species distribution and ecological niche models. This function uses the trainXYZ
function to calibrate and evaluate a suite of models using cross-validation. The models are evaluated against withheld data to determine the optimal settings for a "final" model using all available data. The function returns a set of models and/or a table with statistics on each model. The statistics represent various measures of model accuracy, and are calculated against training and test sites (separately).
Usage
trainByCrossValid(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
folds = predicts::folds(data),
trainFx = enmSdmX::trainGLM,
...,
weightEvalTrain = TRUE,
weightEvalTest = TRUE,
na.rm = FALSE,
outputModels = TRUE,
verbose = 0
)
Arguments
data |
Data frame or matrix. Response variable and environmental predictors (and no other fields) for presences and non-presence sites. |
resp |
Character or integer. Name or column index of response variable. Default is to use the first column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in |
folds |
Either a numeric vector, or matrix or data frame which specify which rows in
|
trainFx |
Function, name of the |
... |
Arguments to pass to the "trainXYZ" function. |
weightEvalTrain |
Logical, if |
weightEvalTest |
Logical, if |
na.rm |
Logical, if |
outputModels |
If |
verbose |
Numeric. If 0 show no progress updates. If > 0 then show minimal progress updates for this function only. If > 1 show detailed progress for this function. If > 2 show detailed progress plus detailed progress for the |
Details
In some cases models do not converge (e.g., boosted regression trees and generalized additive models sometimes suffer from this issue). In this case the model will be skipped, but a data frame with the k-fold and model number in the fold will be returned in the $meta element in the output. If no models converged, then this data frame will be empty.
Value
A list object with several named elements:
-
meta
: Meta-data on the model call. -
folds
: Thefolds
object. -
models
(ifoutputModels
isTRUE
): A list of model objects, one per data fold. -
tuning
: One data frame per k-fold, each containing evaluation statistics for all candidate models in the fold. In addition to algorithm-specific fields, these consist of:-
'logLoss'
: Log loss. Higher (less negative) values imply better fit. -
'cbi'
: Continuous Boyce Index (CBI). Calculated withevalContBoyce
. -
'auc'
: Area under the receiver-operator characteristic curve (AUC). Calculated withevalAUC
. -
'tss'
: Maximum value of the True Skill Statistic. Calculated withevalTSS
. -
'msss'
: Sensitivity and specificity calculated at the threshold that maximizes sensitivity (true presence prediction rate) plus specificity (true absence prediction rate). -
'mdss'
: Sensitivity (se) and specificity (sp) calculated at the threshold that minimizes the difference between sensitivity and specificity. -
'minTrainPres'
: Sensitivity (se) and specificity (sp) at the greatest threshold at which all training presences are classified as "present". -
'trainSe95'
and/or'trainSe90'
: Sensitivity (se) and specificity (sp) at the threshold that ensures either 95 or 90 percent of all training presences are classified as "present" (training sensitivity = 0.95 or 0.9).
-
References
Fielding, A.H. and J.F. Bell. 1997. A review of methods for the assessment of prediction errors in conservation presence/absence models. Environmental Conservation 24:38-49. doi:10.1017/S0376892997000088 La Rest, K., Pinaud, D., Monestiez, P., Chadoeuf, J., and Bretagnolle, V. 2014. Spatial leave-one-out cross-validation for variable selection in the presence of spatial autocorrelation. Global Ecology and Biogeography 23:811-820. doi:10.1111/geb.12161 Radosavljevic, A. and Anderson, R.P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography 41:629-643. doi:10.1111/jbi.12227
See Also
summaryByCrossValid
, trainBRT
, trainGAM
, trainGLM
, trainMaxEnt
, trainMaxNet
, trainNS
, trainRF
Examples
# The example below show a very basic modeling workflow. It has been
# designed to work fast, not produce accurate, defensible models.
# The general idea is to calibrate a series of models and evaluate them
# against a withheld set of data. One can then use the series of models
# of the top models to better select a "final" model.
## Not run:
# Running the entire set of commands can take a few minutes. This can
# be sped up by increasing the number of cores used. The examples below use
# one core, but you can change that argument according to your machine's
# capabilities.
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create background sites (using just 1000 to speed things up!)
bgEnv <- terra::spatSample(madClim, 3000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[sample(nrow(bgEnv), 1000), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
# using "vector" form of "folds" argument
folds <- predicts::kfold(env, 3) # just 3 folds (for speed)
### calibrate models
####################
cores <- 1 # increase this to go faster, if your computer handles it
## MaxEnt
mxx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainMaxEnt,
regMult = 1:2, # too few values for valid model, but fast!
verbose = 1,
cores = cores
)
# summarize MaxEnt feature sets and regularization across folds
summaryByCrossValid(mxx)
## MaxNet
mnx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainMaxNet,
regMult = 1:2, # too few values for valid model, but fast!
verbose = 1,
cores = cores
)
# summarize MaxEnt feature sets and regularization across folds
summaryByCrossValid(mnx)
## generalized linear models
glx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainGLM,
verbose = 1,
cores = cores
)
# summarize GLM terms in best models
summaryByCrossValid(glx)
## generalized additive models
gax <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainGAM,
verbose = 1,
cores = cores
)
# summarize GAM terms in best models
summaryByCrossValid(gax)
## natural splines
nsx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainNS,
df = 1:2,
verbose = 1,
cores = cores
)
# summarize NS terms in best models
summaryByCrossValid(nsx)
## boosted regression trees
brtx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainBRT,
learningRate = c(0.001, 0.0001), # too few values for reliable model(?)
treeComplexity = c(2, 4), # too few values for reliable model, but fast
minTrees = 1000,
maxTrees = 1500, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = 1,
cores = cores
)
# summarize BRT parameters across best models
summaryByCrossValid(brtx)
## random forests
rfx <- trainByCrossValid(
data = env,
resp = 'presBg',
preds = c('bio1', 'bio12'),
folds = folds,
trainFx = trainRF,
verbose = 1,
cores = cores
)
# summarize RF parameters in best models
summaryByCrossValid(rfx)
## End(Not run)
Calibrate an ensemble of small models
Description
This function calibrates a set of "ensembles of small models" (ESM), which are designed for modeling species with few occurrence records. In the original formulation, each model has two covariates interacting additively. Models are calibrated using all possible combinations of covariates. By default, this function does the same, but can also include univariate models, models with two covariates plus their interaction term, and models with quadratic and corresponding linear terms. This function will only train generalized linear models. Extending the types of algorithms is planned!
Usage
trainESM(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
univariate = FALSE,
quadratic = FALSE,
interaction = FALSE,
interceptOnly = FALSE,
method = "glm.fit",
scale = NA,
w = TRUE,
family = stats::binomial(),
...,
verbose = FALSE
)
Arguments
data |
Data frame or matrix. Response variable and environmental predictors (and no other fields) for presences and non-presence sites. |
resp |
Character or integer. Name or column index of response variable. Default is to use the first column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in |
univariate , quadratic , interaction |
|
interceptOnly |
If |
method |
Character: Name of function used to solve the GLM. For "normal" GLMs, this can be |
scale |
Either |
w |
Weights. Any of:
|
family |
Character or function. Name of family for data error structure (see |
... |
Arguments to pass to |
verbose |
Logical. If |
Value
A list object with several named elements:
-
models
: A list with each ESM model. -
tuning
: Adata.frame
with one row per model, in the order as they appear in$models
.
References
Breiner, F.T., Guisan, A., Bergamini, A., and Nobis, M.P. 2015. Overcoming limitations of modeling rare species by using ensembles of small models. Methods in Ecology and Evolution 6:1210-1218.. doi:10.1111/2041-210X.12403 Lomba, A., L. Pellissier, C. Randin, J. Vicente, J. Horondo, and A. Guisan. 2010. Overcoming the rare species modeling complex: A novel hierarchical framework applied to an Iberian endemic plant. Biological Conservation 143:2647-2657. doi:10.1016/j.biocon.2010.07.007
See Also
trainBRT
, trainGAM
, trainGLM
, trainMaxEnt
, trainMaxNet
, trainNS
, trainRF
, trainByCrossValid
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# "traditional" ESMs with just 2 linear predictors
# just one model in this case because we have just 2 predictors
esm1 <- trainESM(
data = env,
resp = 'presBg',
preds = predictors,
family = stats::binomial(),
scale = TRUE,
w = TRUE
)
str(esm1, 1)
esm1$tuning
# extended ESM with other kinds of terms
esm2 <- trainESM(
data = env,
resp = 'presBg',
preds = predictors,
univariate = TRUE,
quadratic = TRUE,
interaction = TRUE,
interceptOnly = TRUE,
family = stats::binomial(),
scale = TRUE,
w = TRUE,
verbose = TRUE
)
str(esm2, 1)
esm2$tuning
### make a set of predictions to rasters
########################################
# center environmental rasters and divide by their SD
madClimScaled <- scale(madClim, center = esm2$scale$mean, scale = esm2$scale$sd)
# make one raster per model
predictions <- list()
for (i in 1:length(esm2$models)) {
predictions[[i]] <- predict(madClimScaled, esm2$models[[i]], type = 'response')
}
# combine into a "stack"
predictions <- do.call(c, predictions)
names(predictions) <- esm2$tuning$model
plot(predictions)
# calculate (unweighted) mean
prediction <- mean(predictions)
plot(prediction)
plot(occs, pch = 1, add = TRUE)
Calibrate a generalized additive model (GAM)
Description
This function constructs a generalized additive model. By default, the model is constructed in a two-stage process. First, the "construct" phase generates a series of simple models with univariate and bivariate interaction terms. These simple models are then ranked based on their AICc. Second, the "select" phase creates a "full" model from the simple models such that there is at least presPerTermInitial
presences (if the response is binary) or data rows (if not) for each smooth term to be estimated (not counting the intercept). Finally, it selects the best model using AICc from all possible subsets of this "full" model. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.
Usage
trainGAM(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
gamma = 1,
scale = 0,
smoothingBasis = "cs",
interaction = "te",
interceptOnly = TRUE,
construct = TRUE,
select = TRUE,
presPerTermInitial = 10,
presPerTermFinal = 10,
maxTerms = 8,
w = TRUE,
family = "binomial",
out = "model",
cores = 1,
verbose = FALSE,
...
)
Arguments
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
gamma |
Initial penalty to degrees of freedom to use (larger ==> smoother fits). |
scale |
A numeric value indicating the "scale" parameter (see argument |
smoothingBasis |
Character. Indicates the type of smoothing basis. The default is |
interaction |
Character or |
interceptOnly |
If |
construct |
If |
select |
If |
presPerTermInitial |
Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only if |
presPerTermFinal |
Positive integer. Minimum number of presence sites per term in initial starting model; used only if |
maxTerms |
Maximum number of terms to be used in any model, not including the intercept (default is 8). Used only if |
w |
Weights. Any of:
|
family |
Name of family for data error structure (see |
out |
Character vector. One or more values:
|
cores |
Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Extra arguments (not used). |
Value
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these.
See Also
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Calibrate a generalized linear model (GLM)
Description
This function constructs a generalized linear model. By default, the model is constructed in a two-stage process. First, the "construct" phase generates a series of simple models with univariate, quadratic, or 2-way-interaction terms. These simple models are then ranked based on their AICc. Second, the "select" phase creates a "full" model from the simple models such that there is at least presPerTermInitial
presences (if the response is binary) or data rows (if not) for each coefficient to be estimated (not counting the intercept). Finally, it selects the best model using AICc from all possible subsets of this "full" model, while respecting marginality (i.e., all lower-order terms of higher-order terms appear in the model).
The function outputs any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.
Usage
trainGLM(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
scale = NA,
construct = TRUE,
select = TRUE,
quadratic = TRUE,
interaction = TRUE,
interceptOnly = TRUE,
method = "glm.fit",
presPerTermInitial = 10,
presPerTermFinal = 10,
maxTerms = 8,
w = TRUE,
family = stats::binomial(),
removeInvalid = TRUE,
failIfNoValid = TRUE,
out = "model",
cores = 1,
verbose = FALSE,
...
)
Arguments
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
scale |
Either |
construct |
Logical. If |
select |
Logical. If |
quadratic |
Logical. Used only if |
interaction |
Logical. Used only if |
interceptOnly |
If |
method |
Character: Name of function used to solve the GLM. For "normal" GLMs, this can be |
presPerTermInitial |
Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only is |
presPerTermFinal |
Positive integer. Minimum number of presence sites per term in initial starting model. Used only if |
maxTerms |
Maximum number of terms to be used in any model, not including the intercept (default is 8). Used only if |
w |
Weights. Any of:
|
family |
Name of family for data error structure (see |
removeInvalid |
Logical. If |
failIfNoValid |
Logical. If |
out |
Character vector. One or more values:
|
cores |
Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Arguments to pass to |
Details
This function is designed to find the most parsimonious model given the amount of calibration data that is available to it. 'trainGLM()' can work with any data, but has been designed to work specifically as a species distribution model where the response is either binary (default) or abundance. Specifically, it 1) identifies the most parsimonious model (lowest AICc) with 2) optimal flexibility (optimal degrees of freedom in splines) and 3) allows for (but does not require) interaction terms between predictors (if desired). If the defaults are used, the following procedure is applied:
Constructing a set of simple model terms, each with 1 to 4 degrees of freedom. Terms can be univariate or bilabiate (two-way interactions). Predictors can be continuous or factors. If any simple models has convergence issues or boundary issues (coefficients that approach negative or positive infinity), it is removed.
Constructing a series of models, each with one of the terms, then using the models to rank terms by AICc.
From the top set of terms, creating a "full" model. The full model will ensure the maximum number of terms is <= 'maxTerms', and that for each term, there are at least 'presPerTermFinal' data points.
All possible submodels, plus the full model, are evaluated and ranked by AICc. If a model has convergence or boundary issues, it is removed from the set. The most parsimonious model (lowest AICc) is returned.
Value
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these. If scale
is TRUE
, any model object will also have an element named $scale
, which contains the means and standard deviations for predictors that are not factors. The data frame reports the AICc for all of the models evaluated, sorted by best to worst. The converged
column indicates whether the model converged ("TRUE
" is good), and the boundary
column whether the model parameters are near the boundary (usually, negative or positive infinity; "FALSE
" is good).
See Also
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Calibrate a MaxEnt model using AICc
Description
This function calculates the "best" MaxEnt model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The best model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better).
Usage
trainMaxEnt(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
regMult = c(seq(0.5, 5, by = 0.5), 7.5, 10),
classes = "default",
testClasses = TRUE,
dropOverparam = TRUE,
anyway = TRUE,
forceLinear = TRUE,
jackknife = FALSE,
arguments = NULL,
scratchDir = NULL,
out = "model",
cores = 1,
verbose = FALSE,
...
)
Arguments
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
regMult |
Numeric vector. Values of the master regularization parameters (called |
classes |
Character vector. Names of feature classes to use (either |
testClasses |
Logical. If |
dropOverparam |
Logical, if |
anyway |
Logical. Same as |
forceLinear |
Logical. If |
jackknife |
Logical. If |
arguments |
|
scratchDir |
Character. Directory to which to write temporary files. Leave as NULL to create a temporary folder in the current working directory. |
out |
Character vector. One or more values:
|
cores |
Number of cores to use. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Extra arguments. Not used. |
Details
The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. The function requires the maxent
jar file. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.
Note that due to differences in how MaxEnt and MaxNet are implemented in their base packages, the models will not necessarily be the same even for the same training data.
This function is a wrapper for MaxEnt()
. The MaxEnt
function creates a series of files on disk for each model. This function assumes you do not want those files, so deletes most of them. However, there is one that cannot be deleted and the normal ways of changing its permissions in R do not work. So the function simply writes over that file (which is allowed) to make it smaller. Regardless, if you run many models your temporary directory (argument scratchDir
) can fill up and require manual deletion.
Value
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these.
References
Warren, D.L. and S.N. Siefert. 2011. Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335-342. doi:10.1890/10-1171.1
See Also
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Calibrate a MaxNet model using AICc
Description
This function calculates the "best" MaxNet model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The "best" model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better).
Usage
trainMaxNet(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
regMult = c(seq(0.5, 5, by = 0.5), 7.5, 10),
classes = "default",
testClasses = TRUE,
dropOverparam = TRUE,
forceLinear = TRUE,
out = "model",
cores = 1,
verbose = FALSE,
...
)
Arguments
data |
Data frame or matrix. Contains a column indicating whether each row is a presence (1) or background (0) site, plus columns for environmental predictors. |
resp |
Character or integer. Name or column index of response variable. Default is to use the first column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in |
regMult |
Numeric vector. Values of the master regularization parameters (called |
classes |
Character vector. Names of feature classes to use (either |
testClasses |
Logical. If |
dropOverparam |
Logical, if |
forceLinear |
Logical. If |
out |
Character vector. One or more values:
|
cores |
Number of cores to use. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Extra arguments. Not used. |
Details
The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc.
Note that due to differences in how MaxEnt and MaxNet are implemented in their base packages, the models will not necessarily be the same even for the same training data.
Value
If out = 'model'
this function returns an object of class MaxEnt
. If out = 'tuning'
this function returns a data frame with tuning parameters, log likelihood, and AICc for each model tried. If out = c('model', 'tuning'
then it returns a list object with the MaxEnt
object and the data frame.
References
Phillips, S.J., Anderson, R.P., DudÃk, M. Schapire, R.E., and Blair, M.E. 2017. Opening the black box: An open-source release of Maxent. Ecography 40:887-893. doi:10.1111/ecog.03049 Warren, D.L. and S.N. Siefert. 2011. Ecological niche modeling in Maxent: The importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335-342. doi:10.1890/10-1171.1
See Also
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Calibrate a natural splines model
Description
This function constructs a natural-spline model by evaluating all possible models given the available predictors and constraints. "Constraints" in this case include the degrees of freedom for a spline, whether or not interaction terms are included, minimum number of presence sites per model term, and maximum number of terms to include in the model. Its output is any or all of: the most parsimonious model (lowest AICc); all models evaluated; and/or a table with AICc for all evaluated models.
Usage
trainNS(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
scale = NA,
df = 1:4,
interaction = TRUE,
interceptOnly = TRUE,
method = "glm.fit",
presPerTermFinal = 10,
maxTerms = 8,
w = TRUE,
family = "binomial",
removeInvalid = TRUE,
failIfNoValid = TRUE,
out = "model",
cores = 1,
verbose = FALSE,
...
)
Arguments
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
scale |
Either |
df |
A vector of integers > 0 or |
interaction |
If |
interceptOnly |
If |
method |
Character, name of function used to solve. This can be |
presPerTermFinal |
Minimum number of presence sites per term in initial starting model. |
maxTerms |
Maximum number of terms to be used in any model, not including the intercept (default is 8). |
w |
Weights. Any of:
|
family |
Name of family for data error structure (see |
removeInvalid |
Logical. If |
failIfNoValid |
Logical. If |
out |
Character vector. One or more values:
|
cores |
Number of cores to use. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Arguments to send to |
Details
This function is designed to find the most parsimonious model given the amount of calibration data that is available to it. 'trainNS()' can work with any data, but has been designed to work specifically as a species distribution model where the response is either binary (default) or abundance. Specifically, it 1) identifies the most parsimonious model (lowest AICc) with 2) optimal flexibility (optimal degrees of freedom in splines) and 3) allows for (but does not require) interaction terms between predictors (if desired). If the defaults are used, the following procedure is applied:
Constructing a set of simple model terms, each with 1 to 4 degrees of freedom. Terms can be univariate or bilabiate (two-way interactions). Predictors can be continuous or factors. If any simple models has convergence issues or boundary issues (coefficients that approach negative or positive infinity), it is removed.
Constructing a series of models, each with one of the terms, then using the models to rank terms by AICc.
From the top set of terms, creating a "full" model. The full model will ensure the maximum number of terms is <= 'maxTerms', and that for each term, there are at least 'presPerTermFinal' data points.
All possible submodels, plus the full model, are evaluated and ranked by AICc. If a model has convergence or boundary issues, it is removed from the set. The most parsimonious model (lowest AICc) is returned.
Value
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these. If scale
is TRUE
, any model object will also have an element named $scale
, which contains the means and standard deviations for predictors that are not factors. The data frame reports the AICc for all of the models evaluated, sorted by best to worst. The converged
column indicates whether the model converged ("TRUE
" is good), and the boundary
column whether the model parameters are near the boundary (usually, negative or positive infinity; "FALSE
" is good).
See Also
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Calibrate a random forest model
Description
This function trains a random forest model. It identifies the optimal number of trees and value for mtry
(number of variables sampled as candidates at each split) using out-of-bag error (OOB). The number of trees in each candidate model is set by the user with argument numTrees
. The number of predictors to test per split, mtry
, is found by exploring a range of values. If the response (y
) is a factor, the starting value for mtry
is max(1, floor(p / 3))
, where p
is the number of predictors. If the response is not a factor, the starting value is max(1, floor(sqrt(p)))
. Values ymtryIncrement
argument until the total number of predictors is used. See ranger
for more details.
The output of the function is any or all of: a table with out-of-bag (OOB) error of evaluated models; all evaluated models; and/or the single model with the lowest OOB error.
Usage
trainRF(
data,
resp = names(data)[1],
preds = names(data)[2:ncol(data)],
numTrees = c(250, 500, 750, 1000),
mtryIncrement = 2,
w = TRUE,
binary = TRUE,
out = "model",
cores = 1,
verbose = FALSE,
...
)
Arguments
data |
Data frame. |
resp |
Response variable. This is either the name of the column in |
preds |
Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in |
numTrees |
Vector of number of trees to grow. All possible combinations of |
mtryIncrement |
Positive integer (default is 2). Number of predictors to add to |
w |
Weights. For random forests, weights are simply used as relative probabilities of selecting a row in
|
binary |
Logical. If |
out |
Character vector. One or more values:
|
cores |
Number of cores to use. Default is 1. If you have issues when |
verbose |
Logical. If |
... |
Arguments to pass to |
Value
The object that is returned depends on the value of the out
argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these.
See Also
Examples
# NB: The examples below show a very basic modeling workflow. They have been
# designed to work fast, not produce accurate, defensible models. They can
# take a few minutes to run.
library(mgcv)
library(sf)
library(terra)
set.seed(123)
### setup data
##############
# environmental rasters
rastFile <- system.file('extdata/madClim.tif', package='enmSdmX')
madClim <- rast(rastFile)
# coordinate reference system
wgs84 <- getCRS('WGS84')
# lemur occurrence data
data(lemurs)
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84)
occs <- elimCellDuplicates(occs, madClim)
occEnv <- extract(madClim, occs, ID = FALSE)
occEnv <- occEnv[complete.cases(occEnv), ]
# create 10000 background sites (or as many as raster can support)
bgEnv <- terra::spatSample(madClim, 20000)
bgEnv <- bgEnv[complete.cases(bgEnv), ]
bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ]
# collate occurrences and background sites
presBg <- data.frame(
presBg = c(
rep(1, nrow(occEnv)),
rep(0, nrow(bgEnv))
)
)
env <- rbind(occEnv, bgEnv)
env <- cbind(presBg, env)
predictors <- c('bio1', 'bio12')
### calibrate models
####################
# Note that all of the trainXYZ functions can made to go faster using the
# "cores" argument (set to just 1, by default). The examples below will not
# go too much faster using more cores because they are simplified, but
# you can try!
cores <- 1
# MaxEnt
mx <- trainMaxEnt(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# MaxNet
mn <- trainMaxNet(
data = env,
resp = 'presBg',
preds = predictors,
regMult = 1, # too few values for reliable model, but fast
verbose = TRUE,
cores = cores
)
# generalized linear model (GLM)
gl <- trainGLM(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
verbose = TRUE,
cores = cores
)
# generalized additive model (GAM)
ga <- trainGAM(
data = env,
resp = 'presBg',
preds = predictors,
verbose = TRUE,
cores = cores
)
# natural splines
ns <- trainNS(
data = env,
resp = 'presBg',
preds = predictors,
scale = TRUE, # automatic scaling of predictors
df = 1:2, # too few values for reliable model(?)
verbose = TRUE,
cores = cores
)
# boosted regression trees
envSub <- env[1:1049, ] # subsetting data to run faster
brt <- trainBRT(
data = envSub,
resp = 'presBg',
preds = predictors,
learningRate = 0.001, # too few values for reliable model(?)
treeComplexity = c(2, 3), # too few values for reliable model, but fast
minTrees = 1200, # minimum trees for reliable model(?), but fast
maxTrees = 1200, # too small for reliable model(?), but fast
tryBy = 'treeComplexity',
anyway = TRUE, # return models that did not converge
verbose = TRUE,
cores = cores
)
# random forests
rf <- trainRF(
data = env,
resp = 'presBg',
preds = predictors,
numTrees = c(100, 500), # using at least 500 recommended, but fast!
verbose = TRUE,
cores = cores
)
### make maps of models
#######################
# NB We do not have to scale rasters before predicting GLMs and NSs because we
# used the `scale = TRUE` argument in trainGLM() and trainNS().
mxMap <- predictEnmSdm(mx, madClim)
mnMap <- predictEnmSdm(mn, madClim)
glMap <- predictEnmSdm(gl, madClim)
gaMap <- predictEnmSdm(ga, madClim)
nsMap <- predictEnmSdm(ns, madClim)
brtMap <- predictEnmSdm(brt, madClim)
rfMap <- predictEnmSdm(rf, madClim)
maps <- c(
mxMap,
mnMap,
glMap,
gaMap,
nsMap,
brtMap,
rfMap
)
names(maps) <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NSs', 'BRTs', 'RFs')
fun <- function() plot(occs, col='black', pch=3, add=TRUE)
plot(maps, fun = fun, nc = 4)
### compare model responses to BIO12 (mean annual precipitation)
################################################################
# make a data frame holding all other variables at mean across occurrences,
# varying only BIO12
occEnvMeans <- colMeans(occEnv, na.rm=TRUE)
occEnvMeans <- rbind(occEnvMeans)
occEnvMeans <- as.data.frame(occEnvMeans)
climFrame <- occEnvMeans[rep(1, 100), ]
rownames(climFrame) <- NULL
minBio12 <- min(env$bio12)
maxBio12 <- max(env$bio12)
climFrame$bio12 <- seq(minBio12, maxBio12, length.out=100)
predMx <- predictEnmSdm(mx, climFrame)
predMn <- predictEnmSdm(mn, climFrame)
predGl <- predictEnmSdm(gl, climFrame)
predGa <- predictEnmSdm(ga, climFrame)
predNat <- predictEnmSdm(ns, climFrame)
predBrt <- predictEnmSdm(brt, climFrame)
predRf <- predictEnmSdm(rf, climFrame)
plot(climFrame$bio12, predMx,
xlab='BIO12', ylab='Prediction', type='l', ylim=c(0, 1))
lines(climFrame$bio12, predMn, lty='solid', col='red')
lines(climFrame$bio12, predGl, lty='dotted', col='blue')
lines(climFrame$bio12, predGa, lty='dashed', col='green')
lines(climFrame$bio12, predNat, lty=4, col='purple')
lines(climFrame$bio12, predBrt, lty=5, col='orange')
lines(climFrame$bio12, predRf, lty=6, col='cyan')
legend(
'topleft',
inset = 0.01,
legend = c(
'MaxEnt',
'MaxNet',
'GLM',
'GAM',
'NS',
'BRT',
'RF'
),
lty = c(1, 1:6),
col = c(
'black',
'red',
'blue',
'green',
'purple',
'orange',
'cyan'
),
bg = 'white'
)
### compare response curves ###
###############################
modelNames <- c('MaxEnt', 'MaxNet', 'GLM', 'GAM', 'NS', 'BRT', 'RF')
responses <- responseCurves(
models = list(mx, mn, gl, ga, ns, brt, rf),
env = bgEnv,
ref = occEnv,
vars = predictors,
modelNames = modelNames
)
print(responses)
Troubleshooting parallel operations
Description
This is a guide to solving issues with running functions that can use more than one core. This includes the train
XYZ functions, bioticVelocity
, and predictEnmSdm
. Each of these function has the argument cores
. By default, the value of cores
is 1, so the function will use only one core. By setting this higher, you can use more cores on your machine. However, occasionally you will run into the error:
Error in checkForRemoteErrors(lapply(cl, recvResult)) :
2 nodes produced errors; first error: object '.doSnowGlobals' not found
This means that the worker "nodes" (different instances of R
started by the function to run in parallel) cannot find the doParallel package, even if it is installed on your system.
There are several solutions to this issue. One of them may work for you, and none are inherent to enmSdmX, as far as I can tell.
Anti-virus is blocking R
Strangely enough, running R in parallel sometimes looks like you are accessing the internet to anti-virus software. So, it may block access to other instances of R. You will have to do some surgery on your anti-virus software settings to find where to change this.
Your R packages are not stored in the "traditional" place
R has a default directory where packages are stored on any system. If your packages are stored in a different place, worker nodes may not be able to find them if you use setwd
to change the working directory. I do not know if you have to set the working directory back to the default for your system, or if you have to change it to the folder that contains the folder where your R packages reside (for me, they are the same directory). You can see what your current working directory is using getwd
. RStudio will often change this directory automatically.
So, if you get this error, try using setwd
to set your working directory to the default one for your system, or to the folder that contains the folder that contains your packages.
Let me know
I'm always game to help you track down your problems (with this package, not necessarily in general). The best way is to create an issue on GitHub.
Exorcise your computer
Not responsible for damage to your computer.
Proximity-based weighting for occurrences to correct for spatial bias
Description
This function calculates weights for points based on proximity to other points and the distance of spatial autocorrelation.
Usage
weightByDist(x, maxDist, alpha = 1)
Arguments
x |
A spatial points object of class |
maxDist |
Maximum distance beyond which a two neighboring points are assumed to have no effect on one another for calculation of weights. |
alpha |
Scaling parameter (see equations below). |
Details
Weights can be used, for example, to account for spatial bias in the manner in which the points were observed. Weighting is calculated on the assumption that if two points fell exactly on top of one another, they should each have a weight of 1/2. If three points had the exact same coordinates, then their weights should be 1/3, and so on. Increasing distance between points should increase their weight, up to the distance at which there is no "significant" spatial autocorrelation, beyond which a point should have a weight of 1. This distance needs to be supplied by the user, as it will depend on the intended use of the weights. The distance can be calculated from "standard" metrics of spatial autocorrelation (e.g., a variogram), or on the basis of knowledge of the system (e.g., maximum dispersal distance of an organism).
For a given point i
, the weight is defined as
w_i = 1 / (1 + \epsilon)
where
\epsilon = \sum_{n=1}^{N}((1 - d_n)/d_sac)^\alpha
in which N
is the total number of points closer than the maximum distance (d_sac
) of point i
, and d_n
the distance between focal point i
and point n
. \alpha
is a weighting factor. By default, this is set to 1, but can be changed by the user to augment or diminish the effect that neighboring points have on the weight of a focal cell. When \alpha
is <1, neighboring points will reduce the weight of the focal point relative to the default, and when \alpha
is >1, they will have less effect relative to the default. When all neighboring points are at or beyond the maximum distance of spatial autocorrelation, then the focal point gets a weight w_i
of 1. When at least neighboring one point is less than this distance away, the weight of the focal point will be >0 but <1.
Value
A numeric vector of weights.
Examples
library(sf)
# lemur occurrence data
data(lemurs)
wgs84 <- getCRS('WGS84')
occs <- lemurs[lemurs$species == 'Eulemur fulvus', ]
occs <- sf::st_as_sf(occs, coords=c('longitude', 'latitude'), crs=wgs84)
# weights
maxDist <- 30000 # in meters, for this example
w <- weightByDist(occs, maxDist)
# plot
plot(st_geometry(occs), cex=5 * w, main='point size ~ weight')
plot(st_geometry(mad0), col='gainsboro', border='gray70', add=TRUE)
plot(st_geometry(occs), cex=5 * w, add=TRUE)