Type: | Package |
Title: | Geographical Ecology and Conservation Knowledge Online |
Version: | 1.0.1 |
Depends: | R (≥ 4.1.0) |
Imports: | terra, sp, grDevices, graphics, stats, utils, geosphere, methods, red, biomod2, kernlab |
BugReports: | https://github.com/VascoBranco/gecko/issues |
URL: | https://github.com/VascoBranco/gecko |
Author: | Vasco V. Branco |
Maintainer: | Vasco V. Branco <vasco.branco@helsinki.fi> |
Description: | Includes a collection of geographical analysis functions aimed primarily at ecology and conservation science studies, allowing processing of both point and raster data. Now integrates SPECTRE (https://biodiversityresearch.org/spectre/), a dataset of global geospatial threat data, developed by the authors. |
License: | GPL-2 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Repository: | CRAN |
NeedsCompilation: | no |
Packaged: | 2024-12-17 12:25:53 UTC; witch-king-of-angmar |
Date/Publication: | 2024-12-17 15:10:10 UTC |
Uniformize raster layers.
Description
Crop raster layers to minimum size possible and uniformize NA
values across layers.
Usage
clean(layers)
Arguments
layers |
SpatRaster. As defined in package terra, see |
Details
Excludes all marginal rows and columns with only NA
values and change values to NA
if they are NA
in any of the layers.
Value
SpatRaster. Same class as layers.
Examples
region = gecko.data("layers")
terra::plot(clean(region))
Create a confusion matrix
Description
Create a confusion matrix for any multiclass set of predicted vs observed labels in a classification problem.
Usage
confusion.matrix(actual, predicted)
Arguments
actual |
dataframe. Original labels. |
predicted |
dataframe. Predicted labels. |
Value
data.frame. Predicted labels (rows) x Observed labels (cols).
Examples
x = c("FALSE", "TRUE", "FALSE", "TRUE", "TRUE")
y = c("TRUE", "TRUE", "TRUE", "TRUE", "TRUE")
confusion.matrix(x, y)
Create eastness layer.
Description
Create a layer depicting eastness based on an elevation layer.
Usage
create.east(layers)
Arguments
layers |
SpatRaster. A layer of elevation (a digital elevation model - DEM).
As defined in package terra, see |
Details
Using elevation, aspect can be calculated. Yet, it is a circular variable (0 = 360) and has to be converted to northness and eastness to be useful for modelling.
Value
SpatRaster.
Examples
region = gecko.data("layers")
terra::plot(create.east(region[[3]]))
Create latitude layer.
Description
Create a layer depicting latitude based on any other.
Usage
create.lat(layers)
Arguments
layers |
SpatRaster. As defined in package terra, see |
Details
Using latitude (and longitude) in models may help limiting the extrapolation of the predicted area much beyond known areas.
Value
SpatRaster.
Examples
region = gecko.data("layers")
terra::plot(create.lat(region[[1]]))
Create longitude layer.
Description
Create a layer depicting longitude based on any other.
Usage
create.long(layers)
Arguments
layers |
SpatRaster. As defined in package terra, see |
Details
Using longitude (and latitude) in models may help limiting the extrapolation of the predicted area much beyond known areas.
Value
SpatRaster.
Examples
region = gecko.data("layers")
terra::plot(create.long(region))
Create northness layer.
Description
Create a layer depicting northness based on an elevation layer.
Usage
create.north(layers)
Arguments
layers |
SpatRaster. A layer of elevation (a digital elevation model - DEM).
As defined in package terra, see |
Details
Using elevation, aspect can be calculated. Yet, it is a circular variable (0 = 360) and has to be converted to northness and eastness to be useful for modelling.
Value
SpatRaster.
Examples
region = gecko.data("layers")
terra::plot(create.north(region[[3]]))
Create distance layer.
Description
Creates a layer depicting distances to records using the minimum, average, distance to the minimum convex polygon or distance taking into account a cost surface.
Usage
distance(longlat, layers, type = "minimum")
Arguments
longlat |
matrix. Matrix of longitude and latitude or eastness and northness (two columns in this order) of species occurrence records. |
layers |
SpatRaster. As defined in package terra, see |
type |
character. text string indicating whether the output should be the "minimum", "average" or "mcp" distance to all records. "mcp" means the distance to the minimum convex polygon encompassing all records. |
Details
Using distance to records in models may help limiting the extrapolation of the predicted area much beyond known areas.
Value
SpatRaster.
Examples
userpar <- par(no.readonly = TRUE)
region = gecko.data("layers")
alt = region[[3]]
localities = gecko.data("records")
par(mfrow=c(3,2))
terra::plot(alt)
points(localities)
terra::plot(distance(localities, alt))
terra::plot(distance(localities, alt, type = "average"))
par(userpar)
Example data packaged with gecko
Description
Load data included in the package. This includes records, a matrix of longitude and latitude (two columns) occurrence records for Hogna maderiana (Walckenaer, 1837); range, a SpatRaster object, as defined by package terra, of the geographic range of Hogna maderiana (Walckenaer, 1837); layers, a SpatRaster object with layers representing the average annual temperature, total annual precipitation, altitude and landcover for Madeira Island (Fick & Hijmans 2017, Tuanmu & Jetz 2014); threat, a layer of mean fire occurence in Madeira between 2006 and 2016; and worldborders is a simplified version of the vector of world country borders created by Victor Cazalis.
Usage
gecko.data(data = NULL)
Arguments
data |
character. String of one of the data names mentioned in the description, e.g.: |
Source
This function is inspired by palmerpanguins::path_to_file()
which in turn is based on readxl::readxl_example()
.
Examples
## Not run:
gecko.data()
gecko.data("range")
## End(Not run)
Read GIS directory.
Description
Read directory where GIS files are stored.
Usage
gecko.getDir()
Details
Reads a txt file pointing to where the world GIS files are stored.
Setup GIS directory.
Description
Setup directory where GIS files are stored.
Usage
gecko.setDir(gisPath = NULL)
Arguments
gisPath |
Path to the directory where the gis files are stored. |
Details
Writes a txt file in the red directory allowing the package to always access the world GIS files directory.
Download worldclim files.
Description
Download the latest version of worldclim to your gecko work directory.
If you have not yet setup a work directory, it will be be setup as if running
gecko::gecko.setDir()
with gisPath = NULL
.
This is a large dataset that is prone to fail by timeout if downloaded
through R. Instead of using this function you can run gecko.setDir() (if you
haven't yet) and download the files at
https://geodata.ucdavis.edu/climate/worldclim/2_1/base/wc2.1_30s_bio.zip or
https://geodata.ucdavis.edu/climate/worldclim/2_1/base/wc2.1_10m_bio.zip.
Unzip their contents correspondingly to the folders "./worldclim/1 km" or
"./worldclim/10 km" inside the folder returned by gecko.getDir().
Usage
gecko.worldclim(res)
Arguments
res |
character. Specifies the resolution of environmental data used. |
Details
Reads a txt file pointing to where the world GIS files are stored.
Examples
## Not run:
gecko.worldclim("10 km")
## End(Not run)
Move records to closest non-NA cell.
Description
Identifies and moves presence records to cells with environmental values.
Usage
move(longlat, layers, buffer = 0)
Arguments
longlat |
matrix. Matrix of longitude and latitude or eastness and northness (two columns in this order) of species occurrence records. |
layers |
SpatRaster. As defined in package terra, see |
buffer |
numeric. Maximum distance in map units that a record will move. If 0 all |
Details
Often records are in coastal or other areas for which no environmental data is available. This function moves such records to the closest cells with data so that no information is lost during modelling.
Value
A matrix with new coordinate values.
Examples
region <- terra::rast(matrix(c(rep(NA,100), rep(1,100), rep(NA,100)), ncol = 15))
presences <- cbind(runif(100, 0, 0.55), runif(100, 0, 1))
terra::plot(region)
points(presences)
presences <- move(presences, region)
terra::plot(region)
points(presences)
Normalize raster.
Description
Normalize a raster file according to one three methods, 'standard', 'range' or 'rank'.
Usage
normalize(layer, method = "standard", filepath = NULL)
Arguments
layer |
SpatRaster. Object with a single layer as defined by package terra. |
method |
character. Specifying |
filepath |
character. Optional, specifies a path to the output file. |
Details
The three options, "standard" standardizes data to a mean = 0 and sd = 1, "range" standardizes to a range of 0 to 1, and "rank" similarly standardizes to a range of 0 to 1 but does so after ranking all points.
Value
A raster layer.
Examples
## Not run:
region = gecko.data("layers")[[1]]
ranked_region = normalize(region, method = "rank")
## End(Not run)
Detect outliers in a set of geographical coordinates
Description
This function generates pseudo-abscences from an input data.frame containing latitude and longitude coordinates by using environmental data and then uses both presences and pseudo-absences to train a SVM model used to flag possible outliers for a given species.
Usage
outliers.detect(
longlat,
training = NULL,
hi_res = TRUE,
crop = FALSE,
threshold = 0.05,
method = "all"
)
Arguments
longlat |
data.frame. With two columns containing latitude and longitude, describing the locations of a species, which may contain outliers. |
training |
data.frame. With the same formatting as |
hi_res |
logical. Specifies if 1 KM resolution environmental data should be used.
If |
crop |
logical. Indicates whether environmental data should be cropped to
an extent similar to what is given in |
threshold |
numeric. Value indicating the threshold for classifying
outliers in methods |
method |
A string specifying the outlier detection method. |
Details
Environmental data used is WorldClim and requires a long download, see
gecko::gecko.setDir()
This function is heavily based on the methods described in Liu et al. (2017).
There the authors describe SVM_pdSDM, a pseudo-SDM method similar to a
two-class presence only SVM that is capable of using pseudo-absence points,
implemented with the ksvm function in the R package kernlab.
It is suggested that, for each set of "n"
occurence
records, "2 * n"
pseudo-absences points are generated.
Whilst using it keep in mind works highlighting limitations such as such as
Meynard et al. (2019). See References section.
Value
list if method = "all"
, containing whether or not a given point
was classified as TRUE
or FALSE
along with the confusion matrix
for the training data. If method = "geo"
or
method = "env"
a data.frame is returned.
References
Liu, C., White, M. and Newell, G. (2017) ‘Detecting outliers in species distribution data’, Journal of Biogeography, 45(1), pp. 164–176. doi:10.1111/jbi.13122.
Meynard, C.N., Kaplan, D.M. and Leroy, B. (2019) ‘Detecting outliers in species distribution data: Some caveats and clarifications on a virtual species study’, Journal of Biogeography, 46(9), pp. 2141–2144. doi:10.1111/jbi.13626.
Examples
## Not run:
new_occurences = gecko.data("records")
old_occurences = data.frame(X = runif(10, -17.1, -17.05), Y = runif(10, 32.73, 32.76))
outliers.detect(new_occurences, old_occurences)
## End(Not run)
Visual detection of outliers.
Description
Draws plots of sites in geographical (longlat) and environmental (2-axis PCA) space.
Usage
outliers.visualize(longlat, layers)
Arguments
longlat |
matrix. Matrix of longitude and latitude or eastness and northness (two columns in this order) of species occurrence records. |
layers |
SpatRaster. As defined in package terra, see |
Details
Erroneous data sources or errors in transcriptions may introduce outliers that can be easily detected by looking at simple graphs of geographical or environmental space.
Value
data.frame. Contains coordinate values and distance to centroid in pca. Two plots are drawn for visual inspection. The environmental plot includes row numbers for easy identification of possible outliers.
Examples
localities = gecko.data("records")
region = gecko.data("layers")
outliers.visualize(localities, region[[1:3]])
Performance of model predictions
Description
Calculate the performance of a model through a comparison
between predicted and observed labels. Available metrics are accuracy
,
F1
and TSS
.
Usage
performance.metrics(actual, predicted, metric)
Arguments
actual |
dataframe. Same formatting as |
predicted |
dataframe. Same formatting as |
metric |
character. String specifying the metric used, one of |
Details
The F-score or F-measure (F1) is:
F1 = 2 \dfrac{Precision * Recall}{Precision + Recall}
, with
Precision = \dfrac{True Positive}{True Positive + False Positive}
Recall = \dfrac{True Positive}{True Positive + False Negative}
Accuracy is:
\dfrac{100 * (True Postives + True Negatives)}{True Postives + True Negatives + False Positives + False Negatives}
The Pierce's skill score (PSS), Bookmaker's Informedness (BM) or True Skill Statistic (TSS) is:
TSS = TPR + TNR - 1
,
with TPR
being the True Positive Rate, positives correctly labelled
as such and TNR
, the True Negative Rate, the rate of negatives correctly
labelled, such that:
TPR = \dfrac{True Positives}{True Positives + False Negatives}
TNR = \dfrac{True Negatives}{True Negatives + False Positives}
Take in consideration the fact that the F1 score is not a robust metric in datasets with class imbalances.
Value
numeric.
References
PSS: Peirce, C. S. (1884). The numerical measure of the success of predictions. Science, 4, 453–454.
Examples
observed = c("FALSE", "TRUE", "FALSE", "TRUE", "TRUE")
predicted = c("TRUE", "TRUE", "TRUE", "TRUE", "TRUE")
performance.metrics(observed, predicted, "TSS")
Reduce dimensionality of raster layers.
Description
Reduce the number of layers by either performing a PCA on them or by eliminating highly correlated ones.
Usage
reduce(layers, method = "pca", n = NULL, thres = NULL)
Arguments
layers |
SpatRaster. As defined in package terra, see |
method |
character. Either Principal Components Analysis ("pca", default) or Pearson's correlation ("cor"). |
n |
numeric. Number of layers to reduce to. |
thres |
numeric. Value for pairwise Pearson's correlation above which one of the layers (randomly selected) is eliminated. |
Details
Using a large number of explanatory variables in models with few records may lead to overfitting. This function allows to avoid it as much as possible. If both n and thres are given, n has priority. If method is not recognized and layers come from read function, only landcover is reduced by using only the dominating landuse of each cell.
Value
SpatRaster.
Get SPECTRE raster segments.
Description
Downloads SPECTRE segments according to a bounding box selection.
Usage
spectre.area(
index,
ext = c(-180, 180, -60, 90),
normalize = FALSE,
filepath = NULL
)
Arguments
index |
numeric. A vector of integers specifying the layers. Refer to the list. |
ext |
numeric or SpatExtent. A vector of |
normalize |
character or logical. Either logical on whether data should be normalized
for the given interval or a character specifying a type of normalization. Type
default to "standard". Check |
filepath |
character. An optional user defined path for the final output. If |
Value
SpatRaster.
Examples
## Not run:
regional_threats = spectre.area(3, terra::ext(-17.3,-16.6,32.6,32.9), normalize = FALSE)
terra::plot(regional_threats[[1]], main = "Human Density")
## End(Not run)
Get in text citations for SPECTRE layers
Description
Generate in-text citations for a selection of SPECTRE layers.
Usage
spectre.citations(index)
Arguments
index |
numeric. A vector of integers specifying the layers. Refer to the Details section. |
Details
The current layers in SPECTRE are:
-
MINING_AREA. Mining density based on the number of known mining properties (pre-operational, operational, and closed) in a 50-cell radius (1x1 km cells).
-
HAZARD_POTENTIAL. Number of significant hazards (earthquakes, volcanoes, landslides, floods, drought, cyclones) potentially affecting cells based on hazard frequency data.
-
HUMAN_DENSITY Continuous metric of population density.
-
BUILT_AREA Percentage metric indicating the built-up presence.
-
ROAD_DENSITY. Continuous metric of road density.
-
FOOTPRINT_PERC. Percentage metric indicating anthropogenic impacts on the environment.
-
IMPACT_AREA. Classification of land into very low impact areas (1), low impact areas (2) and non-low impact areas (3).
-
MODIF_AREA. Continuous 0-1 metric that reflects the proportion of a landscape that has been modified.
-
HUMAN_BIOMES. Classification of land cover into different anthropogenic biomes of differing pressure such as dense settlements, villages and cropland.
-
FIRE_OCCUR. Continuous metric of mean fire occurrence during the years of 2006 and 2016.
-
CROP_PERC_UNI. Percentage metric indicating the proportion of cropland in each cell.
-
CROP_PERC_IIASA. Percentage metric indicating the proportion of cropland in each cell.
-
LIVESTOCK_MASS. Estimated total amount of livestock wet biomass based on global livestock head counts.
-
FOREST_LOSS_PERC. Continuous -100 to 100 metric of forest tree cover loss between 2007 and 2017.
-
FOREST_TREND. Classification metric of 0 (no loss) or a discrete value from 1 to 17, representing loss (a stand-replacement disturbance or change from a forest to non-forest state) detected primarily in the year 2001-2019, respectively.
-
NPPCARBON_GRAM. Quantity of carbon needed to derive food and fiber products (HANPP).
-
NPPCARBON_PERC. HANNP as a percentage of local Net Primary Productivity.
-
LIGHT_MCDM2. Continuous simulated zenith radiance data.
-
FERTILIZER_LGHA. Continuous metric of kilograms of fertilizer used per hectare.
-
TEMP_TRENDS. Continuous metric of temperature trends, based on the linear regression coefficients of mean monthly temperature for the years of 1950 to 2019.
-
TEMP_SIGNIF. Continuous metric of temperature trend significance, the temperature trends divided by its standard error.
-
CLIM_EXTREME. Continuous metric calculated as whatever is the largest of the absolute of the trend coefficients of the months with the lowest or highest mean temperatures.
-
CLIM_VELOCITY. Continuous metric of the velocity of climate change, the ratio between TEMP_TRENDS and a local spatial gradient in mean temperature calculated as the slope of a plane fitted to the values of a 3x3 cell neighbourhood centered on each pixel.
-
ARIDITY_TREND. Continuous metric of aridity trends, based on the linear regression coefficients of aridity for the years of 1990 to 2019, i.e: MPET/(MPRE+1).
Value
list. Contains two elements, both characters: the first a single
character containing the in-text citations, the second a character of
length x
with the bibliographic citations.
Examples
sources = c(2,3)
out = spectre.citations(sources)
Get SPECTRE data from points.
Description
Downloads SPECTRE layer data according to a selection of points.
Usage
spectre.points(index, points)
Arguments
index |
numeric. A vector of integers specifying the layers. Refer to the documentation of
|
points |
data.frame or matrix. Containing point data coordinates, organized in longitude, latitude (longlat). |
Value
data.frame or matrix. Contains both the points given as well as their respective values for each layer specified.
Examples
## Not run:
localities = gecko.data("records")
local_threats = spectre.points(c(2,3), localities)
## End(Not run)
Download the SPECTRE template.
Description
Download the raster template for SPECTRE layers to your gecko work directory.
If you have not yet setup a work directory, it will be be setup as if running
gecko::gecko.setDir()
with gisPath = NULL
.
This is a large dataset that is prone to fail by timeout if downloaded
through R. Instead of using this function you can run gecko.setDir() (if you
haven't yet) and download the file at
https://github.com/VascoBranco/spectre.content/raw/main/spectre.template.zip.
Unzip its contents to a folder "./spectretemplate" inside the folder returned by gecko.getDir().
Usage
spectre.template()
Details
Reads a txt file pointing to where the world GIS files are stored.
Examples
## Not run:
spectre.template()
## End(Not run)
Make a raster layer SPECTRE compatible
Description
Transform a given raster object to the resolution, datum, projection and extent used in SPECTRE.
Usage
spectrify(layers, continuous = TRUE, filepath = NULL)
Arguments
layers |
SpatRaster. A raster object that you would like to be SPECTRE compatible. |
continuous |
logical. Whether the data present in |
filepath |
character. Optional file path to where the final raster layer
should be saved, in the format "folder/file.tif". If |
Value
SpatRaster.
Examples
## Not run:
# For the sake of demonstration we will transform our raster layer "range".
distribution = gecko.data("range")
standard_dist = spectrify(distribution)
terra::plot(standard_dist)
## End(Not run)
Split a dataset for model training
Description
Split a dataset for model training while keeping class representativity.
Usage
splitDataset(data, proportion)
Arguments
data |
dataframe. Containg some sort of classification data. The last column must contain the label data. |
proportion |
numeric. A value between 0 a 1 determining the proportion of the dataset split between training and testing. |
Value
list. First element is the train data, second element is the test data.
Examples
# Binary label case
my_data = data.frame(X = runif(20), Y = runif(20), Z = runif(20), Label =
c(rep("presence", 10), rep("outlier", 10)) )
splitDataset(my_data, 0.8)
# Multi label case
my_data = data.frame(X = runif(60), Y = runif(60), Z = runif(60), Label =
c(rep("A", 20), rep("B", 30), rep("C", 10)) )
splitDataset(my_data, 0.8)
Get a short summary of a given raster segment.
Description
Return a set of descriptive statistics of the given layer, either a specific one (minimum, q1, median, q3, maximum, median absolute deviation (mad), mean, standard deviation (sd)) or all of them.
Usage
stats(layer, plot = FALSE)
Arguments
layer |
SpatRaster. Raster object, as defined by package terra, with a single layer. |
plot |
logical. If TRUE, a histogram of raster values is drawn. |
Value
data.frame. If plot is TRUE, also outputs a histogram of the layer.
Examples
region = gecko.data("layers")
stats(region[[1]])
Spatial thinning of occurrence records.
Description
Thinning of records with minimum distances either absolute or relative to the species range.
Usage
thin(longlat, distance = 0.01, relative = TRUE, runs = 100)
Arguments
longlat |
matrix. Matrix of longitude and latitude or eastness and northness (two columns in this order) of species occurrence records. |
distance |
numeric. Distance either in relative terms (proportion of maximum distance between any two records) or in raster units. |
relative |
logical. If |
runs |
numeric. Number of runs |
Details
Clumped distribution records due to ease of accessibility of sites, emphasis of sampling on certain areas in the past, etc. may bias species distribution models. The algorithm used here eliminates records closer than a given distance to any other record. The choice of records to eliminate is random, so a number of runs are made and the one keeping more of the original records is chosen.
Value
A matrix of species occurrence records separated by at least the given distance.
Examples
userpar <- par(no.readonly = TRUE)
occ_points <- matrix(sample(100), ncol = 2)
par(mfrow=c(1,2))
graphics::plot(occ_points)
occ_points <- thin(occ_points, 0.1)
graphics::plot(occ_points)
par(userpar)