Title: | Spatial Methods and Indices |
Version: | 2.2.1 |
Author: | Gudrun Carl [cre, aut], Ingolf Kuehn [aut], Sam Levin [aut] |
Maintainer: | Sam Levin <levisc8@gmail.com> |
URL: | https://github.com/levisc8/spind |
BugReports: | https://github.com/levisc8/spind/issues |
Description: | Functions for spatial methods based on generalized estimating equations (GEE) and wavelet-revised methods (WRM), functions for scaling by wavelet multiresolution regression (WMRR), conducting multi-model inference, and stepwise model selection. Further, contains functions for spatially corrected model accuracy measures. |
Depends: | R (≥ 3.0.0) |
Imports: | gee (≥ 4.13.19), geepack (≥ 1.2.1), MASS (≥ 7.3.49), splancs (≥ 2.1.40), lattice (≥ 0.20.35), waveslim (≥ 1.7.5), rje (≥ 1.9), stringr (≥ 1.3.1), ggplot2 (≥ 3.0.0), RColorBrewer (≥ 1.1.2), rlang(≥ 0.2.1) |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.1 |
Suggests: | knitr (≥ 1.15.1), rmarkdown (≥ 1.9), gridExtra (≥ 2.3), testthat (≥ 2.0.0), PresenceAbsence (≥ 1.1.9), sp (≥ 1.2.7), covr (≥ 3.1.0) |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2020-10-30 14:12:33 UTC; sl13sise |
Repository: | CRAN |
Date/Publication: | 2020-10-30 16:50:02 UTC |
GEE (Generalized Estimating Equations)
Description
GEE
provides GEE-based methods from the packages gee and geepack
to account for spatial autocorrelation in multiple linear regressions
Usage
GEE(
formula,
family,
data,
coord,
corstr = "fixed",
cluster = 3,
moran.params = list(),
plot = FALSE,
scale.fix = FALSE,
customize_plot = NULL
)
## S3 method for class 'GEE'
plot(x, ...)
## S3 method for class 'GEE'
predict(object, newdata, ...)
## S3 method for class 'GEE'
summary(object, ..., printAutoCorPars = TRUE)
Arguments
formula |
Model formula. Variable names must match variables in |
family |
|
data |
A data frame with variable names that match the variables
specified in |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
corstr |
Expected autocorrelation structure:
|
cluster |
Cluster size for cluster models
|
moran.params |
A list of parameters for calculating Moran's I.
|
plot |
A logical value indicating whether autocorrelation of
residuals should be plotted. NOW DEPRECATED in favor of |
scale.fix |
A logical indicating whether or not the scale parameter should
be fixed. The default is |
customize_plot |
Additional plotting parameters passed to |
x |
An object of class |
... |
Not used. |
object |
An object of class |
newdata |
A data frame containing variables to base the predictions on. |
printAutoCorPars |
A logical indicating whether to print the working autocorrelation parameters |
Details
GEE can be used to fit linear models for response variables with
different distributions: gaussian
, binomial
, or poisson
.
As a spatial model, it is a generalized linear model in which the residuals
may be autocorrelated. It accounts for spatial (2-dimensional)
autocorrelation of the residuals in cases of regular gridded datasets
and returns corrected parameter estimates. The grid cells are assumed to be square.
Furthermore, this function requires that all predictor variables
be continuous.
Value
An object of class GEE
. This consists of a list with the
following elements:
call
Call
formula
Model formula
family
Family
coord
Coordinates used for the model
corstr
User-selected correlation structure
b
Estimate of regression parameters
s.e.
Standard errors of the estimates
z
Depending on the
family
, either a z or t valuep
p-values for each parameter estimate
scale
Scale parameter (dispersion parameter) of the distribution's variance
scale.fix
Logical indicating whether
scale
has fixed valuecluster
User-specified cluster size for clustered models
fitted
Fitted values from the model
resid
Normalized Pearson residuals
w.ac
Working autocorrelation parameters
Mat.ac
Working autocorrelation matrix
QIC
Quasi Information Criterion. See
qic.calc
for further detailsQLik
Quasi-likelihood. See
qic.calc
for further detailsplot
Logical value indicating whether autocorrelation should be plotted
moran.params
Parameters for calculating Moran's I
v2
Parameter variance of the
GEE
modelvar.naive
Parameter variance of the
independence
modelac.glm
Autocorrelation of GLM residuals
ac.gee
Autocorrelation of GEE residuals
plot
An object of class
ggplot
containing information on the autocorrelation of residuals from the fittedGEE
and aGLM
Elements can be viewed using the summary.GEE
methods included in
the package.
Note
When using corstr = "fixed"
on large data sets, the function
may return an error, as the resulting variance-covariance matrix is too
large for R to handle. If this happens, one will have to use one of the
cluster models (i.e quadratic, exchangeable
).
Author(s)
Gudrun Carl, Sam Levin
References
Carl G & Kuehn I, 2007. Analyzing Spatial Autocorrelation in Species Distributions using Gaussian and Logit Models, Ecol. Model. 207, 159 - 170
Carey, V. J., 2006. Ported to R by Thomas Lumley (versions 3.13, 4.4, version 4.13)., B. R. gee: Generalized Estimation Equation solver. R package version 4.13-11.
Yan, J., 2004. geepack: Generalized Estimating Equation Package. R package version 0.2.10.
See Also
Examples
data(musdata)
coords<- musdata[,4:5]
## Not run:
mgee <- GEE(musculus ~ pollution + exposure,
family = "poisson",
data = musdata,
coord = coords,
corstr = "fixed",
scale.fix = FALSE)
summary(mgee, printAutoCorPars = TRUE)
pred <- predict(mgee, newdata = musdata)
library(ggplot2)
plot(mgee)
my_gee_plot <- mgee$plot
# move the legend to a new position
print(my_gee_plot + ggplot2::theme(legend.position = 'top'))
## End(Not run)
Wavelet-revised models (WRMs)
Description
A wavelet-based method to remove spatial autocorrelation in multiple linear regressions. Wavelet transforms are implemented using waveslim (Whitcher, 2005).
Usage
WRM(
formula,
family,
data,
coord,
level = 1,
wavelet = "haar",
wtrafo = "dwt",
b.ini = NULL,
pad = list(),
control = list(),
moran.params = list(),
plot = FALSE,
customize_plot = NULL
)
## S3 method for class 'WRM'
plot(x, ...)
## S3 method for class 'WRM'
summary(object, ...)
## S3 method for class 'WRM'
predict(object, newdata, sm = FALSE, newcoord = NA, ...)
Arguments
formula |
Model formula. Variable names must match variables in |
family |
|
data |
A data frame with variable names that match the variables specified in |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
level |
An integer specifying the degree of wavelet decomposition
|
wavelet |
Name of wavelet family. |
wtrafo |
Type of wavelet transform. Either |
b.ini |
Initial parameter values. Default is NULL. |
pad |
A list of parameters for padding wavelet coefficients.
|
control |
a list of parameters for controlling the fitting process.
|
moran.params |
A list of parameters for calculating Moran's I.
|
plot |
A logical value indicating whether to plot autocorrelation of
residuals by distance bin. NOW DEPRECATED in favor of |
customize_plot |
Additional plotting parameters passed to |
x |
An object of class |
... |
Not used |
object |
An object of class |
newdata |
A data frame containing variables used to make predictions. |
sm |
Logical. Should part of smooth components be included? |
newcoord |
New coordinates corresponding to observations in |
Details
WRM can be used to fit linear models for response vectors of different
distributions: gaussian
, binomial
, or poisson
.
As a spatial model, it is a generalized linear model in which the residuals
may be autocorrelated. It corrects for 2-dimensional residual
autocorrelation for regular gridded data sets using the wavelet
decomposition technique. The grid cells are assumed to be square.
Furthermore, this function requires that all predictor variables
be continuous.
Value
An object of class WRM
. This consists of a list with the
following elements:
call
Call
formula
Model formula
family
Family
coord
Coordinates used in the model
b
Estimate of regression parameters
s.e.
Standard errors
z
Depending on the
family
, either a z or t valuep
p-values
fitted
Fitted values from the model
resid
Pearson residuals
b.sm
Parameter estimates of neglected smooth part
fitted.sm
Fitted values of neglected smooth part
level
Selected level of wavelet decomposition
wavelet
Selected wavelet
wtrafo
Selected wavelet transformation
padzone
Selected padding zone expansion factor
padform
Selected matrix padding type
n.eff
Effective number of observations
AIC
Akaike information criterion
AICc
AIC score corrected for small sample sizes
LogLik
Log likelihood of the model
ac.glm
Autocorrelation of GLM residuals
ac.wrm
Autocorrelation of WRM residuals
b.ini
Initial parameter values
control
Control parameters for the fitting process
moran.params
Parameters for calculating Moran's I
pad
List of parameters for padding wavelet coefficients
plot
An object of class
ggplot
containing information on the autocorrelation of residuals from the fittedWRM
and aGLM
Note
For those interested in multimodel inference approaches, WRM
with
level = 1
is identical to mmiWMRR
with scale = 1
.
Author(s)
Gudrun Carl, Sam Levin
References
Carl, G., Kuehn, I. (2010): A wavelet-based extension of generalized linear models to remove the effect of spatial autocorrelation. Geographical Analysis 42 (3), 323 - 337
Whitcher, B. (2005) Waveslim: basic wavelet routines for one-, two- and three-dimensional signal processing. R package version 1.5.
See Also
mmiWMRR
, predict.WRM
, summary.WRM
,
aic.calc
Examples
data(musdata)
coords <- musdata[,4:5]
## Not run:
mwrm <- WRM(musculus ~ pollution + exposure,
family = "poisson",
data = musdata,
coord = coords,
level = 1)
pred <- predict(mwrm, newdata = musdata)
summary(mwrm)
plot(mwrm)
library(ggplot2)
my_wrm_plot <- mwrm$plot
# increase axis text size
print(my_wrm_plot + ggplot2::theme(axis.text = element_text(size = 15)))
## End(Not run)
Spatial autocorrelation diagnostics
Description
A function for calculating spatial autocorrelation using Moran's I.
Usage
acfft(coord, f, lim1 = 1, lim2 = 2, dmax = 10)
Arguments
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
f |
A vector which is the same length as |
lim1 |
Lower bound for first bin. Default is 1 |
lim2 |
Upper bound for first bin. Default is 2 |
dmax |
Number of distance bins to examine. Bins are formed by annuli of gradually increasing radii. Default is 10. |
Value
A vector of Moran's I values for each distance bin.
Author(s)
Gudrun Carl
Examples
data(musdata)
coords <- musdata[ ,4:5]
mglm <- glm(musculus ~ pollution + exposure, "poisson", musdata)
ac <- acfft(coords, resid(mglm, type = "pearson"), lim1 = 0, lim2 = 1)
ac
Adjusted actual values
Description
Adjusts actual presence/absence data based on the autocorrelation in the predictions of a model. The function will optionally plot results of model predictions, un-modified actual presence/absence, and adjusted values.
Usage
adjusted.actuals(data, coord, plot.maps = FALSE, color.maps = FALSE)
Arguments
data |
a dataframe or matrix containing actual presence/absence (binary, 0 or 1) values in 1st column and predicted values (numeric between 0 and 1) in 2nd column. |
coord |
a matrix of two columns of the same length providing integer,
consecutively numbered coordinates for each occurrence and prediction in
|
plot.maps |
A logical indicating whether maps should be plotted. Default is FALSE. |
color.maps |
A logical value. If |
Value
A vector of adjusted actual values.
Author(s)
Gudrun Carl
Examples
data(hook)
data <- hook[ ,1:2]
coord <- hook[ ,3:4]
aa <- adjusted.actuals(data, coord, plot.maps = TRUE)
Akaike Information Criterion with correction for sample size
Description
Calculates AIC and AICc
Usage
aic.calc(formula, family, data, mu, n.eff = NULL)
Arguments
formula |
A model formula |
family |
Family used to fit the model. |
data |
A data frame |
mu |
Fitted values from a model |
n.eff |
Effective number of observations. Default is NULL |
Value
A list with the following components
loglik
Log likelihood of the model
df
Degrees of freedom
AIC
AIC score for the specified model
AICc
AIC score corrected for small sample sizes
Author(s)
Gudrun Carl, Sam Levin
Examples
data(musdata)
coords <- musdata[ ,4:5]
mglm <- glm(musculus ~ pollution + exposure, "poisson", musdata)
aic <- aic.calc(musculus ~ pollution + exposure, "poisson", musdata,
mglm$fitted)
aic$AIC
Carlina data set
Description
A data frame containing simulated count data for the thistle, Carlina horrida.
Usage
carlinadata
Format
A data frame with 961 rows and 5 columns
- carlina.horrida
integer - Simulated count data
- aridity
numeric - Simulated aridity index values. This variable has high spatial autocorrelation values.
- land.use
numeric - Simulated land use intensity. This variable has no spatial autocorrelation.
- x
integer - x-coordinates for each grid cell
- y
integer - y-coordinates for each grid cell
Plot wavelet variance/covariance
Description
Plots the wavelet variance or covariance for the specified formula. The scale-dependent results are graphically displayed.
Usage
covar.plot(
formula,
data,
coord,
wavelet = "haar",
wtrafo = "dwt",
plot = "covar",
customize_plot = NULL
)
Arguments
formula |
With specified notation according to names in data frame. |
data |
Data frame. |
coord |
A matrix of 2 columns with corresponding x,y-coordinates which have to be integer. |
wavelet |
Type of wavelet: |
wtrafo |
Type of wavelet transform: |
plot |
Either |
customize_plot |
Additional plotting parameters passed to |
Details
Each variable or pair of variables in formula
is passed to wavevar
or
wavecovar
internally, and the result is plotted as a function of level
.
Value
A list containing
1. result
= a vector of results.
2. plot
= a ggplot
object
Author(s)
Gudrun Carl
See Also
Examples
data(carlinadata)
coords <- carlinadata[,4:5]
covariance <- covar.plot(carlina.horrida ~ aridity + land.use - 1,
data = carlinadata,
coord = coords,
wavelet = "d4",
wtrafo = 'modwt',
plot = 'covar')
covariance$plot
covariance$result
variance <- covar.plot(carlina.horrida ~ aridity + land.use - 1,
data = carlinadata,
coord = coords,
wavelet = "d4",
wtrafo = 'modwt',
plot = 'var')
variance$plot
variance$result
Hook data set
Description
A data frame containing actual presence absence data and predicted probability of occurrence values.
Usage
hook
Format
A data frame with 100 rows and 4 columns
- actuals
integer - Presence/absence records
- predictions
numeric - predicted probabilities of occurrence
- x
integer - x-coordinates for each grid cell
- y
integer - y-coordinates for each grid cell
Multi-model inference for GEE models
Description
mmiGEE is a multimodel inference approach evaluating the relative
importance of predictors used in GEE
.
@details It performs automatically generated model selection and creates a model selection table according to the approach of multi-model inference (Burnham & Anderson, 2002). QIC is used to obtain model selection weights and to rank the models. Moreover, mmiGEE calculates relative variable importance of a given model. Finally, this function requires that all predictor variables be continuous.
Usage
mmiGEE(object, data, trace = FALSE)
Arguments
object |
A model of class |
data |
A data frame or set of vectors of equal length. |
trace |
A logical indicating whether or not to print results to console. |
Details
Calculates the relative importance of each variable
using multi-model inference methods in a Generalized Estimating Equations
framework implemented in GEE
.
Value
mmiGEE
returns a list containing the following elements
result
A matrix containing slopes, degrees of freedom, quasilikelihood, QIC, delta, and weight values for the set of candidate models. The models are ranked by QIC.
rvi
A vector containing the relative importance of each variable in the regression.
Author(s)
Gudrun Carl, Sam Levin
References
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multimodel inference. Springer, New York.
Carl G & Kuehn I, 2007. Analyzing Spatial Autocorrelation in Species Distributions using Gaussian and Logit Models, Ecol. Model. 207, 159 - 170
See Also
Examples
# data (for demonstration only)
library(MASS)
data(birthwt)
# impose an artificial (not fully appropriate) grid structure
x <- rep(1:14, 14)
y <- as.integer(gl(14, 14))
coords <- cbind(x[-(190:196)], y[-(190:196)])
## Not run:
formula <- formula(low ~ race + smoke + bwt)
mgee <- GEE(formula,
family = "gaussian",
data = birthwt,
coord = coords,
corstr = "fixed",
scale.fix = TRUE)
mmi <- mmiGEE(mgee, birthwt)
## End(Not run)
Multi-model inference for wavelet multiresolution regression
Description
mmiWMRR is a multimodel inference approach evaluating the relative
importance of predictors used in scaleWMRR
.
Usage
mmiWMRR(object, data, scale, detail = TRUE, trace = FALSE)
Arguments
object |
A model of class |
data |
Data frame. |
scale |
0 or higher integers possible (limit depends on sample size).
|
detail |
Remove smooth wavelets? If |
trace |
Logical value indicating whether to print results to console. |
Details
It performs automatically
generated model selection and creates a model
selection table according to the approach of multi-model inference
(Burnham & Anderson, 2002). The analysis is carried out for scale-specific
regressions (i.e. where scaleWMRR
can be used). AIC is
used to obtain model
selection weights and to rank the models.
Furthermore, this function requires that all predictor variables
be continuous.
Value
mmiWMRR
returns a list containing the following elements
result
A matrix containing slopes, degrees of freedom, likelihood, AIC, delta, and weight values for the set of candidate models. The models are ranked by AIC.
level
An integer corresponding to scale
Author(s)
Gudrun Carl
References
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multimodel inference. Springer, New York.
Carl G, Doktor D, Schweiger O, Kuehn I (2016) Assessing relative variable importance across different spatial scales: a two-dimensional wavelet analysis. Journal of Biogeography 43: 2502-2512.
See Also
aic.calc
, rvi.plot
,
MuMIn, WRM
Examples
data(carlinadata)
coords <- carlinadata[ ,4:5]
## Not run:
wrm <- WRM(carlina.horrida ~ aridity + land.use,
family = "poisson",
data = carlinadata,
coord = coords,
level = 1,
wavelet = "d4")
mmi <- mmiWMRR(wrm,
data = carlinadata,
scale = 3,
detail = TRUE,
trace = FALSE)
## End(Not run)
Mus musculus data set
Description
A data frame containing simulated count data of a house mouse.
Usage
musdata
Format
A data frame with 400 rows and 5 columns
- musculus
integer - Simulated count data for Mus musculus
- pollution
numeric - Simulated variable that describes degree of pollution in corresponding grid cell
- exposure
numeric - Simulated variable that describes degree of exposure for each grid cell
- x
integer - x-coordinates for each grid cell
- y
integer - y-coordinates for each grid cell
Quasi-Information Criterion for Generalized Estimating Equations
Description
A function for calculating quasi-likelihood and Quasi-Information Criterion values based on the method of Hardin & Hilbe (2003).
Usage
qic.calc(formula, family, data, mu, var.robust, var.indep.naive)
Arguments
formula |
a model formula |
family |
|
data |
a data frame |
mu |
fitted values from a model |
var.robust |
variance of model parameters |
var.indep.naive |
naive variance of model parameters under the
|
Value
A list with the following components:
QIC
quasi-information criterion
loglik
quasi-likelihood
References
Hardin, J.W. & Hilbe, J.M. (2003) Generalized Estimating Equations. Chapman and Hall, New York.
Barnett et al. Methods in Ecology & Evolution 2010, 1, 15-24.
Relative Variable Importance
Description
Creates model selection tables, calculates and plots relative variable importance based on the scale level of a given model.
Usage
rvi.plot(
formula,
family,
data,
coord,
maxlevel,
detail = TRUE,
wavelet = "haar",
wtrafo = "dwt",
n.eff = NULL,
trace = FALSE,
customize_plot = NULL
)
Arguments
formula |
A model formula |
family |
|
data |
A data frame or set of vectors of equal length. |
coord |
X,Y coordinates for each observation. Coordinates should be consecutive integers. |
maxlevel |
An integer for maximum scale level |
detail |
Remove smooth wavelets? If |
wavelet |
Type of wavelet: |
wtrafo |
Type of wavelet transform: |
n.eff |
A numeric value of effective sample size |
trace |
Should R print progress updates to the console? Default is FALSE |
customize_plot |
Additional plotting parameters passed to |
Details
Calculates the relative importance of each variable
using multi-model inference methods in a wavelet multi-resolution regression
framework implemented in mmiWMRR
. The scale level dependent
results are then graphically displayed.
Value
A list containing
1. A matrix containing the relative importance of each variable in the regression at each value of the scale level.
2. A ggplot
object containing a plot of the relative
variable importance
Examples
data(carlinadata)
coords<- carlinadata[,4:5]
## Not run:
wrm <- WRM(carlina.horrida ~ aridity + land.use,
family = "poisson",
data = carlinadata,
coord = coords,
level = 1,
wavelet = "d4")
mmi <- mmiWMRR(wrm, data = carlinadata, scale = 3, detail = TRUE)
# Plot scale-dependent relative variable importance
rvi <- rvi.plot(carlina.horrida ~ aridity + land.use,
family = "poisson",
data = carlinadata,
coord = coords,
maxlevel = 4,
detail = TRUE,
wavelet = "d4")
rvi$plot
rvi$rvi
## End(Not run)
Scaling by wavelet multiresolution regression (WMRR)
Description
scaleWMRR performs a scale-specific regression based on a wavelet multiresolution analysis.
Usage
scaleWMRR(
formula,
family,
data,
coord,
scale = 1,
detail = TRUE,
wavelet = "haar",
wtrafo = "dwt",
b.ini = NULL,
pad = list(),
control = list(),
moran.params = list(),
trace = FALSE
)
Arguments
formula |
With specified notation according to names in data frame. |
family |
|
data |
Data frame. |
coord |
Corresponding coordinates which have to be integer. |
scale |
0 (which is equivalent to GLM) or higher integers possible (limit depends on sample size). |
detail |
Remove smooth wavelets? If |
wavelet |
Type of wavelet: |
wtrafo |
Type of wavelet transform: |
b.ini |
Initial parameter values. Default is |
pad |
A list of parameters for padding wavelet coefficients.
|
control |
A list of parameters for controlling the fitting process.
|
moran.params |
A list of parameters for calculating Moran's I.
|
trace |
A logical value indicating whether to print parameter estimates to the console |
Details
This function fits generalized linear models while taking the
two-dimensional grid structure of
datasets into account. The following error distributions (in
conjunction with appropriate link functions) are allowed: gaussian
,
binomial
, or poisson
. The model provides scale-specific
results for data sampled on a contiguous geographical area. The
dataset is assumed to be regular gridded and the grid cells are
assumed to be square. A function from the package 'waveslim' is used
for the wavelet transformations (Whitcher, 2005).
Furthermore, this function requires that all predictor variables
be continuous.
Value
scaleWMRR returns a list containing the following elements
call
Model call
b
Estimates of regression parameters
s.e.
Standard errors of the parameter estimates
z
Z values (or corresponding values for statistics)
p
p-values for each parameter estimate
df
Degrees of freedom
fitted
Fitted values
resid
Pearson residuals
converged
Logical value whether the procedure converged
trace
Logical. If TRUE:
ac.glm
Autocorrelation of glm.residualsac
Autocorrelation of wavelet.residuals
Author(s)
Gudrun Carl
References
Carl G, Doktor D, Schweiger O, Kuehn I (2016) Assessing relative variable importance across different spatial scales: a two-dimensional wavelet analysis. Journal of Biogeography 43: 2502-2512.
Whitcher, B. (2005) Waveslim: basic wavelet routines for one-, two- and three-dimensional signal processing. R package version 1.5.
See Also
waveslim,mra.2d
Examples
data(carlinadata)
coords <- carlinadata[ ,4:5]
## Not run:
# scaleWMRR at scale = 0 is equivalent to GLM
ms0 <- scaleWMRR(carlina.horrida ~ aridity + land.use,
family = "poisson",
data = carlinadata,
coord = coords,
scale = 0,
trace = TRUE)
# scale-specific regressions for detail components
ms1 <- scaleWMRR(carlina.horrida ~ aridity + land.use,
family = "poisson",
data = carlinadata,
coord = coords,
scale = 1,
trace = TRUE)
ms2 <- scaleWMRR(carlina.horrida ~ aridity + land.use,
family = "poisson",
data = carlinadata,
coord = coords,
scale = 2,
trace = TRUE)
ms3<- scaleWMRR(carlina.horrida ~ aridity + land.use,
family = "poisson",
data = carlinadata,
coord = coords,
scale = 3,
trace = TRUE)
## End(Not run)
spind: Spatial Methods and Indices
Description
The spind
package provides convenient implementation of Generalized
estimating equations (GEEs) and Wavelet-revised models (WRMs)
in the context of spatial models. It also provides tools for
multi-model inference, stepwise model selection, and spatially
corrected model diagnostics. This help section provides brief descriptions
of each function and is organized by the type of model they apply to
or the scenarios in which you might use them. Of course, these are
recommendations - feel free to use them as you see fit. For a more
detailed description of the package and its functions, please see
the vignette Intro to spind (browseVignettes('spind')
).
GEEs
The GEE
function fits spatial models using a generalized
estimating equation and a set of gridded data. The package
also includes S3 methods for summary
and predict
so you can interact with these models in the same way you might
interact with a glm
or lm
.
WRMs
The WRM
function fits spatial models using a wavelet-revised
model and a set of gridded data. The package
also includes S3 methods for summary
and predict
so you can interact with these models in the same way you might
interact with a glm
or lm
. There are also a number
of helper functions that help you fine tune the fitting process
that are specific to WRMs. Please see the documentation for
WRM
for more details on those.
WRM
also has a few other features specific to it. For example, if
you are interested in viewing the variance or covariance of your variables
as a function of level
, covar.plot
is useful. upscale
will plot your matrices as a function of level
so you can examine the
effect of cluster resolution on your results.
Multi-model inference and stepwise model selection
spind
includes a couple of functions to help you find the best fit
for your data. The first two are multimodel inference tools specific to GEEs
and WRMs and are called mmiGEE
and mmiWMRR
. These generate
outputs very similar to those from the MuMIn
package. If you would
like to see how variable importance changes as a function of the scale
of
the WMRR, you can call rvi.plot
. This will generate a model selection
table for each degree of level
(from 1 to maxlevel
) and then
plot the weight of each variable as a function of level
.
spind
also includes a function for stepwise model selection that is
loosely based on step
and stepAIC
. step.spind
differs
from these in that it is specific to classes WRM
and GEE
. It
performs model selection using AIC or AICc for WRMs and QIC for GEEs.
Spatial indices of goodness of fit
Finally, spind
has a number of functions that provide spatially
corrected goodness of fit diagnostics for any type of model (i.e. they
are not specific to classes WRM
or GEE
). These first appeared
in spind v1.0
and have not been updated in this release. The first two are
divided into whether or not they are threshold dependent or not. Threshold
dependent metrics can be calculated using th.dep
and threshold
independent metrics can be calculated using th.indep
.
acfft
calculates spatial autocorrelation of residuals from a model
using Moran's I. You can set the number of distance bins you'd
like to examine using dmax
argument and the size of those bins
using lim1
and lim2
.
Conclusion
The vignette titled Intro to spind provides more information on these functions and some example workflows that will demonstrate them in greater depth than this document. Of course, if you have suggestions on how to improve this document or any of the other ones in here, please don't hesitate to contact us.
Stepwise model selection for GEEs and WRMs
Description
Stepwise model selection by AIC or AICc for WRMS and QIC for GEEs
Usage
step.spind(object, data, steps = NULL, trace = TRUE, AICc = FALSE)
Arguments
object |
A model of class |
data |
The data used to fit that model. |
steps |
Number of iterations the procedure should go through before concluding. The default is to use the number of variables as the number of iterations. |
trace |
Should R print progress updates and the final, best model found
to the console? Default is |
AICc |
Logical. In the case of model selection with |
Details
This function performs stepwise variable elimination for model comparison. Each iteration will try to find the best combination of predictors for a given number of variables based on AIC, AICc, or QIC, and then use that as the base model for the next iteration until there are no more variables to eliminate. Alternatively, it will terminate when reducing the number of variables while respecting the model hierarchy no longer produces lower information criterion values.
Value
A list with components model
and table
.
model
is always formula for the best model found by the procedure.
table
is always a data frame, but the content varies for each type of
model.
For WRM
's, the columns
returned are
Deleted.Vars
Variables retained from the previous iteration which were tested in the current iteration.LogLik
Log-likelihood of the model.AIC
AIC score for the model.AICc
AICc score for the model.
For GEE
s:
Deleted.Vars
Variables retained from the previous iteration which were tested in the current iteration.QIC
Quasi-information criterion of the model.Quasi.Lik
Quasi-likelihood of the model.
Note
Currently, the function only supports backwards model selection (i.e. one must start with a full model and subtract variables). Forward and both directions options may be added later.
Author(s)
Sam Levin
References
Hardin, J.W. & Hilbe, J.M. (2003) Generalized Estimating Equations. Chapman and Hall, New York.
See Also
qic.calc
, aic.calc
, add1
,
step
, stepAIC
Examples
# For demonstration only. We are artificially imposing a grid structure
# on data that is not actually spatial data
library(MASS)
data(birthwt)
x <- rep(1:14, 14)
y <- as.integer(gl(14, 14))
coords <- cbind(x[-(190:196)], y[-(190:196)])
## Not run:
formula <- formula(low ~ age + lwt + race + smoke + ftv + bwt)
mgee <- GEE(formula,
family = "gaussian",
data = birthwt,
coord = coords,
corstr = "fixed",
scale.fix = TRUE)
ss <- step.spind(mgee, birthwt)
best.mgee <- GEE(ss$model,
family = "gaussian",
data = birthwt,
coord = coords,
corstr = "fixed",
scale.fix = TRUE)
summary(best.mgee, printAutoCorPars = FALSE)
## End(Not run)
Spatial threshold-dependent accuracy measures
Description
Calculates spatially corrected, threshold-dependent metrics for an observational data set and model predictions (Kappa and confusion matrix)
Usage
th.dep(data, coord, thresh = 0.5, spatial = TRUE)
Arguments
data |
A data frame or matrix with two columns. The first column should contain actual presence/absence data (binary, 0 or 1) and the second column should contain model predictions of probability of occurrence (numeric, between 0 and 1). |
coord |
A data frame or matrix with two columns containing x,y coordinates for each actual and predicted value. Coordinates must be integer and consecutively numbered. |
thresh |
A cutoff value for classifying predictions as modeled presences or modeled absences. Default is 0.5. |
spatial |
A logical indicating whether spatially corrected indices (rather than classical indices) should be computed. |
Value
A list with the following components:
kappa
Kappa statistic
cm
Confusion matrix
sensitivity
Sensitivity
specificity
Specificity
actuals
Actual occurrence data or adjusted actual occurrence data
splitlevel.pred
Level splitting of predicted values
splitlevel.act
Level splitting of actuals or adjusted actuals
splitposition.pred
Position splitting of predicted values
splitposition.act
Position splitting of actuals or adjusted actuals
Author(s)
Gudrun Carl
References
Carl G, Kuehn I (2017) Spind: a package for computing spatially corrected accuracy measures. Ecography 40: 675-682. DOI: 10.1111/ecog.02593
See Also
Examples
data(hook)
data <- hook[ ,1:2]
coord <- hook[ ,3:4]
si1 <- th.dep(data, coord, spatial = TRUE)
si1$kappa
si1$cm
Spatial threshold-independent accuracy measures
Description
Calculates spatially corrected, threshold-independent metrics for an observational data set and model predictions (AUC, ROC, max-TSS)
Usage
th.indep(data, coord, spatial = TRUE, plot.ROC = FALSE, customize_plot = NULL)
Arguments
data |
A data frame or matrix with two columns. The first column should contain actual presence/absence data (binary, 0 or 1) and the second column should contain model predictions of probability of occurrence (numeric, between 0 and 1). |
coord |
A data frame or matrix with two columns containing x,y coordinates for each actual and predicted value. Coordinates must be integer and consecutively numbered. |
spatial |
A logical value indicating whether spatial corrected indices (rather than classical indices) should be computed. |
plot.ROC |
A logical indicating whether the ROC should be plotted. NOW DEPRECATED. |
customize_plot |
Additional plotting parameters passed to |
Value
A list with the following components:
AUC
Area under curve
opt.thresh
optimal threshold for maximum TSS value
TSS
Maximum TSS value
sensitivity
Sensitivity
Specificity
Specificity
AUC.plot
A
ggplot
object
Author(s)
Gudrun Carl
References
Carl G, Kuehn I (2017) Spind: a package for computing spatially corrected accuracy measures. Ecography 40: 675-682. DOI: 10.1111/ecog.02593
See Also
Examples
data(hook)
data <- hook[ ,1:2]
coord <- hook[ ,3:4]
si2 <- th.indep(data, coord, spatial = TRUE)
si2$AUC
si2$TSS
si2$opt.thresh
si2$plot
Upscaling of smooth components
Description
The analysis is based a wavelet multiresolution analysis using only smooth wavelet components. It is a 2D analysis taking the grid structure and provides scale-specific results for data sampled on a contiguous geographical area. The dataset is assumed to be regular gridded and the grid cells are assumed to be square. The scale-dependent results are graphically displayed.
Usage
upscale(
f,
coord,
wavelet = "haar",
wtrafo = "dwt",
pad = mean(f),
color.maps = FALSE
)
Arguments
f |
A vector. |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
wavelet |
Name of wavelet family. |
wtrafo |
Type of wavelet transform. Either |
pad |
A numeric value for padding the matrix into a bigger square. Default is set to mean(f). |
color.maps |
A logical value. If |
Value
A set of plots showing the matrix image at each value for
level
.
Author(s)
Gudrun Carl
Examples
data(carlinadata)
coords <- carlinadata[ ,4:5]
# Upscaling of smooth components
upscale(carlinadata$land.use, coord = coords)
Wavelet covariance analysis
Description
Calculates the wavelet covariance based on a wavelet multiresolution analysis.
Usage
wavecovar(f1, f2, coord, wavelet = "haar", wtrafo = "dwt")
Arguments
f1 |
A vector of length n. |
f2 |
A vector of length n. |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
wavelet |
Name of wavelet family. |
wtrafo |
Type of wavelet transform. Either |
Value
Wavelet covariance for f1
and f2
.
Author(s)
Gudrun Carl
See Also
waveslim, WRM
, covar.plot
,
scaleWMRR
Examples
data(carlinadata)
coords <- carlinadata[ ,4:5]
pc <- covar.plot(carlina.horrida ~ aridity + land.use,
data = carlinadata,
coord = coords,
wavelet = 'd4',
wtrafo = 'modwt',
plot = 'covar')
pc$plot
Wavelet variance analysis
Description
Calculates the wavelet variance based on a wavelet multiresolution analysis.
Usage
wavevar(f, coord, wavelet = "haar", wtrafo = "dwt")
Arguments
f |
A vector |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
wavelet |
Name of wavelet family. |
wtrafo |
Type of wavelet transform. Either |
Value
Wavelet variance for f
.
Author(s)
Gudrun Carl
See Also
waveslim, WRM
, covar.plot
,
scaleWMRR
Examples
data(carlinadata)
coords <- carlinadata[ ,4:5]
pv <- covar.plot(carlina.horrida ~ aridity + land.use,
data = carlinadata,
coord = coords,
wavelet = 'd4',
wtrafo = 'modwt',
plot = 'var')
pv$plot