Title: | Spatiotemporal Autoregression Analyses for Large Data Sets |
Version: | 1.0.4 |
Description: | These tools were created to test map-scale hypotheses about trends in large remotely sensed data sets but any data with spatial and temporal variation can be analyzed. Tests are conducted using the PARTS method for analyzing spatially autocorrelated time series (Ives et al., 2021: <doi:10.1016/j.rse.2021.112678>). The method's unique approach can handle extremely large data sets that other spatiotemporal models cannot, while still appropriately accounting for spatial and temporal autocorrelation. This is done by partitioning the data into smaller chunks, analyzing chunks separately and then combining the separate analyses into a single, correlated test of the map-scale hypotheses. |
URL: | https://github.com/morrowcj/remotePARTS |
BugReports: | https://github.com/morrowcj/remotePARTS/issues |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
LazyData: | TRUE |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 4.0) |
Imports: | stats, geosphere (≥ 1.5.10), Rcpp (≥ 1.0.5), CompQuadForm, foreach, parallel, iterators, doParallel |
Suggests: | dplyr (≥ 1.0.0), data.table, knitr, rmarkdown, markdown, sqldf, devtools, ggplot2, reshape2, sf |
LinkingTo: | Rcpp, RcppEigen |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2023-09-15 15:59:14 UTC; morrowcj |
Author: | Clay Morrow |
Maintainer: | Clay Morrow <morrowcj@outlook.com> |
Repository: | CRAN |
Date/Publication: | 2023-09-15 19:52:13 UTC |
remotePARTS: Spatiotemporal Autoregression Analyses for Large Data Sets
Description
These tools were created to test map-scale hypotheses about trends in large remotely sensed data sets but any data with spatial and temporal variation can be analyzed. Tests are conducted using the PARTS method for analyzing spatially autocorrelated time series (Ives et al., 2021: doi:10.1016/j.rse.2021.112678). The method's unique approach can handle extremely large data sets that other spatiotemporal models cannot, while still appropriately accounting for spatial and temporal autocorrelation. This is done by partitioning the data into smaller chunks, analyzing chunks separately and then combining the separate analyses into a single, correlated test of the map-scale hypotheses.
Author(s)
Maintainer: Clay Morrow morrowcj@outlook.com (ORCID)
Authors:
Anthony Ives arives@wisc.edu (ORCID)
See Also
Useful links:
Report bugs at https://github.com/morrowcj/remotePARTS/issues
fit a parallel partitioned GLS
Description
fit a GLS model to a large data set by partitioning the data into smaller pieces (partitions) and processing these pieces individually and summarizing output across partitions to conduct hypothesis tests.
Usage
MC_GLSpart(
formula,
partmat,
formula0 = NULL,
part_FUN = "part_data",
distm_FUN = "distm_scaled",
covar_FUN = "covar_exp",
covar.pars = c(range = 0.1),
nugget = NA,
ncross = 6,
save.GLS = FALSE,
ncores = parallel::detectCores(logical = FALSE) - 1,
debug = FALSE,
...
)
MCGLS_partsummary(
MCpartGLS,
covar.pars = c(range = 0.1),
save.GLS = FALSE,
partsize
)
multicore_fitGLS_partition(
formula,
partmat,
formula0 = NULL,
part_FUN = "part_data",
distm_FUN = "distm_scaled",
covar_FUN = "covar_exp",
covar.pars = c(range = 0.1),
nugget = NA,
ncross = 6,
save.GLS = FALSE,
ncores = parallel::detectCores(logical = FALSE) - 1,
do.t.test = TRUE,
do.chisqr.test = TRUE,
debug = FALSE,
...
)
fitGLS_partition(
formula,
partmat,
formula0 = NULL,
part_FUN = "part_data",
distm_FUN = "distm_scaled",
covar_FUN = "covar_exp",
covar.pars = c(range = 0.1),
nugget = NA,
ncross = 6,
save.GLS = FALSE,
do.t.test = TRUE,
do.chisqr.test = TRUE,
progressbar = TRUE,
debug = FALSE,
ncores = NA,
parallel = TRUE,
...
)
part_data(index, formula, data, formula0 = NULL, coord.names = c("lng", "lat"))
part_csv(index, formula, file, formula0 = NULL, coord.names = c("lng", "lat"))
Arguments
formula |
a formula for the GLS model |
partmat |
a numeric partition matrix, with values containing indices of locations. |
formula0 |
an optional formula for the null GLS model |
part_FUN |
a function to partition individual data. See details for more information about requirements for this function. |
distm_FUN |
a function to calculate distances from a coordinate matrix |
covar_FUN |
a function to calculate covariances from a distance matrix |
covar.pars |
a named list of parameters passed to |
nugget |
a numeric fixed nugget component: if NA, the nugget is estimated for each partition |
ncross |
an integer indicating the number of partitions used to calculate cross-partition statistics |
save.GLS |
logical: should full GLS output be saved for each partition? |
ncores |
an optional integer indicating how many CPU threads to use for calculations. |
debug |
logical debug mode |
... |
arguments passed to |
MCpartGLS |
object resulting from MC_partGLS() |
partsize |
number of locations per partition |
do.t.test |
logical: should a t-test of the GLS coefficients be conducted? |
do.chisqr.test |
logical: should a correlated chi-squared test of the model fit be conducted? |
progressbar |
logical: should progress be tracked with a progress bar? |
parallel |
logical: should all calculations be done in parallel? See details for more information |
index |
a vector of pixels with which to subset the data |
data |
a data frame |
coord.names |
a vector containing names of spatial coordinate variables (x and y, respectively) |
file |
a text string indicating the csv file from which to read data |
Details
The function specified by part_FUN
is called internally to obtain
properly formatted subsets of the full data (i.e., partitions). Two functions
are provided in the remotePARTs
package for this purpose: part_data
and part_csv
. Both of these functions have required arguments that
must be specified through the call to fitGLS_partition
(via ...
).
Check each function's argument list and see "part_FUN
details" below
for more information.
partmat
is used to partition the data. partmat
must be a complete
matrix, without any missing or non-finite values. Columns of partmat
are
passed as the first argument part_FUN
to obtain data, which is then
passed to fitGLS
. Users are encouraged to use sample_partitions()
to obtain a valid partmat
.
The specific dimensions of partmat
can have a substantial effect on the
efficiency of fitGLS_partition
. For most systems, we do not recommend
fitting with partitions exceeding 3000 locations or pixels
(i.e., partmat(partsize = 3000, ...)
). Any larger, and the covariance
matrix inversions may become quite slow (or impossible for some machines).
It may help performance to use smaller even partitions of around 1000-2000
locations.
ncross
determines how many partitions are used to estimate cross-partition
statistics. All partitions, up to ncross
are compared with all others
in a pairwise fashion. There is no hard rule for setting mincross
. More
crosses will ensure convergence, but we believe that the default of 6
(10 total comparisons) should be sufficient for most moderate-sized maps
if 1500-3000 pixel partitions are used. This may require testing with each
individual dataset to determine at what point convergence occurs.
Covariance matrices for each partition are calculated with covar_FUN
from distances among points within the partition. Parameter values for
covar_FUN
are given by covar.pars
.
The distances among points are calculated with distm_FUN
.
distm_FUN
can be any function, modeled after geosphere::distm()
,
that satisfies both: 1) returns a distance matrix among points when a single
coordinate matrix is given as first argument; and 2) returns a matrix
containing distances between two coordinate matrices if given as the first and
second arguments.
If nugget = NA
, a ML nugget is obtained for each partition. Otherwise,
a fixed nugget is used for all partitions.
It is not required to use all partitions for cross-partition calculations, nor is it recommended to do so for most large data sets.
If progressbar = TRUE
a text progress bar shows the current status
of the calculations in the console.
Value
a "MC_partGLS", which is a precursor to a "partGLS" object
a "partGLS" object
"partGLS" object
fitGLS_partition
returns a list object of class "partGLS" which
contains at least the following elements:
- call
the function call
- GLS
an optional list of "remoteGLS" objects, one for each partition
- part
statistics calculated from each partition: see below for further details
- cross
statistics calculated from each pair of crossed partitions, determined by
ncross
: see below for further details- overall
summary statistics of the overall model: see below for further details
part
is a sub-list containing the following elements
- coefficients
a numeric matrix of GLS coefficients for each partition
- SEs
a numeric matrix of coefficient standard errors
- tstats
a numeric matrix of coefficient t-statstitics
- pvals_t
a numeric matrix of t-test pvalues
- nuggets
a numeric vector of nuggets for each partition
- covar.pars
covar.pars
input vector- modstats
a numeric matrix with rows corresponding to partitions and columns corresponding to log-likelihoods (
logLik
), sum of square error (SSE
), mean-squared error (MSE
), regression mean-square (MSR
), F-statistics (Fstat
), and p-values from F-tests (pval_F
)
cross
is a sub-list containing the following elements, which are use
to calculate the combined (across partitions) standard errors of the coefficient
estimates and statistical tests. See Ives et al. (2022).
- rcoefs
a numeric matrix of cross-partition correlations in the estimates of the coefficients
- rSSRs
a numeric vector of cross-partition correlations in the regression sum of squares
- rSSEs
a numeric vector of cross-partition correlations in the sum of squared errors
and overall
is a sub-list containing the elements
- coefficients
a numeric vector of the average coefficient estimates across all partitions
- rcoefficients
a numeric vector of the average cross-partition coefficient from across all crosses
- rSSR
the average cross-partition correlation in the regression sum of squares
- rSSE
the average cross-partition correlation in the sum of squared errors
- Fstat
the average f-statistic across partitions
- dfs
degrees of freedom to be used with partitioned GLS f-test
- partdims
dimensions of
partmat
- pval.chisqr
if
chisqr.test = TRUE
, a p-value for the correlated chi-squared test- t.test
if
do.t.test = TRUE
, a table with t-test results, including the coefficient estimates, standard errors, t-statistics, and p-values
part_data
and part_csv
both return a list with two elements:
- data
a dataframe, containing the data subset
- coords
a coordinate matrix for the subset
parallel implementation
In order to be efficient and account for different user situations, parallel
processing is available natively in fitGLS_partition
. There are a few
different specifications that will result in different behavior:
When parallel = TRUE
and ncores > 1
, all calculations are done
completely in parallel (via multicore_fitGLS_partition()
).
In this case, parallelization is implemented with the
parallel
, doParallel
, and foreach
packages. In this version,
all matrix operations are serialized on each worker but multiple operations
can occur simultaneously..
When parallel = FALSE
and ncores > 1
, then most calculations
are done on a single core but matrix opperations use multiple cores. In this
case, ncores
is passed to fitGLS. In this option, it is suggested
to not exceed the number of physical cores (not threads).
When ncores <= 1
, then the calculations are completely serialized
When ncores = NA
(the default), only one core is used.
In the parallel implementation of this function, a progress bar is not possible,
so progressbar
is ignored.
part_FUN
details
part_FUN
can be any function that satisfies the following criteria
1. the first argument of part_FUN
must accept an index of pixels by which
to subset the data;
2. part_FUN
must also accept formula
and formula0
from
fitGLS_partition
; and
3. the output of part_FUN
must be a list with at least the
following elements, which are passed to fitGLS
;
- data
a data frame containing all variables given by
formula
. Rows should correspond to pixels specified by the first argument- coords
a coordinate matrix or data frame. Rows should correspond to pixels specified by the first argument
Two functions that satisfy these criteria are provided by remotePARTS
:
part_data
and part_csv
.
part_data
uses an in-memory data frame (data
)
as a data source. part_csv
, instead reads data from a
csv file (file
), one partition at a time, for efficient memory usage.
part_csv
internally calls sqldf::read.csv.sql()
for fast and
efficient row extraction.
Both functions use index
to subset rows of data and formula
and
formula0
(optional) to determine which variables to select.
Both functions also use coord.names
to indicate which variables contain
spatial coordinates. The name of the x-coordinate column should always preceed
the y-coordinate column: c("x", "y")
.
Users are encouraged to write their own part_FUN
functions to meet their
needs. For example, one might be interested in using data stored in a raster
stack or any other file type. In this case, a user-defined part_FUN
function allows access to fitGLS_partition
without saving reformatted
copies of data.
References
Ives, A. R., L. Zhu, F. Wang, J. Zhu, C. J. Morrow, and V. C. Radeloff. in review. Statistical tests for non-independent partitions of large autocorrelated datasets. MethodsX.
See Also
Other partitionedGLS:
crosspart_GLS()
,
sample_partitions()
Other partitionedGLS:
crosspart_GLS()
,
sample_partitions()
Other partitionedGLS:
crosspart_GLS()
,
sample_partitions()
Examples
## read data
data(ndvi_AK10000)
df = ndvi_AK10000[seq_len(1000), ] # first 1000 rows
## create partition matrix
pm = sample_partitions(nrow(df), npart = 3)
## fit GLS with fixed nugget
partGLS = fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm,
data = df, nugget = 0, do.t.test = TRUE)
## hypothesis tests
chisqr(partGLS) # explanatory power of model
t.test(partGLS) # significance of predictors
## now with a numeric predictor
fitGLS_partition(formula = CLS_coef ~ lat, partmat = pm, data = df, nugget = 0)
## fit ML nugget for each partition (slow)
(partGLS.opt = fitGLS_partition(formula = CLS_coef ~ 0 + land, partmat = pm,
data = df, nugget = NA))
partGLS.opt$part$nuggets # ML nuggets
# Certain model structures may not be useful:
## 0 intercept with numeric predictor (produces NAs) and gives a warning in statistical tests
fitGLS_partition(formula = CLS_coef ~ 0 + lat, partmat = pm, data = df, nugget = 0)
## intercept-only, gives warning
fitGLS_partition(formula = CLS_coef ~ 1, partmat = pm, data = df, nugget = 0,
do.chisqr.test = FALSE)
## part_data examples
part_data(1:20, CLS_coef ~ 0 + land, data = ndvi_AK10000)
## part_csv examples - ## CAUTION: examples for part_csv() include manipulation side-effects:
# first, create a .csv file from ndviAK
data(ndvi_AK10000)
file.path = file.path(tempdir(), "ndviAK10000-remotePARTS.csv")
write.csv(ndvi_AK10000, file = file.path)
# build a partition from the first 30 pixels in the file
part_csv(1:20, formula = CLS_coef ~ 0 + land, file = file.path)
# now with a random 20 pixels
part_csv(sample(3000, 20), formula = CLS_coef ~ 0 + land, file = file.path)
# remove the example csv file from disk
file.remove(file.path)
calculate degrees of freedom for partitioned GLS
Description
calculate degrees of freedom for partitioned GLS
Usage
calc_dfpart(partsize, p, p0)
Arguments
partsize |
number of pixels in each partition |
p |
number of predictors in alternate model |
p0 |
number of parameters in null model |
Value
a named vector containing the first and second degrees of freedom ("df1" and "df2", respectively)
Check if a matrix is positive definite
Description
Check if a matrix is positive definite
Usage
check_posdef(M)
Arguments
M |
numeric matrix |
Details
check if a matrix is 1) square, 2) symmetric, and 3) positive definite
Value
returns a named logical vector with the following elements:
- sqr
logical: indicating whether
M
is square- sym
logical: indicating whether
M
is symmetric- posdef
logical: indicating whether
M
is positive-definitive
Examples
# distance matrix
M = distm_scaled(expand.grid(x = 1:3, y = 1:3))
# check if it is positive definitive
check_posdef(M)
# check if the covariance matrix is positive definitive
check_posdef(covar_exp(M, .1))
# non-symmetric matrix
check_posdef(matrix(1:9, 3, 3))
# non-square matrix
check_posdef(matrix(1:6, 3, 2))
Conduct a chi-squared test
Description
generic S3 method for a chi-squared test
Usage
chisqr(x, ...)
Arguments
x |
object on which to conduct the test |
... |
additional arguments |
Value
results of the chi-squared test (generic)
Conduct a chisqr test of "partGLS" object
Description
Conduct a correlated chi-square test on a partitioned GLS
Usage
## S3 method for class 'partGLS'
chisqr(x, ...)
Arguments
x |
"remoteGLS" object |
... |
additional arguments passed to print |
Value
a p-value for the correlated chisqr test
Tapered-spherical distance-based covariance function
Description
Tapered-spherical distance-based covariance function
Exponential distance-based covariance function
Exponential-power distance-based covariance function
Usage
covar_taper(d, theta, cor = NULL)
covar_exp(d, range)
covar_exppow(d, range, shape)
Arguments
d |
a numeric vector or matrix of distances |
theta |
distance beyond which covariances are forced to 0. |
cor |
optional correlation parameter. If included, the covariance is
subtracted from |
range |
range parameter |
shape |
shape parameter |
Details
covar_taper
calculates covariance v as follows:
if d <= theta
, then v = ((1 - (d/theta))^2) * (1 + (d/(2 * theta)))
if d > theta
, then v = 0
covar_exp
calculates covariance v as follows:
v = exp(-d/range)
covar_exppow
calculates covariance v as follows:
v = exp(-(d/range)^2)
Note that covar_exppow(..., shape = 1)
is equivalent to
covar_exp()
but is needed as a separate function for use with fitCor
.
Value
a tapered-spherical transformation of d is returned.
the exponential covariance (v)
exponential-power covariance (v)
Examples
# simulate dummy data
map.width = 5 # square map width
coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix
# calculate distance
D = geosphere::distm(coords) # distance matrix
# visualize covariance matrix
image(covar_taper(D, theta = .5*max(D)))
# plot tapered covariance function
curve(covar_taper(x, theta = .5), from = 0, to = 1);abline(v = 0.5, lty = 2, col = "grey80")
# visualize covariance matrix
image(covar_exp(D, range = .2*max(D)))
# plot exponential function with different ranges
curve(covar_exp(x, range = .2), from = 0, to = 1)
curve(covar_exp(x, range = .1), from = 0, to = 1, col = "blue", add = TRUE)
legend("topright", legend = c("range = 0.2", "range = 0.1"), col = c("black", "blue"), lty = 1)
# visualize Exponential covariance matrix
image(covar_exppow(D, range = .2*max(D), shape = 1))
# visualize Exponential-power covariance matrix
image(covar_exppow(D, range = .2*max(D), shape = .5))
# plot exponential power function with different shapes
curve(covar_exppow(x, range = .2, shape = 1), from = 0, to = 1)
curve(covar_exppow(x, range = .2, shape = .5), from = 0, to = 1, col = "blue", add = TRUE)
legend("topright", legend = c("shape = 1.0", "shape = 0.5"), col = c("black", "blue"), lty = 1)
Calculate cross-partition statistics in a partitioned GLS
Description
Calculate cross-partition statistics between two GLS partitions
Usage
crosspart_GLS(
xxi,
xxj,
xxi0,
xxj0,
invChol_i,
invChol_j,
Vsub,
nug_i,
nug_j,
df1,
df2,
small = TRUE,
ncores = NA
)
Arguments
xxi |
numeric matrix xx from partition i |
xxj |
numeric matrix xx from partition j |
xxi0 |
numeric matrix xx0 from partition i |
xxj0 |
numeric matrix xx0 from partition j |
invChol_i |
numeric matrix invcholV from partition i |
invChol_j |
numeric matrix invcholV from partition j |
Vsub |
numeric variance matrix for Xij (upper block) |
nug_i |
nugget from partition i |
nug_j |
nugget from partition j |
df1 |
first degree of freedom |
df2 |
second degree of freedom |
small |
logical: if |
ncores |
an optional integer indicating how many CPU threads to use for matrix calculations. |
Value
crosspart_GLS
returns a list of cross-partition statistics.
If small = FALSE
, the list contains the following elements
- Rij
- Hi
- Hj
- Hi0
- Hj0
- SiR
- SjR
- rcoefij
- rSSRij
- rSSEij
- Vcoefij
If small = FALSE
, the list only contains the necessary elements
rcoefij
, rSSRij
, and rSSEij
.
See Also
Other partitionedGLS:
MC_GLSpart()
,
sample_partitions()
Calculate a distance matrix from coordinates
Description
Calculate the distances among points from a single coordinate matrix or
Usage
distm_km(coords, coords2 = NULL)
distm_scaled(coords, coords2 = NULL, distm_FUN = "distm_km")
Arguments
coords |
a coordinate matrix with 2 columns and rows corresponding to each location. |
coords2 |
an optional coordinate matrix |
distm_FUN |
function used to calculate the distance matrix. This function dictates the units of "max.dist" |
Details
distm_km
is simply a wrapper for geosphere::distm()
Value
distm_km
returns a distance matrix in km
A distance matrix is returned.
If coords2 = NULL
, then distances among points in coords
are
calculated. Otherwise, distances are calculated between points in coords
and coords2
distm_km
returns a distance matrix in km and distm_scaled
returns
relative distances (between 0 and 1). The resulting matrix has the attribute
"max.dist" which stores the maximum distance of the map. "max.dist" is in
km for distm_km
and in the units of distm_FUN
for distm_scaled
.
See Also
?geosphere::distm()
Examples
map.width = 3 # square map width
coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix
distm_scaled(coords) # calculate relative distance matrix
AR regressions by REML
Description
fitAR
is used to fit AR(1) time series regression
analysis using restricted maximum likelihood
Usage
fitAR(formula, data = NULL)
AR_fun(par, y, X, logLik.only = TRUE)
Arguments
formula |
a model formula, as used by |
data |
optional data environment to search for variables in |
par |
AR parameter value |
y |
vector of time series (response) |
X |
model matrix (predictors) |
logLik.only |
logical: should only the partial log-likelihood be computed |
Details
This function finds the restricted maximum likelihood (REML) to estimate parameters for the regression model with AR(1) random error terms
y(t) = X(t) \beta + \varepsilon(t)
\varepsilon(t) = \rho \varepsilon(t-1) + \delta(t)
where y(t)
is the response at time t
;
X(t)
is a model matrix containing covariates;
\beta
is a vector of effects of X(t)
;
\varepsilon(t)
is the autocorrelated random error;
\delta \sim N(0, \sigma)
is a temporally independent
Gaussian random variable with mean zero and standard deviation
\sigma
;
and \rho
is the AR(1) autoregression parameter
fitAR
estimates the parameter via mathematical optimization
of the restricted log-likelihood function.
AR_fun
is the work horse behind fitAR
that is called
by optim
to estimate the autoregression parameter \rho
.
Value
fitAR
returns a list object of class "remoteTS", which contains
the following elements.
- call
the function call
- coefficients
a named vector of coefficients
- SE
the standard errors of parameter estimates
- tstat
the t-statistics for coefficients
- pval
the p-values corresponding to t-tests of coefficients
- MSE
the model mean squared error
- logLik
the log-likelihood of the model fit
- residuals
the residuals: response minus fitted values
- fitted.values
the fitted mean values
- rho
The AR parameter, determined via REML
- rank
the numeric rank of the fitted model
- df.residual
the residual degrees of freedom
- terms
the
stats::terms
object used
Output is structured similarly to an "lm" object.
When logLik.only == F
, AR_fun
returns the output described in
?fitAR
. When logLik.only == T
, it returns a quantity that is
linearly and negatively related to the restricted log likelihood
(i.e., partial log-likelihood).
References
Ives, A. R., K. C. Abbott, and N. L. Ziebarth. 2010. Analysis of ecological
time series with ARMA(p,q) models. Ecology 91:858-871.
See Also
fitAR_map
to easily apply fit_AR
to many pixels;
fitCLS
and fitCLS_map
for conditional least squares
time series analyses.
Other remoteTS:
fitAR_map()
,
fitCLS_map()
,
fitCLS()
Other remoteTS:
fitAR_map()
,
fitCLS_map()
,
fitCLS()
Examples
# simulate dummy data
t = 1:30 # times series
Z = rnorm(30) # random independent variable
x = .2*Z + (.05*t) # generate dependent effects
x[2:30] = x[2:30] + .2*x[1:29] # add autocorrelation
# fit the AR model, using Z as a covariate
(AR = fitAR(x ~ Z))
# get specific components
AR$residuals
AR$coefficients
AR$pval
# now using time as a covariate
(AR.time <- fitAR(x ~ t))
# source variable from a dataframe
df = data.frame(y = x, t.scaled = t/30, Z = Z)
fitAR(y ~ t.scaled + Z, data = df)
## Methods
summary(AR)
residuals(AR)
coefficients(AR)
Map-level AR REML
Description
fitAR_map
is used to fit AR REML regression to each spatial
location (pixel) within spatiotemporal data.
Usage
fitAR_map(
Y,
coords,
formula = "y ~ t",
X.list = list(t = 1:ncol(Y)),
resids.only = FALSE
)
Arguments
Y |
a spatiotemporal response variable: a numeric matrix or data frame where columns correspond to time points and rows correspond to pixels. |
coords |
a numeric coordinate matrix or data frame, with two columns and rows corresponding to each pixel |
formula |
a model formula, passed to |
X.list |
a named list of temporal or spatiotemporal predictor variables:
elements must be either numeric vectors with one element for each time point
or a matrix/data frame with rows corresponding to pixels and columns
corresponding to time point. These elements must be named and referred to
in |
resids.only |
logical: should output beyond coordinates and residuals be
withheld? Useful when passing output to |
Details
fitAR_map
is a wrapper function that applies fitAR
to
many pixels.
The function loops through the rows of Y
, matched with rows of
spatiotemporal predictor matrices. Purely temporal predictors, given by
vectors, are used for all pixels. These predictor variables, given by the
right side of formula
are sourced from named elements in X.list
.
Value
fitCLS_map
returns a list object of class "mapTS".
The output will always contain at least these elements:
- call
the function call
- coords
the coordinate matrix or dataframe
- residuals
time series residuals: rows correspond to pixels (
coords
)
When resids.only = FALSE
, the output will also contain the following
components. Matrices have rows that correspond to pixels and columns that
correspond to time points and vector elements correspond to pixels.
- coefficients
a numeric matrix of coefficeints
- SEs
a numeric matrix of coefficient standard errors
- tstats
a numeric matrix of t-statistics for coefficients
- pvals
a numeric matrix of p-values for coefficients t-tests
- rhos
a vector of rho values for each pixel
- MSEs
a numeric vector of MSEs
- logLiks
a numeric vector of log-likelihoods
- fitted.values
a numeric matrix of fitted values
An attribute called "resids.only" is also set to match the value of
resids.only
See Also
fitAR
for fitting AR REML to individual time series and fitCLS
& fitCLS_map
for time series analysis based on conditional least squares.
Other remoteTS:
fitAR()
,
fitCLS_map()
,
fitCLS()
Examples
# simulate dummy data
time.points = 9 # time series length
map.width = 5 # square map width
coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix
## create empty spatiotemporal variables:
X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response
Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor
# setup first time point:
Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"]
X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t
## project through time:
for(t in 2:time.points){
Z[, t] <- Z[, t-1] + rnorm(map.width^2)
X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25)
}
# visualize dummy data (NOT RUN)
library(ggplot2);library(dplyr)
data.frame(coords, X) %>%
reshape2::melt(id.vars = c("x", "y")) %>%
ggplot(aes(x = x, y = y, fill = value)) +
geom_tile() +
facet_wrap(~variable)
# fit AR, showing all output
fitAR_map(X, coords, formula = y ~ t, resids.only = TRUE)
# fit AR with temporal and spatiotemporal predictors
(AR.map <- fitAR_map(X, coords, formula = y ~ t + Z, X.list = list(t = 1:ncol(X),
Z = Z), resids.only = FALSE))
## extract some values
AR.map$coefficients # coefficients
AR.map$logLik # log-likelihoods
## Methods
summary(AR.map)
residuals(AR.map)
coefficients(AR.map)
CLS for time series
Description
fitCLS
is used to fit conditional least squares regression
to time series data.
Usage
fitCLS(
formula,
data = NULL,
lag.y = 1,
lag.x = 1,
debug = FALSE,
model = FALSE,
y = FALSE
)
Arguments
formula |
a model formula, as used by |
data |
optional data environment to search for variables in |
lag.y |
an integer indicating the lag (in time steps) between y and y.0 |
lag.x |
an integer indicating the lag (in time steps) between y and the independent variables (except y.0). |
debug |
logical debug mode |
model |
logical, should the used model matrix be returned? As used by
|
y |
logical, should the used response variable be returned? As used by
|
Details
This function regresses the response variable (y) at time t, conditional on the
response at time t-lag.y
and the specified dependent variables (X) at
time t-lag.x
:
y(t) = y(t - lag.y) + X(t - lag.x) + \varepsilon
where y(t)
is the response at time t
;
X(t)
is a model matrix containing covariates;
\beta
is a vector of effects of X(t)
;
and \varepsilon(t)
is a temporally independent Gaussian random
variable with mean zero and standard deviation \sigma
stats::lm()
is then called, using the above equation.
Value
fitCLS
returns a list object of class "remoteTS", which
inherits from "lm". In addition to the default "lm" components, the output
contains these additional list elements:
- tstat
the t-statistics for coefficients
- pval
the p-values corresponding to t-tests of coefficients
- MSE
the model mean squared error
- logLik
the log-likelihood of the model fit
See Also
fitCLS_map
to easily apply fitCLS
to many pixels;
fitAR
and fitAR_map
for AR time series analyses.
Other remoteTS:
fitAR_map()
,
fitAR()
,
fitCLS_map()
Examples
# simulate dummy data
t = 1:30 # times series
Z = rnorm(30) # random independent variable
x = .2*Z + (.05*t) # generate dependent effects
x[2:30] = x[2:30] + .2*x[1:29] # add autocorrelation
x = x + rnorm(30, 0, .01)
df = data.frame(x, t, Z) # collect in data frame
# fit a CLS model with previous x, t, and Z as predictors
## note, this model does not follow the underlying process.
### See below for a better fit.
(CLS <- fitCLS(x ~ t + Z, data = df))
# extract other values
CLS$MSE #MSE
CLS$logLik #log-likelihood
# fit with no lag in independent variables (as simulated):
(CLS2 <- fitCLS(x ~ t + Z, df, lag.x = 0))
summary(CLS2)
# no lag in x
fitCLS(x ~ t + Z, df, lag.y = 0)
# visualize the lag
## large lag in x
fitCLS(x ~ t + Z, df, lag.y = 2, lag.x = 0, debug = TRUE)$lag
## large lag in Z
fitCLS(x ~ t + Z, df, lag.y = 0, lag.x = 2, debug = TRUE)$lag
# # throws errors (NOT RUN)
# fitCLS(x ~ t + Z, df, lag.y = 28) # longer lag than time
# fitCLS(cbind(x, rnorm(30)) ~ t + Z, df) # matrix response
## Methods
summary(CLS)
residuals(CLS)
Map-level CLS for time series
Description
fitCLS_map
is used to fit conditional least squares
regression to each spatial location (pixel) within spatiotemporal data.
Usage
fitCLS_map(
Y,
coords,
formula = "y ~ t",
X.list = list(t = 1:ncol(Y)),
lag.y = 1,
lag.x = 0,
resids.only = FALSE
)
Arguments
Y |
a spatiotemporal response variable: a numeric matrix or data frame where columns correspond to time points and rows correspond to pixels. |
coords |
a numeric coordinate matrix or data frame, with two columns and rows corresponding to each pixel |
formula |
a model formula, passed to |
X.list |
a named list of temporal or spatiotemporal predictor variables:
elements must be either numeric vectors with one element for each time point
or a matrix/data frame with rows corresponding to pixels and columns
corresponding to time point. These elements must be named and referred to
in |
lag.y |
the lag between y and y.0, passed to |
lag.x |
the lag between y and predictor variables, passed to
|
resids.only |
logical: should output beyond coordinates and residuals be
withheld? Useful when passing output to |
Details
fitCLS_map
is a wrapper function that applies
fitCLS()
to many pixels.
The function loops through the rows of Y
, matched with rows of
spatiotemporal predictor matrices. Purely temporal predictors, given by
vectors, are used for all pixels. These predictor variables, given by the
right side of formula
are sourced from named elements in X.list
.
Value
fitCLS_map
returns a list object of class "mapTS".
The output will always contain at least these elements:
- call
the function call
- coords
the coordinate matrix or dataframe
- residuals
time series residuals: rows correspond to pixels (
coords
)
When resids.only = FALSE
, the output will also contain the following
components. Matrices have rows that correspond to pixels and columns that
correspond to time points and vector elements correspond to pixels.
- coefficients
a numeric matrix of coefficeints
- SEs
a numeric matrix of coefficient standard errors
- tstats
a numeric matrix of t-statistics for coefficients
- pvals
a numeric matrix of p-values for coefficients t-tests
- MSEs
a numeric vector of MSEs
- logLiks
a numeric vector of log-likelihoods
- fitted.values
a numeric matrix of fitted values
An attribute called "resids.only" is also set to match the value of
resids.only
See Also
fitCLS
for fitting CLS on individual time series and
fitAR
and fitAR_map
for AR REML time series analysis.
Other remoteTS:
fitAR_map()
,
fitAR()
,
fitCLS()
Examples
# simulate dummy data
time.points = 9 # time series length
map.width = 5 # square map width
coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix
## create empty spatiotemporal variables:
X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response
Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor
# setup first time point:
Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"]
X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t
## project through time:
for(t in 2:time.points){
Z[, t] <- Z[, t-1] + rnorm(map.width^2)
X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25)
}
# # visualize dummy data (NOT RUN)
# library(ggplot2);library(dplyr)
# data.frame(coords, X) %>%
# reshape2::melt(id.vars = c("x", "y")) %>%
# ggplot(aes(x = x, y = y, fill = value)) +
# geom_tile() +
# facet_wrap(~variable)
# fit CLS, showing all output
fitCLS_map(X, coords, formula = y ~ t, resids.only = TRUE)
# fit CLS with temporal and spatiotemporal predictors
(CLS.map <- fitCLS_map(X, coords, formula = y ~ t + Z,
X.list = list(t = 1:ncol(X), Z = Z),
resids.only = FALSE))
## extract some values
CLS.map$coefficients # coefficients
CLS.map$logLik # log-likelihoods
## Methods
summary(CLS.map)
residuals(CLS.map)
coefficients(CLS.map)
Estimate spatial parameters from time series residuals
Description
fitCor()
estimates parameter values of a distance-based
variance function from the pixel-wise correlations among time series residuals.
Usage
fitCor(
resids,
coords,
distm_FUN = "distm_scaled",
covar_FUN = "covar_exp",
start = list(r = 0.1),
fit.n = 1000,
index,
save_mod = TRUE,
...
)
Arguments
resids |
a matrix of time series residuals, with rows corresponding to pixels and columns to time points |
coords |
a numeric coordinate matrix or data frame, with two columns and rows corresponding to each pixel |
distm_FUN |
a function to calculate a distance matrix from |
covar_FUN |
a function to estimate distance-based covariances |
start |
a named list of starting parameter values for |
fit.n |
an integer indicating how many pixels should be used to estimate parameters. |
index |
an optional index of pixels to use for parameter estimation |
save_mod |
logical: should the nls model be saved in the output? |
... |
additional arguments passed to |
Details
For accurate results, resids
and coords
must be paired matrices.
Rows of both matrices should correspond to the same pixels.
Distances between sapmled pixels are calculated with the function specified by
distm_FUN
. This function can be any that takes a coordinate
matrix as input and returns a distance matrix between points. Some options
provided by remotePARTS
are distm_km()
, which returns distances
in kilometers and distm_scaled()
, which returns distances scaled between
0 and 1.
covar_FUN
can be any function that takes a vector of distances as its
first argument, and at least one parameter as additional arguments. remotePARTS
provides three suitable functions: covar_exp
, covar_exppow
, and
covar_taper
; but user-defined functions are also possible.
Parameters are estimated with stats::nls()
by regressing correlations
among time series residuals on a function of distances specified by covar_FUN
.
start
is used by nls
to determine how many parameters need
estimating, and starting values for those parameters. As such, it is important
that start
has named elements for each parameter in covar_FUN
.
The fit will be performed for all pixels specified in index
, if provided.
Otherwise, a random sample of length fit.n
is used. If fit.n
exceeds the number of pixels, all pixels are used. When random pixels are used,
parameter estimates will be different for each call of the function. For reproducible
results, we recommend taking a random sample of pixels manually and passing in
those values as index
.
Caution: Note that a distance matrix, of size n \times n
must be fit to the
sampled data where n
is either fit.n
or length(index)
.
Take your computer's memory and processing time into consideration when choosing
this size.
Parameter estimates are always returned in the same scale of distances
calculated by distm_FUN
. It is very important that these estimates
are re-scaled by users if output of distm_FUN
use units different from
the desired scale. For example,
if the function covar_FUN = function(d, r, a){-(d/r)^a}
is used
with distm_FUN = "distm_scaled"
, the estimated range parameter r
will be based on a unit-map. Users will likely want to re-scaled it to map
units by multiplying r
by the maximum distance among points on your map.
If the distm_FUN
is on the scale of your map (e.g., "distm_km"),
re-scaling is not needed but may be preferable, since it is scaled to the
maximum distance among the sampled data rather than the true maximum
distance. For example, dividing the range parameter by max.distance
and then multiplying it by the true max distance may provide a better range
estimate.
Value
fitCor
returns a list object of class "remoteCor", which contains
these elements:
- call
the function call
- mod
the
nls
fit object, ifsave_mod=TRUE
- spcor
a vector of the estimated spatial correlation parameters
- max.distance
the maximum distance among the sampled pixels, as calculated by
dist_FUN
.- logLik
the log-likelihood of the fit
Examples
# simulate dummy data
set.seed(19)
time.points = 30 # time series length
map.width = 8 # square map width
coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix
## create empty spatiotemporal variables:
X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response
Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor
## setup first time point:
Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"]
X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t
## project through time:
for(t in 2:time.points){
Z[, t] <- Z[, t-1] + rnorm(map.width^2)
X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25)
}
AR.map = fitAR_map(X, coords, formula = y ~ Z, X.list = list(Z = Z), resids.only = FALSE)
# using pre-defined covariance function
## exponential covariance
fitCor(AR.map$residuals, coords, covar_FUN = "covar_exp", start = list(range = .1))
## exponential-power covariance
fitCor(AR.map$residuals, coords, covar_FUN = "covar_exppow", start = list(range = .1, shape = .2))
# user-specified covariance function
fitCor(AR.map$residuals, coords, covar_FUN = function(d, r){d^r}, start = list(r = .1))
# un-scaled distances:
fitCor(AR.map$residuals, coords, distm_FUN = "distm_km", start = list(r = 106))
# specify which pixels to use, for reproducibility
fitCor(AR.map$residuals, coords, index = 1:64)$spcor #all
fitCor(AR.map$residuals, coords, index = 1:20)$spcor #first 20
fitCor(AR.map$residuals, coords, index = 21:64)$spcor # last 43
# randomly select pixels
fitCor(AR.map$residuals, coords, fit.n = 20)$spcor #random 20
fitCor(AR.map$residuals, coords, fit.n = 20)$spcor # different random 20
Fit a PARTS GLS model.
Description
Fit a PARTS GLS model.
Usage
fitGLS(
formula,
data,
V,
nugget = 0,
formula0 = NULL,
save.xx = FALSE,
save.invchol = FALSE,
logLik.only = FALSE,
no.F = FALSE,
coords,
distm_FUN,
covar_FUN,
covar.pars,
invCholV,
ncores = NA,
suppress_compare_warning = FALSE,
...
)
Arguments
formula |
a model formula |
data |
an optional data frame environment in which to search for
variables given by |
V |
a covariance matrix, which must be positive definitive. This argument
is optional if |
nugget |
an optional numeric nugget, must be positive |
formula0 |
an optional formula for the null model to be compared with
|
save.xx |
logical: should information needed for cross-partition comparisons be returned? |
save.invchol |
logical: should the inverse of the Cholesky matrix be returned? |
logLik.only |
logical: should calculations stop after calculating parital log-likelihood? |
no.F |
logical: should F-test calculations be made? |
coords |
optional coordinate matrix for calculating |
distm_FUN |
optional function for calculating a distance matrix from
|
covar_FUN |
optional distance-based covariance function for calculating
|
covar.pars |
an optional named list of parameters passed to |
invCholV |
optional pre-calculated inverse cholesky matrix to use in place
of |
ncores |
an optional integer indicating how many CPU threads to use for matrix calculations. |
suppress_compare_warning |
an optional variable to suppress warning that
arises from identical |
... |
additional arguments passed to |
Details
conduct generalized least-squares regression of spatiotemporal trends
fitGLS
fits a GLS model, using terms specified in formula
.
In the PARTS method, generally the left side of formula
should be
pixel-level trend estimates and the right side should be spatial predictors.
The errors of the GLS are correlated according to covariance matrix V
.
If nugget = NA
, an ML nugget is estimated from the data using the
optimize_nugget()
function. Arguments additional arguments (...
)
are passed to optimize_nugget
in this case. V
must be provided
for nugget optimization.
If formula0
is not specified, the default is to fit an intercept-only
null model.
save.xx
is included to allow for manually conducting a partitioned
GLS analyses. Because most users will not need this feature, opting instead
to use fitGLS_parition()
, save.xx = FALSE
by default.
Similarly, save.invchol
is included to allow for recycling of the
inverse cholesky matrix. Often, inverting the large cholesky matrix
(i.e., invert_chol(V)
) is the slowest part of GLS. This argument exists
to allow users to recycle this process, though no remotePARTS
function
currently exists that can use invert_chol(V)
to fit the GLS.
logLik.only = TRUE
will return only the partial log-likelihood, which can
minimized to obtain the maximum likelihood for a given set of data.
If no.F = TRUE
, then the model given by formula
is not compared
to the model given by formula0
.
If V
is not provided, it can be fit internally by specifying all of
coords
, distm_FUN
, covar_FUN
, and covar.pars
.
The function given by distm_FUN
will calculate a distance matrix from
coords
, which is then transformed into a distance-based covariance
matrix with covar_FUN
and parameters given by covar.pars
.
This function uses C++ code that uses the Eigen matrix library (RcppEigen package) to fit models as efficiently as possible. As such, all available CPU cores are used for matrix calculations on systems with OpenMP support.
ncores
is passed to the C++ code Eigen::setNpThreads() which sets
the number of cores used for compatible Eigen matrix operations (when OpenMP
is used).
Value
fitGLS
returns a list object of class "remoteGLS", if
logLik.only = FALSE
. The list contains at least the following elements:
- coefficients
coefficient estimates for predictor variables
- SSE
sum of squares error
- MSE
mean squared error
- SE
standard errors
- df_t
degrees of freedom for the t-test
- logDetV
log-determinant of V
- tstat
t-test statistic
- pval_t
p-value of the t-statistic
- logLik
the Log-likelihood of the model
- nugget
the nugget used in fitting
- covar_coef
the covariance matrix of the coefficients
If no.F = FALSE
, the following elements, corresponding to the null
model and F-test are also calculated:
- coefficients0
coefficient estimates for the null model
- SSE0
sum of squares error for the null model
- MSE0
mean squared error for the null model
- SE0
the standard errors for null coefficients
- MSR
the regression mean square
- df0
the null model F-test degrees of freedom
- LL0
the log-likelihood of the null model
- df_F
the F-test degrees of freedom, for the main model
- Fstat
the F-statistic
- pval_F
the F-test p-value
- formula
the alternate formula used
- formula0
the null formula used
An attribute called also set to "no.F"
is set to the value of
argument no.F
, which signals to generic methods how to handle the output.
If save.invchol = TRUE
, output also includes
- invcholV
the inverse of the Cholesky decomposition of the covariance matrix obtained with
invert_chol(V, nugget)
If save.xx = TRUE
, output also includes the following elements
- xx
the predictor variables
X
, from the right side offormula
, transformed by the inverse cholesky matrix: xx =invcholV %*% X
- xx0
the predictor variables
X0
, from the right side offormula0
, transformed by the inverse cholesky matrix: xx0 =invcholV %*% X0
The primary use of xx
and xx0
is for use with fitGLS_partition()
.
If logLik.only = TRUE
, a single numeric output containing the
log-likelihood is returned.
Examples
## read data
data(ndvi_AK10000)
df = ndvi_AK10000[seq_len(200), ] # first 200 rows
## fit covariance matrix
V = covar_exp(distm_scaled(cbind(df$lng, df$lat)), range = .01)
## run GLS
(GLS = fitGLS(CLS_coef ~ 0 + land, data = df, V = V))
## with F-test calculations to compare with the NULL model
(GLS.F = fitGLS(CLS_coef ~ 0 + land, data = df, V = V, no.F = FALSE))
## find ML nugget
fitGLS(CLS_coef ~ 0 + land, data = df, V = V, no.F = FALSE, nugget = NA)
## calculate V internally
coords = cbind(df$lng, df$lat)
fitGLS(CLS_coef ~ 0 + land, data = df, logLik.only = FALSE, coords = coords,
distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = list(range = .01))
## use inverse cholesky
fitGLS(CLS_coef ~ 0 + land, data = df, invCholV = invert_chol(V))
## save inverse cholesky matrix
invchol = fitGLS(CLS_coef ~ 0 + land, data = df, V = V, save.invchol = TRUE)$invcholV
## re-use inverse cholesky instead of V
fitGLS(CLS_coef ~ 0 + land, data = df, invCholV = invchol)
## Log-likelihood (fast)
fitGLS(CLS_coef ~ 0 + land, data = df, V = V, logLik.only = TRUE)
Fit a PARTS GLS model, with maximum likelihood spatial parameters
Description
Fit a PARTS GLS model, with maximum likelihood spatial parameters
Usage
fitGLS_opt(
formula,
data = NULL,
coords,
distm_FUN = "distm_scaled",
covar_FUN = "covar_exp",
start = c(range = 0.01, nugget = 0),
fixed = c(),
opt.only = FALSE,
formula0 = NULL,
save.xx = FALSE,
save.invchol = FALSE,
no.F = TRUE,
trans = list(),
backtrans = list(),
debug = TRUE,
ncores = NA,
...
)
Arguments
formula |
a model formula, passed to |
data |
an optional data frame environment in which to search for
variables given by |
coords |
a numeric coordinate matrix or data frame, with two columns and rows corresponding to each pixel |
distm_FUN |
a function to calculate a distance matrix from |
covar_FUN |
a function to estimate distance-based covariances |
start |
a named vector of starting values for each parameter to be estimated;
names must match the names of arguments in |
fixed |
an optional named vector of fixed parameter values; names
must match the names of arguments in |
opt.only |
logical: if TRUE, execution will halt after estimating the parameters; a final GLS will not be fit with the estimated parameters |
formula0 , save.xx , save.invchol , no.F |
arguments passed to |
trans |
optional list of functions for transforming the values in
|
backtrans |
optional list of functions for back-transforming parameters
to their correct scale (for use with |
debug |
logical: debug mode (for use with |
ncores |
an optional integer indicating how many CPU threads to use for calculations. |
... |
additional arguments passed to |
Details
Estimate spatial parameters, via maximum likelihood, from data rather than from time series residuals; Fit a GLS with these specifications.
fitGLS_opt
fits a GLS by estimating spatial parameters from
data. fitCor
, combined with fitGLS(nugget = NA)
,
gives better estimates of spatial parameters, but time-series residuals may
not be available in all cases. In these cases, spatial parameters can be
estimated from distances among points and a response vector. Mathematical
optimization of the log likelihood of different GLS models are computed by
calling optim()
on fitGLS
.
Distances are calculated with distm_FUN
and a covariance matrix is
calculated from these distances with covar_FUN
. Arguments to to
covar_FUN
, except distances, are given by start
and fixed
.
Parameters specified in start
will be be estimated while those given
by fixed
will remain constant throughout fitting. Parameter names in
start
and fixed
should exactly match the names of arguments in
covar_FUN
and should not overlap (though, fixed
takes precedence).
In addition to arguments of covar_FUN
a "nugget" component can
also be occur in start
or fixed
. If "nugget" does not occur
in either vector, the GLS are fit with nugget = 0
. A zero nugget also
allows much faster computation, through recycling the common
inverse cholesky matrix in each GLS computation. A non-zero nugget requires
inversion of a different matrix at each iteration, which can be
substantially slower.
If opt.only = FALSE
, the estimated parameters are used to fit the final
maximum likelihood GLS solution with fitGLS()
and arguments
formula0
, save.xx
, save.invchol
, and no.F
.
Some parameter combinations may not produce valid covariance matrices. During
the optimization step messages about non-positive definitive V may result on
some iterations. These warnings are produced by fitGLS
and NA
log-likelihoods are returned in those cases.
Note that fitGLS_opt
fits multiple GLS models, which requires
inverting a large matrix for each one (unless a fixed 0 nugget is used).
This process is very computationally intensive and may take a long time to
finish depending upon your machine and the size of the data.
Sometimes optim
can have a difficult time finding a reasonable solution
and without any constraits on parameter space (with certain algorithms), results
may even be nonsensical. To combat this, fitGLS_opt
has the arguments
trans
and backtrans
which allow you to transform
(and back-transform) parameters to a different scale. For example, you may
want to force the 'range' parameter between 0 and 1. The logit function can
do just that, as its limits are -Inf and Inf as x approaches 0 and 1,
respectively. So, we can set trans
to the logit function:
trans = list(range = function(x)log(x/(1-x)))
. Then we need to set
backtrans
to the inverse logit function to return a parameter value
between 0 and 1: backtrans = list(range = function(x)1/(1+exp(-x)))
.
This will force the optimizer to only search for the range parameter in the
space from 0 to 1. Any other constraint function can be used for trans
provided that there is a matching back-transformation.
Value
If opt.only = TRUE
, fitGLS_opt
returns the
output from stats::optim()
: see it's documentation for more details.
Otherwise, a list with two elements is returned:
- opt
output from
optim
, as above- GLS
a "remoteGLS" object. See
fitGLS
for more details.
See Also
fitCor
for estimating spatial parameters from time
series residuals; fitGLS
for fitting GLS and with the option
of estimating the maximum-likelihood nugget component only.
Examples
## read data
data(ndvi_AK10000)
df = ndvi_AK10000[seq_len(200), ] # first 200 rows
## estimate nugget and range (very slow)
fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df,
coords = df[, c("lng", "lat")], start = c(range = .1, nugget = 0),
opt.only = TRUE)
## estimate range only, fixed nugget at 0, and fit full GLS (slow)
fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df,
coords = df[, c("lng", "lat")],
start = c(range = .1), fixed = c("nugget" = 0),
method = "Brent", lower = 0, upper = 1)
## constrain nugget to 0 and 1
logit <- function(p) {log(p / (1 - p))}
inv_logit <- function(l) {1 / (1 + exp(-l))}
fitGLS_opt(formula = CLS_coef ~ 0 + land, data = df,
coords = df[, c("lng", "lat")],
start = c(range = .1, nugget = 1e-10),
trans = list(nugget = logit), backtrans = list(nugget = inv_logit),
opt.only = TRUE)
Function that fitGLS_opt optimizes over
Description
Function that fitGLS_opt optimizes over
Usage
fitGLS_opt_FUN(
op,
fp,
formula,
data = NULL,
coords,
covar_FUN = "covar_exp",
distm_FUN = "distm_scaled",
is.trans = FALSE,
backtrans = list(),
ncores = NA
)
Arguments
op |
a named vector of parameters to be optimized |
fp |
a named vector of fixed parameters |
formula |
GLS model formula |
data |
data source |
coords |
a coordinate matrix |
covar_FUN |
a covariance function |
distm_FUN |
a distm function |
is.trans |
logical: are any of the values in |
backtrans |
optional: a named list of functions used to backtransform any element
of |
ncores |
an optional integer indicating how many CPU threads to use for calculations. |
Value
fitGLS_opt_FUN
returns the negative log likelihood of a GLS,
given the parameters in op
and fp
Invert the cholesky decomposition of V
Description
Invert the cholesky decomposition of V
Usage
invert_chol(M, nugget = 0, ncores = NA)
Arguments
M |
numeric (double), positive definite matrix |
nugget |
numeric (double) nugget to add to M |
ncores |
optional integer indicating how many cores to use during the inversion calculation |
Details
Calculates the inverse of the Cholesky decomposition of M which should not be confused with the inverse of M *derived* from the Cholesky decomposition (i.e. 'chol2inv(M)').
Value
numeric matrix: inverse of the Cholesky decomposition (lower triangle)
Examples
M <- crossprod(matrix(1:6, 3))
# without a nugget:
invert_chol(M)
# with a nugget:
invert_chol(M, nugget = 0.2)
calculate maximum distance among a table of coordinates
Description
calculate maximum distance among a table of coordinates
Usage
max_dist(coords, dist_FUN = "distm_km")
Arguments
coords |
the coordinate matrix (or dataframe) from which a maximum distance is desired. |
dist_FUN |
the distance function used to calculate distances |
Details
First the outermost points are found by fitting a convex hull in
Euclidean space. Then, the distances between these outer points is calculated
with dist_FUN
, and the maximum of these distances is returned
This is a fast, simple way of determining the maximum distance.
Value
The maximum distance between two points (units determined by
dist_FUN
)
NDVI remote sensing data for 10,000 random pixels from Alaska, with rare land classes removed.
Description
NDVI remote sensing data for 10,000 random pixels from Alaska, with rare land classes removed.
Usage
ndvi_AK10000
Format
data frame with 10,000 rows corresponding to sites and 37 columns:
- lng
longitude of the pixel
- lat
latitude of the pixel
- AR_coef
pre-calculated AR REML coefficient standardized by mean ndvi values for each pixel
- CLS_coef
pre-calculated CLS coefficient standardized by mean ndvi values for each pixel
- land
dominant land class of the pixel
- land
logical: is this land class rare?
- ndvi<t>
ndvi value of the pixel during the year <t>
Find the maximum likelihood estimate of the nugget
Description
Find the maximum likelihood estimate of the nugget
Usage
optimize_nugget(
X,
y,
V,
lower = 0.001,
upper = 0.999,
tol = .Machine$double.eps^0.25,
debug = FALSE,
ncores = NA
)
Arguments
X |
numeric (double) nxp matrix |
y |
numeric (double) nx1 column vector |
V |
numeric (double) nxn matrix |
lower |
lower boundary for nugget search |
upper |
upper boundary for nugget search |
tol |
desired accuracy of nugget search |
debug |
logical: debug mode? |
ncores |
an optional integer indicating how many CPU threads to use for matrix calculations. |
Details
Finds the maximum likelihood nugget estimate via mathematical optimization.
To maximize efficiency, optimize_nugget()
is implemented entirely
in C++. Optimization takes place via a C++ version of the fmin
routine
(Forsythe et al 1977). Translated from http://www.netlib.org/fmm/fmin.f
The function LogLikGLS()
is optimized for nugget
. Once the
LogLikGLS()
functionality is absorbed by fitGLS()
, it will
be used instead.
Value
maximum likelihood nugget estimate
See Also
?stats::optimize()
partitioned GLS results
Description
Example output from fitGLS_partition() fit to the ndvi_AK
data set
Usage
partGLS_ndviAK
Format
an S3 class "partGLS" object. See ?fitGLS_partition() for further details
Chisqr test for partitioned GLS
Description
Chisqr test for partitioned GLS
Usage
part_chisqr(Fmean, rSSR, df1, npart)
Arguments
Fmean |
mean value of F-statistic from correlated F-tests |
rSSR |
correlation among partition regression sum of squares |
df1 |
first degree of freedom for F-tests |
npart |
number of partitions |
Value
a p-value for the correlated chisqr test
Correlated t-test for paritioned GLS
Description
Correlated t-test for paritioned GLS
Usage
part_ttest(coefs, part.covar_coef, rcoefficients, df2, npart)
Arguments
coefs |
vector average GLS coefficients |
part.covar_coef |
an array of covar_coef from each partition |
rcoefficients |
an rcoefficeints array, one for each partition |
df2 |
second degree of freedom from partitioned GLS |
npart |
number of partitions |
Value
a list whose first element is a coefficient table with estimates, standard errors, t-statistics, and p-values and whose second element is a matrix of correlations among coefficients.
S3 print method for "partGLS" objects
Description
S3 print method for "partGLS" objects
Usage
## S3 method for class 'partGLS'
print(x, ...)
Arguments
x |
"partGLS" object |
... |
additional arguments passed to print |
Value
a print-formatted version of key elements of the "partGLS" object.
S3 print method for "remoteCor" class
Description
S3 print method for "remoteCor" class
Usage
## S3 method for class 'remoteCor'
print(x, ...)
Arguments
x |
remoteCor object to print |
... |
additional arguments passed to print() |
Value
a print-formatted version of key elements of the "remoteCor" object.
print method for remoteGLS
Description
print method for remoteGLS
Usage
## S3 method for class 'remoteGLS'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
remoteGLS object |
digits |
digits to print |
... |
additional arguments |
Value
formatted output for remoteGLS object
S3 print method for remoteTS class
Description
S3 print method for remoteTS class
S3 summary method for remoteTS class
S3 print method for mapTS class
S3 summary method for mapTS class
helper summary function (matrix)
helper summary function (vector)
Usage
## S3 method for class 'remoteTS'
print(
x,
digits = max(3L, getOption("digits") - 3L),
signif.stars = getOption("show.signif.stars"),
...
)
## S3 method for class 'remoteTS'
summary(
object,
digits = max(3L, getOption("digits") - 3L),
signif.stars = getOption("show.signif.stars"),
...
)
## S3 method for class 'mapTS'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'mapTS'
summary(
object,
digits = max(3L, getOption("digits") - 3L),
CL = 0.95,
na.rm = TRUE,
...
)
smry_funM(x, CL = 0.95, na.rm = TRUE)
smry_funV(x, CL = 0.95, na.rm = TRUE)
Arguments
x |
numeric matrix |
digits |
significant digits to show |
signif.stars |
logical, passed to |
... |
additional parameters passed to further print methods |
object |
mapTS object |
CL |
confidence level (default = .95) |
na.rm |
logical, should observations with NA be removed? |
Value
returns formatted output
returns formatted output, including summary stats
returns formatted output
returns formatted summary stats
summary statistics for each column including quartiles, mean, and upper and lower confidence levels (given by CL)
summary statistics including quartiles, mean, and upper and lower confidence levels (given by CL)
Examples
# simulate dummy data
time.points = 9 # time series length
map.width = 5 # square map width
coords = expand.grid(x = 1:map.width, y = 1:map.width) # coordinate matrix
## create empty spatiotemporal variables:
X <- matrix(NA, nrow = nrow(coords), ncol = time.points) # response
Z <- matrix(NA, nrow = nrow(coords), ncol = time.points) # predictor
# setup first time point:
Z[, 1] <- .05*coords[,"x"] + .2*coords[,"y"]
X[, 1] <- .5*Z[, 1] + rnorm(nrow(coords), 0, .05) #x at time t
## project through time:
for(t in 2:time.points){
Z[, t] <- Z[, t-1] + rnorm(map.width^2)
X[, t] <- .2*X[, t-1] + .1*Z[, t] + .05*t + rnorm(nrow(coords), 0 , .25)
}
## Pixel CLS
tmp.df = data.frame(x = X[1, ], t = nrow(X), z = Z[1, ])
CLS <- fitCLS(x ~ z, data = tmp.df)
print(CLS)
summary(CLS)
residuals(CLS)
coef(CLS)
logLik(CLS)
fitted(CLS)
# plot(CLS) # doesn't work
## Pixel AR
AR <- fitAR(x ~ z, data = tmp.df)
print(AR)
summary(AR)
coef(AR)
residuals(AR)
logLik(AR)
fitted(AR)
# plot(AR) # doesn't work
## Map CLS
CLS.map <- fitCLS_map(X, coords, y ~ Z, X.list = list(Z = Z), lag.x = 0, resids.only = TRUE)
print(CLS.map)
summary(CLS.map)
residuals(CLS.map)
# plot(CLS.map)# doesn't work
CLS.map <- fitCLS_map(X, coords, y ~ Z, X.list = list(Z = Z), lag.x = 0, resids.only = FALSE)
print(CLS.map)
summary(CLS.map)
coef(CLS.map)
residuals(CLS.map)
# logLik(CLS.map) # doesn't work
fitted(CLS.map)
# plot(CLS.map) # doesn't work
## Map AR
AR.map <- fitAR_map(X, coords, y ~ Z, X.list = list(Z = Z), resids.only = TRUE)
print(AR.map)
summary(AR.map)
residuals(AR.map)
# plot(AR.map)# doesn't work
AR.map <- fitAR_map(X, coords, y ~ Z, X.list = list(Z = Z), resids.only = FALSE)
print(AR.map)
summary(AR.map)
coef(AR.map)
residuals(AR.map)
# logLik(AR.map) # doesn't work
fitted(AR.map)
# plot(AR.map) # doesn't work
remoteGLS constructor (S3)
Description
remoteGLS constructor (S3)
Usage
remoteGLS(formula, formula0, no.F = FALSE)
Arguments
formula |
optional argument specifying the GLS formula |
formula0 |
optional argument specifying the null GLS formula |
no.F |
optional argument specifying the no.F attribute |
Value
an empty S3 object of class "remoteGLS"
Randomly sample a partition matrix for partitioned GLS
Description
Create a matrix whose columns contain indices of non-overlapping random samples.
Usage
sample_partitions(
npix,
npart = 10,
partsize = NA,
pixels = NA,
verbose = FALSE
)
Arguments
npix |
number of pixels in full dataset |
npart |
number of partitions to create |
partsize |
size of each partition |
pixels |
vector of pixel indexes to sample from |
verbose |
logical: TRUE prints additional info |
Details
If both npart
and partsize
is specified, a partition matrix with
these dimensions is returned. If only npart
, is specified,
partsize
is selected as the largest integer possible that creates
equal sized partitions. Similarly, if only npart = NA
, then npart
is selected to obtain as many partitions as possible.
Value
sample_partitions
returns a matrix with partsize
rows and npart
columns. Columns contain random, non-overlapping samples
from 1:npix
See Also
Other partitionedGLS:
MC_GLSpart()
,
crosspart_GLS()
Examples
# dummy data with 100 pixels and 20 time points
dat.M <- matrix(rnorm(100*20), ncol = 20)
# 4 partitions (exhaustive)
sample_partitions(npix = nrow(dat.M), npart = 4)
# partitions with 10 pixels each (exhaustive)
sample_partitions(npix = nrow(dat.M), partsize = 10)
# 4 partitions each with 10 pixels (non-exhaustive, produces warning)
sample_partitions(npix = nrow(dat.M), npart = 4, partsize = 10)
# index of 50 pixels to use as subset
sub.indx <- c(1:10, 21:25, 30:62, 70:71)
# 5 partitions (exhaustive) from only the specified pixel subset
sample_partitions(npix = nrow(dat.M), npart = 5, pixels = sub.indx)
Conduct a t-test of "partGLS" object
Description
Conduct a correlated t-test of a partitioned GLS
Usage
## S3 method for class 'partGLS'
t.test(x, ...)
Arguments
x |
"partGLS" object |
... |
additional arguments passed to print |
Value
a list whose first element is a coefficient table with estimates, standard errors, t-statistics, and p-values and whose second element is a matrix of correlations among coefficients.
Test passing a covariance function and arguments
Description
Test passing a covariance function and arguments
Usage
test_covar_fun(d, covar_FUN = "covar_exppow", covar.pars = list(range = 0.5))
Arguments
d |
numeric vector or matrix of distances |
covar_FUN |
distance-based covariance function to use,
which must take |
covar.pars |
vector or list of parameters (other than d) passed to the covar function |