Type: Package
Title: Spatial Analysis on Network
Version: 0.4.4.6
Description: Perform spatial analysis on network. Implement several methods for spatial analysis on network: Network Kernel Density estimation, building of spatial matrices based on network distance ('listw' objects from 'spdep' package), K functions estimation for point pattern analysis on network, k nearest neighbours on network, reachable area calculation, and graph generation References: Okabe et al (2019) <doi:10.1080/13658810802475491>; Okabe et al (2012, ISBN:978-0470770818);Baddeley et al (2015, ISBN:9781482210200).
License: GPL-2
Encoding: UTF-8
LazyData: true
Imports: spdep (≥ 1.1.2), igraph (≥ 1.2.6), cubature (≥ 2.0.4.1), future.apply (≥ 1.4.0), methods (≥ 1.7.1), ggplot2 (≥ 3.3.0), progressr (≥ 0.4.0), data.table (≥ 1.12.8), Rcpp (≥ 1.0.4.6), Rdpack (≥ 2.1.1), dbscan (≥ 1.1-8), sf (≥ 1.0-3), abind (≥ 1.4-5), sfheaders (≥ 0.4.4), cppRouting (≥ 3.1)
Depends: R (≥ 3.6)
Suggests: future (≥ 1.16.0), testthat (≥ 3.0.0), kableExtra (≥ 1.1.0), RColorBrewer (≥ 1.1-2), classInt (≥ 0.4-3), reshape2 (≥ 1.4.3), rlang (≥ 0.4.6), rgl (≥ 0.107.14), tmap (≥ 3.3-1), smoothr (≥ 0.2.2), concaveman (≥ 1.1.0), covr (≥ 3.5.1), knitr, rmarkdown
RoxygenNote: 7.3.2
VignetteBuilder: knitr
URL: https://jeremygelb.github.io/spNetwork/
BugReports: https://github.com/JeremyGelb/spNetwork/issues
LinkingTo: Rcpp, RcppProgress, RcppArmadillo, BH
RdMacros: Rdpack
Language: en-CA
SystemRequirements: C++17
NeedsCompilation: yes
Packaged: 2025-03-29 15:40:59 UTC; Gelb
Author: Jeremy Gelb ORCID iD [aut, cre], Philippe Apparicio ORCID iD [ctb]
Maintainer: Jeremy Gelb <jeremy.gelb@ucs.inrs.ca>
Repository: CRAN
Date/Publication: 2025-03-29 16:00:02 UTC

spNetwork: Spatial Analysis on Network

Description

Perform spatial analysis on network. Implement several methods for spatial analysis on network: Network Kernel Density estimation, building of spatial matrices based on network distance ('listw' objects from 'spdep' package), K functions estimation for point pattern analysis on network, k nearest neighbours on network, reachable area calculation, and graph generation References: Okabe et al (2019) doi:10.1080/13658810802475491; Okabe et al (2012, ISBN:978-0470770818);Baddeley et al (2015, ISBN:9781482210200).

Perform spatial analysis on network. Implement several methods for spatial analysis on network: Network Kernel Density estimation, building of spatial matrices based on network distance ('listw' objects from 'spdep' package), K functions estimation for point pattern analysis on network, k nearest neighbours on network, reachable area calculation, and graph generation References: Okabe et al (2019) doi:10.1080/13658810802475491; Okabe et al (2012, ISBN:978-0470770818);Baddeley et al (2015, ISBN:9781482210200).

Author(s)

Maintainer: Jeremy Gelb jeremy.gelb@ucs.inrs.ca (ORCID)

Other contributors:

See Also

Useful links:

Useful links:


Adaptive bandwidth

Description

Function to calculate Adaptive bandwidths according to Abramson’s smoothing regimen.

Usage

adaptive_bw(
  grid,
  events,
  lines,
  bw,
  trim_bw,
  method,
  kernel_name,
  max_depth,
  tol,
  digits,
  sparse,
  verbose
)

Arguments

grid

A spatial grid to split the data within

events

A feature collection of points representing the events points

lines

A feature collection of linestrings representing the network

bw

The fixed kernel bandwidth (can also be a vector, the value returned will be a matrix in that case)

trim_bw

The maximum size of local bandwidths (can also be a vector, must match bw)

method

The method to use when calculating the NKDE

kernel_name

The name of the kernel to use

max_depth

The maximum recursion depth

tol

A float indicating the spatial tolerance when snapping events on lines

digits

The number of digits to keep

sparse

A Boolean indicating if sparse matrix should be used

verbose

A Boolean indicating if update messages should be printed

Value

A vector with the local bandwidths

Examples

#This is an internal function, no example provided

Adaptive bandwidth (multicore)

Description

Function to calculate Adaptive bandwidths according to Abramson’s smoothing regimen with multicore support

Usage

adaptive_bw.mc(
  grid,
  events,
  lines,
  bw,
  trim_bw,
  method,
  kernel_name,
  max_depth,
  tol,
  digits,
  sparse,
  verbose
)

Arguments

grid

A spatial grid to split the data within

events

A feature collection of points representing the events

lines

A feature collection of linestrings representing the network

bw

The fixed kernel bandwidth (can also be a vector, the value returned will be a matrix in that case)

trim_bw

The maximum size of local bandwidths (can also be a vector, must match bw)

method

The method to use when calculating the NKDE

kernel_name

The name of the kernel to use

max_depth

The maximum recursion depth

tol

A float indicating the spatial tolerance when snapping events on lines

digits

The number of digits to keep

sparse

A Boolean indicating if sparse matrix should be used

verbose

A Boolean indicating if update messages should be printed

Value

A vector with the local bandwidths

Examples

#This is an internal function, no example provided

Adaptive bw in one dimension

Description

Calculate adaptive bandwidths in one dimension

Usage

adaptive_bw_1d(events, w, bw, kernel_name)

Arguments

events

A numeric vector representing the moments of occurrence of events

w

The weight of the events

bw

A float, the bandiwdth to use

kernel_name

The name of the kernel to use


Adaptive bandwidth for TNDE

Description

Function to calculate Adaptive bandwidths according to Abramson’s smoothing regimen for TNKDE with a space-time interaction.

Usage

adaptive_bw_tnkde(
  grid,
  events_loc,
  events,
  lines,
  bw_net,
  bw_time,
  trim_bw_net,
  trim_bw_time,
  method,
  kernel_name,
  max_depth,
  div,
  tol,
  digits,
  sparse,
  verbose
)

Arguments

grid

A spatial grid to split the data within

events

A feature collection of points representing the events points

lines

A feature collection of linestrings representing the network

bw_net

The fixed kernel bandwidth for the network dimension. Can also be a vector if several bandwidth must be used.

bw_time

The fixed kernel bandwidth for the time dimension. Can also be a vector if several bandwidth must be used.

trim_bw_net

The maximum size of local bandwidths for network dimension. Must be a vector if bw_net is a vector

trim_bw_time

The maximum size of local bandwidths for time dimension. Must be a vector if bw_net is a vector

method

The method to use when calculating the NKDE

kernel_name

The name of the kernel to use

max_depth

The maximum recursion depth

div

The divisor to use for kernels

tol

A float indicating the spatial tolerance when snapping events on lines

digits

The number of digits to keep

sparse

A Boolean indicating if sparse matrix should be used

verbose

A Boolean indicating if update messages should be printed

Value

A vector with the local bandwidths, or an array if bw_time and bw_net are vectors. In that case, the array has the following dimensions : length(bw_net) X length(bw_time) X nrow(events)

Examples

#This is an internal function, no example provided

Adaptive bandwidth for TNDE (multicore)

Description

Function to calculate Adaptive bandwidths according to Abramson’s smoothing regimen for TNKDE with a space-time interaction with multicore support.

Usage

adaptive_bw_tnkde.mc(
  grid,
  events_loc,
  events,
  lines,
  bw_net,
  bw_time,
  trim_bw_net,
  trim_bw_time,
  method,
  kernel_name,
  max_depth,
  div,
  tol,
  digits,
  sparse,
  verbose
)

Arguments

grid

A spatial grid to split the data within

events

A feature collection of points representing the events points

lines

A feature collection of linestrings representing the network

bw_net

The fixed kernel bandwidth for the network dimension

bw_time

The fixed kernel bandwidth for the time dimension

trim_bw_net

The maximum size of local bandiwidths for network dimension

trim_bw_time

The maximum size of local bandiwidths for time dimension

method

The method to use when calculating the NKDE

kernel_name

The name of the kernel to use

max_depth

The maximum recursion depth

div

The divisor to use for kernels

tol

A float indicating the spatial tolerance when snapping events on lines

digits

The number of digits to keep

sparse

A Boolean indicating if sparse matrix should be used

verbose

A Boolean indicating if update messages should be printed

Value

A vector with the local bandwidths

Examples

#This is an internal function, no example provided

The exposed function to calculate adaptive bandwidth with space-time interaction for TNKDE (INTERNAL)

Description

The exposed function to calculate adaptive bandwidth with space-time interaction for TNKDE (INTERNAL)

Usage

adaptive_bw_tnkde_cpp(
  method,
  neighbour_list,
  sel_events,
  sel_events_wid,
  sel_events_time,
  events,
  events_wid,
  events_time,
  weights,
  bws_net,
  bws_time,
  kernel_name,
  line_list,
  max_depth,
  min_tol
)

Arguments

method

a string, one of "simple", "continuous", "discontinuous"

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

sel_events

a Numeric vector indicating the selected events (id of nodes)

sel_events_wid

a Numeric Vector indicating the unique if of the selected events

sel_events_time

a Numeric Vector indicating the time of the selected events

events

a NumericVector indicating the nodes in the graph being events

events_wid

a NumericVector indicating the unique id of all the events

events_time

a NumericVector indicating the timestamp of each event

weights

a cube with the weights associated with each event for each bws_net and bws_time.

bws_net

an arma::vec with the network bandwidths to consider

bws_time

an arma::vec with the time bandwidths to consider

kernel_name

a string with the name of the kernel to use

line_list

a DataFrame describing the lines

max_depth

the maximum recursion depth

min_tol

a double indicating by how much 0 in density values must be replaced

Value

a vector witht the estimated density at each event location

Examples

# no example provided, this is an internal function

The exposed function to calculate adaptive bandwidth with space-time interaction for TNKDE (INTERNAL)

Description

The exposed function to calculate adaptive bandwidth with space-time interaction for TNKDE (INTERNAL)

Usage

adaptive_bw_tnkde_cpp2(
  method,
  neighbour_list,
  sel_events,
  sel_events_wid,
  sel_events_time,
  events,
  events_wid,
  events_time,
  weights,
  bws_net,
  bws_time,
  kernel_name,
  line_list,
  max_depth,
  min_tol
)

Arguments

method

a string, one of "simple", "continuous", "discontinuous"

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

sel_events

a Numeric vector indicating the selected events (id of nodes)

sel_events_wid

a Numeric Vector indicating the unique if of the selected events

sel_events_time

a Numeric Vector indicating the time of the selected events

events

a NumericVector indicating the nodes in the graph being events

events_wid

a NumericVector indicating the unique id of all the events

events_time

a NumericVector indicating the timestamp of each event

weights

a cube with the weights associated with each event for each bws_net and bws_time.

bws_net

an arma::vec with the network bandwidths to consider

bws_time

an arma::vec with the time bandwidths to consider

kernel_name

a string with the name of the kernel to use

line_list

a DataFrame describing the lines

max_depth

the maximum recursion depth

min_tol

a double indicating by how much 0 in density values must be replaced

Value

a vector with the estimated density at each event location

Examples

# no example provided, this is an internal function

Add center vertex to lines

Description

Add to each feature of a feature collection of lines an additional vertex at its center.

Usage

add_center_lines(lines)

Arguments

lines

The feature collection of linestrings to use

Value

A feature collection of points

Examples

#This is an internal function, no example provided

Add vertices to a feature collection of linestrings

Description

Add vertices (feature collection of points) to their nearest lines (feature collection of linestrings), may fail if the line geometries are self intersecting.

Usage

add_vertices_lines(lines, points, nearest_lines_idx, mindist)

Arguments

lines

The feature collection of linestrings to modify

points

The feature collection of points to add to as vertex to the lines

nearest_lines_idx

For each point, the index of the nearest line

mindist

The minimum distance between one point and the extremity of the line to add the point as a vertex.

Value

A feature collection of linestrings

Examples

#This is an internal function, no example provided

Events aggregation

Description

Function to aggregate points within a radius.

Usage

aggregate_points(points, maxdist, weight = "weight", return_ids = FALSE)

Arguments

points

The feature collection of points to contract (must have a weight column)

maxdist

The distance to use

weight

The name of the column to use as weight (default is "weight"). The values of the aggregated points for this column will be summed. For all the other columns, only the max value is retained.

return_ids

A boolean (default is FALSE), if TRUE, then an index indicating for each point the group it belongs to is returned. If FALSE, then a spatial point features is returned with the points already aggregated.

Details

This function can be used to aggregate points within a radius. This is done by using the dbscan algorithm. This process is repeated until no more modification is applied.

Value

A new feature collection of points

Examples

data(bike_accidents)
bike_accidents$weight <- 1
agg_points <- aggregate_points(bike_accidents, 5)

Road accidents including a bicyle in Montreal in 2016

Description

A feature collection (sf object) representing road accidents including a cyclist in Montreal in 2016. The EPSG is 3797, and the data comes from the Montreal OpenData website. It is only a small subset in central districts used to demonstrate the main functions of spNetwork.

Usage

bike_accidents

Format

A sf object with 347 rows and 4 variables

NB_VICTIME

the number of victims

AN

the year of the accident

Date

the date of the accident (yyyy/mm/dd)

geom

the geometry (points)

Source

https://donnees.montreal.ca/dataset/collisions-routieres


Network generation with igraph

Description

Generate an igraph object from a feature collection of linestrings

Usage

build_graph(lines, digits, line_weight, attrs = FALSE)

Arguments

lines

A feature collection of lines

digits

The number of digits to keep from the coordinates

line_weight

The name of the column giving the weight of the lines

attrs

A boolean indicating if the original lines' attributes should be stored in the final object

Details

This function can be used to generate an undirected graph object (igraph object). It uses the coordinates of the linestrings extremities to create the nodes of the graph. This is why the number of digits in the coordinates is important. Too high precision (high number of digits) might break some connections.

Value

A list containing the following elements:

Examples

data(mtl_network)
mtl_network$length <- as.numeric(sf::st_length(mtl_network))
graph_result <- build_graph(mtl_network, 2, "length", attrs = TRUE)

Network generation with cppRouting

Description

Generate an cppRouting object from a feature collection of linestrings

Usage

build_graph_cppr(lines, digits, line_weight, attrs = FALSE, direction = NULL)

Arguments

lines

A feature collection of lines

digits

The number of digits to keep from the coordinates

line_weight

The name of the column giving the weight of the lines

attrs

A boolean indicating if the original lines' attributes should be stored in the final object

Details

This function can be used to generate an undirected graph object (cppRouting object). It uses the coordinates of the linestrings extremities to create the nodes of the graph. This is why the number of digits in the coordinates is important. Too high precision (high number of digits) might break some connections.

Value

A list containing the following elements:

Examples


data(mtl_network)
mtl_network$length <- as.numeric(sf::st_length(mtl_network))
graph_result <- build_graph_cppr(mtl_network, 2, "length", attrs = TRUE)


Directed network generation

Description

Generate a directed igraph object from a feature collection of linestrings

Usage

build_graph_directed(lines, digits, line_weight, direction, attrs = FALSE)

Arguments

lines

A feature collection of linestrings

digits

The number of digits to keep from the coordinates

line_weight

The name of the column giving the weight of the lines

direction

A column name indicating authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both"

attrs

A boolean indicating if the original lines' attributes should be stored in the final object

Details

This function can be used to generate a directed graph object (igraph object). It uses the coordinates of the linestrings extremities to create the nodes of the graph. This is why the number of digits in the coordinates is important. Too high precision (high number of digits) might break some connections. The column used to indicate directions can only have the following values: "FT" (From-To), "TF" (To-From) and "Both".

Value

A list containing the following elements:

Examples


data(mtl_network)
mtl_network$length <- as.numeric(sf::st_length(mtl_network))
mtl_network$direction <- "Both"
mtl_network[6, "direction"] <- "TF"
mtl_network_directed <- lines_direction(mtl_network, "direction")
graph_result <- build_graph_directed(lines = mtl_network_directed,
        digits = 2,
        line_weight = "length",
        direction = "direction",
        attrs = TRUE)


Spatial grid

Description

Generate a grid of a specified shape in the bbox of a Spatial object.

Usage

build_grid(grid_shape, spatial)

Arguments

grid_shape

A numeric vector of length 2 indicating the number of rows and the numbers of columns of the grid

spatial

A list of spatial feature collections objects (package sf)

Value

A feature collection of polygons representing the grid

Examples

#This is an internal function, no example provided

Check function for parameters in bandwidth selection methods

Description

A check function for bandwidth selection methods raising an error if a parameter is not valid

Usage

bw_checks(
  check,
  lines,
  samples,
  events,
  kernel_name,
  method,
  bws_net = NULL,
  bws_time = NULL,
  arr_bws_net = NULL,
  arr_bws_time = NULL,
  adaptive = FALSE,
  trim_net_bws = NULL,
  trim_time_bws = NULL,
  diggle_correction = FALSE,
  study_area = NULL
)

Arguments

check

A boolean indicating if the geometries must be checked

lines

A feature collection of linestrings representing the underlying network

samples

A feature collection of points representing the sample location

events

a feature collection of points representing the events

kernel_name

The name of the kernel to use

method

The name of the NKDE to use

bws_net

An ordered numeric vector with all the network bandwidths

bws_time

An ordered numeric vector with all the time bandwidths

arr_bws_net

An array with all the local netowrk bandwidths precalculated (for each event, and at each possible combinaison of network and temporal bandwidths). The dimensions must be c(length(net_bws), length(time_bws), nrow(events)))

arr_bws_time

An array with all the local time bandwidths precalculated (for each event, and at each possible combinaison of network and temporal bandwidths). The dimensions must be c(length(net_bws), length(time_bws), nrow(events)))

adaptive

A boolean indicating if local bandwidths must be calculated

trim_net_bws

A numeric vector with the maximum local network bandwidth. If local bandwidths have higher values, they will be replaced by the corresponding value in this vector.

trim_time_bws

A numeric vector with the maximum local time bandwidth. If local bandwidths have higher values, they will be replaced by the corresponding value in this vector.

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

Examples

# no example provided, this is an internal function

Bandwidth selection by likelihood cross validation

Description

Calculate for multiple bandwidth the cross validation likelihood to select an appropriate bandwidth in a data-driven approach

Usage

bw_cv_likelihood_calc(
  bws = NULL,
  lines,
  events,
  w,
  kernel_name,
  method,
  diggle_correction = FALSE,
  study_area = NULL,
  adaptive = FALSE,
  trim_bws = NULL,
  mat_bws = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  sub_sample = 1,
  zero_strat = "min_double",
  verbose = TRUE,
  check = TRUE
)

Arguments

bws

An ordered numeric vector with the bandwidths

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

w

A vector representing the weight of each event

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

adaptive

A boolean indicating if an adaptive bandwidth must be used. If adaptive = TRUE, the local bandwidth are derived from the global bandwidths (bws)

trim_bws

A vector indicating the maximum value an adaptive bandwidth can reach. Higher values will be trimmed. It must have the same length as bws.

mat_bws

A matrix giving the bandwidths for each observation and for each global bandwidth. This is usefull when the user want to use a different method from Abramson's smoothing regimen.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

sub_sample

A float between 0 and 1 indicating the percentage of quadra to keep in the calculus. For large datasets, it may be useful to limit the bandwidth evaluation and thus reduce calculation time.

zero_strat

A string indicating what to do when density is 0 when calculating LOO density estimate for an isolated event. "min_double" (default) replace the 0 value by the minimum double possible on the machine. "remove" will remove them from the final score. The first approach penalizes more strongly the small bandwidths.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

The function calculates the likelihood cross validation score for several bandwidths in order to find the most appropriate one. The general idea is to find the bandwidth that would produce the most similar results if one event was removed from the dataset (leave one out cross validation). We use here the shortcut formula as described by the package spatstat (Baddeley et al. 2021).

LCV(h) = \sum_i \log\hat\lambda_{-i}(x_i)

Where the sum is taken for all events x_i and where \hat\lambda_{-i}(x_i) is the leave-one-out kernel estimate at x_i for a bandwidth h. A higher value indicates a better bandwidth.

Value

A dataframe with two columns, one for the bandwidths and the second for the cross validation score (the lower the better).

References

Baddeley A, Turner R, Rubak E (2021). spatstat: Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests. R package version 2.1-0, https://CRAN.R-project.org/package=spatstat.

Examples


data(mtl_network)
data(bike_accidents)
cv_scores <- bw_cv_likelihood_calc(seq(200,800,50),
                               mtl_network, bike_accidents,
                               rep(1,nrow(bike_accidents)),
                               "quartic", "simple",
                               diggle_correction = FALSE, study_area = NULL,
                               max_depth = 8,
                               digits=2, tol=0.1, agg=5,
                               sparse=TRUE, grid_shape=c(1,1),
                               sub_sample = 1, verbose=TRUE, check=TRUE)


Bandwidth selection by likelihood cross validation (multicore)

Description

Calculate for multiple bandwidth the cross validation likelihood to select an appropriate bandwidth in a data-driven approach

Usage

bw_cv_likelihood_calc.mc(
  bws,
  lines,
  events,
  w,
  kernel_name,
  method,
  diggle_correction = FALSE,
  study_area = NULL,
  adaptive = FALSE,
  trim_bws = NULL,
  mat_bws = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  sub_sample = 1,
  zero_strat = "min_double",
  verbose = TRUE,
  check = TRUE
)

Arguments

bws

An ordered numeric vector with the bandwidths

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

w

A vector representing the weight of each event

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

adaptive

A boolean indicating if an adaptive bandwidth must be used. If adaptive = TRUE, the local bandwidth are derived from the global bandwidths (bws)

trim_bws

A vector indicating the maximum value an adaptive bandwidth can reach. Higher values will be trimmed. It must have the same length as bws.

mat_bws

A matrix giving the bandwidths for each observation and for each global bandwidth. This is usefull when the user want to use a different method from Abramson's smoothing regimen.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

sub_sample

A float between 0 and 1 indicating the percentage of quadra to keep in the calculus. For large datasets, it may be useful to limit the bandwidth evaluation and thus reduce calculation time.

zero_strat

A string indicating what to do when density is 0 when calculating LOO density estimate for an isolated event. "min_double" (default) replace the 0 value by the minimum double possible on the machine. "remove" will remove them from the final score. The first approach penalizes more strongly the small bandwidths.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

See the function bw_cv_likelihood_calc for more details. The calculation is split according to the parameter grid_shape. If grid_shape = c(1,1), then parallel processing cannot be used.

Value

A dataframe with two columns, one for the bandwidths and the second for the cross validation score (the lower the better).

Examples


data(mtl_network)
data(bike_accidents)
future::plan(future::multisession(workers=1))
cv_scores <- bw_cv_likelihood_calc.mc(seq(200,800,50),
                               mtl_network, bike_accidents,
                               rep(1,nrow(bike_accidents)),
                               "quartic", "simple",
                               diggle_correction = FALSE, study_area = NULL,
                               max_depth = 8,
                               digits=2, tol=0.1, agg=5,
                               sparse=TRUE, grid_shape=c(1,1),
                               sub_sample = 1, verbose=TRUE, check=TRUE)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


Bandwidth selection for Temporal Kernel density estimate by likelihood cross validation

Description

Calculate the likelihood cross validation score for several bandwidths for the Temporal Kernel density

Usage

bw_cv_likelihood_calc_tkde(events, w, bws, kernel_name)

Arguments

events

A numeric vector representing the moments of occurrence of events

w

The weight of the events

bws

A numeric vector, the bandwidths to use

kernel_name

The name of the kernel to use

Value

A vector with the cross validation scores (the higher the better).

Examples

data(bike_accidents)
bike_accidents$Date <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
start <- min(bike_accidents$Date)
diff <- as.integer(difftime(bike_accidents$Date , start, units = "days"))
w <- rep(1,length(diff))
scores <- bw_cv_likelihood_calc_tkde(diff, w, seq(10,60,10), "quartic")

Bandwidth selection by Cronie and Van Lieshout's Criterion

Description

Calculate for multiple bandwidth the Cronie and Van Lieshout's Criterion to select an appropriate bandwidth in a data-driven approach.

Usage

bw_cvl_calc(
  bws = NULL,
  lines,
  events,
  w,
  kernel_name,
  method,
  diggle_correction = FALSE,
  study_area = NULL,
  adaptive = FALSE,
  trim_bws = NULL,
  mat_bws = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  zero_strat = "min_double",
  grid_shape = c(1, 1),
  sub_sample = 1,
  verbose = TRUE,
  check = TRUE
)

Arguments

bws

An ordered numeric vector with the bandwidths

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

w

A vector representing the weight of each event

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

adaptive

A boolean indicating if an adaptive bandwidth must be used. If adaptive = TRUE, the local bandwidth are derived from the global bandwidths (bws)

trim_bws

A vector indicating the maximum value an adaptive bandwidth can reach. Higher values will be trimmed. It must have the same length as bws.

mat_bws

A matrix giving the bandwidths for each observation and for each global bandwidth. This is usefull when the user want to use a different method from Abramson's smoothing regimen.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

zero_strat

A string indicating what to do when density is 0 when calculating LOO density estimate for an isolated event. "min_double" (default) replace the 0 value by the minimum double possible on the machine. "remove" will remove them from the final score. The first approach penalizes more strongly the small bandwidths.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

sub_sample

A float between 0 and 1 indicating the percentage of quadra to keep in the calculus. For large datasets, it may be useful to limit the bandwidth evaluation and thus reduce calculation time.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

The Cronie and Van Lieshout's Criterion (Cronie and Van Lieshout 2018) find the optimal bandwidth by minimizing the difference between the size of the observation window and the sum of the reciprocal of the estimated kernel density at the events locations. In the network case, the size of the study area is the sum of the length of each line in the network. Thus, it is important to only use the necessary parts of the network.

Value

A dataframe with two columns, one for the bandwidths and the second for the Cronie and Van Lieshout's Criterion.

References

Cronie O, Van Lieshout MNM (2018). “A non-model-based approach to bandwidth selection for kernel estimators of spatial intensity functions.” Biometrika, 105(2), 455–462.

Examples


data(mtl_network)
data(bike_accidents)
cv_scores <- bw_cvl_calc(seq(200,400,50),
                               mtl_network, bike_accidents,
                               rep(1,nrow(bike_accidents)),
                               "quartic", "discontinuous",
                               diggle_correction = FALSE, study_area = NULL,
                               max_depth = 8,
                               digits=2, tol=0.1, agg=5,
                               sparse=TRUE, grid_shape=c(1,1),
                               sub_sample = 1, verbose=TRUE, check=TRUE)


Bandwidth selection by Cronie and Van Lieshout's Criterion (multicore version)

Description

Calculate for multiple bandwidths the Cronie and Van Lieshout's Criterion to select an appropriate bandwidth in a data-driven approach. A plan from the package future can be used to split the work across several cores. The different cells generated in accordance with the argument grid_shape are used for the parallelization. So if only one cell is generated (grid_shape = c(1,1)), the function will use only one core. The progress bar displays the progression for the cells.

Usage

bw_cvl_calc.mc(
  bws = NULL,
  lines,
  events,
  w,
  kernel_name,
  method,
  diggle_correction = FALSE,
  study_area = NULL,
  adaptive = FALSE,
  trim_bws = NULL,
  mat_bws = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  zero_strat = "min_double",
  grid_shape = c(1, 1),
  sub_sample = 1,
  verbose = TRUE,
  check = TRUE
)

Arguments

bws

An ordered numeric vector with the bandwidths

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

w

A vector representing the weight of each event

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

adaptive

A boolean indicating if an adaptive bandwidth must be used. If adaptive = TRUE, the local bandwidth are derived from the global bandwidths calculated from bw_range and bw_step.

trim_bws

A vector indicating the maximum value an adaptive bandwidth can reach. Higher values will be trimmed. It must have the same length as bws.

mat_bws

A matrix giving the bandwidths for each observation and for each global bandwidth. This is usefull when the user want to use a different method from Abramson's smoothing regimen.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

zero_strat

A string indicating what to do when density is 0 when calculating LOO density estimate for an isolated event. "min_double" (default) replace the 0 value by the minimum double possible on the machine. "remove" will remove them from the final score. The first approach penalizes more strongly the small bandwidths.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

sub_sample

A float between 0 and 1 indicating the percentage of quadra to keep in the calculus. For large datasets, it may be useful to limit the bandwidth evaluation and thus reduce calculation time.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

For more details, see help(bw_cvl_calc)

Value

A dataframe with two columns, one for the bandwidths and the second for the Cronie and Van Lieshout's Criterion.

Examples


data(mtl_network)
data(bike_accidents)
future::plan(future::multisession(workers=1))
cv_scores <- bw_cvl_calc.mc(seq(200,400,50),
                               mtl_network, bike_accidents,
                               rep(1,nrow(bike_accidents)),
                               "quartic", "discontinuous",
                               diggle_correction = FALSE, study_area = NULL,
                               max_depth = 8,
                               digits=2, tol=0.1, agg=5,
                               sparse=TRUE, grid_shape=c(1,1),
                               sub_sample = 1, verbose=TRUE, check=TRUE)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


Time and Network bandwidth correction calculation

Description

Calculating the border correction factor for both time and network bandwidths

Usage

bw_tnkde_corr_factor(
  net_bws,
  time_bws,
  diggle_correction,
  study_area,
  events,
  events_loc,
  lines,
  method,
  kernel_name,
  tol,
  digits,
  max_depth,
  sparse
)

Arguments

net_bws

A vector of network bandwidths

time_bws

A vector of time bandwidths

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

events

A feature collection of points representing the events

events_loc

A feature collection of points representing the unique location of events

lines

A feature collection of linestrings representing the underlying lines of the network

method

The name of the NKDE to use

kernel_name

The name of the kernel to use

tol

float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

digits

An integer, the number of digits to keep for the spatial coordinates

max_depth

The maximal depth for continuous or discontinuous NKDE

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

Value

A list of two elements, first the network correction factors, then the time correction factors.

Examples

# no example provided, this is an internal function

Time and Network bandwidth correction calculation for arrays

Description

Calculating the border correction factor for both time and network bandwidths when we have to deal with adaptive bandwidths and arrays

Usage

bw_tnkde_corr_factor_arr(
  net_bws,
  time_bws,
  diggle_correction,
  study_area,
  events,
  events_loc,
  lines,
  method,
  kernel_name,
  tol,
  digits,
  max_depth,
  sparse,
  time_limits = NULL
)

Arguments

net_bws

An array of network bandwidths

time_bws

An array of time bandwidths

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

events

A feature collection of points representing the events

events_loc

A feature collection of points representing the unique location of events

lines

A feature collection of linestrings representing the underlying lines of the network

method

The name of the NKDE to use

kernel_name

The name of the kernel to use

tol

float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

digits

An integer, the number of digits to keep for the spatial coordinates

max_depth

The maximal depth for continuous or discontinuous NKDE

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

time_limits

A vector with the upper and lower limit of the time period studied

Examples

# no example provided, this is an internal function

Bandwidth selection by likelihood cross validation for temporal NKDE

Description

Calculate for multiple network and time bandwidths the cross validation likelihood to select an appropriate bandwidth in a data-driven approach

Usage

bw_tnkde_cv_likelihood_calc(
  bws_net = NULL,
  bws_time = NULL,
  lines,
  events,
  time_field,
  w,
  kernel_name,
  method,
  arr_bws_net = NULL,
  arr_bws_time = NULL,
  diggle_correction = FALSE,
  study_area = NULL,
  adaptive = FALSE,
  trim_net_bws = NULL,
  trim_time_bws = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  zero_strat = "min_double",
  grid_shape = c(1, 1),
  sub_sample = 1,
  verbose = TRUE,
  check = TRUE
)

Arguments

bws_net

An ordered numeric vector with all the network bandwidths

bws_time

An ordered numeric vector with all the time bandwidths

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

time_field

The name of the field in events indicating when the events occurred. It must be a numeric field

w

A vector representing the weight of each event

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

arr_bws_net

An array with all the local netowrk bandwidths precalculated (for each event, and at each possible combinaison of network and temporal bandwidths). The dimensions must be c(length(net_bws), length(time_bws), nrow(events)))

arr_bws_time

An array with all the local time bandwidths precalculated (for each event, and at each possible combinaison of network and temporal bandwidths). The dimensions must be c(length(net_bws), length(time_bws), nrow(events)))

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

adaptive

A boolean indicating if local bandwidths must be calculated

trim_net_bws

A numeric vector with the maximum local network bandwidth. If local bandwidths have higher values, they will be replaced by the corresponding value in this vector.

trim_time_bws

A numeric vector with the maximum local time bandwidth. If local bandwidths have higher values, they will be replaced by the corresponding value in this vector.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

zero_strat

A string indicating what to do when density is 0 when calculating LOO density estimate for an isolated event. "min_double" (default) replace the 0 value by the minimum double possible on the machine. "remove" will remove them from the final score. The first approach penalizes more strongly the small bandwidths.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

sub_sample

A float between 0 and 1 indicating the percentage of quadra to keep in the calculus. For large datasets, it may be useful to limit the bandwidth evaluation and thus reduce calculation time.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

The function calculates the likelihood cross validation score for several time and network bandwidths in order to find the most appropriate one. The general idea is to find the pair of bandwidths that would produce the most similar results if one event is removed from the dataset (leave one out cross validation). We use here the shortcut formula as described by the package spatstat (Baddeley et al. 2021).

LCV(h) = \sum_i \log\hat\lambda_{-i}(x_i)

Where the sum is taken for all events x_i and where \hat\lambda_{-i}(x_i) is the leave-one-out kernel estimate at x_i for a bandwidth h. A higher value indicates a better bandwidth.

Value

A matrix with the cross validation score. Each row corresponds to a network bandwidth and each column to a time bandwidth (the higher the better).

References

Baddeley A, Turner R, Rubak E (2021). spatstat: Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests. R package version 2.1-0, https://CRAN.R-project.org/package=spatstat.

Examples


# loading the data
data(mtl_network)
data(bike_accidents)

# converting the Date field to a numeric field (counting days)
bike_accidents$Time <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
bike_accidents$Time <- difftime(bike_accidents$Time, min(bike_accidents$Time), units = "days")
bike_accidents$Time <- as.numeric(bike_accidents$Time)
bike_accidents <- subset(bike_accidents, bike_accidents$Time>=89)

# calculating the cross validation values
cv_scores <- bw_tnkde_cv_likelihood_calc(
  bws_net = seq(100,800,100),
  bws_time = seq(10,60,5),
  lines = mtl_network,
  events = bike_accidents,
  time_field = "Time",
  w = rep(1, nrow(bike_accidents)),
  kernel_name = "quartic",
  method = "discontinuous",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 10,
  digits = 2,
  tol = 0.1,
  agg = 15,
  sparse=TRUE,
  grid_shape=c(1,1),
  sub_sample=1,
  verbose = FALSE,
  check = TRUE)


Bandwidth selection by likelihood cross validation for temporal NKDE (multicore)

Description

Calculate for multiple network and time bandwidths the cross validation likelihood to select an appropriate bandwidth in a data-driven approach with multicore support

Usage

bw_tnkde_cv_likelihood_calc.mc(
  bws_net = NULL,
  bws_time = NULL,
  lines,
  events,
  time_field,
  w,
  kernel_name,
  method,
  arr_bws_net = NULL,
  arr_bws_time = NULL,
  diggle_correction = FALSE,
  study_area = NULL,
  adaptive = FALSE,
  trim_net_bws = NULL,
  trim_time_bws = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  zero_strat = "min_double",
  grid_shape = c(1, 1),
  sub_sample = 1,
  verbose = TRUE,
  check = TRUE
)

Arguments

bws_net

An ordered numeric vector with all the network bandwidths

bws_time

An ordered numeric vector with all the time bandwidths

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

time_field

The name of the field in events indicating when the events occurred. It must be a numeric field

w

A vector representing the weight of each event

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

arr_bws_net

An array with all the local netowrk bandwidths precalculated (for each event, and at each possible combinaison of network and temporal bandwidths). The dimensions must be c(length(net_bws), length(time_bws), nrow(events)))

arr_bws_time

An array with all the local time bandwidths precalculated (for each event, and at each possible combinaison of network and temporal bandwidths). The dimensions must be c(length(net_bws), length(time_bws), nrow(events)))

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

adaptive

A boolean indicating if local bandwidths must be calculated

trim_net_bws

A numeric vector with the maximum local network bandwidth. If local bandwidths have higher values, they will be replaced by the corresponding value in this vector.

trim_time_bws

A numeric vector with the maximum local time bandwidth. If local bandwidths have higher values, they will be replaced by the corresponding value in this vector.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

zero_strat

A string indicating what to do when density is 0 when calculating LOO density estimate for an isolated event. "min_double" (default) replace the 0 value by the minimum double possible on the machine. "remove" will remove them from the final score. The first approach penalizes more strongly the small bandwidths.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

sub_sample

A float between 0 and 1 indicating the percentage of quadra to keep in the calculus. For large datasets, it may be useful to limit the bandwidth evaluation and thus reduce calculation time.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

See the function bws_tnkde_cv_likelihood_calc for more details. Note that the calculation is split according to the grid_shape argument. If the grid_shape is c(1,1) then only one process can be used.

Value

A matrix with the cross validation score. Each row corresponds to a network bandwidth and each column to a time bandwidth (the higher the better).

Examples


# loading the data
data(mtl_network)
data(bike_accidents)

# converting the Date field to a numeric field (counting days)
bike_accidents$Time <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
bike_accidents$Time <- difftime(bike_accidents$Time, min(bike_accidents$Time), units = "days")
bike_accidents$Time <- as.numeric(bike_accidents$Time)
bike_accidents <- subset(bike_accidents, bike_accidents$Time>=89)

future::plan(future::multisession(workers=1))

# calculating the cross validation values
cv_scores <- bw_tnkde_cv_likelihood_calc.mc(
  bws_net = seq(100,800,100),
  bws_time = seq(10,60,5),
  lines = mtl_network,
  events = bike_accidents,
  time_field = "Time",
  w = rep(1, nrow(bike_accidents)),
  kernel_name = "quartic",
  method = "discontinuous",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 10,
  digits = 2,
  tol = 0.1,
  agg = 15,
  sparse=TRUE,
  grid_shape=c(1,1),
  sub_sample=1,
  verbose = FALSE,
  check = TRUE)

## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


euclidean distance between rows of a matrix and a vector (arma mode)

Description

euclidean distance between rows of a matrix and a vector (arma mode)

Usage

calcEuclideanDistance3(y, x)

Arguments

y

a matrix

x

a vector (same length as ncol(matrix))

Value

a vector (same length as nrow(matrix))


Gamma parameter for Abramson’s adaptive bandwidth

Description

Function to calculate the gamma parameter in Abramson’s smoothing regimen.

Usage

calc_gamma(k)

Arguments

k

a vector of numeric values (the estimated kernel densities)

Value

the gamma parameter in Abramson’s smoothing regimen

Examples

#This is an internal function, no example provided

Isochrones calculation

Description

Calculate isochrones on a network

Usage

calc_isochrones(
  lines,
  dists,
  start_points,
  donught = FALSE,
  mindist = 1,
  weight = NULL,
  direction = NULL
)

Arguments

lines

A feature collection of lines representing the edges of the network

dists

A vector of the size of the desired isochrones. Can also be a list of vector when each start point must have its own distances. If so, the length of the list must be equal to the number of rows in start_points.

start_points

A feature collection of points representing the starting points if the isochrones

donught

A boolean indicating if the returned lines must overlap for each distance (FALSE, default) or if the lines must be cut between each distance step (TRUE).

mindist

The minimum distance between two points. When two points are too close, they might end up snapped at the same location on a line. Default is 1.

weight

The name of the column in lines to use an edge weight. If NULL, the geographical length is used. Note that if lines are split during the network creation, the weight column is recalculated proportionally to the new lines length.

direction

The name of the column indicating authorized travelling direction on lines. if NULL, then all lines can be used in both directions (undirected). The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

Details

An isochrone is the set of reachable lines around a node in a network within a specified distance (or time). This function perform dynamic segmentation to return the part of the edges reached and not only the fully covered edges. Several start points and several distances can be given. The network can also be directed. The lines returned by the function are the most accurate representation of the isochrones. However, if polygons are required for mapping, the vignette "Calculating isochrones" shows how to create smooth polygons from the returned sets of lines.

Value

A feature collection of lines representing the isochrones with the following columns

Examples

library(sf)
# creating a simple network
wkt_lines <- c(
  "LINESTRING (0.0 0.0, 5.0 0.0)",
  "LINESTRING (0.0 -5.0, 5.0 -5.0)",
  "LINESTRING (5.0 0.0, 5.0 5.0)",
  "LINESTRING (5.0 -5.0, 5.0 -10.0)",
  "LINESTRING (5.0 0.0, 5.0 -5.0)",
  "LINESTRING (5.0 0.0, 10.0 0.0)",
  "LINESTRING (5.0 -5.0, 10.0 -5.0)",
  "LINESTRING (10.0 0, 10.0 -5.0)",
  "LINESTRING (10.0 -10.0, 10.0 -5.0)",
  "LINESTRING (15.0 -5.0, 10.0 -5.0)",
  "LINESTRING (10.0 0.0, 15.0 0.0)",
  "LINESTRING (10.0 0.0, 10.0 5.0)")

linesdf <- data.frame(wkt = wkt_lines,
                      id = paste("l",1:length(wkt_lines),sep=""))

lines <- st_as_sf(linesdf, wkt = "wkt", crs = 32188)

# and the definition of the starting point
start_points <- data.frame(x=c(5),
                           y=c(-2.5))
start_points <- st_as_sf(start_points, coords = c("x","y"), crs = 32188)

# setting the directions

lines$direction <- "Both"
lines[6,"direction"] <- "TF"

isochrones <- calc_isochrones(lines,dists = c(10,12),
                              donught = TRUE,
                              start_points = start_points,
                              direction = "direction")

Geometry sanity check

Description

Function to check if the geometries given by the user are valid.

Usage

check_geometries(lines, samples, events, study_area)

Arguments

lines

A feature collection of lines

samples

A feature collection of points (the samples)

events

A feature collection of points (the events)

study_area

A feature collection of polygons (the study_area)

Value

TRUE if all the checks are passed

Examples

#This is an internal function, no example provided

Clean events geometries

Description

Function to avoid having events at the same location.

Usage

clean_events(events, digits = 5, agg = NULL)

Arguments

events

The feature collection of points to contract (must have a weight column)

digits

The number of digits to keep

agg

A double indicating if the points must be aggregated within a distance. if NULL, then the points are aggregated by rounding the coordinates.

Value

A new feature collection of points

Examples

#This is an internal function, no example provided

Find closest points

Description

Solve the nearest neighbour problem for two feature collections of points This is a simple wrap-up of the dbscan::kNN function

Usage

closest_points(origins, targets)

Arguments

origins

a feature collection of points

targets

a feature collection of points

Value

for each origin point, the index of the nearest target point

Examples

data(mtl_libraries)
data(mtl_theatres)
close_libs <- closest_points(mtl_theatres, mtl_libraries)

The worker function to calculate continuous NKDE (with ARMADILLO and integer matrix)

Description

The worker function to calculate continuous NKDE (with ARMADILLO and integer matrix)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

samples_k

a numeric vector of the actual kernel values, updates at each recursion

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

edge_mat

matrix, to find the id of each edge given two neighbours.

v

the actual node to consider for the recursion (int)

bw

the kernel bandwidth

line_weights

a vector with the length of the edges

samples_edgeid

a vector associating each sample to an edge

samples_x

a vector with x coordinates of each sample

samples_y

a vector with y coordinates of each sample

nodes_x

a vector with x coordinates of each node

nodes_y

a vector with y coordinates of each node

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a vector with the kernel values calculated for each samples from the first node given


The worker function to calculate continuous NKDE (with ARMADILLO and sparse matrix)

Description

The worker function to calculate continuous NKDE (with ARMADILLO and sparse matrix)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

samples_k

a numeric vector of the actual kernel values, updates at each recursion

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

edge_mat

matrix, to find the id of each edge given two neighbours.

v

the actual node to consider for the recursion (int)

bw

the kernel bandwidth

line_weights

a vector with the length of the edges

samples_edgeid

a vector associating each sample to an edge

samples_coords

a matrix with the X and Y coordinates of the samples

nodes_coords

a matrix with the X and Y coordinates of the nodes

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a vector with the kernel values calculated for each samples from the first node given


The main function to calculate continuous NKDE (with ARMADILO and integer matrix)

Description

The main function to calculate continuous NKDE (with ARMADILO and integer matrix)

Usage

continuous_nkde_cpp_arma(
  neighbour_list,
  events,
  weights,
  samples,
  bws,
  kernel_name,
  nodes,
  line_list,
  max_depth,
  verbose,
  div = "bw"
)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

weights

a numeric vector of the weight of each event

samples

a DataFrame of the samples (with spatial coordinates and belonging edge)

bws

the kernel bandwidths for each event

kernel_name

the name of the kernel to use

nodes

a DataFrame representing the nodes of the graph (with spatial coordinates)

line_list

a DataFrame representing the lines of the graph

max_depth

the maximum recursion depth (after which recursion is stopped)

verbose

a boolean indicating if the function must print its progress

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

Value

a DataFrame with two columns : the kernel values (sum_k) and the number of events for each sample (n)


The main function to calculate continuous NKDE (with ARMADILO and sparse matrix)

Description

The main function to calculate continuous NKDE (with ARMADILO and sparse matrix)

Usage

continuous_nkde_cpp_arma_sparse(
  neighbour_list,
  events,
  weights,
  samples,
  bws,
  kernel_name,
  nodes,
  line_list,
  max_depth,
  verbose,
  div = "bw"
)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

weights

a numeric vector of the weight of each event

samples

a DataFrame of the samples (with spatial coordinates and belonging edge)

bws

the kernel bandwidths for each event

kernel_name

the name of the kernel to use

nodes

a DataFrame representing the nodes of the graph (with spatial coordinates)

line_list

a DataFrame representing the lines of the graph

max_depth

the maximum recursion depth (after which recursion is stopped)

verbose

a boolean indicating if the function must print its progress

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

Value

a DataFrame with two columns : the kernel values (sum_k) and the number of events for each sample (n)


Border correction for NKDE

Description

Function to calculate the border correction factor.

Usage

correction_factor(
  study_area,
  events,
  lines,
  method,
  bws,
  kernel_name,
  tol,
  digits,
  max_depth,
  sparse
)

Arguments

study_area

A feature collection of polygons or a polygon, the limit of the study area.

events

A feature collection of points representing the events on the network.

lines

The lines used to create the network

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see details for more information)

bws

The kernel bandwidth (in meters) for each event

kernel_name

The name of the kernel to use

tol

When adding the events and the sampling points to the network, the minimum distance between these points and the lines extremities. When points are closer, they are added at the extremity of the lines.

digits

The number of digits to keep in the spatial coordinates. It ensures that topology is good when building the network. Default is 3

max_depth

When using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has a lot of small edges (area with a lot of intersections and a lot of events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 8 should yield good estimates. A larger value can be used without problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

sparse

A boolean indicating if sparse or regular matrix should be used by the Rcpp functions. Regular matrices are faster, but require more memory and could lead to error, in particular with multiprocessing. Sparse matrices are slower, but require much less memory.

Value

A numeric vector with the correction factor values for each event

Examples

#no example provided, this is an internal function

Time extent correction for NKDE

Description

Function to calculate the time extent correction factor in tnkde.

Usage

correction_factor_time(
  events_time,
  samples_time,
  bws_time,
  kernel_name,
  time_limits = NULL
)

Arguments

events_time

A numeric vector representing when the events occurred

samples_time

A numeric vector representing when the densities will be sampled

bws_time

A numeric vector with the temporal bandwidths

kernel_name

The name of the kernel to use

time_limits

A vector with the upper and lower limit of the time period studied

Value

A numeric vector with the correction factor values for each event

Examples

#no example provided, this is an internal function

A function to calculate the necessary information to apply the Diggle correction factor with a continuous method

Description

A function to calculate the necessary information to apply the Diggle correction factor with a continuous method

Usage

corrfactor_continuous(neighbour_list, events, line_list, bws, max_depth)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

line_list

a DataFrame representing the lines of the graph

bws

the kernel bandwidth for each event

max_depth

the maximum recursion depth (after which recursion is stopped)

Value

a list of dataframes, used to calculate the Diggel correction factor


A function to calculate the necessary information to apply the Diggle correction factor with a continuous method (sparse)

Description

A function to calculate the necessary information to apply the Diggle correction factor with a continuous method (sparse)

Usage

corrfactor_continuous_sparse(neighbour_list, events, line_list, bws, max_depth)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

line_list

a DataFrame representing the lines of the graph

bws

the kernel bandwidth for each event

max_depth

the maximum recursion depth (after which recursion is stopped)

Value

a list of dataframes, used to calculate the Diggel correction factor


A function to calculate the necessary informations to apply the Diggle correction factor with a discontinuous method

Description

A function to calculate the necessary informations to apply the Diggle correction factor with a discontinuous method

Usage

corrfactor_discontinuous(neighbour_list, events, line_list, bws, max_depth)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

line_list

a DataFrame representing the lines of the graph

bws

the kernel bandwidth for each event

max_depth

the maximum recursion depth (after which recursion is stopped)

Value

a list of dataframes, used to calculate the Diggel correction factor


A function to calculate the necessary information to apply the Diggle correction factor with a discontinuous method (sparse)

Description

A function to calculate the necessary information to apply the Diggle correction factor with a discontinuous method (sparse)

Usage

corrfactor_discontinuous_sparse(
  neighbour_list,
  events,
  line_list,
  bws,
  max_depth
)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

line_list

a DataFrame representing the lines of the graph

bws

the kernel bandwidth for each event

max_depth

the maximum recursion depth (after which recursion is stopped)

Value

a list of dataframes, used to calculate the Diggel correction factor


Cosine kernel

Description

Function implementing the cosine kernel.

Usage

cosine_kernel(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ cosine kernel

Description

c++ cosine kernel

Usage

cosine_kernel_cpp(d, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ cosine kernel for one distance

Description

c++ cosine kernel for one distance

Usage

cosine_kernelos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ cross g function

Description

c++ cross g function (INTERNAL)

Usage

cross_gfunc_cpp(dist_mat, start, end, step, width, Lt, na, nb, wa, wb)

Arguments

dist_mat

A matrix with the distances between points

start

A float, the start value for evaluating the g-function

end

A float, the last value for evaluating the g-function

step

A float, the jump between two evaluations of the k-function

width

The width of each donut

Lt

The total length of the network

na

The number of points in set A

nb

The number of points in set B

wa

The weight of the points in set A (coincident points)

wb

The weight of the points in set B (coincident points)


c++ cross k function

Description

c++ cross k function

Usage

cross_kfunc_cpp(dist_mat, start, end, step, Lt, na, nb, wa, wb)

Arguments

dist_mat

A square matrix with the distances between points

start

A float, the start value for evaluating the k-function

end

A float, the last value for evaluating the k-function

step

A float, the jump between two evaluations of the k-function

Lt

The total length of the network

na

The number of points in set A

nb

The number of points in set B

wa

The weight of the points in set A (coincident points)

wb

The weight of the points in set B (coincident points)


Network cross k and g functions (maturing)

Description

Calculate the cross k and g functions for a set of points on a network. (maturing)

Usage

cross_kfunctions(
  lines,
  pointsA,
  pointsB,
  start,
  end,
  step,
  width,
  nsim,
  conf_int = 0.05,
  digits = 2,
  tol = 0.1,
  resolution = NULL,
  agg = NULL,
  verbose = TRUE,
  return_sims = FALSE,
  calc_g_func = TRUE
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring

pointsA

A feature collection of points representing the points to which the distances are calculated.

pointsB

A feature collection of points representing the points from which the distances are calculated.

start

A double, the lowest distance used to evaluate the k and g functions

end

A double, the highest distance used to evaluate the k and g functions

step

A double, the step between two evaluations of the k and g function. start, end and step are used to create a vector of distances with the function seq

width

The width of each donut for the g-function. Half of the width is applied on both sides of the considered distance

nsim

An integer indicating the number of Monte Carlo simulations to perform for inference

conf_int

A double indicating the width confidence interval (default = 0.05) calculated on the Monte Carlo simulations

digits

An integer indicating the number of digits to retain from the spatial coordinates

tol

When adding the points to the network, specify the minimum distance between these points and the lines' extremities. When points are closer, they are added at the extremity of the lines

resolution

When simulating random points on the network, selecting a resolution will reduce greatly the calculation time. When resolution is null the random points can occur everywhere on the graph. If a value is specified, the edges are split according to this value and the random points can only be vertices on the new network

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates

verbose

A Boolean indicating if progress messages should be displayed

return_sims

a boolean indicating if the simulated k and g values must also be returned.

calc_g_func

A Boolean indicating if the G function must also be calculated (TRUE by default). If FALSE, then only the K function is calculated

Details

The cross k-function is a method to characterize the dispersion of a set of points (A) around a second set of points (B). For each point in B, the numbers of other points in A in subsequent radii are calculated. This empirical cross k-function can be more or less clustered than a cross k-function obtained if the points in A were randomly located around points in B. In a network, the network distance is used instead of the Euclidean distance. This function uses Monte Carlo simulations to assess if the points are clustered or dispersed and gives the results as a line plot. If the line of the observed cross k-function is higher than the shaded area representing the values of the simulations, then the points in A are more clustered around points in B than what we can expect from randomness and vice-versa. The function also calculates the cross g-function, a modified version of the cross k-function using rings instead of disks. The width of the ring must be chosen. The main interest is to avoid the cumulative effect of the classical k-function. Note that the cross k-function of points A around B is not necessarily the same as the cross k-function of points B around A. This function is maturing, it works as expected (unit tests) but will probably be modified in the future releases (gain speed, advanced features, etc.).

Value

A list with the following values :

plotk

A ggplot2 object representing the values of the cross k-function

plotg

A ggplot2 object representing the values of the cross g-function

values

A DataFrame with the values used to build the plots

Examples


data(main_network_mtl)
data(mtl_libraries)
data(mtl_theatres)
result <- cross_kfunctions(main_network_mtl, mtl_theatres, mtl_libraries,
                           start = 0, end = 2500, step = 10, width = 250,
                           nsim = 50, conf_int = 0.05, digits = 2,
                           tol = 0.1, agg = NULL, verbose = FALSE)


Network cross k and g functions (maturing, multicore)

Description

Calculate the cross k and g functions for a set of points on a network. For more details, see the document of the function cross_kfunctions.

Usage

cross_kfunctions.mc(
  lines,
  pointsA,
  pointsB,
  start,
  end,
  step,
  width,
  nsim,
  conf_int = 0.05,
  digits = 2,
  tol = 0.1,
  resolution = NULL,
  agg = NULL,
  verbose = TRUE,
  return_sims = FALSE,
  calc_g_func = TRUE,
  grid_shape = c(1, 1)
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring

pointsA

A feature collection of points representing the points to which the distances are calculated.

pointsB

A feature collection of points representing the points from which the distances are calculated.

start

A double, the lowest distance used to evaluate the k and g functions

end

A double, the highest distance used to evaluate the k and g functions

step

A double, the step between two evaluations of the k and g function. start, end and step are used to create a vector of distances with the function seq

width

The width of each donut for the g-function. Half of the width is applied on both sides of the considered distance

nsim

An integer indicating the number of Monte Carlo simulations to perform for inference

conf_int

A double indicating the width confidence interval (default = 0.05) calculated on the Monte Carlo simulations

digits

An integer indicating the number of digits to retain from the spatial coordinates

tol

When adding the points to the network, specify the minimum distance between these points and the lines' extremities. When points are closer, they are added at the extremity of the lines

resolution

When simulating random points on the network, selecting a resolution will reduce greatly the calculation time. When resolution is null the random points can occur everywhere on the graph. If a value is specified, the edges are split according to this value and the random points can only be vertices on the new network

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates

verbose

A Boolean indicating if progress messages should be displayed

return_sims

a boolean indicating if the simulated k and g values must also be returned.

calc_g_func

A Boolean indicating if the G function must also be calculated (TRUE by default). If FALSE, then only the K function is calculated

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

Value

A list with the following values :

plotk

A ggplot2 object representing the values of the cross k-function

plotg

A ggplot2 object representing the values of the cross g-function

values

A DataFrame with the values used to build the plots

Examples


data(main_network_mtl)
data(mtl_libraries)
data(mtl_theatres)
future::plan(future::multisession(workers=1))
result <- cross_kfunctions.mc(main_network_mtl, mtl_theatres, mtl_libraries,
                           start = 0, end = 2500, step = 10, width = 250,
                           nsim = 50, conf_int = 0.05, digits = 2,
                           tol = 0.1, agg = NULL, verbose = FALSE)


Cut lines at a specified distance

Description

Cut lines in a feature collection of linestrings at a specified distance from the begining of the lines.

Usage

cut_lines_at_distance(lines, dists)

Arguments

lines

The feature collection of linestrings to cut

dists

A vector of distances, if only one value is given, each line will be cut at that distance.

Value

A feature collection of linestrings

Examples

# This is an interal function, no example provided

Make a network directed

Description

Function to create complementary lines for a directed network.

Usage

direct_lines(lines, direction)

Arguments

lines

The original feature collection of linestrings

direction

A vector of integers. 0 indicates a bidirectional line and 1 an unidirectional line

Value

A feature collection of linestrings with some lines duplicated according to direction

Examples

#This is an internal function, no example provided

The worker function to calculate discontinuous NKDE (with ARMADILLO and Integer matrix)

Description

The worker function to calculate discontinuous NKDE (with ARMADILLO and Integer matrix)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider for the recursion (int)

bw

the kernel bandiwdth

line_weights

a vector with the length of the edges

samples_edgeid

a vector associating each sample to an edge

samples_x

a vector with x coordinates of each sample

samples_ya

vector with y coordinates of each sample

nodes_x

a vector with x coordinates of each node

nodes_y

a vector with y coordinates of each node

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a vector with the kernel values calculated for each samples from the first node given


The worker function to calculate discontinuous NKDE (with ARMADILLO and sparse matrix)

Description

The worker function to calculate discontinuous NKDE (with ARMADILLO and sparse matrix)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider for the recursion (int)

bw

the kernel bandiwdth

line_weights

a vector with the length of the edges

samples_edgeid

a vector associating each sample to an edge

samples_coords

a matrix with the X and Y coordinates of the samples

nodes_coords

a matrix with the X and Y coordinates of the nodes

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a vector with the kernel values calculated for each samples from the first node given


The main function to calculate discontinuous NKDE (ARMA and sparse matrix)

Description

The main function to calculate discontinuous NKDE (ARMA and sparse matrix)

The main function to calculate discontinuous NKDE (ARMA and Integer matrix)

Usage

discontinuous_nkde_cpp_arma_sparse(
  neighbour_list,
  events,
  weights,
  samples,
  bws,
  kernel_name,
  nodes,
  line_list,
  max_depth,
  verbose,
  div = "bw"
)

discontinuous_nkde_cpp_arma(
  neighbour_list,
  events,
  weights,
  samples,
  bws,
  kernel_name,
  nodes,
  line_list,
  max_depth,
  verbose,
  div = "bw"
)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

weights

a numeric vector of the weight of each event

samples

a DataFrame of the samples (with spatial coordinates and belonging edge)

bws

the kernel bandwidth for each event

kernel_name

the name of the kernel function to use

nodes

a DataFrame representing the nodes of the graph (with spatial coordinates)

line_list

a DataFrame representing the lines of the graph

max_depth

the maximum recursion depth (after which recursion is stopped)

verbose

a boolean indicating if the function must print its progress

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

Value

a DataFrame with two columns : the kernel values (sum_k) and the number of events for each sample (n)

a DataFrame with two columns : the kernel values (sum_k) and the number of events for each sample (n)


Distance matrix with dupicated

Description

Function to Create a distance matrix when some vertices are duplicated.

Usage

dist_mat_dupl(graph, start, end, ...)

Arguments

graph

The Graph to use

start

The vertices to use as starting points

end

The vertices to use as ending points

...

parameters passed to the function igraph::distances

Value

A matrix with the distances between the vertices

Examples

#This is an internal function, no example provided

Epanechnikov kernel

Description

Function implementing the epanechnikov kernel.

Usage

epanechnikov_kernel(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ epanechnikov kernel

Description

c++ epanechnikov kernel

Usage

epanechnikov_kernel_cpp(d, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ epanechnikov kernel for one distance

Description

c++ epanechnikov kernel for one distance

Usage

epanechnikov_kernelos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


The worker function to calculate continuous TNKDE likelihood cv

Description

The worker function to calculate continuous TNKDE likelihood cv (INTERNAL)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

bws_net

an arma::vec with the network bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a cube with the impact of the event v on each other events for each pair of bandwidths (cube(bws_net, bws_time, events))


The worker function to calculate continuous TNKDE likelihood cv

Description

The worker function to calculate continuous TNKDE likelihood cv (INTERNAL)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

time_events

a NumericVector indicating the timestamp of each event

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

v_time

the time of v (double)

bws_net

an arma::vec with the network bandwidths to consider

bws_time

an arma::vec with the time bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a cube with the impact of the event v on each other event for each pair of bandwidths (cube(bws_net, bws_time, events))


The worker function to calculate continuous TNKDE likelihood cv (adaptive case)

Description

The worker function to calculate continuous TNKDE likelihood cv (INTERNAL)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

time_events

a NumericVector indicating the timestamp of each event

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

v_time

the time of v (double)

bws_net

an arma::mat with the network bandwidths to consider

bws_time

an arma::mat with the time bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a cube with the impact of the event v on each other event for each pair of bandwidths (cube(bws_net, bws_time, events))


The worker function to calculate discontinuous TNKDE likelihood cv

Description

The worker function to calculate discontinuous TNKDE likelihood cv (INTERNAL)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

bws_net

an arma::vec with the network bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a cube with the impact of the event v on each other events for each pair of bandwidths (cube(bws_net, bws_time, events))


The worker function to calculate discontinuous TNKDE likelihood cv

Description

The worker function to calculate discontinuous TNKDE likelihood cv (INTERNAL)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

time_events

a NumericVector indicating the timestamp of each event

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

v_time

the time of v (double)

bws_net

an arma::vec with the network bandwidths to consider

bws_time

an arma::vec with the time bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a cube with the impact of the event v on each other event for each pair of bandwidths (cube(bws_net, bws_time, events))


The worker function to calculate discontinuous TNKDE likelihood cv (adaptive case)

Description

The worker function to calculate discontinuous TNKDE likelihood cv (INTERNAL)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

time_events

a NumericVector indicating the timestamp of each event

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

v_time

the time of v (double)

bws_net

an arma::mat with the network bandwidths to consider

bws_time

an arma::mat with the time bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a cube with the impact of the event v on each other event for each pair of bandwidths (cube(bws_net, bws_time, events))


Worker for simple NKDE algorithm

Description

The worker function to perform the simple nkde.

Usage

ess_kernel(graph, y, bw, kernel_func, ok_samples, nodes, ok_edges, N)

Arguments

graph

a graph object from igraph representing the network

y

the index of the actual event

bw

a float indicating the kernel bandwidth (in meters)

kernel_func

a function obtained with the function select_kernel

ok_samples

a a feature collection of points representing the sampling points. The samples must be snapped on the network. A column edge_id must indicate for each sample on which edge it is snapped.

nodes

a a feature collection of points representing the nodes of the network

ok_edges

a a feature collection of linestrings representing the edges of the network

Examples

#This is an internal function, no example provided

The worker function to calculate simple NKDE likelihood cv

Description

The worker function to calculate simple NKDE likelihood cv

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

bws_net

an arma::vec with the network bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a matrix with the impact of the event v on each other events for each pair of bandwidths (mat(event, bws_net))


The worker function to calculate simple TNKDE likelihood cv

Description

The worker function to calculate simple TNKDE likelihood cv

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

time_events

a NumericVector indicating the timestamp of each event

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

v_time

the time of v (double)

bws_net

an arma::vec with the network bandwidths to consider

bws_time

an arma::vec with the time bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a cube with the impact of the event v on each other event for each pair of bandwidths (cube(bws_net, bws_time, events))


The worker function to calculate simple TNKDE likelihood cv (adaptive case)

Description

The worker function to calculate simple TNKDE likelihood cv (adaptive case)

Arguments

kernel_func

a cpp pointer function (selected with the kernel name)

edge_mat

matrix, to find the id of each edge given two neighbours.

events

a NumericVector indicating the nodes in the graph being events

time_events

a NumericVector indicating the timestamp of each event

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

v

the actual node to consider (int)

v_time

the time of v (double)

bws_net

an arma::mat with the network bandwidths to consider

bws_time

an arma::mat with the time bandwidths to consider

line_weights

a vector with the length of the edges

depth

the actual recursion depth

max_depth

the maximum recursion depth

Value

a cube with the impact of the event v on each other event for each pair of bandwidths (cube(bws_net, bws_time, events))


c++ g space-time function

Description

c++ g space-time function

Usage

g_nt_func_cpp(
  dist_mat_net,
  dist_mat_time,
  start_net,
  end_net,
  step_net,
  width_net,
  start_time,
  end_time,
  step_time,
  width_time,
  Lt,
  Tt,
  n,
  w
)

Arguments

dist_mat_net

A square matrix with the distances between points on the network

dist_mat_time

A square matrix with the distances between points in time

start_net

A float, the start value for evaluating the g-function on the network

end_net

A float, the last value for evaluating the g-function on the network

step_net

A float, the jump between two evaluations of the g-function on the network

width_net

The width of each donut on the network

start_time

A float, the start value for evaluating the g-function in time

end_time

A float, the last value for evaluating the g-function in time

step_time

A float, the jump between two evaluations of the g-function in time

width_time

The width of each donut in time

Lt

The total length of the network

n

The number of points

w

The weight of the points (coincident points)


Gaussian kernel

Description

Function implementing the gaussian kernel.

Usage

gaussian_kernel(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ gaussian kernel

Description

c++ gaussian kernel

Usage

gaussian_kernel_cpp(d, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


Scaled gaussian kernel

Description

Function implementing the scaled gaussian kernel.

Usage

gaussian_kernel_scaled(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ scale gaussian kernel

Description

c++ scale gaussian kernel

Usage

gaussian_kernel_scaled_cpp(d, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ scaled gaussian kernel for one distance

Description

c++ scaled gaussian kernel for one distance

Usage

gaussian_kernel_scaledos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ gaussian kernel for one distance

Description

c++ gaussian kernel for one distance

Usage

gaussian_kernelos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ g function counting worker

Description

c++ k function counting (INTERNAL)

Usage

gfunc_counting(dist_mat, wc, wr, breaks, width)

Arguments

dist_mat

A matrix with the distances between points

wc

The weight of the points represented by the columns (destinations)

wr

The weight of the points represented by the rows (origins)

breaks

A numeric vector with the distance to consider

width

The width of each donut

Value

A numeric matrix with the countings of the g function evaluated at the required distances


c++ g function

Description

c++ g function (INTERNAL)

Usage

gfunc_cpp(dist_mat, start, end, step, width, Lt, n, w)

Arguments

dist_mat

A square matrix with the distances between points

start

A float, the start value for evaluating the g-function

end

A float, the last value for evaluating the g-function

step

A float, the jump between two evaluations of the k-function

width

The width of each donut

Lt

The total length of the network

n

The number of points

w

The weight of the points (coincident points)

Value

A numeric vector with the values of the g function evaluated at the required distances


c++ g function

Description

c++ g function (INTERNAL)

Usage

gfunc_cpp2(dist_mat, start, end, step, width, Lt, n, wc, wr)

Arguments

dist_mat

A square matrix with the distances between points

start

A float, the start value for evaluating the g-function

end

A float, the last value for evaluating the g-function

step

A float, the jump between two evaluations of the k-function

width

The width of each donut

Lt

The total length of the network

n

The number of points

wc

The weight of the points represented by the columns (destinations)

wr

The weight of the points represented by the rows (origins)

Value

A numeric vector with the values of the g function evaluated at the required distances


Geometric mean

Description

Function to calculate the geometric mean.

Usage

gm_mean(x, na.rm = TRUE)

Arguments

x

A vector of numeric values

na.rm

A boolean indicating if we filter the NA values

Value

The geometric mean of x

Examples

#This is an internal function, no example provided

Topological error

Description

A utility function to find topological errors in a network.

Usage

graph_checking(lines, digits, max_search = 5, tol = 0.1)

Arguments

lines

A feature collection of linestrings representing the network

digits

An integer indicating the number of digits to retain for coordinates

max_search

The maximum number of nearest neighbour to search to find close_nodes

tol

The minimum distance expected between two nodes. If two nodes are closer, they are returned in the result of the function.

Details

This function can be used to check for three common problems in networks: disconnected components, dangle nodes and close nodes. When a network has disconnected components, this means that several unconnected graphs are composing the overall network. This can be caused by topological errors in the dataset. Dangle nodes are nodes connected to only one other node. This type of node can be normal at the border of a network, but can also be caused by topological errors. Close nodes are nodes that are not coincident, but so close that they probably should be coincident.

Value

A list with three elements. The first is a feature collection of points indicating for each node of the network to which component it belongs. The second is a feature collection of points with nodes that are too close one of each other. The third is a feature collection of points with the dangle nodes of the network.

Examples


data(mtl_netowrk)
topo_errors <- graph_checking(mtl_network, 2)


Heal edges

Description

Merge Lines if they form a longer linestring without external intersections (experimental)

Usage

heal_edges(lines, digits = 3, verbose = TRUE)

Arguments

lines

A feature collection of linestrings

digits

An integer indicating the number of digits to keep in coordinates

verbose

A boolean indicating if a progress bar should be displayed

Value

A feature collection of linestrings with the eventually merged geometries. Note that if lines are merged, only the attributes of the first line are preserved

Examples

#This is an internal function, no example provided

Projection test

Description

Check if a feature collection is in a projected CRS

Usage

is_projected(obj)

Arguments

obj

A feature collection

Value

A boolean

Examples

#This is an internal function, no example provided

c++ k space-time function

Description

c++ k space-time function

c++ k and g space-time function

c++ k space-time function

Usage

k_nt_func_cpp(
  dist_mat_net,
  dist_mat_time,
  start_net,
  end_net,
  step_net,
  start_time,
  end_time,
  step_time,
  Lt,
  Tt,
  n,
  w
)

k_g_nt_func_cpp2(
  dist_mat_net,
  dist_mat_time,
  start_net,
  end_net,
  step_net,
  start_time,
  end_time,
  step_time,
  width_net,
  width_time,
  Lt,
  Tt,
  n,
  wc,
  wr,
  cross = FALSE
)

k_nt_func_cpp2(
  dist_mat_net,
  dist_mat_time,
  start_net,
  end_net,
  step_net,
  start_time,
  end_time,
  step_time,
  Lt,
  Tt,
  n,
  wc,
  wr,
  cross = FALSE
)

Arguments

dist_mat_net

A square matrix with the distances between points (network)

dist_mat_time

A square matrix with the distances between points (time)

start_net

A float, the start value for evaluating the k-function (network)

end_net

A float, the last value for evaluating the k-function (network)

step_net

A float, the jump between two evaluations of the k-function (network)

start_time

A float, the start value for evaluating the k-function (time)

end_time

A float, the last value for evaluating the k-function (time)

step_time

A float, the jump between two evaluations of the k-function (time)

Lt

The total length of the network

Tt

The total duration of study area

n

The number of points

w

The weight of the points (coincident points)

width_net

A float indicating the width of the donught of the g-function (network)

width_time

A float indicating the width of the donught of the g-function (time)

cross

a boolean indicating of we are calculating a cross k or g function


Network k and g functions for spatio-temporal data (experimental, NOT READY FOR USE)

Description

Calculate the k and g functions for a set of points on a network and in time (experimental, NOT READY FOR USE).

Usage

k_nt_functions(
  lines,
  points,
  points_time,
  start_net,
  end_net,
  step_net,
  width_net,
  start_time,
  end_time,
  step_time,
  width_time,
  nsim,
  conf_int = 0.05,
  digits = 2,
  tol = 0.1,
  resolution = NULL,
  agg = NULL,
  verbose = TRUE,
  calc_g_func = TRUE
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring

points

A feature collection of points representing the points on the network. These points will be snapped on their nearest line

points_time

A numeric vector indicating when the point occured

start_net

A double, the lowest network distance used to evaluate the k and g functions

end_net

A double, the highest network distance used to evaluate the k and g functions

step_net

A double, the step between two evaluations of the k and g for the network distance function. start_net, end_net and step_net are used to create a vector of distances with the function seq

width_net

The width (network distance) of each donut for the g-function. Half of the width is applied on both sides of the considered distance

start_time

A double, the lowest time distance used to evaluate the k and g functions

end_time

A double, the highest time distance used to evaluate the k and g functions

step_time

A double, the step between two evaluations of the k and g for the time distance function. start_time, end_time and step_time are used to create a vector of distances with the function seq

width_time

The width (time distance) of each donut for the g-function. Half of the width is applied on both sides of the considered distance

nsim

An integer indicating the number of Monte Carlo simulations to perform for inference

conf_int

A double indicating the width confidence interval (default = 0.05) calculated on the Monte Carlo simulations

digits

An integer indicating the number of digits to retain from the spatial coordinates

tol

When adding the points to the network, specify the minimum distance between these points and the lines' extremities. When points are closer, they are added at the extremity of the lines

resolution

When simulating random points on the network, selecting a resolution will reduce greatly the calculation time. When resolution is null the random points can occur everywhere on the graph. If a value is specified, the edges are split according to this value and the random points can only be vertices on the new network

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates

verbose

A Boolean indicating if progress messages should be displayed

calc_g_func

A boolean indicating if the G function must also be calculated

Details

The k-function is a method to characterize the dispersion of a set of points. For each point, the numbers of other points in subsequent radii are calculated in both space and time. This empirical k-function can be more or less clustered than a k-function obtained if the points were randomly located . In a network, the network distance is used instead of the Euclidean distance. This function uses Monte Carlo simulations to assess if the points are clustered or dispersed. The function also calculates the g-function, a modified version of the k-function using rings instead of disks. The width of the ring must be chosen. The main interest is to avoid the cumulative effect of the classical k-function. This function is maturing, it works as expected (unit tests) but will probably be modified in the future releases (gain speed, advanced features, etc.).

Value

A list with the following values :

Examples


data(mtl_network)
data(bike_accidents)

# converting the Date field to a numeric field (counting days)
bike_accidents$Time <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
start <- as.POSIXct("2016/01/01", format = "%Y/%m/%d")
bike_accidents$Time <- difftime(bike_accidents$Time, start, units = "days")
bike_accidents$Time <- as.numeric(bike_accidents$Time)

values <- k_nt_functions(
      lines =  mtl_network,
      points = bike_accidents,
      points_time = bike_accidents$Time,
      start_net = 0 ,
      end_net = 2000,
      step_net = 10,
      width_net = 200,
      start_time = 0,
      end_time = 360,
      step_time = 7,
      width_time = 14,
      nsim = 50,
      conf_int = 0.05,
      digits = 2,
      tol = 0.1,
      resolution = NULL,
      agg = 15,
      verbose = TRUE)


Network k and g functions for spatio-temporal data (multicore, experimental, NOT READY FOR USE)

Description

Calculate the k and g functions for a set of points on a network and in time (multicore, experimental, NOT READY FOR USE).

Usage

k_nt_functions.mc(
  lines,
  points,
  points_time,
  start_net,
  end_net,
  step_net,
  width_net,
  start_time,
  end_time,
  step_time,
  width_time,
  nsim,
  conf_int = 0.05,
  digits = 2,
  tol = 0.1,
  resolution = NULL,
  agg = NULL,
  verbose = TRUE,
  calc_g_func = TRUE,
  grid_shape = c(1, 1)
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring

points

A feature collection of points representing the points on the network. These points will be snapped on their nearest line

points_time

A numeric vector indicating when the point occured

start_net

A double, the lowest network distance used to evaluate the k and g functions

end_net

A double, the highest network distance used to evaluate the k and g functions

step_net

A double, the step between two evaluations of the k and g for the network distance function. start_net, end_net and step_net are used to create a vector of distances with the function seq

width_net

The width (network distance) of each donut for the g-function. Half of the width is applied on both sides of the considered distance

start_time

A double, the lowest time distance used to evaluate the k and g functions

end_time

A double, the highest time distance used to evaluate the k and g functions

step_time

A double, the step between two evaluations of the k and g for the time distance function. start_time, end_time and step_time are used to create a vector of distances with the function seq

width_time

The width (time distance) of each donut for the g-function. Half of the width is applied on both sides of the considered distance

nsim

An integer indicating the number of Monte Carlo simulations to perform for inference

conf_int

A double indicating the width confidence interval (default = 0.05) calculated on the Monte Carlo simulations

digits

An integer indicating the number of digits to retain from the spatial coordinates

tol

When adding the points to the network, specify the minimum distance between these points and the lines' extremities. When points are closer, they are added at the extremity of the lines

resolution

When simulating random points on the network, selecting a resolution will reduce greatly the calculation time. When resolution is null the random points can occur everywhere on the graph. If a value is specified, the edges are split according to this value and the random points can only be vertices on the new network

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates

verbose

A Boolean indicating if progress messages should be displayed

calc_g_func

A boolean indicating if the G function must also be calculated

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

Details

The k-function is a method to characterize the dispersion of a set of points. For each point, the numbers of other points in subsequent radii are calculated. This empirical k-function can be more or less clustered than a k-function obtained if the points were randomly located in space. In a network, the network distance is used instead of the Euclidean distance. This function uses Monte Carlo simulations to assess if the points are clustered or dispersed, and gives the results as a line plot. If the line of the observed k-function is higher than the shaded area representing the values of the simulations, then the points are more clustered than what we can expect from randomness and vice-versa. The function also calculates the g-function, a modified version of the k-function using rings instead of disks. The width of the ring must be chosen. The main interest is to avoid the cumulative effect of the classical k-function. This function is maturing, it works as expected (unit tests) but will probably be modified in the future releases (gain speed, advanced features, etc.).

Value

A list with the following values :


c++ k function counting worker

Description

c++ k function counting (INTERNAL)

Usage

kfunc_counting(dist_mat, wc, wr, breaks, cross = FALSE)

Arguments

dist_mat

A matrix with the distances between points

wc

The weight of the points represented by the columns (destinations)

wr

The weight of the points represented by the rows (origins)

breaks

A numeric vector with the distance to consider

cross

A boolean indicating if we are calculating a cross k function or not (default is FALSE)

Value

A numeric matrix with the countings of the k function evaluated at the required distances


c++ k function

Description

c++ k function (INTERNAL)

Usage

kfunc_cpp(dist_mat, start, end, step, Lt, n, w)

Arguments

dist_mat

A square matrix with the distances between points

start

A float, the start value for evaluating the k-function

end

A float, the last value for evaluating the k-function

step

A float, the jump between two evaluations of the k-function

Lt

The total length of the network

n

The number of points

w

The weight of the points (coincident points)

Value

A numeric vector with the values of the k function evaluated at the required distances


c++ k function 2

Description

c++ k function (INTERNAL)

Usage

kfunc_cpp2(dist_mat, start, end, step, Lt, n, wc, wr, cross = FALSE)

Arguments

dist_mat

A square matrix with the distances between points

start

A float, the start value for evaluating the k-function

end

A float, the last value for evaluating the k-function

step

A float, the jump between two evaluations of the k-function

Lt

The total length of the network

n

The number of points

wc

The weight of the points represented by the columns (destinations)

wr

The weight of the points represented by the rows (origins)

cross

A boolean indicating if we are calculating a cross k function or not (default is FALSE)

Value

A numeric vector with the values of the k function evaluated at the required distances


Network k and g functions (maturing)

Description

Calculate the k and g functions for a set of points on a network (maturing).

Usage

kfunctions(
  lines,
  points,
  start,
  end,
  step,
  width,
  nsim,
  conf_int = 0.05,
  digits = 2,
  tol = 0.1,
  agg = NULL,
  verbose = TRUE,
  return_sims = FALSE,
  calc_g_func = TRUE,
  resolution = NULL
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring

points

A feature collection of points representing the points on the network. These points will be snapped on their nearest line

start

A double, the lowest distance used to evaluate the k and g functions

end

A double, the highest distance used to evaluate the k and g functions

step

A double, the step between two evaluations of the k and g function. start, end and step are used to create a vector of distances with the function seq

width

The width of each donut for the g-function. Half of the width is applied on both sides of the considered distance

nsim

An integer indicating the number of Monte Carlo simulations to perform for inference

conf_int

A double indicating the width confidence interval (default = 0.05) calculated on the Monte Carlo simulations

digits

An integer indicating the number of digits to retain from the spatial coordinates

tol

When adding the points to the network, specify the minimum distance between these points and the lines' extremities. When points are closer, they are added at the extremity of the lines

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates

verbose

A Boolean indicating if progress messages should be displayed

return_sims

a boolean indicating if the simulated k and g values must also be returned.

calc_g_func

A Boolean indicating if the G function must also be calculated (TRUE by default). If FALSE, then only the K function is calculated

resolution

When simulating random points on the network, selecting a resolution will reduce greatly the calculation time. When resolution is null the random points can occur everywhere on the graph. If a value is specified, the edges are split according to this value and the random points can only be vertices on the new network

Details

The k-function is a method to characterize the dispersion of a set of points. For each point, the numbers of other points in subsequent radii are calculated. This empirical k-function can be more or less clustered than a k-function obtained if the points were randomly located in space. In a network, the network distance is used instead of the Euclidean distance. This function uses Monte Carlo simulations to assess if the points are clustered or dispersed, and gives the results as a line plot. If the line of the observed k-function is higher than the shaded area representing the values of the simulations, then the points are more clustered than what we can expect from randomness and vice-versa. The function also calculates the g-function, a modified version of the k-function using rings instead of disks. The width of the ring must be chosen. The main interest is to avoid the cumulative effect of the classical k-function. This function is maturing, it works as expected (unit tests) but will probably be modified in the future releases (gain speed, advanced features, etc.).

Value

A list with the following values :

Examples


data(main_network_mtl)
data(mtl_libraries)
result <- kfunctions(main_network_mtl, mtl_libraries,
     start = 0, end = 2500, step = 100,
     width = 200, nsim = 50,
     conf_int = 0.05, tol = 0.1, agg = NULL,
     calc_g_func = TRUE,
     verbose = FALSE)


Network k and g functions (multicore)

Description

Calculate the k and g functions for a set of points on a network with multicore support. For details, please see the function kfunctions. (maturing)

Usage

kfunctions.mc(
  lines,
  points,
  start,
  end,
  step,
  width,
  nsim,
  conf_int = 0.05,
  digits = 2,
  tol = 0.1,
  agg = NULL,
  verbose = TRUE,
  return_sims = FALSE,
  calc_g_func = TRUE,
  resolution = NULL,
  grid_shape = c(1, 1)
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring

points

A feature collection of points representing the points on the network. These points will be snapped on their nearest line

start

A double, the lowest distance used to evaluate the k and g functions

end

A double, the highest distance used to evaluate the k and g functions

step

A double, the step between two evaluations of the k and g function. start, end and step are used to create a vector of distances with the function seq

width

The width of each donut for the g-function. Half of the width is applied on both sides of the considered distance

nsim

An integer indicating the number of Monte Carlo simulations to perform for inference

conf_int

A double indicating the width confidence interval (default = 0.05) calculated on the Monte Carlo simulations

digits

An integer indicating the number of digits to retain from the spatial coordinates

tol

When adding the points to the network, specify the minimum distance between these points and the lines' extremities. When points are closer, they are added at the extremity of the lines

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates

verbose

A Boolean indicating if progress messages should be displayed

return_sims

a boolean indicating if the simulated k and g values must also be returned.

calc_g_func

A Boolean indicating if the G function must also be calculated (TRUE by default). If FALSE, then only the K function is calculated

resolution

When simulating random points on the network, selecting a resolution will reduce greatly the calculation time. When resolution is null the random points can occur everywhere on the graph. If a value is specified, the edges are split according to this value and the random points can only be vertices on the new network

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

Details

For details, please look at the function kfunctions.

Value

A list with the following values :

Examples


data(main_network_mtl)
data(mtl_libraries)
result <- kfunctions(main_network_mtl, mtl_libraries,
     start = 0, end = 2500, step = 10,
     width = 200, nsim = 50,
     conf_int = 0.05, tol = 0.1, agg = NULL,
     verbose = FALSE)


c++ k and g function counting worker

Description

c++ k function counting (INTERNAL)

Usage

kgfunc_counting(dist_mat, wc, wr, breaks, width, cross = FALSE)

Arguments

dist_mat

A matrix with the distances between points

wc

The weight of the points represented by the columns (destinations)

wr

The weight of the points represented by the rows (origins)

breaks

A numeric vector with the distance to consider

width

The width of each donut

cross

A boolean indicating if we are calculating a cross k function or not (default is FALSE)

Value

A list of two numeric matrices with the values of the k and g function evaluated at the required distances


c++ k and g function

Description

c++ g function (INTERNAL)

Usage

kgfunc_cpp2(dist_mat, start, end, step, width, Lt, n, wc, wr, cross = FALSE)

Arguments

dist_mat

A square matrix with the distances between points

start

A float, the start value for evaluating the g-function

end

A float, the last value for evaluating the g-function

step

A float, the jump between two evaluations of the k-function

width

The width of each donut

Lt

The total length of the network

n

The number of points

wc

The weight of the points represented by the columns (destinations)

wr

The weight of the points represented by the rows (origins)

cross

A boolean indicating if we are calculating a cross k function or not (default is FALSE)

Value

A numeric matrix with the values of the k (first col) and g (second col) function evaluated at the required distances


c++ k and g function counting worker

Description

c++ k function counting (INTERNAL)

c++ k function counting (INTERNAL)

Usage

kgfunc_time_counting(
  dist_mat_net,
  dist_mat_time,
  wc,
  wr,
  breaks_net,
  breaks_time,
  width_net,
  width_time,
  cross = FALSE
)

kfunc_time_counting(
  dist_mat_net,
  dist_mat_time,
  wc,
  wr,
  breaks_net,
  breaks_time,
  cross = FALSE
)

Arguments

dist_mat_net

A matrix with the distances between points on the network

dist_mat_time

A matrix with the distances between points in time

wc

The weight of the points represented by the columns (destinations)

wr

The weight of the points represented by the rows (origins)

breaks_net

A numeric vector with the distance to consider on network

breaks_time

A numeric vector with the distance to consider in time

width_net

The width of each donut for the network dimension

width_time

The width of each donut for the time dimension

cross

A boolean indicating if we are calculating a cross k function or not (default is FALSE)

Value

A list of two numeric cubes with the values of the k and g function evaluated at the required distances

A list of two numeric cubes with the values of the k and g function evaluated at the required distances


Centre points of lines

Description

Generate a feature collection of points at the centre of the lines of a feature collection of linestrings. The length of the lines is used to determine their centres.

Usage

lines_center(lines)

Arguments

lines

A feature collection of linestrings to use

Value

A feature collection of points

Examples


data(mtl_network)
centers <- lines_center(mtl_network)


Lines coordinates as list

Description

A function to get the coordinates of some lines as a list of matrices

Usage

lines_coordinates_as_list(lines)

Arguments

lines

A sf object with linestring type geometries

Value

A list of matrices

Examples

#This is an internal function, no example provided

Unify lines direction

Description

A function to deal with the directions of lines. It ensures that only From-To situation are present by reverting To-From lines. For the lines labelled as To-From, the order of their vertices is reverted.

Usage

lines_direction(lines, field)

Arguments

lines

A sf object with linestring type geometries

field

Indicate a field giving information about authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

Value

A sf object with linestring type geometries

Examples

data(mtl_network)
mtl_network$length <- as.numeric(sf::st_length(mtl_network))
mtl_network$direction <- "Both"
mtl_network[6, "direction"] <- "TF"
mtl_network_directed <- lines_direction(mtl_network, "direction")

Get lines extremities

Description

Generate a feature collection of points with the first and last vertex of each line in a feature collection of linestrings.

Usage

lines_extremities(lines)

Arguments

lines

A feature collection of linestrings (simple Linestrings)

Value

A feature collection of points

Examples

wkt_lines <- c(
"LINESTRING (0 0, 1 0)",
"LINESTRING (1 0, 2 0)",
"LINESTRING (2 0, 3 0)",
"LINESTRING (0 1, 1 1)")

linesdf <- data.frame(wkt = wkt_lines,
                      id = paste("l",1:length(wkt_lines),sep=""))

all_lines <- sf::st_as_sf(linesdf, wkt = "wkt")
all_lines <- cbind(linesdf$wkt,all_lines)
points <- lines_extremities(all_lines)

Points along lines

Description

Generate a feature collection of points along the lines of feature collection of Linestrings.

Usage

lines_points_along(lines, dist)

Arguments

lines

A feature collection of linestrings to use

dist

The distance between the points along the lines

Value

A feature collection of points

Examples


data(mtl_network)
new_pts <- lines_points_along(mtl_network,50)


List of coordinates as lines

Description

A function to convert a list of matrices to as sf object with linestring geometry type

Usage

list_coordinates_as_lines(coord_list, crs)

Arguments

coord_list

A list of matrices

crs

The CRS to use to create the lines

Value

A sf object with linestring type geometries

Examples

#This is an internal function, no example provided

Cut lines into lixels

Description

Cut the lines of a feature collection of linestrings into lixels with a specified minimal distance may fail if the line geometries are self intersecting.

Usage

lixelize_lines(lines, lx_length, mindist = NULL)

Arguments

lines

The sf object with linestring geometry type to modify

lx_length

The length of a lixel

mindist

The minimum length of a lixel. After cut, if the length of the final lixel is shorter than the minimum distance, then it is added to the previous lixel. if NULL, then mindist = maxdist/10. Note that the segments that are already shorter than the minimum distance are not modified.

Value

An sf object with linestring geometry type

Examples


data(mtl_network)
lixels <- lixelize_lines(mtl_network,150,50)


Cut lines into lixels (multicore)

Description

Cut the lines of a feature collection of linestrings into lixels with a specified minimal distance may fail if the line geometries are self intersecting with multicore support.

Usage

lixelize_lines.mc(
  lines,
  lx_length,
  mindist = NULL,
  verbose = TRUE,
  chunk_size = 100
)

Arguments

lines

A feature collection of linestrings to convert to lixels

lx_length

The length of a lixel

mindist

The minimum length of a lixel. After cut, if the length of the final lixel is shorter than the minimum distance, then it is added to the previous lixel. If NULL, then mindist = maxdist/10

verbose

A Boolean indicating if a progress bar must be displayed

chunk_size

The size of a chunk used for multiprocessing. Default is 100.

Value

A feature collection of linestrings

Examples


data(mtl_network)
future::plan(future::multisession(workers=1))
lixels <- lixelize_lines.mc(mtl_network,150,50)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")){
future::plan(future::sequential)
}


Primary road network of Montreal

Description

A feature collection (sf object) representing the primary road network of Montreal. The EPSG is 3797, and the data comes from the Montreal OpenData website.

Usage

main_network_mtl

Format

A sf object with 2945 rows and 2 variables

TYPE

the type of road

geom

the geometry (linestrings)

Source

https://donnees.montreal.ca/dataset/geobase


Libraries of Montreal

Description

A feature collection (sf object) representing the libraries of Montreal. The EPSG is 3797 and the data comes from the Montreal OpenData website.

Usage

mtl_libraries

Format

A sf object with 55 rows and 3 variables.

CP

the postal code

NAME

the name of the library

geom

the geometry (points)

Source

https://donnees.montreal.ca/dataset/lieux-culturels


Road network of Montreal

Description

A feature collection (sf object) representing the road network of Montreal. The EPSG is 3797, and the data comes from the Montreal OpenData website. It is only a small subset in central districts used to demonstrate the main functions of spNetwork.

Usage

mtl_network

Format

A sf object with 2945 rows and 2 variables

ClsRte

the category of the road

geom

the geometry (linestrings)

Source

https://donnees.montreal.ca/dataset/geobase


Theatres of Montreal

Description

A feature collection (sf object) representing the theatres of Montreal. The EPSG is 3797 and the data comes from the Montreal OpenData website.

Usage

mtl_theatres

Format

A sf object with 54 rows and 3 variables.

CP

the postal code

NAME

the name of the theatre

geom

the geometry (points)

Source

https://donnees.montreal.ca/dataset/lieux-culturels


Nearest point on Line

Description

Find the nearest projected point on a LineString (from maptools)

Usage

nearestPointOnLine(coordsLine, coordsPoint)

Arguments

coordsLine

The coordinates of the line (matrix)

coordsPoint

The coordinates of the point

Value

A numeric vector with the coordinates of the projected point

Examples

#This is an internal function, no example provided

Nearest point on segment

Description

Find the nearest projected point on a segment (from maptools)

Usage

nearestPointOnSegment(s, p)

Arguments

s

The coordinates of the segment

p

The coordinates of the point

Value

A numeric vector with the coordinates of the projected point

Examples

#This is an internal function, no example provided

Nearest line for points

Description

Find for each point its nearest LineString

Usage

nearest_lines(points, lines, snap_dist = 300, max_iter = 10)

Arguments

points

A feature collection of points

lines

A feature collection of linestrings

snap_dist

A distance (float) given to find for each point its nearest line in a spatial index. A too big value will produce unnecessary distance calculations and a too short value will lead to more iterations to find neighbours. In extrem cases, a too short value could lead to points not associated with lines (index = -1).

max_iter

An integer indicating how many iteration the search algorithm must perform in the spatial index to find lines close to a point. At each iteration, the snap_dist is doubled to find candidates.

Examples

# this is an internal function, no example provided

K-nearest points on network

Description

Calculate the K-nearest points for a set of points on a network.

Usage

network_knn(
  origins,
  lines,
  k,
  destinations = NULL,
  maxdistance = 0,
  snap_dist = Inf,
  line_weight = "length",
  direction = NULL,
  grid_shape = c(1, 1),
  verbose = FALSE,
  digits = 3,
  tol = 0.1
)

Arguments

origins

A feature collection of points, for each point, its k nearest neighbours will be found on the network.

lines

A feature collection of linestrings representing the underlying network

k

An integer indicating the number of neighbours to find.

destinations

A feature collection of points, might be used if the neighbours must be found in a separate set of points NULL if the neighbours must be found in origins.

maxdistance

The maximum distance between two observations to consider them as neighbours. It is useful only if a grid is used, a lower value will reduce calculating time, but one must be sure that the k nearest neighbours are within this radius. Otherwise NAs will be present in the results.

snap_dist

The maximum distance to snap the start and end points on the network.

line_weight

The weighting to use for lines. Default is "length" (the geographical length), but can be the name of a column. The value is considered proportional to the geographical length of the lines.

direction

The name of a column indicating authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

grid_shape

A vector of length 2 indicating the shape of the grid to use for splitting the dataset. Default is c(1,1), so all the calculation is done in one go. It might be necessary to split it if the dataset is large.

verbose

A Boolean indicating if the function should print its progress

digits

The number of digits to retain from the spatial coordinates ( simplification used to reduce risk of topological error)

tol

A float indicating the minimum distance between the points and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

Details

The k nearest neighbours of each point are found by using the network distance. The results could not be exact if some points share the exact same location. As an example, consider the following case. If A and B are two points at the exact same location, and C is a third point close to A and B. If the 1 nearest neighbour is requested for C, the function could return either A or B but not both. When such situation happens, a warning is raised by the function.

Value

A list with two matrices, one with the index of the neighbours and one with the distances.

Examples


    data(main_network_mtl)
    data(mtl_libraries)
    results <- network_knn(mtl_libraries, main_network_mtl,
        k = 3, maxdistance = 1000, line_weight = "length",
        grid_shape=c(1,1), verbose = FALSE)


K-nearest points on network (multicore version)

Description

Calculate the K-nearest points for a set of points on a network with multicore support.

Usage

network_knn.mc(
  origins,
  lines,
  k,
  destinations = NULL,
  maxdistance = 0,
  snap_dist = Inf,
  line_weight = "length",
  direction = NULL,
  grid_shape = c(1, 1),
  verbose = FALSE,
  digits = 3,
  tol = 0.1
)

Arguments

origins

A feature collection of points, for each point, its k nearest neighbours will be found on the network.

lines

A feature collection of linestrings representing the underlying network

k

An integer indicating the number of neighbours to find.

destinations

A feature collection of points, might be used if the neighbours must be found in a separate set of points NULL if the neighbours must be found in origins.

maxdistance

The maximum distance between two observations to consider them as neighbours. It is useful only if a grid is used, a lower value will reduce calculating time, but one must be sure that the k nearest neighbours are within this radius. Otherwise NAs will be present in the results.

snap_dist

The maximum distance to snap the start and end points on the network.

line_weight

The weighting to use for lines. Default is "length" (the geographical length), but can be the name of a column. The value is considered proportional to the geographical length of the lines.

direction

The name of a column indicating authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

grid_shape

A vector of length 2 indicating the shape of the grid to use for splitting the dataset. Default is c(1,1), so all the calculation is done in one go. It might be necessary to split it if the dataset is large.

verbose

A Boolean indicating if the function should print its progress

digits

The number of digits to retain from the spatial coordinates ( simplification used to reduce risk of topological error)

tol

A float indicating the minimum distance between the points and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

Value

A list with two matrices, one with the index of the neighbours and one with the distances.

Examples


data(main_network_mtl)
data(mtl_libraries)
future::plan(future::multisession(workers=1))
results <- network_knn.mc(mtl_libraries, main_network_mtl,
    k = 3, maxdistance = 1000, line_weight = "length",
    grid_shape=c(1,1), verbose = FALSE)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


worker function for K-nearest points on network

Description

The worker the K-nearest points for a set of points on a network.

Usage

network_knn_worker(
  points,
  lines,
  k,
  direction = NULL,
  use_dest = FALSE,
  verbose = verbose,
  digits = digits,
  tol = tol
)

Arguments

points

A feature collection of points, for each point, its k nearest neighbours will be found on the network.

lines

A feature collection of lines representing the network

k

An integer indicating the number of neighbours to find..

direction

Indicates a field providing information about authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

use_dest

A boolean indicating if the origins and separations are separated (TRUE), FALSE if only origins are used.

verbose

A Boolean indicating if the function should print its progress

digits

The number of digits to retain in the spatial coordinates ( simplification used to reduce risk of topological error)

tol

A float indicating the spatial tolerance when points are added as vertices to lines.

Value

A list with two matrices, one with the index of the neighbours and one with the distances.

Examples

#no example provided, this is an internal function

Network distance listw

Description

Generate listw object (spdep like) based on network distances.

Usage

network_listw(
  origins,
  lines,
  maxdistance,
  method = "centroid",
  point_dist = NULL,
  snap_dist = Inf,
  line_weight = "length",
  mindist = 10,
  direction = NULL,
  dist_func = "inverse",
  matrice_type = "B",
  grid_shape = c(1, 1),
  verbose = FALSE,
  digits = 3,
  tol = 0.1
)

Arguments

origins

A feature collection of lines, points, or polygons for which the spatial neighbouring list will be built

lines

A feature collection of lines representing the network

maxdistance

The maximum distance between two observations to consider them as neighbours.

method

A string indicating how the starting points will be built. If 'centroid' is used, then the centre of lines or polygons is used. If 'pointsalong' is used, then points will be placed along polygons' borders or along lines as starting and end points. If 'ends' is used (only for lines) the first and last vertices of lines are used as starting and ending points.

point_dist

A float, defining the distance between points when the method 'pointsalong' is selected.

snap_dist

The maximum distance to snap the start and end points on the network.

line_weight

The weighting to use for lines. Default is "length" (the geographical length), but can be the name of a column. The value is considered proportional to the geographical length of the lines.

mindist

The minimum distance between two different observations. It is important for it to be different from 0 when a W style is used.

direction

Indicates a field providing information about authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

dist_func

Indicates the function to use to convert the distance between observation in spatial weights. Can be 'identity', 'inverse', 'squared inverse' or a function with one parameter x that will be vectorized internally

matrice_type

The type of the weighting scheme. Can be 'B' for Binary, 'W' for row weighted, or 'I' (identity), see the documentation of spdep::nb2listw for details

grid_shape

A vector of length 2 indicating the shape of the grid to use for splitting the dataset. Default is c(1,1), so all the calculation is done in one go. It might be necessary to split it if the dataset is large.

verbose

A Boolean indicating if the function should print its progress

digits

The number of digits to retain in the spatial coordinates ( simplification used to reduce risk of topological error)

tol

A float indicating the spatial tolerance when points are added as vertices to lines.

Value

A listw object (spdep like) if matrice_type is "B" or "W". If matrice_type is I, then a list with a nblist object and a list of weights is returned.

Examples


data(mtl_network)
listw <- network_listw(mtl_network,
    mtl_network,
    maxdistance = 500,
    method = "centroid",
    line_weight = "length",
    dist_func = 'squared inverse',
    matrice_type='B',
    grid_shape = c(2,2))


Network distance listw (multicore)

Description

Generate listw object (spdep like) based on network distances with multicore support.

Usage

network_listw.mc(
  origins,
  lines,
  maxdistance,
  method = "centroid",
  point_dist = NULL,
  snap_dist = Inf,
  line_weight = "length",
  mindist = 10,
  direction = NULL,
  dist_func = "inverse",
  matrice_type = "B",
  grid_shape = c(1, 1),
  verbose = FALSE,
  digits = 3,
  tol = 0.1
)

Arguments

origins

A feature collection of linestrings, points or polygons for which the spatial neighbouring list will be built.

lines

A feature collection of linestrings representing the network

maxdistance

The maximum distance between two observations to consider them as neighbours.

method

A string indicating how the starting points will be built. If 'centroid' is used, then the centre of lines or polygons is used. If 'pointsalong' is used, then points will be placed along polygons' borders or along lines as starting and end points. If 'ends' is used (only for lines) the first and last vertices of lines are used as starting and ending points.

point_dist

A float, defining the distance between points when the method pointsalong is selected.

snap_dist

the maximum distance to snap the start and end points on the network.

line_weight

The weights to use for lines. Default is "length" (the geographical length), but can be the name of a column. The value is considered proportional with the geographical length of the lines.

mindist

The minimum distance between two different observations. It is important for it to be different from 0 when a W style is used.

direction

Indicates a field giving information about authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

dist_func

Indicates the function to use to convert the distance between observation in spatial weights. Can be 'identity', 'inverse', 'squared inverse' or a function with one parameter x that will be vectorized internally

matrice_type

The type of the weighting scheme. Can be 'B' for Binary, 'W' for row weighted, or 'I' (identity) see the documentation of spdep::nb2listw for details

grid_shape

A vector of length 2 indicating the shape of the grid to use for splitting the dataset. Default is c(1,1), so all the calculation is done in one go. It might be necessary to split it if the dataset is large.

verbose

A Boolean indicating if the function should print its progress

digits

The number of digits to retain in the spatial coordinates ( simplification used to reduce risk of topological error)

tol

A float indicating the spatial tolerance when points are added as vertices to lines.

Value

A listw object (spdep like) if matrice_type is "B" or "W". If matrice_type is I, then a list with a nblist object and a list of weights is returned.

Examples


data(mtl_network)
future::plan(future::multisession(workers=1))
listw <- network_listw.mc(mtl_network,mtl_network,maxdistance=500,
        method = "centroid", line_weight = "length",
        dist_func = 'squared inverse', matrice_type='B', grid_shape = c(2,2))
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


network_listw worker

Description

The worker function of network_listw.

Usage

network_listw_worker(
  points,
  lines,
  maxdistance,
  dist_func,
  direction = NULL,
  mindist = 10,
  matrice_type = "B",
  verbose = FALSE,
  digits = 3,
  tol = 0.1
)

Arguments

points

A feature collection of points corresponding to start and end points. It must have a column fid, grouping the points if necessary.

lines

A feature collection of lines representing the network

maxdistance

The maximum distance between two observation to consider them as neighbours.

dist_func

A vectorized function converting spatial distances into weights.

direction

Indicate a field giving information about authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

mindist

The minimum distance between two different observations. It is important for it to be different from 0 when a W style is used.

matrice_type

The type of the weighting scheme. Can be 'B' for Binary, 'W' for row weighted, or 'I' (identity), see the documentation of spdep::nb2listw for details

verbose

A Boolean indicating if the function should print its progress

digits

the number of digits to keep in the spatial coordinates ( simplification used to reduce risk of topological error)

tol

A float indicating the spatial tolerance when points are added as vertices to lines.

Value

A list of neihbours as weights.

Examples

#no example provided, this is an internal function

Network Kernel density estimate

Description

Calculate the Network Kernel Density Estimate based on a network of lines, sampling points, and events

Usage

nkde(
  lines,
  events,
  w,
  samples,
  kernel_name,
  bw,
  adaptive = FALSE,
  trim_bw = NULL,
  method,
  div = "bw",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  verbose = TRUE,
  check = TRUE
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

w

A vector representing the weight of each event

samples

A feature collection of points representing the locations for which the densities will be estimated.

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

bw

The kernel bandwidth (using the scale of the lines), can be a single float or a numeric vector if a different bandwidth must be used for each event.

adaptive

A Boolean, indicating if an adaptive bandwidth must be used

trim_bw

A float, indicating the maximum value for the adaptive bandwidth

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

The three NKDE methods
Estimating the density of a point process is commonly done by using an ordinary two-dimensional kernel density function. However, there are numerous cases for which the events do not occur in a two-dimensional space but on a network (like car crashes, outdoor crimes, leaks in pipelines, etc.). New methods were developed to adapt the methodology to networks, three of them are available in this package.

The three methods are available because, even though that the simple method is less precise statistically speaking, it might be more intuitive. From a purely geographical view, it might be seen as a sort of distance decay function as used in Geographically Weighted Regression.


adaptive bandwidth
It is possible to use adaptive bandwidth instead of fixed bandwidth. Adaptive bandwidths are calculated using the Abramson’s smoothing regimen (Abramson 1982). To do so, an original fixed bandwidth must be specified (bw parameter), and is used to estimate the priory densitiy at event locations. These densities are then used to calculate local bandwidth. The maximum size of the local bandwidth can be limited with the parameter trim_bw. For more details, see the vignettes.

Optimization parameters
The grid_shape parameter allows to split the calculus of the NKDE according to a grid dividing the study area. It might be necessary for big dataset to reduce the memory used. If the grid_shape is c(1,1), then a full network is built for the area. If the grid_shape is c(2,2), then the area is split in 4 rectangles. For each rectangle, the sample points falling in the rectangle are used, the events and the lines in a radius of the bandwidth length are used. The results are combined at the end and ordered to match the original order of the samples.

The geographical coordinates of the start and end of lines are used to build the network. To avoid troubles with digits, we truncate the coordinates according to the digit parameter. A minimal loss of precision is expected but results in a fast construction of the network.

To calculate the distances on the network, all the events are added as vertices. To reduce the size of the network, it is possible to reduce the number of vertices by adding the events at the extremity of the lines if they are close to them. This is controlled by the parameter tol.

In the same way, it is possible to limit the number of vertices by aggregating the events that are close to each other. In that case, the weights of the aggregated events are summed. According to an aggregation distance, a buffer is drawn around the fist event, all events falling in that buffer are aggregated to the first event, forming a new event. The coordinates of this new event are the means of the original events coordinates. This procedure is repeated until no events are aggregated. The aggregation distance can be fixed with the parameter agg.

When using the continuous and discontinuous kernel, the density is reduced at each intersection crossed. In the discontinuous case, after 5 intersections with four directions each, the density value is divided by 243 leading to very small values. In the same situation but with the continuous NKDE, the density value is divided by approximately 7.6. The max_depth parameters allows the user to control the maximum depth of these two NKDE. The base value is 15, but a value of 10 would yield very close estimates. A lower value might have a critical impact on speed when the bandwidth is large.

When using the continuous and discontinuous kernel, the connections between graph nodes are stored in a matrix. This matrix is typically sparse, and so a sparse matrix object is used to limit memory use. If the network is small (typically when the grid used to split the data has small rectangles) then a classical matrix could be used instead of a sparse one. It significantly increases speed, but could lead to memory issues.

Value

A vector of values, they are the density estimates at sampling points

References

Abramson IS (1982). “On bandwidth variation in kernel estimates-a square root law.” The annals of Statistics, 1217–1223.

Okabe A, Satoh T, Sugihara K (2009). “A kernel density estimation method for networks, its computational method and a GIS-based tool.” International Journal of Geographical Information Science, 23(1), 7–32.

Xie Z, Yan J (2008). “Kernel density estimation of traffic accidents in a network space.” Computers, environment and urban systems, 32(5), 396–406.

Examples


data(mtl_network)
data(bike_accidents)
lixels <- lixelize_lines(mtl_network,200,mindist = 50)
samples <- lines_center(lixels)
densities <- nkde(mtl_network,
                  events = bike_accidents,
                  w = rep(1,nrow(bike_accidents)),
                  samples = samples,
                  kernel_name = "quartic",
                  bw = 300, div= "bw",
                  adaptive = FALSE,
                  method = "discontinuous", digits = 1, tol = 1,
                  agg = 15,
                  grid_shape = c(1,1),
                  verbose=FALSE)


Network Kernel density estimate (multicore)

Description

Calculate the Network Kernel Density Estimate based on a network of lines, sampling points, and events with multicore support.

Usage

nkde.mc(
  lines,
  events,
  w,
  samples,
  kernel_name,
  bw,
  adaptive = FALSE,
  trim_bw = NULL,
  method,
  div = "bw",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  verbose = TRUE,
  check = TRUE
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

w

A vector representing the weight of each event

samples

A feature collection of points representing the locations for which the densities will be estimated.

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

bw

The kernel bandwidth (using the scale of the lines), can be a single float or a numeric vector if a different bandwidth must be used for each event.

adaptive

A Boolean, indicating if an adaptive bandwidth must be used

trim_bw

A float, indicating the maximum value for the adaptive bandwidth

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

For more details, see help(nkde)

Value

A vector of values, they are the density estimates at sampling points

Examples


data(mtl_network)
data(bike_accidents)
future::plan(future::multisession(workers=1))
lixels <- lixelize_lines(mtl_network,200,mindist = 50)
samples <- lines_center(lixels)
densities <- nkde.mc(mtl_network,
                  events = bike_accidents,
                  w = rep(1,nrow(bike_accidents)),
                  samples = samples,
                  kernel_name = "quartic",
                  bw = 300, div= "bw",
                  adaptive = FALSE, agg = 15,
                  method = "discontinuous", digits = 1, tol = 1,
                  grid_shape = c(3,3),
                  verbose=TRUE)
## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


The exposed function to calculate NKDE likelihood cv

Description

The exposed function to calculate NKDE likelihood cv (INTERNAL)

Usage

nkde_get_loo_values(
  method,
  neighbour_list,
  sel_events,
  sel_events_wid,
  events,
  events_wid,
  weights,
  bws_net,
  kernel_name,
  line_list,
  max_depth,
  cvl
)

Arguments

method

a string, one of "simple", "continuous", "discontinuous"

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

sel_events

a Numeric vector indicating the selected events (id of nodes)

sel_events_wid

a Numeric Vector indicating the unique if of the selected events

events

a NumericVector indicating the nodes in the graph being events

events_wid

a NumericVector indicating the unique id of all the events

weights

a matrix with the weights associated with each event (row) for each bws_net (cols).

bws_net

an arma::mat with the network bandwidths to consider for each event

kernel_name

a string with the name of the kernel to use

line_list

a DataFrame describing the lines

max_depth

the maximum recursion depth

cvl

a boolean indicating if the Cronie (TRUE) or CV likelihood (FALSE) must be used

Value

a vector with the CV score for each bandwidth and the densities if required

Examples

# no example provided, this is an internal function

NKDE worker

Description

The worker function for nkde and nkde.mc

Usage

nkde_worker(
  lines,
  events,
  samples,
  kernel_name,
  bw,
  bws,
  method,
  div,
  digits,
  tol,
  sparse,
  max_depth,
  verbose = FALSE
)

Arguments

lines

A feature collection of linestrings representing the network. The geometries must be simple lines (may crash if some geometries are invalid)

events

A feature collection of points representing the events on the network. The points will be snapped on the network.

samples

A feature collection of points representing the locations for which the densities will be estimated.

kernel_name

The name of the kernel to use

bw

The global kernel bandwidth

bws

The kernel bandwidth (in meters) for each event. Is usually a vector but could also be a matrix if several global bandwidths were used. In this case, the output value is also a matrix.

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see details for more information)

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

digits

The number of digits to keep in the spatial coordinates. It ensures that topology is good when building the network. Default is 3

tol

When adding the events and the sampling points to the network, the minimum distance between these points and the lines extremities. When points are closer, they are added at the extremity of the lines.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. Regular matrices are faster, but require more memory and could lead to error, in particular with multiprocessing. Sparse matrices are slower, but require much less memory.

max_depth

When using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has a lot of small edges (area with a lot of intersections and a lot of events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 8 should yield good estimates. A larger value can be used without problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

verbose

A Boolean, indicating if the function should print messages about the process.

Value

A numeric vector with the nkde values

Examples

#This is an internal function, no example provided

Bandwidth selection by likelihood cross validation worker function

Description

worker function for calculating for multiple bandwidth the cross validation likelihood to select an appropriate bandwidth in a data-driven approach

Usage

nkde_worker_bw_sel(
  lines,
  quad_events,
  events_loc,
  events,
  w,
  kernel_name,
  bws_net,
  method,
  div,
  digits,
  tol,
  sparse,
  max_depth,
  zero_strat = "min_double",
  verbose = FALSE,
  cvl = FALSE
)

Arguments

lines

A feature collection of linestrings representing the underlying network

quad_events

a feature collection of points indicating for which events the densities must be calculated

events_loc

A feature collection of points representing the location of the events

events

A feature collection of points representing the events. Multiple events can share the same location. They are linked by the goid column

w

A numeric matrix with the weight of the events for each bandwdith

kernel_name

The name of the kernel to use (string)

bws_net

A numeric matrix with the network bandwidths for each event

method

The type of NKDE to use (string)

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

zero_strat

A string indicating what to do when density is 0 when calculating LOO density estimate for an isolated event. "min_double" (default) replace the 0 value by the minimum double possible on the machine. "remove" will remove them from the final score. The first approach penalizes more strongly the small bandwidths.

verbose

A boolean

cvl

A boolean indicating if the cvl method (TRUE) or the loo (FALSE) method must be used

Examples

# no example provided, this is an internal function

pairwise distance between two vectors

Description

pairwise distance between two vectors

Usage

pair_dists(x, y)

Arguments

x

a numeric vector

y

a numeric vector

Value

a matrix width dimenion l(x) * l(y)


Plot graph

Description

Function to plot a graph (useful to check connectivity).

Usage

plot_graph(graph)

Arguments

graph

A graph object (produced with build_graph)

Examples

#This is an internal function, no example provided

Preparing results for K functions

Description

Prepare the final results at the end of the execution of the main functions calculating K or G functions.

Usage

prep_kfuncs_results(
  k_vals,
  g_vals,
  all_values,
  conf_int,
  calc_g_func,
  cross,
  dist_seq,
  return_sims
)

Arguments

k_vals

a numeric vector with the real K values

g_vals

a numeric vector with the real g values

all_values

a list with the simulated K and G values that must be arranged.

conf_int

the confidence interval parameter.

calc_g_func

a boolean indicating if the G function has been calculated.

cross

a boolean indicating if we have calculated a simple (FALSE) or a cross function.

dist_seq

a numeric vector representing the distance used for calculation

return_sims

a boolean, indicating if the simulations must be returned

Value

A list with the following values :

Examples

# no example, this is an internal function

Prior data preparation

Description

A simple function to prepare data before the NKDE calculation.

Usage

prepare_data(samples, lines, events, w, digits, tol, agg)

Arguments

samples

A feature collection of points representing the samples points

lines

A feature collection of Linestrings representing the network

events

A feature collection of points representing the events points

w

A numeric vector representing the weight of the events

digits

The number of digits to keep

tol

A float indicating the spatial tolerance when snapping events on lines

agg

A double indicating if the points must be aggregated within a distance. if NULL, then the points are aggregated by rounding the coordinates.

Value

the data prepared for the rest of the operations

Examples

#This is an internal function, no example provided

Data preparation for network_listw

Description

Function to prepare selected points and selected lines during the process.

Usage

prepare_elements_netlistw(is, grid, snapped_points, lines, maxdistance)

Arguments

is

The indices of the quadras to use in the grid

grid

A feature collection of polygons representing the quadras to split calculus

snapped_points

The start and end points snapped to the lines

lines

The lines representing the network

maxdistance

The maximum distance between two observation to considere them as neighbours.

Value

A list of two elements : selected points and selected lines

Examples

#no example provided, this is an internal function

Quartic kernel

Description

Function implementing the quartic kernel.

Usage

quartic_kernel(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ quartic kernel

Description

c++ quartic kernel

c++ quartic kernel integral

Usage

quartic_kernel_cpp(d, bw)

quartic_kernel_int_cpp(d_start, d_end, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth

d_start

a vector of start distances for which the density must be calculated

d_end

a vector of end distances for which the density must be calculated


c++ quartic kernel for one distance

Description

c++ quartic kernel for one distance

Usage

quartic_kernelos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


Remove loops

Description

Remove from a sf object with linestring type geometries the lines that have the same starting and ending point.

Usage

remove_loop_lines(lines, digits)

Arguments

lines

A sf object with linestring type geometries

digits

An integer indicating the number of digits to keep for the spatial coordinates

Value

A sf object with linestring type geometries

Examples

#This is an internal function, no example provided

Remove mirror edges

Description

Keep unique edges based on start and end point

Usage

remove_mirror_edges(lines, keep_shortest = TRUE, digits = 3, verbose = TRUE)

Arguments

lines

A feature collection of linestrings

keep_shortest

A boolean, if TRUE, then the shortest line is keeped if several lines have the same starting point and ending point. if FALSE, then the longest line is keeped.

digits

An integer indicating the number of digits to keep in coordinates

Value

A feature collection of linestrings with the mirror edges removed

Examples

#This is an internal function, no example provided

Rervese the elements in a matrix

Description

reverse the order of the elements in a matrix both column and row wise

Usage

rev_matrix(mat)

Arguments

mat

The matrix to reverse

Value

A matrix


Reverse lines

Description

A function to reverse the order of the vertices of lines

Usage

reverse_lines(lines)

Arguments

lines

A sf object with linestring type geometries

Value

A sf object with linestring type geometries

Examples

#This is an internal function, no example provided

Sanity check for the knn functions

Description

Check if all the parameters are valid for the knn functions

Usage

sanity_check_knn(
  origins,
  destinations,
  lines,
  k,
  maxdistance,
  snap_dist,
  line_weight,
  direction,
  grid_shape,
  verbose,
  digits,
  tol
)

Arguments

origins

A a feature collection of points, for each point, its k nearest neighbours will be found on the network.

destinations

A a feature collection of points, might be used if the neighbours must be found in a separate dataset. NULL if the neighbours must be found in origins.

lines

A a feature collection of linestrings representing the network

k

An integer indicating the number of neighbours to find..

maxdistance

The maximum distance between two observations to consider them as neighbours. It is useful only if a grid is used, a lower value will reduce calculating time, but one must be sure that the k nearest neighbours are within this radius. Otherwise NAs will be present in the final matrices.

snap_dist

The maximum distance to snap the start and end points on the network.

line_weight

The weighting to use for lines. Default is "length" (the geographical length), but can be the name of a column. The value is considered proportional to the geographical length of the lines.

direction

Indicates a field providing information about authorized travelling direction on lines. if NULL, then all lines can be used in both directions. Must be the name of a column otherwise. The values of the column must be "FT" (From - To), "TF" (To - From) or "Both".

grid_shape

A vector of length 2 indicating the shape of the grid to use for splitting the dataset. Default is c(1,1), so all the calculation is done in one go. It might be necessary to split it if the dataset is large.

verbose

A Boolean indicating if the function should print its progress

digits

The number of digits to retain in the spatial coordinates ( simplification used to reduce risk of topological error)

tol

A float indicating the spatial tolerance when points are added as vertices to lines.

Value

A list with two matrices, one with the index of the neighbours and one with the distances.

Examples

#no example provided, this is an internal function

Select the distance to weight function

Description

Select a function to convert distance to weights if a function is provided, this function will be vectorized.

Usage

select_dist_function(dist_func = "inverse")

Arguments

dist_func

Could be a name in c('inverse', 'identity', 'squared inverse') or a function with only one parameter x

Value

A vectorized function used to convert distance into spatial weights

Examples

#This is an internal function, no example provided

Select kernel function

Description

select the kernel function with its name.

Usage

select_kernel(name)

Arguments

name

The name of the kernel to use

Value

A kernel function

Examples

#This is an internal function, no example provided

LineString to simple Line

Description

Split the polylines of a feature collection of linestrings in simple segments at each vertex. The values of the columns are duplicated for each segment.

Usage

simple_lines(lines)

Arguments

lines

The featue collection of linestrings to modify

Value

An featue collection of linestrings

Examples


data(mtl_network)
new_lines <- simple_lines(mtl_network)


Simple NKDE algorithm

Description

Function to perform the simple nkde.

Usage

simple_nkde(graph, events, samples, bws, kernel_func, nodes, edges, div = "bw")

Arguments

graph

a graph object from igraph representing the network

events

a feature collection of points representing the events. It must be snapped on the network, and be nodes of the network. A column vertex_id must indicate for each event its corresponding node

samples

a a feature collection of points representing the sampling points. The samples must be snapped on the network. A column edge_id must indicate for each sample on which edge it is snapped.

bws

a vector indicating the kernel bandwidth (in meters) for each event

kernel_func

a function obtained with the function select_kernel

nodes

a a feature collection of points representing the nodes of the network

edges

a a feature collection of linestrings representing the edges of the network

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

Value

a dataframe with two columns. sum_k is the sum for each sample point of the kernel values. n is the number of events influencing each sample point

Examples

#This is an internal function, no example provided

Simple TNKDE algorithm

Description

Function to perform the simple tnkde.

Usage

simple_tnkde(
  graph,
  events,
  samples,
  samples_time,
  bws_net,
  bws_time,
  kernel_func,
  nodes,
  edges,
  div
)

Arguments

graph

a graph object from igraph representing the network

events

a feature collection of points representing the events. It must be snapped on the network, and be nodes of the network. A column vertex_id must indicate for each event its corresponding node

samples

a feature collection of points representing the sampling points. The samples must be snapped on the network. A column edge_id must indicate for each sample on which edge it is snapped.

samples_time

a numeric vector indicating when the densities must be sampled

bws_net

a vector indicating the network kernel bandwidth (in meters) for each event

bws_time

a vector indicating the time kernel bandwidth for each event

kernel_func

a function obtained with the function select_kernel

nodes

a feature collection of points representing the nodes of the network

edges

a feature collection of linestrings representing the edges of the network

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

Value

a list of two matrices. The first one ins the matrix of the densities, the rows are the samples and the columns the time. The second has the same dimensions and contains the number of events influencing each sample

Examples

#This is an internal function, no example provided

Simplify a network

Description

Simplify a network by applying two corrections: Healing edges and Removing mirror edges (experimental).

Usage

simplify_network(
  lines,
  digits = 3,
  heal = TRUE,
  mirror = TRUE,
  keep_shortest = TRUE,
  verbose = TRUE
)

Arguments

lines

A feature collection of linestrings

digits

An integer indicating the number of digits to keep in coordinates

heal

A boolean indicating if the healing operation must be performed

mirror

A boolean indicating if the mirror edges must be removed

keep_shortest

A boolean, if TRUE, then the shortest line is kept from mirror edges. if FALSE, then the longest line is kept.

verbose

A boolean indicating if messages and a progress bar should be displayed

Details

Healing is the operation to merge two connected linestring if the are intersecting at one extremity and do not intersect any other linestring. It helps to reduce the complexity of the network and thus can reduce calculation time. Removing mirror edges is the operation to remove edges that have the same extremities. If two edges start at the same point and end at the same point, they do not add information in the network and one can be removed to simplify the network. One can decide to keep the longest of the two edges or the shortest. NOTE: the edge healing does not consider lines directions currently!

Value

A feature collection of linestrings

Examples


data(mtl_network)
edited_lines <- simplify_network(mtl_network, digits = 3, verbose = FALSE)


Smaller subset road network of Montreal

Description

A feature collection (sf object) representing the road network of Montreal. The EPSG is 3797, and the data comes from the Montreal OpenData website. It is only a small extract in central districts used to demonstrate the main functions of spNetwork. It is mainly used internally for tests.

Usage

small_mtl_network

Format

A sf object with 1244 rows and 2 variables

TYPE

the type of road

geom

the geometry (linestrings)

Source

https://donnees.montreal.ca/dataset/geobase


Snap points to lines

Description

Snap points to their nearest lines (edited from maptools)

Usage

snapPointsToLines2(points, lines, idField = NA, ...)

Arguments

points

A feature collection of points

lines

A feature collection of linestrings

idField

The name of the column to use as index for the lines

...

unused

Value

A feature collection of points with the projected geometries

Examples

# reading the data
data(mtl_network)
data(bike_accidents)
mtl_network$LineID <- 1:nrow(mtl_network)
# snapping point to lines
snapped_points <- snapPointsToLines2(bike_accidents,
    mtl_network,
    "LineID"
)

Coordinates to unique character vector

Description

Generate a character vector based on a coordinates matrix and the maximum number of digits to keep.

Usage

sp_char_index(coords, digits)

Arguments

coords

A n * 2 matrix representing the coordinates

digits

The number of digits to keep from the coordinates

Value

A vector character vector of length n

Examples

#This is an internal function, no example provided

Split boundary of polygon

Description

A function to cut the boundary of the study area into chunks.

Usage

split_border(polygon, bw)

Arguments

polygon

The polygon representing the study area

bw

The maximum bandwidth

Value

A feature collection of linestrings


Split data with a grid

Description

Function to split the dataset according to a grid.

Usage

split_by_grid(grid, samples, events, lines, bw, tol, digits, split_all = TRUE)

Arguments

grid

A spatial grid to split the data within

samples

A feature collection of points representing the samples points

events

A feature collection of points representing the events points

lines

A feature collection of linestrings representing the network

bw

The kernel bandwidth (used to avoid edge effect)

tol

A float indicating the spatial tolerance when snapping events on lines

digits

The number of digits to keep

split_all

A boolean indicating if we must split the lines at each vertex (TRUE) or only at event vertices (FALSE)

Value

A list with the split dataset

Examples

#This is an internal function, no example provided

Split data with a grid

Description

Function to split the dataset according to a grid.

Function to split the dataset according to a grid.

Usage

split_by_grid.mc(
  grid,
  samples,
  events,
  lines,
  bw,
  tol,
  digits,
  split_all = TRUE
)

split_by_grid.mc(
  grid,
  samples,
  events,
  lines,
  bw,
  tol,
  digits,
  split_all = TRUE
)

Arguments

grid

A spatial grid to split the data within

samples

A feature collection of points representing the samples points

events

A feature collection of points representing the events points

lines

A feature collection of linestrings representing the network

bw

The kernel bandwidth (used to avoid edge effect)

tol

A float indicating the spatial tolerance when snapping events on lines

digits

The number of digits to keep

split_all

A boolean indicating if we must split the lines at each vertex (TRUE) or only at event vertices (FALSE)

Value

A list with the split dataset

A list with the split dataset

Examples

#This is an internal function, no example provided
#This is an internal function, no example provided

Split data with a grid for the adaptive bw function

Description

Function to split the dataset according to a grid for the adaptive bw function.

Usage

split_by_grid_abw(grid, events, lines, bw, tol, digits)

Arguments

grid

A spatial grid to split the data within

events

A feature collection of points representing the events

lines

A feature collection of lines representing the network

bw

The kernel bandwidth (used to avoid edge effect)

tol

A float indicating the spatial tolerance when snapping events on lines

digits

The number of digits to keep

Value

A list with the split dataset

Examples

#This is an internal function, no example provided

Split data with a grid for the adaptive bw function (multicore)

Description

Function to split the dataset according to a grid for the adaptive bw function with multicore support

Usage

split_by_grid_abw.mc(grid, events, lines, bw, tol, digits)

Arguments

grid

A spatial grid to split the data within

events

A feature collection of points representing the events points

lines

A feature collection of lines representing the network

bw

The kernel bandwidth (used to avoid edge effect)

tol

A float indicating the spatial tolerance when snapping events on lines

digits

The number of digits to keep

Value

A list with the split dataset

Examples

#This is an internal function, no example provided

Split graph components

Description

Function to split the results of build_graph and build_graph_directed into their sub components

Usage

split_graph_components(graph_result)

Arguments

graph_result

A list typically obtained from the function build_graph or build_graph_directed

Value

A list of lists, the graph_result split for each graph component

Examples

data(mtl_network)
mtl_network$length <- as.numeric(sf::st_length(mtl_network))
graph_result <- build_graph(mtl_network, 2, "length", attrs = TRUE)
sub_elements <- split_graph_components(graph_result)

Split lines at vertices in a feature collection of linestrings

Description

Split lines (feature collection of linestrings) at their nearest vertices (feature collection of points), may fail if the line geometries are self intersecting.

Usage

split_lines_at_vertex(lines, points, nearest_lines_idx, mindist)

Arguments

lines

The feature collection of linestrings to split

points

The feature collection of points to add to as vertex to the lines

nearest_lines_idx

For each point, the index of the nearest line

mindist

The minimum distance between one point and the extremity of the line to add the point as a vertex.

Value

A feature collection of linestrings

Examples


# reading the data
data(mtl_network)
data(bike_accidents)
# aggregating points within a 5 metres radius
bike_accidents$weight <- 1
agg_points <- aggregate_points(bike_accidents, 5)
mtl_network$LineID <- 1:nrow(mtl_network)
# snapping point to lines
snapped_points <- snapPointsToLines2(agg_points,
    mtl_network,
    "LineID"
)
# splitting lines
new_lines <- split_lines_at_vertex(mtl_network, snapped_points,
    snapped_points$nearest_line_id, 1)


Obtain all the bounding boxes of a feature collection

Description

Obtain all the bounding boxes of a feature collection (INTERNAL).

Usage

st_bbox_by_feature(x)

Arguments

x

a feature collection

Value

a matrix (xmin, ymin, xmax, ymax)

Examples

#This is an internal function, no example provided

sf geometry bbox

Description

Generate polygon as the bounding box of a feature collection

Usage

st_bbox_geom(x)

Arguments

x

A feature collection

Value

A feature collection of polygons

Examples

#This is an internal function, no example provided

Points along polygon boundary

Description

Generate a feature collection of points by placing points along the border of polygons of a feature collection.

Usage

surrounding_points(polygons, dist)

Arguments

polygons

A feature collection of polygons

dist

The distance between the points

Value

A feature collection of points representing the points arrond the polygond

Examples

#This is an internal function, no example provided

Temporal Kernel density estimate

Description

Calculate the Temporal kernel density estimate based on sampling points in time and events

Usage

tkde(events, w, samples, bw, kernel_name, adaptive = FALSE)

Arguments

events

A numeric vector representing the moments of occurrence of events

w

The weight of the events

samples

A numeric vector representing the moments to sample

bw

A float, the bandwidth to use

kernel_name

The name of the kernel to use

adaptive

Boolean

Value

A numeric vector with the density values at the requested timestamps

Examples

data(bike_accidents)
bike_accidents$Date <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
start <- min(bike_accidents$Date)
diff <- as.integer(difftime(bike_accidents$Date , start, units = "days"))
density <- tkde(diff, rep(1,length(diff)), seq(0,max(diff),1), 2, "quartic")

Temporal Network Kernel density estimate

Description

Calculate the Temporal Network Kernel Density Estimate based on a network of lines, sampling points in space and times, and events in space and time.

Usage

tnkde(
  lines,
  events,
  time_field,
  w,
  samples_loc,
  samples_time,
  kernel_name,
  bw_net,
  bw_time,
  adaptive = FALSE,
  adaptive_separate = TRUE,
  trim_bw_net = NULL,
  trim_bw_time = NULL,
  method,
  div = "bw",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  verbose = TRUE,
  check = TRUE
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

time_field

The name of the field in events indicating when the events occurred. It must be a numeric field

w

A vector representing the weight of each event

samples_loc

A feature collection of points representing the locations for which the densities will be estimated.

samples_time

A numeric vector indicating when the densities will be sampled

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

bw_net

The network kernel bandwidth (using the scale of the lines), can be a single float or a numeric vector if a different bandwidth must be used for each event.

bw_time

The time kernel bandwidth, can be a single float or a numeric vector if a different bandwidth must be used for each event.

adaptive

A Boolean, indicating if an adaptive bandwidth must be used. Both spatial and temporal bandwidths are adapted but separately.

adaptive_separate

A boolean indicating if the adaptive bandwidths for the time and the network dimensions must be calculated separately (TRUE) or in interaction (FALSE)

trim_bw_net

A float, indicating the maximum value for the adaptive network bandwidth

trim_bw_time

A float, indicating the maximum value for the adaptive time bandwidth

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwith) "none" (the simple sum).

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

Temporal Network Kernel Density Estimate
The TNKDE is an extension of the NKDE considering both the location of events on the network and in time. Thus, density estimation (density sampling) can be done along lines of the network and at different time. It can be used with the three NKDE (simple, discontinuous and continuous).

density in time and space
Two bandwidths must be provided, one for the network distance and one for the time distance. They are both used to calculate the contribution of each event to each sampling point. Let us consider one event E and a sample S. dnet(E,S) is the contribution to network density of E at S location and dtime(E,S) is the contribution to time density of E at S time. The total contribution is thus dnet(E,S) * dtime(E,S). If one of the two densities is 0, then the total density is 0 because the sampling point is out of the covered area by the event in time or in the network space.

adaptive bandwidth
It is possible to use an adaptive bandwidth both on the network and in time. Adaptive bandwidths are calculated using the Abramson’s smoothing regimen (Abramson 1982). To do so, the original fixed bandwidths must be specified (bw_net and bw_time parameters). The maximum size of the two local bandwidths can be limited with the parameters trim_bw_net and trim_bw_time.

Diggle correction factor
A set of events can be limited in both space (limits of the study area) and time ( beginning and ending of the data collection period). These limits induce lower densities at the border of the set of events, because they are not sampled outside the limits. It is possible to apply the Diggle correction factor (Diggle 1985) in both the network and time spaces to minimize this effect.

Separated or simultaneous adaptive bandwidth
When the parameter adaptive is TRUE, one can choose between using separated calculation of network and temporal bandwidths, and calculating them simultaneously. In the first case (default), the network bandwidths are determined for each event by considering only their locations and the time bandwidths are determined by considering only there time stamps. In the second case, for each event, the spatio-temporal density at its location on the network and in time is estimated and used to determine both the network and temporal bandwidths. This second approach must be preferred if the events are characterized by a high level of spatio-temporal autocorrelation.

Value

A matrix with the estimated density for each sample point (rows) at each timestamp (columns). If adaptive = TRUE, the function returns a list with two slots: k (the matrix with the density values) and events (a feature collection of points with the local bandwidths).

Examples


# loading the data
data(mtl_network)
data(bike_accidents)

# converting the Date field to a numeric field (counting days)
bike_accidents$Time <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
start <- as.POSIXct("2016/01/01", format = "%Y/%m/%d")
bike_accidents$Time <- difftime(bike_accidents$Time, start, units = "days")
bike_accidents$Time <- as.numeric(bike_accidents$Time)

# creating sample points
lixels <- lixelize_lines(mtl_network, 50)
sample_points <- lines_center(lixels)

# choosing sample in times (every 10 days)
sample_time <- seq(0, max(bike_accidents$Time), 10)

# calculating the densities
tnkde_densities <- tnkde(lines = mtl_network,
    events = bike_accidents, time_field = "Time",
    w = rep(1, nrow(bike_accidents)),
    samples_loc = sample_points,
    samples_time = sample_time,
    kernel_name = "quartic",
    bw_net = 700, bw_time = 60, adaptive = TRUE,
    trim_bw_net = 900, trim_bw_time = 80,
    method = "discontinuous", div = "bw",
    max_depth = 10, digits = 2, tol = 0.01,
    agg = 15, grid_shape = c(1,1),
    verbose  = FALSE)


Temporal Network Kernel density estimate (multicore)

Description

Calculate the Temporal Network Kernel Density Estimate based on a network of lines, sampling points in space and times, and events in space and time with multicore support.

Usage

tnkde.mc(
  lines,
  events,
  time_field,
  w,
  samples_loc,
  samples_time,
  kernel_name,
  bw_net,
  bw_time,
  adaptive = FALSE,
  adaptive_separate = TRUE,
  trim_bw_net = NULL,
  trim_bw_time = NULL,
  method,
  div = "bw",
  diggle_correction = FALSE,
  study_area = NULL,
  max_depth = 15,
  digits = 5,
  tol = 0.1,
  agg = NULL,
  sparse = TRUE,
  grid_shape = c(1, 1),
  verbose = TRUE,
  check = TRUE
)

Arguments

lines

A feature collection of linestrings representing the underlying network. The geometries must be simple Linestrings (may crash if some geometries are invalid) without MultiLineSring.

events

events A feature collection of points representing the events on the network. The points will be snapped on the network to their closest line.

time_field

The name of the field in events indicating when the events occurred. It must be a numeric field

w

A vector representing the weight of each event

samples_loc

A feature collection of points representing the locations for which the densities will be estimated.

samples_time

A numeric vector indicating when the densities will be sampled

kernel_name

The name of the kernel to use. Must be one of triangle, gaussian, tricube, cosine, triweight, quartic, epanechnikov or uniform.

bw_net

The network kernel bandwidth (using the scale of the lines), can be a single float or a numeric vector if a different bandwidth must be used for each event.

bw_time

The time kernel bandwidth, can be a single float or a numeric vector if a different bandwidth must be used for each event.

adaptive

A Boolean, indicating if an adaptive bandwidth must be used. Both spatial and temporal bandwidths are adapted but separately.

adaptive_separate

A boolean indicating if the adaptive bandwidths for the time and the network dimensions must be calculated separately (TRUE) or in interaction (FALSE)

trim_bw_net

A float, indicating the maximum value for the adaptive network bandwidth

trim_bw_time

A float, indicating the maximum value for the adaptive time bandwidth

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see nkde details for more information)

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwith) "none" (the simple sum).

diggle_correction

A Boolean indicating if the correction factor for edge effect must be used.

study_area

A feature collection of polygons representing the limits of the study area.

max_depth

when using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has many small edges (area with many of intersections and many events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 10 should yield good estimates in most cases. A larger value can be used without a problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

agg

A double indicating if the events must be aggregated within a distance. If NULL, the events are aggregated only by rounding the coordinates.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

grid_shape

A vector of two values indicating how the study area must be split when performing the calculus. Default is c(1,1) (no split). A finer grid could reduce memory usage and increase speed when a large dataset is used. When using multiprocessing, the work in each grid is dispatched between the workers.

verbose

A Boolean, indicating if the function should print messages about the process.

check

A Boolean indicating if the geometry checks must be run before the operation. This might take some times, but it will ensure that the CRS of the provided objects are valid and identical, and that geometries are valid.

Details

For details, see help(tnkde) and help(nkde)

Value

A matrix with the estimated density for each sample point (rows) at each timestamp (columns). If adaptive = TRUE, the function returns a list with two slots: k (the matrix with the density values) and events (a feature collection of points with the local bandwidths).

Examples


# loading the data
data(mtl_network)
data(bike_accidents)

# converting the Date field to a numeric field (counting days)
bike_accidents$Time <- as.POSIXct(bike_accidents$Date, format = "%Y/%m/%d")
start <- as.POSIXct("2016/01/01", format = "%Y/%m/%d")
bike_accidents$Time <- difftime(bike_accidents$Time, start, units = "days")
bike_accidents$Time <- as.numeric(bike_accidents$Time)

# creating sample points
lixels <- lixelize_lines(mtl_network, 50)
sample_points <- lines_center(lixels)

# choosing sample in times (every 10 days)
sample_time <- seq(0, max(bike_accidents$Time), 10)

future::plan(future::multisession(workers=1))

# calculating the densities
tnkde_densities <- tnkde.mc(lines = mtl_network,
    events = bike_accidents, time_field = "Time",
    w = rep(1, nrow(bike_accidents)),
    samples_loc = sample_points,
    samples_time = sample_time,
    kernel_name = "quartic",
    bw_net = 700, bw_time = 60, adaptive = TRUE,
    trim_bw_net = 900, trim_bw_time = 80,
    method = "discontinuous", div = "bw",
    max_depth = 10, digits = 2, tol = 0.01,
    agg = 15, grid_shape = c(1,1),
    verbose  = FALSE)

## make sure any open connections are closed afterward
if (!inherits(future::plan(), "sequential")) future::plan(future::sequential)


The exposed function to calculate TNKDE likelihood cv

Description

The exposed function to calculate TNKDE likelihood cv (INTERNAL)

Usage

tnkde_get_loo_values(
  method,
  neighbour_list,
  sel_events,
  sel_events_wid,
  sel_events_time,
  events,
  events_wid,
  events_time,
  weights,
  bws_net,
  bws_time,
  kernel_name,
  line_list,
  max_depth,
  min_tol
)

Arguments

method

a string, one of "simple", "continuous", "discontinuous"

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

sel_events

a Numeric vector indicating the selected events (id of nodes)

sel_events_wid

a Numeric Vector indicating the unique if of the selected events

sel_events_time

a Numeric Vector indicating the time of the selected events

events

a NumericVector indicating the nodes in the graph being events

events_wid

a NumericVector indicating the unique id of all the events

events_time

a NumericVector indicating the timestamp of each event

weights

a cube with the weights associated with each event for each bws_net and bws_time.

bws_net

an arma::vec with the network bandwidths to consider

bws_time

an arma::vec with the time bandwidths to consider

kernel_name

a string with the name of the kernel to use

line_list

a DataFrame describing the lines

max_depth

the maximum recursion depth

min_tol

a double indicating by how much 0 in density values must be replaced

Value

a matrix with the CV score for each pair of bandiwdths

Examples

# no example provided, this is an internal function

The exposed function to calculate TNKDE likelihood cv

Description

The exposed function to calculate TNKDE likelihood cv (INTERNAL) when an adaptive bandwidth is used

Usage

tnkde_get_loo_values2(
  method,
  neighbour_list,
  sel_events,
  sel_events_wid,
  sel_events_time,
  events,
  events_wid,
  events_time,
  weights,
  bws_net,
  bws_time,
  kernel_name,
  line_list,
  max_depth,
  min_tol
)

Arguments

method

a string, one of "simple", "continuous", "discontinuous"

neighbour_list

a List, giving for each node an IntegerVector with its neighbours

sel_events

a Numeric vector indicating the selected events (id of nodes)

sel_events_wid

a Numeric Vector indicating the unique if of the selected events

sel_events_time

a Numeric Vector indicating the time of the selected events

events

a NumericVector indicating the nodes in the graph being events

events_wid

a NumericVector indicating the unique id of all the events

events_time

a NumericVector indicating the timestamp of each event

weights

a cube with the weights associated with each event for each bws_net and bws_time.

bws_net

an arma::cube of three dimensions with the network bandwidths calculated for each observation for each global time and network bandwidths

bws_time

an arma::cube of three dimensions with the time bandwidths calculated for each observation for each global time and network bandwidths

kernel_name

a string with the name of the kernel to use

line_list

a DataFrame describing the lines

max_depth

the maximum recursion depth

min_tol

a double indicating by how much 0 in density values must be replaced

Value

a matrix with the CV score for each pair of global bandiwdths

Examples

# no example provided, this is an internal function

TNKDE worker

Description

The worker function for tnkde and tnkde.mc

Usage

tnkde_worker(
  lines,
  events_loc,
  events,
  samples_loc,
  samples_time,
  kernel_name,
  bw_net,
  bw_time,
  bws_net,
  bws_time,
  method,
  div,
  digits,
  tol,
  sparse,
  max_depth,
  verbose = FALSE
)

Arguments

lines

A feature collection of linestrings with the sampling points. The geometries must be simple Linestrings (may crash if some geometries are invalid)

events_loc

A feature collection of points representing the aggergated events on the network. The points will be snapped on the network.

events

A feature collection of points representing the base events on the network

samples_loc

A feature collection of points representing the locations for which the densities will be estimated.

samples_time

A numeric vector representing when each density will be estimated

kernel_name

The name of the kernel to use

bw_net

The global network kernel bandwidth

bw_time

The global time kernel bandwidth

bws_net

The network kernel bandwidth (in meters) for each event

bws_time

The time bandwidth for each event

method

The method to use when calculating the NKDE, must be one of simple / discontinuous / continuous (see details for more information)

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

digits

The number of digits to keep in the spatial coordinates. It ensures that topology is good when building the network. Default is 3

tol

When adding the events and the sampling points to the network, the minimum distance between these points and the lines extremities. When points are closer, they are added at the extremity of the lines.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. Regular matrices are faster, but require more memory and could lead to error, in particular with multiprocessing. Sparse matrices are slower, but require much less memory.

max_depth

When using the continuous and discontinuous methods, the calculation time and memory use can go wild if the network has a lot of small edges (area with a lot of intersections and a lot of events). To avoid it, it is possible to set here a maximum depth. Considering that the kernel is divided at intersections, a value of 8 should yield good estimates. A larger value can be used without problem for the discontinuous method. For the continuous method, a larger value will strongly impact calculation speed.

verbose

A Boolean, indicating if the function should print messages about the process.

Value

A numeric matrix with the nkde values

Examples

#This is an internal function, no example provided

Worker function fo Bandwidth selection by likelihood cross validation for temporal NKDE

Description

Calculate for multiple network and time bandwidths the cross validation likelihood to select an appropriate bandwidth in a data-driven approach (INTERNAL)

Usage

tnkde_worker_bw_sel(
  lines,
  quad_events,
  events_loc,
  events,
  w,
  kernel_name,
  bws_net,
  bws_time,
  method,
  div,
  digits,
  tol,
  sparse,
  max_depth,
  verbose = FALSE,
  cvl = FALSE
)

Arguments

lines

A feature collection of linestrings representing the underlying network

quad_events

a feature collection of points indicating for which events the densities must be calculated

events_loc

A feature collection of points representing the location of the events

events

A feature collection of points representing the events. Multiple events can share the same location. They are linked by the goid column

w

A numeric array with the weight of the events for each pair of bandwidth

kernel_name

The name of the kernel to use (string)

bws_net

A numeric vector with the network bandwidths. Could also be an array if an adaptive bandwidth is calculated.

bws_time

A numeric vector with the time bandwidths. Could also be an array if an adaptive bandwidth is calculated.

method

The type of NKDE to use (string)

div

The type of divisor (not used currently)

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

max_depth

The maximum depth of recursion

verbose

A boolean

cvl

A boolean indicating if the cvl method (TRUE) or the loo (FALSE) method must be used

Value

An array with the CV score for each pair of bandiwdths (rows and lines) for each event (slices)

Examples

# no example provided, this is an internal function

The main function to calculate continuous TNKDE (with ARMADILO and sparse matrix)

Description

The main function to calculate continuous TNKDE (with ARMADILO and sparse matrix)

The main function to calculate continuous TNKDE (with ARMADILO and integer matrix)

Usage

continuous_tnkde_cpp_arma_sparse(
  neighbour_list,
  events,
  events_time,
  weights,
  samples,
  samples_time,
  bws_net,
  bws_time,
  kernel_name,
  nodes,
  line_list,
  max_depth,
  verbose,
  div
)

continuous_tnkde_cpp_arma(
  neighbour_list,
  events,
  events_time,
  weights,
  samples,
  samples_time,
  bws_net,
  bws_time,
  kernel_name,
  nodes,
  line_list,
  max_depth,
  verbose,
  div
)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

events_time

a numeric vector with the time for the events

weights

a numeric vector of the weight of each event

samples

a DataFrame of the samples (with spatial coordinates and belonging edge)

samples_time

a NumericVector indicating when to do the samples

bws_net

the network kernel bandwidths for each event

bws_time

the time kernel bandwidths for each event

kernel_name

the name of the kernel to use

nodes

a DataFrame representing the nodes of the graph (with spatial coordinates)

line_list

a DataFrame representing the lines of the graph

max_depth

the maximum recursion depth (after which recursion is stopped)

verbose

a boolean indicating if the function must print its progress

div

a string indicating how to standardize the kernel values

Value

a List with two matrices: the kernel values (sum_k) and the number of events for each sample (n)

a List with two matrices: the kernel values (sum_k) and the number of events for each sample (n)


The main function to calculate discontinuous NKDE (ARMA and Integer matrix)

Description

The main function to calculate discontinuous NKDE (ARMA and Integer matrix)

Usage

discontinuous_tnkde_cpp_arma(
  neighbour_list,
  events,
  weights,
  events_time,
  samples,
  samples_time,
  bws_net,
  bws_time,
  kernel_name,
  nodes,
  line_list,
  max_depth,
  verbose,
  div = "bw"
)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

weights

a numeric vector of the weight of each event

events_time

a numeric vector with the time for the events

samples

a DataFrame of the samples (with spatial coordinates and belonging edge)

samples_time

a NumericVector indicating when to do the samples

bws_net

the network kernel bandwidths for each event

kernel_name

the name of the kernel function to use

nodes

a DataFrame representing the nodes of the graph (with spatial coordinates)

line_list

a DataFrame representing the lines of the graph

max_depth

the maximum recursion depth (after which recursion is stopped)

verbose

a boolean indicating if the function must print its progress

div

a string indicating how to standardize the kernel values

Value

a List with two matrices: the kernel values (sum_k) and the number of events for each sample (n)


The main function to calculate discontinuous NKDE (ARMA and sparse matrix)

Description

The main function to calculate discontinuous NKDE (ARMA and sparse matrix)

Usage

discontinuous_tnkde_cpp_arma_sparse(
  neighbour_list,
  events,
  weights,
  events_time,
  samples,
  samples_time,
  bws_net,
  bws_time,
  kernel_name,
  nodes,
  line_list,
  max_depth,
  verbose,
  div = "bw"
)

Arguments

neighbour_list

a list of the neighbours of each node

events

a numeric vector of the node id of each event

weights

a numeric vector of the weight of each event

events_time

a numeric vector with the time for the events

samples

a DataFrame of the samples (with spatial coordinates and belonging edge)

samples_time

a NumericVector indicating when to do the samples

bws_net

the network kernel bandwidths for each event

kernel_name

the name of the kernel function to use

nodes

a DataFrame representing the nodes of the graph (with spatial coordinates)

line_list

a DataFrame representing the lines of the graph

max_depth

the maximum recursion depth (after which recursion is stopped)

verbose

a boolean indicating if the function must print its progress

div

a string indicating how to standardize the kernel values

Value

a List with two matrices: the kernel values (sum_k) and the number of events for each sample (n)


triangle kernel

Description

Function implementing the triangle kernel.

Usage

triangle_kernel(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ triangle kernel

Description

c++ triangle kernel

Usage

triangle_kernel_cpp(d, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ triangle kernel for one distance

Description

c++ triangle kernel for one distance

Usage

triangle_kernelos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


Tricube kernel

Description

Function implementing the tricube kernel.

Usage

tricube_kernel(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ tricube kernel

Description

c++ tricube kernel

Usage

tricube_kernel_cpp(d, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ tricube kernel for one distance

Description

c++ tricube kernel for one distance

Usage

tricube_kernelos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


Helper for isochrones lines cutting

Description

last operation for isochrone calculation, cutting the lines at their begining and ending. This is a worker function for calc_isochrones.

Usage

trim_lines_at(df1, graph_result, d, dd, i, donught)

Arguments

df1

A features collection of linestrings with some specific fields.

graph_result

A list produced by the functions build_graph_directed or build_graph.

d

the end distance of this isochrones.

dd

the start distance of this isochrones.

i

the actual iteration.

donught

A boolean indicating if the returned isochrone will be plained or a donught.

Value

A feature collection of lines


Triweight kernel

Description

Function implementing the triweight kernel.

Usage

triweight_kernel(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ triweight kernel

Description

c++ triweight kernel

Usage

triweight_kernel_cpp(d, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ triweight kernel for one distance

Description

c++ triweight kernel for one distance

Usage

triweight_kernelos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


Uniform kernel

Description

Function implementing the uniform kernel.

Usage

uniform_kernel(d, bw)

Arguments

d

The distance from the event

bw

The bandwidth used for the kernel

Value

The estimated density

Examples

#This is an internal function, no example provided

c++ uniform kernel

Description

c++ uniform kernel

Usage

uniform_kernel_cpp(d, bw)

Arguments

d

a vector of distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


c++ uniform kernel for one distance

Description

c++ uniform kernel for one distance

Usage

uniform_kernelos(d, bw)

Arguments

d

a double, the distances for which the density must be calculated

bw

a double representing the size of the kernel bandwidth


Worker function for adaptive bandwidth for TNDE

Description

The worker function to calculate Adaptive bandwidths according to Abramson’s smoothing regimen for TNKDE with a space-time interaction (INTERNAL).

Usage

worker_adaptive_bw_tnkde(
  lines,
  quad_events,
  events_loc,
  events,
  w,
  kernel_name,
  bw_net,
  bw_time,
  method,
  div,
  digits,
  tol,
  sparse,
  max_depth,
  verbose = FALSE
)

Arguments

lines

A feature collection of linestrings representing the underlying network

quad_events

a feature collection of points indicating for which events the densities must be calculated

events_loc

A feature collection of points representing the location of the events

events

A feature collection of points representing the events. Multiple events can share the same location. They are linked by the goid column

w

A numeric vector with the weight of the events

kernel_name

The name of the kernel to use (string)

bw_net

The fixed kernel bandwidth for the network dimension. Can also be a vector if several bandwidth must be used.

bw_time

The fixed kernel bandwidth for the time dimension. Can also be a vector if several bandwidth must be used.

method

The type of NKDE to use (string)

div

The divisor to use for the kernel. Must be "n" (the number of events within the radius around each sampling point), "bw" (the bandwidth) "none" (the simple sum).

digits

The number of digits to retain from the spatial coordinates. It ensures that topology is good when building the network. Default is 3. Too high a precision (high number of digits) might break some connections

tol

A float indicating the minimum distance between the events and the lines' extremities when adding the point to the network. When points are closer, they are added at the extremity of the lines.

sparse

A Boolean indicating if sparse or regular matrices should be used by the Rcpp functions. These matrices are used to store edge indices between two nodes in a graph. Regular matrices are faster, but require more memory, in particular with multiprocessing. Sparse matrices are slower (a bit), but require much less memory.

max_depth

An integer, the maximum depth to reach for continuous and discontinuous NKDE

verbose

A Boolean, indicating if the function should print messages about the process.

Value

A vector with the local bandwidths or an array if bw_net and bw_time are vectors

Examples

#This is an internal function, no example provided