Title: Cluster Origin-Destination Flow Data
Version: 0.1.0
Description: Provides functionality for clustering origin-destination (OD) pairs, representing desire lines (or flows). This includes creating distance matrices between OD pairs and passing distance matrices to a clustering algorithm. See the academic paper Tao and Thill (2016) <doi:10.1111/gean.12100> for more details on spatial clustering of flows.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
URL: https://hussein-mahfouz.github.io/flowcluster/
Depends: R (≥ 4.1.0)
Imports: sf, dbscan, dplyr, glue, lwgeom, tibble, units, tidyr, tidyselect
LazyData: true
Suggests: testthat (≥ 3.0.0)
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-07-01 13:35:02 UTC; hussein
Author: Hussein Mahfouz ORCID iD [aut, cre], Robin Lovelace ORCID iD [aut]
Maintainer: Hussein Mahfouz <husseinmahfouz93@gmail.com>
Repository: CRAN
Date/Publication: 2025-07-05 18:50:11 UTC

Assign Unique IDs to Flows (internal) Internal helper for assigning unique IDs to flows based on spatial columns. Used by add_xyuv()

Description

Assign Unique IDs to Flows (internal) Internal helper for assigning unique IDs to flows based on spatial columns. Used by add_xyuv()

Usage

add_flow_ids(x)

Arguments

x

tibble with origin, destination, x, y, u, v columns

Value

tibble with flow_ID column


Add Length Column to Flow Data

Description

Also checks that 'origin' and 'destination' columns are present.

Usage

add_flow_length(x)

Arguments

x

sf object of flows (LINESTRING, projected CRS)

Value

sf object with an additional length_m column (od length in meters)

Examples

flows <- sf::st_transform(flows_leeds, 3857)
flows <- add_flow_length(flows)

Add Start/End Coordinates & Flow IDs

Description

Add Start/End Coordinates & Flow IDs

Usage

add_xyuv(x)

Arguments

x

sf object of flows

Value

tibble with x, y, u, v, flow_ID columns

Examples

flows <- sf::st_transform(flows_leeds, 3857)
flows <- add_flow_length(flows)
flows <- add_xyuv(flows)

Cluster Flows using DBSCAN

Description

See dbscan for details on the DBSCAN algorithm.

Usage

cluster_flows_dbscan(dist_mat, w_vec, x, eps, minPts)

Arguments

dist_mat

distance matrix

w_vec

weight vector

x

flows tibble with flow_ID

eps

DBSCAN epsilon parameter

minPts

DBSCAN minPts parameter

Value

flows tibble with an additional cluster column

Examples

flows <- sf::st_transform(flows_leeds, 3857)
flows <- head(flows, 100) # for testing
# Add flow lengths and coordinates
flows <- add_flow_length(flows)
# filter by length
flows <- filter_by_length(flows, length_min = 5000, length_max = 12000)
flows <- add_xyuv(flows)
# Calculate distances
distances <- flow_distance(flows, alpha = 1.5, beta = 0.5)
dmat <- distance_matrix(distances)
wvec <- weight_vector(dmat, flows, weight_col = "count")
clustered <- cluster_flows_dbscan(dmat, wvec, flows, eps = 8, minPts = 70)

Sensitivity analysis of DBSCAN parameters for flow clustering. The function allows you to test different combinations of epsilon and minPts parameters for clustering flows using DBSCAN. It can be used to determine what parameter values make sense for your data

Description

Sensitivity analysis of DBSCAN parameters for flow clustering. The function allows you to test different combinations of epsilon and minPts parameters for clustering flows using DBSCAN. It can be used to determine what parameter values make sense for your data

Usage

dbscan_sensitivity(
  dist_mat,
  flows,
  options_epsilon,
  options_minpts,
  w_vec = NULL
)

Arguments

dist_mat

a precalculated distance matrix between desire lines (output of distance_matrix())

flows

the original flows tibble (must contain flow_ID and 'count' column)

options_epsilon

a vector of options for the epsilon parameter

options_minpts

a vector of options for the minPts parameter

w_vec

Optional precomputed weight vector (otherwise computed internally from 'count' column)

Value

a tibble with columns: id (to identify eps and minpts), cluster, size (number of desire lines in cluster), count_sum (total count per cluster)

Examples

flows <- sf::st_transform(flows_leeds, 3857)
flows <- head(flows, 1000) # for testing
# Add flow lengths and coordinates
flows <- add_flow_length(flows)
# filter by length
flows <- filter_by_length(flows, length_min = 5000, length_max = 12000)
# Add x, y, u, v coordinates to flows
flows <- add_xyuv(flows)
# Calculate distance matrix
distances <- flow_distance(flows, alpha = 1.5, beta = 0.5)
dmat <- distance_matrix(distances)
# Generate weight vector
w_vec <- weight_vector(dmat, flows, weight_col = "count")

# Define the parameters for sensitivity analysis
options_epsilon <- seq(1, 10, by = 2)
options_minpts <- seq(10, 100, by = 10)
# # Run the sensitivity analysis
results <- dbscan_sensitivity(
  dist_mat = dmat,
  flows = flows,
  options_epsilon = options_epsilon,
  options_minpts = options_minpts,
  w_vec = w_vec
)

Convert Long-Format Distance Tibble to Matrix

Description

Convert Long-Format Distance Tibble to Matrix

Usage

distance_matrix(distances, distance_col = "fds")

Arguments

distances

tibble with columns flow_ID_a, flow_ID_b, and distance

distance_col

column name for distance (default "fds")

Value

distance matrix (tibble with rownames). The matrix has flow_ID_a as rownames and flow_ID_b as column names. This function converts the output of flow_distance() into a format suitable for the dbscan clustering algorithm.

Examples

flows <- sf::st_transform(flows_leeds, 3857)
flows <- head(flows, 100) # for testing
# Add flow lengths and coordinates
flows <- add_flow_length(flows)
flows <- add_xyuv(flows)
# Calculate distances
distances <- flow_distance(flows, alpha = 1.5, beta = 0.5)
dmat <- distance_matrix(distances)

Filter Flows by Length

Description

Filter Flows by Length

Usage

filter_by_length(x, length_min = 0, length_max = Inf)

Arguments

x

sf object with length_m

length_min

minimum length (default 0)

length_max

maximum length (default Inf)

Value

filtered sf object. Flows with length_m outside the specified range are removed.

Examples

flows <- sf::st_transform(flows_leeds, 3857)
flows <- add_flow_length(flows)
flows <- filter_by_length(flows, length_min = 5000, length_max = 12000)

Calculate Flow Distance and Dissimilarity

Description

This function calculates flow distance and dissimilarity measures between all pairs of flows based on the method described in @tao2016spatial.

Usage

flow_distance(x, alpha = 1, beta = 1)

Arguments

x

tibble with flow_ID, x, y, u, v, length_m

alpha

numeric, origin weight

beta

numeric, destination weight

Value

tibble of all OD pairs with fd, fds columns

References

Tao, R., Thill, J.-C., 2016. Spatial cluster detection in spatial flow data. Geographical Analysis 48, 355–372. https://doi.org/10.1111/gean.12100

Examples

flows <- sf::st_transform(flows_leeds, 3857)
flows <- head(flows, 100) # for testing
# Add flow lengths and coordinates
flows <- add_flow_length(flows)
flows <- add_xyuv(flows)
# Calculate distances
distances <- flow_distance(flows, alpha = 1.5, beta = 0.5)

Example flow data for Leeds. It is from the 2021 census, and it contains all Origin - Destination flows at the MSOA level. For more info on census flow data, see the ONS documentation See data-raw/flows_leeds.R for how this data was created.

Description

Example flow data for Leeds. It is from the 2021 census, and it contains all Origin - Destination flows at the MSOA level. For more info on census flow data, see the ONS documentation See data-raw/flows_leeds.R for how this data was created.

Usage

flows_leeds

Format

An object of class sf with LINESTRING geometry. It has the following columns:

origin

MSOA code of origin zone

destination

MSOA code of destination zone

count

number of people moving from origin to destination

geometry

desire line between origin and destination

Source

https://www.nomisweb.co.uk/sources/census_2021_od


Generate Weight Vector from Flows

Description

Generate Weight Vector from Flows

Usage

weight_vector(dist_mat, x, weight_col = "count")

Arguments

dist_mat

distance matrix

x

flows tibble with flow_ID and weight_col

weight_col

column to use as weights (default = "count")

Value

numeric weight vector. Each element corresponds to a flow in the distance matrix, and is used as a weight in the DBSCAN clustering algorithm.

Examples

flows <- sf::st_transform(flows_leeds, 3857)
flows <- head(flows, 100) # for testing
# Add flow lengths and coordinates
flows <- add_flow_length(flows)
flows <- add_xyuv(flows)
# Calculate distances
distances <- flow_distance(flows, alpha = 1.5, beta = 0.5)
dmat <- distance_matrix(distances)
wvec <- weight_vector(dmat, flows, weight_col = "count")