Title: | Get Spanish Origin-Destination Data |
Version: | 0.2.0 |
Description: | Gain seamless access to origin-destination (OD) data from the Spanish Ministry of Transport, hosted at https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/opendata-movilidad. This package simplifies the management of these large datasets by providing tools to download zone boundaries, handle associated origin-destination data, and process it efficiently with the 'duckdb' database interface. Local caching minimizes repeated downloads, streamlining workflows for researchers and analysts. Extensive documentation is available at https://ropenspain.github.io/spanishoddata/index.html, offering guides on creating static and dynamic mobility flow visualizations and transforming large datasets into analysis-ready formats. |
License: | MIT + file LICENSE |
URL: | https://rOpenSpain.github.io/spanishoddata/, https://github.com/rOpenSpain/spanishoddata |
BugReports: | https://github.com/rOpenSpain/spanishoddata/issues |
Depends: | R (≥ 4.1.0) |
Imports: | checkmate, DBI, digest, dplyr, duckdb (≥ 0.5.0), fs, glue, here, httr2, jsonlite, lifecycle, lubridate, memoise, openssl, parallelly, paws.storage (≥ 0.4.0), purrr, readr, rlang, sf, stats, stringr, tibble, xml2 |
Suggests: | flowmapblue, flowmapper (≥ 0.1.2), furrr, future, future.mirai, hexSticker, mapSpain, quarto, remotes, scales, testthat (≥ 3.0.0), tidyverse |
VignetteBuilder: | quarto |
Config/Needs/website: | rmarkdown |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-06-15 20:00:52 UTC; ek |
Author: | Egor Kotov |
Maintainer: | Egor Kotov <kotov.egor@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-06-15 23:20:02 UTC |
spanishoddata: Get Spanish Origin-Destination Data
Description
Gain seamless access to origin-destination (OD) data from the Spanish Ministry of Transport, hosted at https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/opendata-movilidad. This package simplifies the management of these large datasets by providing tools to download zone boundaries, handle associated origin-destination data, and process it efficiently with the 'duckdb' database interface. Local caching minimizes repeated downloads, streamlining workflows for researchers and analysts. Extensive documentation is available at https://ropenspain.github.io/spanishoddata/index.html, offering guides on creating static and dynamic mobility flow visualizations and transforming large datasets into analysis-ready formats.
Author(s)
Maintainer: Egor Kotov kotov.egor@gmail.com (ORCID)
Authors:
Robin Lovelace rob00x@gmail.com (ORCID)
Other contributors:
Eugeni Vidal-Tortosa (ORCID) [contributor]
See Also
Useful links:
Report bugs at https://github.com/rOpenSpain/spanishoddata/issues
Global Quiet Parameter
Description
Documentation for the quiet
parameter, used globally.
Usage
global_quiet_param(quiet = FALSE)
Arguments
quiet |
A |
Value
Nothing. This function is just a placeholder for global quiet parameter.
Checks if a package is installed and informs the user if not
Description
This is wrapper around rlang::check_installed;
instead of erroring out if the check fails it returns FALSE
.
However, unlike rlang::is_installed, it emits a message to the user.
Usage
spod_assert_package(...)
Arguments
... |
Arguments passed on to
|
Get available data list
Description
Get a table with links to available data files for the specified data version. Optionally check (see arguments) the file size and availability of data files previously downloaded into the cache directory specified with SPANISH_OD_DATA_DIR environment variable (set by spod_set_data_dir()
) or a custom path specified with data_dir
argument. By default the data is fetched from Amazon S3 bucket where the data is stored. If that fails, the function falls back to downloading an XML file from the Spanish Ministry of Transport website. You can also control this behaviour with use_s3
argument.
Usage
spod_available_data(
ver = 2,
check_local_files = FALSE,
quiet = FALSE,
data_dir = spod_get_data_dir(),
use_s3 = TRUE,
force = FALSE
)
Arguments
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
check_local_files |
Logical. Whether to check if the local files exist and get the file size. Defaults to |
quiet |
A |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
use_s3 |
|
force |
Logical. If |
Value
A tibble with links, release dates of files in the data, dates of data coverage, local paths to files, and the download status.
- target_url
character
. The URL link to the data file.- pub_ts
POSIXct
. The timestamp of when the file was published.- file_extension
character
. The file extension of the data file (e.g., 'tar', 'gz').- data_ym
Date
. The year and month of the data coverage, if available.- data_ymd
Date
. The specific date of the data coverage, if available.- study
factor
. Study category derived from the URL (e.g., 'basic', 'complete', 'routes').- type
factor
. Data type category derived from the URL (e.g., 'number_of_trips', 'origin-destination', 'overnight_stays', 'data_quality', 'metadata').- period
factor
. Temporal granularity category derived from the URL (e.g., 'day', 'month').- zones
factor
. Geographic zone classification derived from the URL (e.g., 'districts', 'municipalities', 'large_urban_areas').- local_path
character
. The local file path where the data is (or going to be) stored.- downloaded
logical
. Indicator of whether the data file has been downloaded locally. This is only available ifcheck_local_files
isTRUE
.
Examples
# Set data dir for file downloads
spod_set_data_dir(tempdir())
# Get available data list for v1 (2020-2021) data
spod_available_data(ver = 1)
# Get available data list for v2 (2022 onwards) data
spod_available_data(ver = 2)
# Get available data list for v2 (2022 onwards) data
# while also checking for local files that are already downloaded
spod_available_data(ver = 2, check_local_files = TRUE)
Get available data list from Amazon S3 storage
Description
Get a table with links to available data files for the specified data version from Amazon S3 storage.
Usage
spod_available_data_s3(
ver = c(1, 2),
force = FALSE,
quiet = FALSE,
data_dir = spod_get_data_dir()
)
Arguments
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
force |
Logical. If |
quiet |
A |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
Value
A tibble with links, release dates of files in the data, dates of data coverage, local paths to files, and the download status.
Get the available v1 data list
Description
This function provides a table of the available data list of MITMA v1 (2020-2021), both remote and local.
Usage
spod_available_data_v1(
data_dir = spod_get_data_dir(),
check_local_files = FALSE,
use_s3 = TRUE,
force = FALSE,
quiet = FALSE
)
Arguments
Value
A tibble with links, release dates of files in the data, dates of data coverage, local paths to files, and the download status.
- target_url
character
. The URL link to the data file.- pub_ts
POSIXct
. The timestamp of when the file was published.- file_extension
character
. The file extension of the data file (e.g., 'tar', 'gz').- data_ym
Date
. The year and month of the data coverage, if available.- data_ymd
Date
. The specific date of the data coverage, if available.- study
factor
. Study category derived from the URL (e.g., 'basic', 'complete', 'routes').- type
factor
. Data type category derived from the URL (e.g., 'number_of_trips', 'origin-destination', 'overnight_stays', 'data_quality', 'metadata').- period
factor
. Temporal granularity category derived from the URL (e.g., 'day', 'month').- zones
factor
. Geographic zone classification derived from the URL (e.g., 'districts', 'municipalities', 'large_urban_areas').- local_path
character
. The local file path where the data is (or going to be) stored.- downloaded
logical
. Indicator of whether the data file has been downloaded locally. This is only available ifcheck_local_files
isTRUE
.
Get the data dictionary
Description
This function retrieves the data dictionary for the specified data directory.
Usage
spod_available_data_v2(
data_dir = spod_get_data_dir(),
check_local_files = FALSE,
use_s3 = TRUE,
force = FALSE,
quiet = FALSE
)
Arguments
Value
A tibble with links, release dates of files in the data, dates of data coverage, local paths to files, and the download status.
- target_url
character
. The URL link to the data file.- pub_ts
POSIXct
. The timestamp of when the file was published.- file_extension
character
. The file extension of the data file (e.g., 'tar', 'gz').- data_ym
Date
. The year and month of the data coverage, if available.- data_ymd
Date
. The specific date of the data coverage, if available.- study
factor
. Study category derived from the URL (e.g., 'basic', 'complete', 'routes').- type
factor
. Data type category derived from the URL (e.g., 'number_of_trips', 'origin-destination', 'overnight_stays', 'data_quality', 'metadata').- period
factor
. Temporal granularity category derived from the URL (e.g., 'day', 'month').- zones
factor
. Geographic zone classification derived from the URL (e.g., 'districts', 'municipalities', 'large_urban_areas').- local_path
character
. The local file path where the data is (or going to be) stored.- downloaded
logical
. Indicator of whether the data file has been downloaded locally. This is only available ifcheck_local_files
isTRUE
.
Check cached files consistency against checksums from S3
Description
WARNING: The checks may fail for May 2022 data and for some 2025 data, as the remote cheksums that are used for checking the file consistency are incorrect. We are working on solving this in future updates, for now, kindly rely on the built-in file size checks of spod_download
, spod_get
, and spod_convert
. This function checks downloaded data files whether they are consistent with their checksums in Amazon S3 by computing ETag for each file. This involves computing MD5 for each part of the file and concatenating them and computing MD5 again on the resulting concatenated MD5s. This may take very long time if you check all files, so use with caution.
Usage
spod_check_files(
type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
dates = NULL,
data_dir = spod_get_data_dir(),
quiet = FALSE,
ignore_missing_dates = FALSE,
n_threads = 1
)
Arguments
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
ignore_missing_dates |
Logical. If |
n_threads |
Numeric. Number of threads to use for file verificaiton. Defaults to 1. When set to 2 or more threads, uses |
Value
A tibble similar to the output of spod_available_data
, but with an extra column local_file_consistent
, where TRUE
indicates that the file cheksum matches the expected checksums in Amazon S3. Note: some v1 (2020-2021) files were not stored correctly on S3 and their ETag checksums are incorrectly reported by Amazon S3, so their true file sizes and ETag checksums were cached inside the spanishoddata
package.
Examples
spod_set_data_dir(tempdir())
spod_download(
type = "number_of_trips",
zones = "distr",
dates = "2020-03-14"
)
# now check the consistency
check_results <- spod_check_files(
type = "number_of_trips",
zones = "distr",
dates = "2020-03-14"
)
all(check_results$local_file_consistent)
Cite the package and the data
Description
Cite the package and the data
Usage
spod_cite(what = "all", format = "all")
Arguments
what |
Character vector specifying what to cite. Can include "package", "data", "methodology_v1", "methodology_v2", or "all". Default is "all". |
format |
Character vector specifying output format(s). Can include "text", "markdown", "bibtex", or "all". Default is "all". |
Value
Nothing. Prints citation in plain text, markdown, BibTeX, or all formats at once to console.
Examples
# Cite everything in all formats
## Not run:
spod_cite()
## End(Not run)
# Cite just the package in BibTeX format
## Not run:
spod_cite(what = "package", format = "bibtex")
## End(Not run)
# Cite both methodologies in plain text
## Not run:
spod_cite(what = c("methodology_v1", "methodology_v2"), format = "text")
## End(Not run)
Fixes common issues in the zones data and cleans up variable names
Description
This function fixes any invalid geometries in the zones data and renames the "ID" column to "id".
Usage
spod_clean_zones_v1(zones_path, zones)
Arguments
zones_path |
The path to the zones spatial data file. |
zones |
The zones for which to download the data. Can be |
Value
A spatial object containing the cleaned zones data.
Fixes common issues in the zones data and cleans up variable names
Description
This function fixes any invalid geometries in the zones data and renames the "ID" column to "id". It also attacches the population counts and zone names provided in the csv files supplied by the original data provider.
Usage
spod_clean_zones_v2(zones_path)
Arguments
zones_path |
The path to the zones spatial data file. |
Value
A spatial 'sf“ object containing the cleaned zones data.
View codebooks for v1 and v2 open mobility data
Description
Opens relevant vignette with a codebook for v1 (2020-2021) and v2 (2022 onwards) data or provide a webpage if vignette is missing.
Usage
spod_codebook(ver = 1)
Arguments
ver |
An |
Value
Nothing, opens vignette if it is installed. If vignette is missing, prints a message with a link to a webpage with the codebook.
Examples
# View codebook for v1 (2020-2021) data
spod_codebook(ver = 1)
# View codebook for v2 (2022 onwards) data
spod_codebook(ver = 2)
Compute ETag for a file
Description
Compute ETag for a file
Usage
spod_compute_s3_etag(file_path, part_size = 8 * 1024^2)
Arguments
file_path |
Character. The path to the file. |
part_size |
Numeric. The size of each part in bytes. Do not change, as this is a default for S3 Etag. |
Value
Character. The ETag for the file.
Connect to data converted to DuckDB
or hive-style parquet
files
Description
This function allows the user to quickly connect to the data converted to DuckDB with the spod_convert function. This function simplifies the connection process. The user is free to use the DBI
and DuckDB
packages to connect to the data manually, or to use the arrow
package to connect to the parquet
files folder.
Usage
spod_connect(
data_path,
target_table_name = NULL,
quiet = FALSE,
max_mem_gb = NULL,
max_n_cpu = max(1, parallelly::availableCores() - 1),
temp_path = spod_get_temp_dir()
)
Arguments
data_path |
a path to the |
target_table_name |
Default is |
quiet |
A |
max_mem_gb |
|
max_n_cpu |
The maximum number of threads to use. Defaults to the number of available cores minus 1. |
temp_path |
The path to the temp folder for DuckDB for intermediate spilling in case the set memory limit and/or physical memory of the computer is too low to perform the query. By default this is set to the |
Value
a DuckDB
table connection object.
Examples
# Set data dir for file downloads
spod_set_data_dir(tempdir())
# download and convert data
dates_1 <- c(start = "2020-02-17", end = "2020-02-18")
db_2 <- spod_convert(
type = "number_of_trips",
zones = "distr",
dates = dates_1,
overwrite = TRUE
)
# now connect to the converted data
my_od_data_2 <- spod_connect(db_2)
# disconnect from the database
spod_disconnect(my_od_data_2)
Convert data from plain text to duckdb or parquet format
Description
Converts data for faster analysis into either DuckDB
file or into parquet
files in a hive-style directory structure. Running analysis on these files is sometimes 100x times faster than working with raw CSV files, espetially when these are in gzip archives. To connect to converted data, please use 'mydata <- spod_connect(data_path = path_returned_by_spod_convert)' passing the path to where the data was saved. The connected mydata
can be analysed using dplyr
functions such as select, filter, mutate, group_by, summarise, etc. In the end of any sequence of commands you will need to add collect to execute the whole chain of data manipulations and load the results into memory in an R data.frame
/tibble
. For more in-depth usage of such data, please refer to DuckDB documentation and examples at https://duckdb.org/docs/api/r#dbplyr . Some more useful examples can be found here https://arrow-user2022.netlify.app/data-wrangling#combining-arrow-with-duckdb . You may also use arrow
package to work with parquet files https://arrow.apache.org/docs/r/.
Usage
spod_convert(
type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios"),
dates = NULL,
save_format = "duckdb",
save_path = NULL,
overwrite = FALSE,
data_dir = spod_get_data_dir(),
quiet = FALSE,
max_mem_gb = NULL,
max_n_cpu = max(1, parallelly::availableCores() - 1),
max_download_size_gb = 1,
ignore_missing_dates = FALSE
)
Arguments
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
save_format |
A You can also set |
save_path |
A
|
overwrite |
A |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
max_mem_gb |
|
max_n_cpu |
The maximum number of threads to use. Defaults to the number of available cores minus 1. |
max_download_size_gb |
The maximum download size in gigabytes. Defaults to 1. |
ignore_missing_dates |
Logical. If |
Value
Path to saved DuckDB
database file or to a folder with parquet
files in hive-style directory structure.
Examples
# Set data dir for file downloads
spod_set_data_dir(tempdir())
# download and convert data
dates_1 <- c(start = "2020-02-17", end = "2020-02-18")
db_2 <- spod_convert(
type = "number_of_trips",
zones = "distr",
dates = dates_1,
overwrite = TRUE
)
# now connect to the converted data
my_od_data_2 <- spod_connect(db_2)
# disconnect from the database
spod_disconnect(my_od_data_2)
Convert dates to ranges
Description
This internal helper function reduces a vector of dates to a vector of date ranges to shorten the warning and error messages that mention the valid date ranges.
Usage
spod_convert_dates_to_ranges(dates)
Arguments
dates |
A |
Value
A character
vector of date ranges.
Convert multiple formates of date arguments to a sequence of dates
Description
This function processes the date arguments provided to various functions in the package. It can handle single dates and arbitratry sequences (vectors) of dates in ISO (YYYY-MM-DD) and YYYYMMDD format. It can also handle date ranges in the format 'YYYY-MM-DD_YYYY-MM-DD' (or 'YYYYMMDD_YYYYMMDD'), date ranges in named vec and regular expressions to match dates in the format YYYYMMDD
.
Usage
spod_dates_argument_to_dates_seq(dates)
Arguments
dates |
A The possible values can be any of the following:
|
Value
A character
vector of dates in ISO format (YYYY-MM-DD).
Safely disconnect from data and free memory
Description
This function is to ensure that DuckDB
connections to CSV.gz files (created via spod_get()
), as well as to DuckDB
files or folders of parquet
files (created via spod_convert()
) are closed properly to prevent conflicting connections. Essentially this is just a wrapper around DBI::dbDisconnect()
that reaches out into the .$src$con
object of the tbl_duckdb_connection
connection object that is returned to the user via spod_get()
and spod_connect()
. After disonnecting the database, it also frees up memory by running gc()
.
Usage
spod_disconnect(tbl_con, free_mem = TRUE)
Arguments
tbl_con |
A |
free_mem |
A |
Value
No return value, called for side effect of disconnecting from the database and freeing up memory.
Examples
# Set data dir for file downloads
spod_set_data_dir(tempdir())
# basic example
# create a connection to the v1 data without converting
# this creates a duckdb database connection to CSV files
od_distr <- spod_get(
"od",
zones = "distr",
dates = c("2020-03-01", "2020-03-02")
)
# disconnect from the database connection
spod_disconnect(od_distr)
# Advanced example
# download and convert data
dates_1 <- c(start = "2020-02-17", end = "2020-02-19")
db_2 <- spod_convert(
type = "od",
zones = "distr",
dates = dates_1,
overwrite = TRUE
)
# now connect to the converted data
my_od_data_2 <- spod_connect(db_2)
# disconnect from the database
spod_disconnect(my_od_data_2)
Download the data files of specified type, zones, and dates
Description
This function downloads the data files of the specified type, zones, dates and data version.
Usage
spod_download(
type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
dates = NULL,
max_download_size_gb = 1,
data_dir = spod_get_data_dir(),
quiet = FALSE,
return_local_file_paths = FALSE,
ignore_missing_dates = FALSE,
check_local_files = TRUE
)
Arguments
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
max_download_size_gb |
The maximum download size in gigabytes. Defaults to 1. |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
return_local_file_paths |
Logical. If |
ignore_missing_dates |
Logical. If |
check_local_files |
Logical. Whether to check the file size of local files against known remote file sizes on the Amazon S3 storage. Defaults to |
Value
Nothing. If return_local_file_paths = TRUE
, a character
vector of the paths to the downloaded files.
Examples
# Set data dir for file downloads
spod_set_data_dir(tempdir())
# Download the number of trips on district level for the a date range in March 2020
spod_download(
type = "number_of_trips", zones = "districts",
dates = c(start = "2020-03-20", end = "2020-03-21")
)
# Download the number of trips on district level for select dates in 2020 and 2021
spod_download(
type = "number_of_trips", zones = "dist",
dates = c("2020-03-20", "2020-03-24", "2021-03-20", "2021-03-24")
)
# Download the number of trips on municipality level using regex for a date range in March 2020
# (the regex will capture the dates 2020-03-20 to 2020-03-24)
spod_download(
type = "number_of_trips", zones = "municip",
dates = "2020032[0-4]"
)
Download multiple files with progress bar in parallel
Description
Download multiple files with a progress bar. Retries failed downloads up to 3 times. Downloads are in parallel and in batches to show progress. First 10 Mb of a file is downloaded to check the speed.
Usage
spod_download_in_batches(
files_to_download,
batch_size = 5,
bar_width = 20,
chunk_size = 1024 * 1024,
test_size = 10 * 1024 * 1024,
max_retries = 3L,
timeout = 900,
show_progress = interactive() && !isTRUE(getOption("knitr.in.progress"))
)
Arguments
files_to_download |
A data frame with columns |
batch_size |
Numeric. Number of files to download at a time. |
bar_width |
Numeric. Width of the progress bar. |
chunk_size |
Numeric. Number of bytes to download at a time for speed test. |
max_retries |
Integer. Maximum number of retries for failed downloads. |
timeout |
Numeric. Timeout in seconds for each download. |
show_progress |
Logical. Whether to show the progress bar. |
Value
A data frame with columns target_url
, local_path
, file_size_bytes
and local_file_size
.
Downloads and extracts the raw v1 zones data
Description
This function ensures that the necessary v1 raw data for zones files are downloaded and extracted from the specified data directory.
Usage
spod_download_zones_v1(
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios"),
data_dir = spod_get_data_dir(),
quiet = FALSE
)
Arguments
zones |
The zones for which to download the data. Can be |
data_dir |
The directory where the data is stored. |
quiet |
Boolean flag to control the display of messages. |
Value
A character
string containing the path to the downloaded and extracted file.
Create province names ENUM in a duckdb connection
Description
Create province names ENUM in a duckdb connection
Usage
spod_duckdb_create_province_enum(con)
Arguments
con |
A |
Value
A duckdb
connection with INE_PROV_NAME_ENUM
and INE_PROV_CODE_ENUM
created.
Filter a duckdb conenction by dates
Description
IMPORTANT: This function assumes that the table or view that is being filtered has separate year
, month
and day
columns with integer values. This is done so that the filtering is faster on CSV files that are stored in a folder structure with hive-style /year=2020/month=2/day=14/
.
Usage
spod_duckdb_filter_by_dates(con, source_view_name, new_view_name, dates)
Arguments
con |
A duckdb connection |
source_view_name |
The name of the source duckdb "view" (the virtual table, in the context of current package likely connected to a folder of CSV files) |
new_view_name |
The name of the new duckdb "view" (the virtual table, in the context of current package likely connected to a folder of CSV files). |
dates |
A The possible values can be any of the following:
|
Value
A duckdb
connection with original views and a new filtered view.
Set maximum memory and number of threads for a DuckDB
connection
Description
Set maximum memory and number of threads for a DuckDB
connection
Usage
spod_duckdb_limit_resources(
con,
max_mem_gb = NULL,
max_n_cpu = max(1, parallelly::availableCores() - 1)
)
Arguments
con |
A |
max_mem_gb |
|
max_n_cpu |
The maximum number of threads to use. Defaults to the number of available cores minus 1. |
Value
A duckdb
connection.
Create a duckdb number of trips table
Description
This function creates a duckdb connection to the number of trips data stored in a folder of CSV.gz files.
Usage
spod_duckdb_number_of_trips(
con = DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:", read_only = FALSE),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
ver = NULL,
data_dir = spod_get_data_dir()
)
Arguments
con |
A duckdb connection object. If not specified, a new in-memory connection will be created. |
zones |
The zones for which to download the data. Can be |
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
Value
A duckdb
connection object with 2 views:
-
od_csv_raw
- a raw table view of all cached CSV files with the origin-destination data that has been previously cached in $SPANISH_OD_DATA_DIR -
od_csv_clean
- a cleaned-up table view ofod_csv_raw
with column names and values translated and mapped to English. This still includes all cached data.
Creates a duckdb connection to origin-destination data
Description
This function creates a duckdb connection to the origin-destination data stored in CSV.gz files.
Usage
spod_duckdb_od(
con = DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:", read_only = FALSE),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
ver = NULL,
data_dir = spod_get_data_dir()
)
Arguments
con |
A duckdb connection object. If not specified, a new in-memory connection will be created. |
zones |
The zones for which to download the data. Can be |
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
Value
A duckdb
connection object with 2 views:
-
od_csv_raw
- a raw table view of all cached CSV files with the origin-destination data that has been previously cached in $SPANISH_OD_DATA_DIR -
od_csv_clean
- a cleaned-up table view ofod_csv_raw
with column names and values translated and mapped to English. This still includes all cached data.
The structure of the cleaned-up views od_csv_clean
is as follows:
- date
Date
. The full date of the trip, including year, month, and day.- id_origin
factor
. The identifier for the origin location of the trip, formatted as a code (e.g., '01001_AM').- id_destination
factor
. The identifier for the destination location of the trip, formatted as a code (e.g., '01001_AM').- activity_origin
factor
. The type of activity at the origin location (e.g., 'home', 'work'). Note: Only available for district level data.- activity_destination
factor
. The type of activity at the destination location (e.g., 'home', 'other'). Note: Only available for district level data.- residence_province_ine_code
factor
. The province of residence for the group of individual making the trip, encoded according to the INE classification. Note: Only available for district level data.- residence_province_name
factor
. The province of residence for the group of individuals making the trip (e.g., 'Cuenca', 'Girona'). Note: Only available for district level data.- hour
integer
. The time slot (the hour of the day) during which the trip started, represented as an integer (e.g., 0, 1, 2).- distance
factor
. The distance category of the trip, represented as a code (e.g., '002-005' for 2-5 km).- n_trips
double
. The number of trips taken within the specified time slot and distance.- trips_total_length_km
double
. The total length of all trips in kilometers for the specified time slot and distance.- year
double
. The year of the trip.- month
double
. The month of the trip.- day
double
. The day of the trip.
The structure of the original data in od_csv_raw
is as follows:
- fecha
Date
. The date of the trip, including year, month, and day.- origen
character
. The identifier for the origin location of the trip, formatted as a character string (e.g., '01001_AM').- destino
character
. The identifier for the destination location of the trip, formatted as a character string (e.g., '01001_AM').- actividad_origen
character
. The type of activity at the origin location (e.g., 'casa', 'trabajo').- actividad_destino
character
. The type of activity at the destination location (e.g., 'otros', 'trabajo').- residencia
character
. The code representing the residence of the individual making the trip (e.g., '01') according to the official INE classification.- edad
character
. The age of the individual making the trip. This data is actaully filled with 'NA' values, which is why this column is removed in the cleaned-up and translated view described above.- periodo
integer
. The time period during which the trip started, represented as an integer (e.g., 0, 1, 2).- distancia
character
. The distance category of the trip, represented as a character string (e.g., '002-005' for 2-5 km).- viajes
double
. The number of trips taken within the specified time period and distance.- viajes_km
double
. The total length of all trips in kilometers for the specified time period and distance.- day
double
. The day of the trip.- month
double
. The month of the trip.- year
double
. The year of the trip.
Create a duckdb overnight stays table
Description
This function creates a duckdb connection to the overnight stays data stored in a folder of CSV.gz files.
Usage
spod_duckdb_overnight_stays(
con = DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:", read_only = FALSE),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
ver = NULL,
data_dir = spod_get_data_dir()
)
Arguments
con |
A duckdb connection object. If not specified, a new in-memory connection will be created. |
zones |
The zones for which to download the data. Can be |
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
Value
A duckdb
connection object with 2 views:
-
od_csv_raw
- a raw table view of all cached CSV files with the origin-destination data that has been previously cached in $SPANISH_OD_DATA_DIR -
od_csv_clean
- a cleaned-up table view ofod_csv_raw
with column names and values translated and mapped to English. This still includes all cached data.
Set temp file for DuckDB connection
Description
Set temp file for DuckDB connection
Usage
spod_duckdb_set_temp(con, temp_path = spod_get_temp_dir())
Arguments
con |
A duckdb connection |
temp_path |
The path to the temp folder for DuckDB for intermediate spilling in case the set memory limit and/or physical memory of the computer is too low to perform the query. By default this is set to the |
Value
A duckdb
connection.
Function to expand dates from a regex
Description
This function generates a sequence of dates from a regular expression pattern based on the provided regular expression.
Usage
spod_expand_dates_from_regex(date_regex)
Arguments
date_regex |
A regular expression to match dates in the format 'yyyymmdd'. |
Value
A character
vector of dates matching the regex.
Cache the municipalities geometries from the mapas-movilidad website
Description
Cache the municipalities geometries from the mapas-movilidad website
Usage
spod_fetch_municipalities_json_memoised()
Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder.
Description
Get files sizes for remote files of v1 and v2 data and save them into a csv.gz file in the inst/extdata folder.
Usage
spod_files_sizes(ver = 2)
Arguments
ver |
The version of the data (1 or 2). Can be both. Defaults to 2, as v1 data is not being updated since 2021. |
Value
Nothing. Only saves a csv.gz file with up to date file sizes in the inst/extdata folder.
Get tabular mobility data
Description
This function creates a DuckDB lazy table connection object from the specified type and zones. It checks for missing data and downloads it if necessary. The connnection is made to the raw CSV files in gzip archives, so analysing the data through this connection may be slow if you select more than a few days. You can manipulate this object using dplyr
functions such as select
, filter
, mutate
, group_by
, summarise
, etc. In the end of any sequence of commands you will need to add collect
to execute the whole chain of data manipulations and load the results into memory in an R data.frame
/tibble
. See codebooks for v1 and v2 data in vignettes with spod_codebook(1)
and spod_codebook(2)
.
If you want to analyse longer periods of time (especiially several months or even the whole data over several years), consider using the spod_convert
and then spod_connect
.
If you want to quickly get the origin-destination data with flows aggregated for a single day at municipal level and without any extra socio-economic variables, consider using the spod_quick_get_od
function.
Usage
spod_get(
type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"),
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
dates = NULL,
data_dir = spod_get_data_dir(),
quiet = FALSE,
max_mem_gb = NULL,
max_n_cpu = max(1, parallelly::availableCores() - 1),
max_download_size_gb = 1,
duckdb_target = ":memory:",
temp_path = spod_get_temp_dir(),
ignore_missing_dates = FALSE
)
Arguments
type |
The type of data to download. Can be |
zones |
The zones for which to download the data. Can be |
dates |
A The possible values can be any of the following:
|
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
max_mem_gb |
|
max_n_cpu |
The maximum number of threads to use. Defaults to the number of available cores minus 1. |
max_download_size_gb |
The maximum download size in gigabytes. Defaults to 1. |
duckdb_target |
(Optional) The path to the duckdb file to save the data to, if a convertation from CSV is reuqested by the |
temp_path |
The path to the temp folder for DuckDB for intermediate spilling in case the set memory limit and/or physical memory of the computer is too low to perform the query. By default this is set to the |
ignore_missing_dates |
Logical. If |
Value
A DuckDB lazy table connection object of class tbl_duckdb_connection
.
Examples
# create a connection to the v1 data
spod_set_data_dir(tempdir())
dates <- c("2020-02-14", "2020-03-14", "2021-02-14", "2021-02-14", "2021-02-15")
nt_dist <- spod_get(type = "number_of_trips", zones = "distr", dates = dates)
# nt_dist is a table view filtered to the specified dates
# for advanced users only
# access the source connection with all dates
# list tables
DBI::dbListTables(nt_dist$src$con)
# disconnect
spod_disconnect(nt_dist)
Get the data directory
Description
This function retrieves the data directory from the environment variable SPANISH_OD_DATA_DIR. If the environment variable is not set, it returns the temporary directory.
Usage
spod_get_data_dir(quiet = FALSE)
Arguments
quiet |
A |
Value
A character
vector of length 1 containing the path to the data directory where the package will download and convert the data.
Examples
spod_set_data_dir(tempdir())
spod_get_data_dir()
Get file size from URL
Description
Get file size from URL
Usage
spod_get_file_size_from_url(x_url)
Arguments
x_url |
URL |
Value
File size in MB
Get the HMAC secret from the mapas-movilidad website
Description
Get the HMAC secret from the mapas-movilidad website
Usage
spod_get_hmac_secret(base_url = "https://mapas-movilidad.transportes.gob.es")
Arguments
base_url |
The base URL of the mapas-movilidad website |
Value
Character vector with the HMAC secret.
Cache the HMAC secret to avoid repeated requests
Description
Cache the HMAC secret to avoid repeated requests
Usage
spod_get_hmac_secret_memoised(
base_url = "https://mapas-movilidad.transportes.gob.es"
)
Get latest file list from the XML for MITMA open mobility data v1 (2020-2021)
Description
Get latest file list from the XML for MITMA open mobility data v1 (2020-2021)
Usage
spod_get_latest_v1_file_list(
data_dir = spod_get_data_dir(),
xml_url = "https://opendata-movilidad.mitma.es/RSS.xml"
)
Arguments
data_dir |
The directory where the data is stored. Defaults to the value returned by |
xml_url |
The URL of the XML file to download. Defaults to "https://opendata-movilidad.mitma.es/RSS.xml". |
Value
The path to the downloaded XML file.
Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards)
Description
Get latest file list from the XML for MITMA open mobility data v2 (2022 onwards)
Usage
spod_get_latest_v2_file_list(
data_dir = spod_get_data_dir(),
xml_url = "https://movilidad-opendata.mitma.es/RSS.xml"
)
Arguments
data_dir |
The directory where the data is stored. Defaults to the value returned by |
xml_url |
The URL of the XML file to download. Defaults to "https://movilidad-opendata.mitma.es/RSS.xml". |
Value
The path to the downloaded XML file.
Get temporary directory for DuckDB intermediate spilling
Description
Get the The path to the temp folder for DuckDB for intermediate spilling in case the set memory limit and/or physical memory of the computer is too low to perform the query.
Usage
spod_get_temp_dir(data_dir = spod_get_data_dir())
Arguments
data_dir |
The directory where the data is stored. Defaults to the value returned by |
Value
A character
string with the path to the temp folder for DuckDB
for intermediate spilling.
Get valid dates for the specified data version
Description
Get all metadata for requested data version and identify all dates available for download.
Usage
spod_get_valid_dates(ver = NULL)
Arguments
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
Value
A vector of type Date
with all possible valid dates for the specified data version (v1 for 2020-2021 and v2 for 2020 onwards).
Examples
# Get all valid dates for v1 (2020-2021) data
spod_get_valid_dates(ver = 1)
# Get all valid dates for v2 (2020 onwards) data
spod_get_valid_dates(ver = 2)
Get zones
Description
Get spatial zones for the specified data version. Supports both v1 (2020-2021) and v2 (2022 onwards) data.
Usage
spod_get_zones(
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
ver = NULL,
data_dir = spod_get_data_dir(),
quiet = FALSE
)
Arguments
zones |
The zones for which to download the data. Can be |
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
data_dir |
The directory where the data is stored. Defaults to the value returned by |
quiet |
A |
Value
An sf
object (Simple Feature collection).
The columns for v1 (2020-2021) data include:
- id
A character vector containing the unique identifier for each district, assigned by the data provider. This
id
matches theid_origin
,id_destination
, andid
in district-level origin-destination and number of trips data.- census_districts
A string with semicolon-separated identifiers of census districts classified by the Spanish Statistical Office (INE) that are spatially bound within the polygons for each
id
.- municipalities_mitma
A string with semicolon-separated municipality identifiers (as assigned by the data provider) corresponding to each district
id
.- municipalities
A string with semicolon-separated municipality identifiers classified by the Spanish Statistical Office (INE) corresponding to each
id
.- district_names_in_v2/municipality_names_in_v2
A string with semicolon-separated district names (from the v2 version of this data) corresponding to each district
id
in v1.- district_ids_in_v2/municipality_ids_in_v2
A string with semicolon-separated district identifiers (from the v2 version of this data) corresponding to each district
id
in v1.- geometry
A
MULTIPOLYGON
column containing the spatial geometry of each district, stored as an sf object. The geometry is projected in the ETRS89 / UTM zone 30N coordinate reference system (CRS), with XY dimensions.
The columns for v2 (2022 onwards) data include:
- id
A character vector containing the unique identifier for each zone, assigned by the data provider.
- name
A character vector with the name of each district.
- population
A numeric vector representing the population of each district (as of 2022).
- census_sections
A string with semicolon-separated identifiers of census sections corresponding to each district.
- census_districts
A string with semicolon-separated identifiers of census districts as classified by the Spanish Statistical Office (INE) corresponding to each district.
- municipalities
A string with semicolon-separated identifiers of municipalities classified by the Spanish Statistical Office (INE) corresponding to each district.
- municipalities_mitma
A string with semicolon-separated identifiers of municipalities, as assigned by the data provider, that correspond to each district.
- luas_mitma
A string with semicolon-separated identifiers of LUAs (Local Urban Areas) from the provider, associated with each district.
- district_ids_in_v1/municipality_ids_in_v1
A string with semicolon-separated district identifiers from v1 data corresponding to each district in v2. If no match exists, it is marked as
NA
.- geometry
A
MULTIPOLYGON
column containing the spatial geometry of each district, stored as an sf object. The geometry is projected in the ETRS89 / UTM zone 30N coordinate reference system (CRS), with XY dimensions.
Examples
# get polygons for municipalities for the v2 data
municip_v2 <- spod_get_zones(zones = "municipalities", ver = 2)
# get polygons for the districts for the v1 data
distr_v1 <- spod_get_zones(zones = "districts", ver = 1)
Retrieves the zones for v1 data
Description
This function retrieves the zones data from the specified data directory. It can retrieve either "distritos" or "municipios" zones data.
Usage
spod_get_zones_v1(
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios"),
data_dir = spod_get_data_dir(),
quiet = FALSE
)
Arguments
zones |
The zones for which to download the data. Can be |
data_dir |
The directory where the data is stored. |
quiet |
A |
Value
An sf
object (Simple Feature collection) with 2 fields:
- id
A character vector containing the unique identifier for each zone, to be matched with identifiers in the tabular data.
- geometry
A
MULTIPOLYGON
column containing the spatial geometry of each zone, stored as an sf object. The geometry is projected in the ETRS89 / UTM zone 30N coordinate reference system (CRS), with XY dimensions.
Retrieves the zones v2 data
Description
This function retrieves the zones data from the specified data directory. It can retrieve either "distritos" or "municipios" zones data.
Usage
spod_get_zones_v2(
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas"),
data_dir = spod_get_data_dir(),
quiet = FALSE
)
Arguments
zones |
The zones for which to download the data. Can be |
data_dir |
The directory where the data is stored. |
quiet |
A |
Value
An sf
object (Simple Feature collection) with 4 fields:
- id
A character vector containing the unique identifier for each zone, to be matched with identifiers in the tabular data.
- name
A character vector with the name of the zone.
- population
A numeric vector representing the population of each zone (as of 2022).
- geometry
A
MULTIPOLYGON
column containing the spatial geometry of each zone, stored as an sf object. The geometry is projected in the ETRS89 / UTM zone 30N coordinate reference system (CRS), with XY dimensions.
Get valid dates from the GraphQL API
Description
Get valid dates from the GraphQL API
Usage
spod_graphql_valid_dates()
Value
A Date
vector of dates that are valid to request data with spod_quick_get_od()
.
Infer data version from dates
Description
Infer data version from dates
Usage
spod_infer_data_v_from_dates(dates, ignore_missing_dates = FALSE)
Arguments
dates |
A The possible values can be any of the following:
|
ignore_missing_dates |
Logical. If |
Value
An integer
indicating the inferred data version.
Check if specified dates span both data versions
Description
This function checks if the specified dates or date ranges span both v1 and v2 data versions.
Usage
spod_is_data_version_overlaps(dates)
Arguments
dates |
A |
Value
TRUE
if the dates span both data versions, FALSE
otherwise.
Match data types for normalisation
Description
Match data types for normalisation
Usage
spod_match_data_type(
type = c("od", "origin-destination", "viajes", "os", "overnight_stays",
"pernoctaciones", "nt", "number_of_trips", "personas")
)
Arguments
type |
The type of data to match. Can be "od", "origin-destination", "os", "overnight_stays", or "nt", "number_of_trips". |
Value
A character
string with the folder name for the specified data type. Or NULL
if the type is not recognized.
Match data types to folders
Description
Match data types to folders
Usage
spod_match_data_type_for_local_folders(
type = c("od", "origin-destination", "os", "overnight_stays", "nt", "number_of_trips"),
ver = c(1, 2)
)
Arguments
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
Value
A character
string with the folder name for the specified data type. Or NULL
if the data type is not recognized.
Download multiple files with progress bar sequentially
Description
Download multiple files with a progress bar. Retries failed downloads up to 3 times.
Usage
spod_multi_download_with_progress(
files_to_download,
chunk_size = 1024 * 1024,
bar_width = 20,
show_progress = interactive() && !isTRUE(getOption("knitr.in.progress"))
)
Arguments
files_to_download |
A data frame with columns |
chunk_size |
Number of bytes to download at a time. |
bar_width |
Width of the progress bar. |
show_progress |
Whether to show the progress bar. |
Value
A data frame with columns target_url
, local_path
, file_size_bytes
and local_file_size
.
Cache the spod_query_od_raw function to avoid repeated requests
Description
Cache the spod_query_od_raw function to avoid repeated requests
Usage
spod_query_od_memoised(
date_fmt,
graphql_distances,
id_origin,
id_destination,
min_trips,
graphql_query
)
Internal function to query the GraphQL API for origin-destination data
Description
Internal function to query the GraphQL API for origin-destination data
Usage
spod_query_od_raw(
date_fmt,
graphql_distances,
id_origin,
id_destination,
min_trips,
graphql_query
)
Arguments
id_origin |
A character vector specifying the origin municipalities to retrieve. If not provided, all origin municipalities will be included. Valid municipality IDs can be found in the dataset returned by |
id_destination |
A character vector specifying the target municipalities to retrieve. If not provided, all target municipalities will be included. Valid municipality IDs can be found in the dataset returned by |
min_trips |
A numeric value specifying the minimum number of journeys per origin-destination pair to retrieve. Defaults to 100 to reduce the amount of data returned. Can be set to 0 to retrieve all data. |
Value
A tibble
containing the flows for the specified date, minimum number of journeys, distances and origin-destination pairs if specified.
Get daily trip counts per origin-destionation municipality from 2022 onward
Description
WARNING: this function may stop working at any time, as the API may change. This function provides a quick way to get daily aggregated (no hourly data) trip counts per origin-destination municipality from v2 data (2022 onward). Compared to spod_get()
, which downloads large CSV files, this function downloads the data directly from the GraphQL API. An interactive web map with this data is available at https://mapas-movilidad.transportes.gob.es/. No data aggregation is performed on your computer (unlike in spod_get()
), so you do not need to worry about memory usage and do not have to use a powerful computer with multiple CPU cores just to get this simple data. Only about 1 MB of data is downloaded for a single day. The limitation of this function is that it can only retrieve data for a single day at a time and only with total number of trips and total km travelled. So it is not possible to get any of the extra variables available in the full dataset via spod_get()
.
Usage
spod_quick_get_od(
date = NA,
min_trips = 100,
distances = c("500m-2km", "2-10km", "10-50km", "50+km"),
id_origin = NA,
id_destination = NA
)
Arguments
date |
A character or Date object specifying the date for which to retrieve the data. If date is a character, the date must be in "YYYY-MM-DD" or "YYYYMMDD" format. |
min_trips |
A numeric value specifying the minimum number of journeys per origin-destination pair to retrieve. Defaults to 100 to reduce the amount of data returned. Can be set to 0 to retrieve all data. |
distances |
A character vector specifying the distances to retrieve. Valid values are "500m-2km", "2-10km", "10-50km", and "50+km". Defaults to |
id_origin |
A character vector specifying the origin municipalities to retrieve. If not provided, all origin municipalities will be included. Valid municipality IDs can be found in the dataset returned by |
id_destination |
A character vector specifying the target municipalities to retrieve. If not provided, all target municipalities will be included. Valid municipality IDs can be found in the dataset returned by |
Value
A tibble
containing the flows for the specified date, minimum number of journeys, distances and origin-destination pairs if specified. The columns are:
- date
The date of the trips.
- id_origin
The origin municipality ID.
- id_destination
The target municipality ID.
- n_trips
The number of trips between the origin and target municipality.
- trips_total_length_km
The total length of trips in kilometers.
Examples
od_1000 <- spod_quick_get_od(
date = "2022-01-01",
min_trips = 1000
)
Get the municipalities geometries
Description
This function fetches the municipalities (for now this is the only option) geometries from the mapas-movilidad website and returns a sf
object with the municipalities geometries. This is intended for use with the flows data retrieved by the spod_quick_get_od()
function. An interactive web map with this data is available at https://mapas-movilidad.transportes.gob.es/. These municipality geometries only include Spanish municipalities (and not the NUTS3 regions in Portugal and France) and do not contain extra columns that you can get with the spod_get_zones()
function. The function caches the retrieved geometries in memory of the current R session to reduce the number of requests to the mapas-movilidad website.
Usage
spod_quick_get_zones(zones = "municipalities")
Arguments
zones |
A character string specifying the zones to retrieve. Valid values are "municipalities", "muni", "municip", and "municipios". Defaults to "municipalities". |
Value
A sf
object with the municipalities geometries to match with the data retrieved with spod_quick_get_od()
.
Examples
municipalities_sf <- spod_quick_get_zones()
Load an SQL query, glue it, dplyr::sql it
Description
Load an SQL query from a specified file in package installation directory, glue::collapse it, glue::glue it in case of any variables that need to be replaced, and dplyr::sql it for additional safety.
Usage
spod_read_sql(sql_file_name)
Arguments
sql_file_name |
The name of the SQL file to load from the package installation directory. |
Value
Text of the SQL query of class sql
/character
.
Get the length of the request payload
Description
Get the length of the request payload
Usage
spod_request_length(graphql_query)
Arguments
graphql_query |
Character. The GraphQL query. |
Value
Numeric. The length of the request payload.
Set the data directory
Description
This function sets the data directory in the environment variable SPANISH_OD_DATA_DIR, so that all other functions in the package can access the data. It also creates the directory if it doesn't exist.
Usage
spod_set_data_dir(data_dir, quiet = FALSE)
Arguments
data_dir |
The data directory to set. |
quiet |
A |
Value
Nothing. If quiet is FALSE
, prints a message with the path and confirmation that the path exists.
Examples
spod_set_data_dir(tempdir())
Generate a WHERE part of an SQL query from a sequence of dates
Description
Generate a WHERE part of an SQL query from a sequence of dates
Usage
spod_sql_where_dates(dates)
Arguments
dates |
A Dates vector of dates to process. |
Value
A character vector of the SQL query.
Get Etags for locally saved v1 data files and save them into a RDS file in the inst/extdata folder.
Description
Get Etags for locally saved v1 data files and save them into a RDS file in the inst/extdata folder.
Usage
spod_store_etags()
Value
Returns a tibble with the local path, local ETag and remote ETag.
Get clean data subfolder name
Description
Change subfolder name in the code of this function for clean data cache here to apply globally, as all functions in the package should use this function to get the clean data cache path.
Usage
spod_subfolder_clean_data_cache(ver = 1)
Arguments
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
Value
A character
string with the subfolder name for the clean data cache.
Get metadata cache subfolder name
Description
Change subfolder name in the code of this function for metadata cache here to apply globally, as all functions in the package should use this function to get the metadata cache path.
Usage
spod_subfolder_metadata_cache()
Value
A character
string with the subfolder name for the raw data cache.
Get raw data cache subfolder name
Description
Change subfolder name in the code of this function for raw data cache here to apply globally, as all functions in the package should use this function to get the raw data cache path.
Usage
spod_subfolder_raw_data_cache(ver = 1)
Arguments
ver |
Integer. Can be 1 or 2. The version of the data to use. v1 spans 2020-2021, v2 covers 2022 and onwards. See more details in codebooks with |
Value
A character
string with the subfolder name for the raw data cache.
Remove duplicate values in a semicolon-separated string
Description
Remove duplicate IDs in a semicolon-separated string in a selected column in a data frame
Usage
spod_unique_separated_ids(column)
Arguments
column |
A |
Value
A character
vector with semicolon-separated unique IDs.
Translate zone names from English to Spanish
Description
Translate zone names from English to Spanish
Usage
spod_zone_names_en2es(
zones = c("districts", "dist", "distr", "distritos", "municipalities", "muni",
"municip", "municipios", "lua", "large_urban_areas", "gau", "grandes_areas_urbanas")
)
Arguments
zones |
The zones for which to download the data. Can be |
Value
A character
string with the translated zone name. Or NULL
if the zone name is not recognized.