Title: | Read/Write Simple Feature Objects ('sf') with 'Apache' 'Arrow' |
Version: | 0.4.1 |
Date: | 2021-10-25 |
Description: | Support for reading/writing simple feature ('sf') spatial objects from/to 'Parquet' files. 'Parquet' files are an open-source, column-oriented data storage format from Apache (https://parquet.apache.org/), now popular across programming languages. This implementation converts simple feature list geometries into well-known binary format for use by 'arrow', and coordinate reference system information is maintained in a standard metadata format. |
License: | MIT + file LICENSE |
URL: | https://github.com/wcjochem/sfarrow, https://wcjochem.github.io/sfarrow/ |
BugReports: | https://github.com/wcjochem/sfarrow/issues |
Encoding: | UTF-8 |
RoxygenNote: | 7.1.1 |
Imports: | sf, arrow, jsonlite, dplyr, |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2021-10-27 16:15:34 UTC; jochem |
Author: | Chris Jochem |
Maintainer: | Chris Jochem <w.c.jochem@soton.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2021-10-27 16:30:02 UTC |
Helper function to convert 'data.frame' to sf
Description
Helper function to convert 'data.frame' to sf
Usage
arrow_to_sf(tbl, metadata)
Arguments
tbl |
|
metadata |
|
Value
object of sf
with CRS and geometry columns
Create standardised geo metadata for Parquet files
Description
Create standardised geo metadata for Parquet files
Usage
create_metadata(df)
Arguments
df |
object of class |
Details
Reference for metadata standard:
https://github.com/geopandas/geo-arrow-spec. This is compatible with
GeoPandas
Parquet files.
Value
JSON formatted list with geo-metadata
Convert sfc
geometry columns into a WKB binary format
Description
Convert sfc
geometry columns into a WKB binary format
Usage
encode_wkb(df)
Arguments
df |
|
Details
Allows for more than one geometry column in sfc
format
Value
data.frame
with binary geometry column(s)
Read an Arrow multi-file dataset and create sf
object
Description
Read an Arrow multi-file dataset and create sf
object
Usage
read_sf_dataset(dataset, find_geom = FALSE)
Arguments
dataset |
a |
find_geom |
logical. Only needed when returning a subset of columns.
Should all available geometry columns be selected and added to to the
dataset query without being named? Default is |
Details
This function is primarily for use after opening a dataset with
arrow::open_dataset
. Users can then query the arrow Dataset
using dplyr
methods such as filter
or
select
. Passing the resulting query to this function
will parse the datasets and create an sf
object. The function
expects consistent geographic metadata to be stored with the dataset in
order to create sf
objects.
Value
object of class sf
See Also
open_dataset
, st_read
, st_read_parquet
Examples
# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# create random grouping
nc$group <- sample(1:3, nrow(nc), replace = TRUE)
# use dplyr to group the dataset. %>% also allowed
nc_g <- dplyr::group_by(nc, group)
# write out to parquet datasets
tf <- tempfile() # create temporary location
on.exit(unlink(tf))
# partitioning determined by dplyr 'group_vars'
write_sf_dataset(nc_g, path = tf)
list.files(tf, recursive = TRUE)
# open parquet files from dataset
ds <- arrow::open_dataset(tf)
# create a query. %>% also allowed
q <- dplyr::filter(ds, group == 1)
# read the dataset (piping syntax also works)
nc_d <- read_sf_dataset(dataset = q)
nc_d
plot(sf::st_geometry(nc_d))
sfarrow
: An R package for reading/writing simple feature (sf
)
objects from/to Arrow parquet/feather files with arrow
Description
Simple features are a popular, standardised way to create spatial vector data
with a list-type geometry column. Parquet files are standard column-oriented
files designed by Apache Arrow (https://parquet.apache.org/) for fast
read/writes. sfarrow
is designed to support the reading and writing of
simple features in sf
objects from/to Parquet files (.parquet) and
Feather files (.feather) within R
. A key goal of sfarrow
is to
support interoperability of spatial data in files between R
and
Python
through the use of standardised metadata.
Metadata
Coordinate reference and geometry field information for sf
objects are
stored in standard metadata tables within the files. The metadata are based
on a standard representation (Version 0.1.0, reference:
https://github.com/geopandas/geo-arrow-spec). This is compatible with
the format used by the Python library GeoPandas
for read/writing
Parquet/Feather files. Note to users: this metadata format is not yet stable
for production uses and may change in the future.
Credits
This work was undertaken by Chris Jochem, a member of the WorldPop Research Group at the University of Southampton(https://www.worldpop.org/).
Read a Feather file to sf
object
Description
Read a Feather file. Uses standard metadata information to identify geometry columns and coordinate reference system information.
Usage
st_read_feather(dsn, col_select = NULL, ...)
Arguments
dsn |
character file path to a data source |
col_select |
A character vector of column names to keep. Default is
|
... |
additional parameters to pass to
|
Details
Reference for the metadata used:
https://github.com/geopandas/geo-arrow-spec. These are standard with
the Python GeoPandas
library.
Value
object of class sf
See Also
Examples
# load Natural Earth low-res dataset.
# Created in Python with GeoPandas.to_feather()
path <- system.file("extdata", package = "sfarrow")
world <- st_read_feather(file.path(path, "world.feather"))
world
plot(sf::st_geometry(world))
Read a Parquet file to sf
object
Description
Read a Parquet file. Uses standard metadata information to identify geometry columns and coordinate reference system information.
Usage
st_read_parquet(dsn, col_select = NULL, props = NULL, ...)
Arguments
dsn |
character file path to a data source |
col_select |
A character vector of column names to keep. Default is
|
props |
Now deprecated in |
... |
additional parameters to pass to
|
Details
Reference for the metadata used:
https://github.com/geopandas/geo-arrow-spec. These are standard with
the Python GeoPandas
library.
Value
object of class sf
See Also
Examples
# load Natural Earth low-res dataset.
# Created in Python with GeoPandas.to_parquet()
path <- system.file("extdata", package = "sfarrow")
world <- st_read_parquet(file.path(path, "world.parquet"))
world
plot(sf::st_geometry(world))
Write sf
object to Feather file
Description
Convert a simple features spatial object from sf
and
write to a Feather file using write_feather
. Geometry
columns (type sfc
) are converted to well-known binary (WKB) format.
Usage
st_write_feather(obj, dsn, ...)
Arguments
obj |
object of class |
dsn |
data source name. A path and file name with .parquet extension |
... |
additional options to pass to |
Value
obj
invisibly
See Also
Examples
# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# create temp file
tf <- tempfile(fileext = '.feather')
on.exit(unlink(tf))
# write out object
st_write_feather(obj = nc, dsn = tf)
# In Python, read the new file with geopandas.read_feather(...)
# read back into R
nc_f <- st_read_feather(tf)
Write sf
object to Parquet file
Description
Convert a simple features spatial object from sf
and
write to a Parquet file using write_parquet
. Geometry
columns (type sfc
) are converted to well-known binary (WKB) format.
Usage
st_write_parquet(obj, dsn, ...)
Arguments
obj |
object of class |
dsn |
data source name. A path and file name with .parquet extension |
... |
additional options to pass to |
Value
obj
invisibly
See Also
Examples
# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# create temp file
tf <- tempfile(fileext = '.parquet')
on.exit(unlink(tf))
# write out object
st_write_parquet(obj = nc, dsn = tf)
# In Python, read the new file with geopandas.read_parquet(...)
# read back into R
nc_p <- st_read_parquet(tf)
Basic checking of key geo metadata columns
Description
Basic checking of key geo metadata columns
Usage
validate_metadata(metadata)
Arguments
metadata |
list for geo metadata |
Value
None. Throws an error and stops execution
Write sf
object to an Arrow multi-file dataset
Description
Write sf
object to an Arrow multi-file dataset
Usage
write_sf_dataset(
obj,
path,
format = "parquet",
partitioning = dplyr::group_vars(obj),
...
)
Arguments
obj |
object of class |
path |
string path referencing a directory for the output |
format |
output file format ("parquet" or "feather") |
partitioning |
character vector of columns in |
... |
additional arguments and options passed to
|
Details
Translate an sf
spatial object to data.frame
with WKB
geometry columns and then write to an arrow
dataset with
partitioning. Allows for dplyr
grouped datasets (using
group_by
) and uses those variables to define
partitions.
Value
obj
invisibly
See Also
write_dataset
, st_read_parquet
Examples
# read spatial object
nc <- sf::st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE)
# create random grouping
nc$group <- sample(1:3, nrow(nc), replace = TRUE)
# use dplyr to group the dataset. %>% also allowed
nc_g <- dplyr::group_by(nc, group)
# write out to parquet datasets
tf <- tempfile() # create temporary location
on.exit(unlink(tf))
# partitioning determined by dplyr 'group_vars'
write_sf_dataset(nc_g, path = tf)
list.files(tf, recursive = TRUE)
# open parquet files from dataset
ds <- arrow::open_dataset(tf)
# create a query. %>% also allowed
q <- dplyr::filter(ds, group == 1)
# read the dataset (piping syntax also works)
nc_d <- read_sf_dataset(dataset = q)
nc_d
plot(sf::st_geometry(nc_d))