Help for package finch

Title:

Parse Darwin Core Files

Description:

Parse and create Darwin Core (http://rs.tdwg.org/dwc/) Simple and Archives. Functionality includes reading and parsing all the files in a Darwin Core Archive, including the datasets and metadata; read and parse simple Darwin Core files; and validation of Darwin Core Archives.

Version:

0.4.0

License:

MIT + file LICENSE

LazyData:

true

URL:

https://docs.ropensci.org/finch/, https://github.com/ropensci/finch

BugReports:

https://github.com/ropensci/finch/issues

Encoding:

UTF-8

Imports:

xml2 (≥ 1.0.0), EML (≥ 2.0.0), data.table (≥ 1.10.0), digest, hoardr (≥ 0.2.0)

Suggests:

testthat, crul, jsonlite

RoxygenNote:

7.1.1

X-schema.org-applicationCategory:

Biology

X-schema.org-keywords:

biology, occurrences, specimen, biodiversity, collections, species

X-schema.org-isPartOf:

https://ropensci.org

NeedsCompilation:

Packaged:

2020-08-10 21:48:08 UTC; sckott

Author:

Scott Chamberlain

[aut, cre]

Maintainer:

Scott Chamberlain <myrmecocystus@gmail.com>

Repository:

CRAN

Date/Publication:

2020-08-11 04:50:02 UTC

finch

Description

Parse Darwin Core Archive files

Author(s)

Scott Chamberlain myrmecocystus@gmail.com

Convert a path or URL to a location object

Description

Convert a path or URL to a location object

Usage

as.location(x, ...)

## S3 method for class 'character'
as.location(x, ...)

## S3 method for class 'location'
as.location(x, ...)

## S3 method for class 'location'
print(x, ...)

Arguments

x

Input, a path or URL

...

Ignored.

Examples

# A zip file
file <- system.file("examples/0000154-150116162929234.zip",
  package = "finch")
as.location(file)

# A directory
dir <- system.file("examples/0000154-150116162929234",
  package = "finch")
as.location(dir)

# A URL
as.location("https://httpbin.org/get")

Caching

Description

Manage cached finch files with package hoardr

Details

The dafault cache directory is paste0(rappdirs::user_cache_dir(), "/R/finch"), but you can set your own path using cache_path_set()

cache_delete only accepts one file name, while cache_delete_all doesn't accept any names, but deletes all files. For deleting many specific files, use cache_delete in a lapply() type call

Useful user functions

dwca_cache$cache_path_get() get cache path
dwca_cache$cache_path_set() set cache path
dwca_cache$list() returns a character vector of full path file names
dwca_cache$files() returns file objects with metadata
dwca_cache$details() returns files with details
dwca_cache$delete() delete specific files
dwca_cache$delete_all() delete all files, returns nothing

Examples

## Not run: 
dwca_cache

# list files in cache
dwca_cache$list()

# delete certain database files
# dwca_cache$delete("file path")
# dwca_cache$list()

# delete all files in cache
# dwca_cache$delete_all()
# dwca_cache$list()

# set a different cache path from the default

## End(Not run)

This function is defunct.

Description

This function is defunct.

Usage

dwca_cache_delete(...)

This function is defunct.

Description

This function is defunct.

Usage

dwca_cache_delete_all(...)

This function is defunct.

Description

This function is defunct.

Usage

dwca_cache_details(...)

This function is defunct.

Description

This function is defunct.

Usage

dwca_cache_list(...)

Parse Darwin Core Archive

Description

Parse Darwin Core Archive

Usage

dwca_read(input, read = FALSE, ...)

Arguments

input

(character) Path to local zip file, directory, or a url. If a URL it must be for a zip file.

read

(logical) Whether or not to read in data files. If FALSE, we give back paths to files only. Default: FALSE

...

Further args passed on to data.table::fread()

Details

Note that sometimes file reads fail. We use data.table::fread() internally, which is very fast, but can fail sometimes. If so, try reading in the data manually.

When you pass in a URL, we use rappdirs to determine cache path, and if you pass the same URL again, and your cache is not cleared, we'll pull from the cache. Passing a file or directory on your local system won't invoke the caching route, but will go directly to the file/directory.

Examples

## Not run: 
# set up a temporary directory for the example
dwca_cache$cache_path_set(path = "finch", type = "tempdir")

dir <- system.file("examples", "0000154-150116162929234", package = "finch")

# Don't read data in
(x <- dwca_read(dir, read=FALSE))
x$files
x$highmeta
x$dataset_meta[[1]]
x$data

# Read data
(x <- dwca_read(dir, read=TRUE))
head(x$data[[1]])

# Can pass in a zip file
zip <- system.file("examples", "0000154-150116162929234.zip",
  package = "finch")
(out <- dwca_read(zip))
out$files
out$highmeta
out$emlmeta
out$dataset_meta

# Can pass in zip file as a url
url <-
"https://github.com/ropensci/finch/blob/master/inst/examples/0000154-150116162929234.zip?raw=true"
(out <- dwca_read(url))

# another url
url <- "http://ipt.jbrj.gov.br/jbrj/archive.do?r=redlist_2013_taxons&v=3.12"
(out <- dwca_read(url))

## End(Not run)

Validate a Darwin Core Archive

Description

Validate a Darwin Core Archive

Usage

dwca_validate(x, ifModifiedSince = NULL, browse = FALSE, ...)

Arguments

x

(character) A url for a Darwin Core Archive. If you have a local Darwin Core Archive, put it up online somewhere. Required.

ifModifiedSince

(character) An optional ISO date (yyyy-mm-dd) to enable conditional get requests, validating archives only if they have been modified since the given date. This feature requires the archive url to honor the if-modified-since http header. Apache webservers for example do this out of the box for static files, but if you use dynamic scripts to generate the archive on the fly this might not be recognised. Optional.

browse

(logical) Browse to generated report or not. Default: FALSE

...

Curl options passed to crul::HttpClient

Details

Uses the GBIF DCA validator (http://tools.gbif.org/dwca-validator/)

Examples

## Not run: 
x <- "http://rs.gbif.org/datasets/german_sl.zip"
dwca_validate(x)

## End(Not run)

Defunct functions in finch

Description

dwca_cache_delete: Defunt - see dwca_cache
dwca_cache_delete_all: Defunt - see dwca_cache
dwca_cache_details: Defunt - see dwca_cache
dwca_cache_list: Defunt - see dwca_cache

Parse a DarwinRecordSet and SimpleDarwinRecordSet files

Description

Parse a DarwinRecordSet and SimpleDarwinRecordSet files

Usage

simple_read(file)

Arguments

file

(character) A path to a single simple Darwin Core file in XML format. Required.

Details

Make sure when reading a DarwinRecordSet to access the chunks by position rather than name since duplicate names are allowed in chunks.

Value

a S3 class dwc_recordset when a DarwinRecordSet is given, or a dwc_simplerecordset when a SimpleDarwinRecordSet is given. In each case the object is really just a list, with lightweight S3 class attached for easy downstream usage. Prints summary to screen by default

Examples

## Not run: 
# SimpleDarwinRecordSet examples
file <- system.file("examples", "example_simple.xml", package = "finch")
simple_read(file)
file <- system.file("examples", "example_simple_fossil.xml",
  package = "finch")
simple_read(file)

# DarwinRecordSet examples
file <- system.file("examples", "example_classes_observation.xml",
  package = "finch")
simple_read(file)

file <- system.file("examples", "example_classes_specimen.xml",
  package = "finch")
simple_read(file)

# access elements of the object
file <- system.file("examples", "example_classes_specimen.xml",
  package = "finch")
res <- simple_read(file)
## namespaces
res$meta
## locations
res$locations
## chunks, the first one
res$chunks[[1]]

## End(Not run)