Help for package rDataPipeline

Title:

Functions to Interact with the 'FAIR Data Pipeline'

Version:

0.60.0

Description:

R implementation of the 'FAIR Data Pipeline API'. The 'FAIR Data Pipeline' is intended to enable tracking of provenance of FAIR (findable, accessible and interoperable) data used in epidemiological modelling.

License:

GPL (≥ 3)

Imports:

assertthat, cli, configr, dplyr, git2r, httr, jsonlite, openssl, R6, rhdf5, semver, stats, usethis, utils, yaml

Suggests:

units, testthat

biocViews:

rhdf5

Encoding:

UTF-8

RoxygenNote:

7.3.2

URL:

https://www.fairdatapipeline.org/rDataPipeline/, https://github.com/FAIRDataPipeline/rDataPipeline

BugReports:

https://github.com/FAIRDataPipeline/rDataPipeline/issues

NeedsCompilation:

Packaged:

2024-10-08 08:59:43 UTC; ryanfield

Author:

Sonia Mitchell

[aut], Ryan Field

[cre, aut]

Maintainer:

Ryan Field <ryan.field@glasgow.ac.uk>

Repository:

CRAN

Date/Publication:

2024-10-08 09:20:02 UTC

rDataPipeline

Description

FAIR Data Pipeline API

Details

For more information see https://www.fairdatapipeline.org/

Author(s)

Maintainer: Ryan Field ryan.field@glasgow.ac.uk (ORCID)

Authors:

Sonia Mitchell (ORCID)

add_read

Description

Add data product to read block of user-written config file. Used in combination with create_config() for unit testing.

Usage

add_read(
  path,
  data_product,
  component,
  version,
  use_data_product,
  use_component,
  use_version,
  use_namespace
)

Arguments

path

config file path

data_product

data_product field

component

component field

version

(optional) version field

use_data_product

(optional) use_data_product field

use_component

(optional) use_component field

use_version

(optional) use_version field

use_namespace

(optional) use_namespace field

Examples

## Not run: 
path <- "test_config/config.yaml"

# Write run_metadata block
create_config(path = path,
              description = "test",
              input_namespace = "test_user",
              output_namespace = "test_user")

# Write read block
add_read(path = path,
         data_product = "test/array",
         component = "level/a/s/d/f/s",
         version = "0.2.0")

## End(Not run)

add_write

Description

Add data product to read block of user-written config file. Used in combination with create_config() for unit testing.

Usage

add_write(
  path,
  data_product,
  description,
  version,
  file_type,
  use_data_product,
  use_component,
  use_version,
  use_namespace
)

Arguments

path

config file path

data_product

data_product field

description

component field

version

(optional) version field

file_type

(optional) file type field

use_data_product

(optional) use_data_product field

use_component

(optional) use_component field

use_version

(optional) use_version field

use_namespace

(optional) use_namespace field

Examples

## Not run: 
path <- "test_config/config.yaml"

# Write run_metadata block
create_config(path = path,
              description = "test",
              input_namespace = "test_user",
              output_namespace = "test_user")

# Write read block
add_write(path = path,
          data_product = "test/array",
          description = "data product description",
          version = "0.2.0")

## End(Not run)

check_config

Description

check_config

Usage

check_config(handle, data_product, what)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

data_product

what

what

check_dataproduct_exists

Description

If a data product already exists with the same name, version, and namespace, throw an error

Usage

check_dataproduct_exists(
  write_dataproduct,
  write_version,
  write_namespace_id,
  endpoint
)

Arguments

write_dataproduct

write_dataproduct

write_version

write_version

write_namespace_id

write_namespace_id

endpoint

endpoint

check_datetime

Description

check_datetime

Usage

check_datetime(table, this_field, query_class, this_query)

Arguments

table

a string specifying the name of the table

this_field

a string specifying the name of the field

query_class

a string specifying the class of the field

this_query

a string specifying the contents of the field

Check if entry exists in the data registry

Description

Check whether an entry already exists in a table (in the data registry)

Usage

check_exists(table, query)

Arguments

table

a string specifying the name of the table

query

a list containing a valid query for the table, e.g. list(field = value)

Value

Returns TRUE if the entry exists and FALSE if it doesn't

check_field

Description

check_field

Usage

check_field(table, this_field, query_class, this_query, method, endpoint)

Arguments

table

a string specifying the name of the table

this_field

a string specifying the name of the field

query_class

a string specifying the class of the field

this_query

a string specifying the contents of the field

method

a string specifying the method, c("GET", "POST")

endpoint

endpoint

check_fields

Description

check_fields

Usage

check_fields(table, query, method, endpoint)

Arguments

table

a string specifying the name of the table

query

a list containing the query

method

a string specifying the method, c("GET", "POST")

endpoint

endpoint

check_handle

Description

check_handle

Usage

check_handle(handle, data_product, what, component)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the name of the data product

what

element in handle – one of c("inputs", "outputs")

component

a string specifying the name of the component

check_integer

Description

check_integer

Usage

check_integer(table, this_field, query_class, this_query)

Arguments

table

a string specifying the name of the table

this_field

a string specifying the name of the field

query_class

a string specifying the class of the field

this_query

a string specifying the contents of the field

check_local_repo

Description

check_local_repo

Usage

check_local_repo(path)

Arguments

path

Local repository file path

Value

boolean, if local repository is clean (TRUE, else FALSE)

check_string

Description

check_string

Usage

check_string(table, this_field, this_query)

Arguments

table

a string specifying the name of the table

this_field

a string specifying the name of the field

this_query

a string specifying the contents of the field

Check if table exists

Description

Check if table exists in the data registry

Usage

check_table_exists(table)

Arguments

table

a string specifying the name of the table

Value

Returns TRUE if a table exists, FALSE if it doesn't

check_yaml_write

Description

check_yaml_write

Usage

check_yaml_write(handle, data_product)

Arguments

handle

fdp object

data_product

a string specifying the name of the data product

Clean query

Description

Function to clean a query and return it without an api prefix

Usage

clean_query(data, endpoint)

Arguments

data

a list containing a valid query for the table, e.g. list(field = value)

endpoint

endpoint

create_config

Description

Generates (user generated) config.yaml files for unit tests. Use add_read() and add_write() functions to add read and write blocks.

Usage

create_config(
  path,
  description,
  input_namespace,
  output_namespace,
  write_data_store = file.path(tempdir(), "datastore", ""),
  force = TRUE,
  local_repo = "local_repo"
)

Arguments

path

config file path

description

description field

input_namespace

input_namespace field

output_namespace

output_namespace field

write_data_store

write_data_store field

force

force

local_repo

local_repo

create_index

Description

create_index

Usage

create_index(self)

Arguments

self

self

Create version number

Description

Creates a version number from either a date and a version or a date and major and patch or major minor patch. If no parameters are supplied a default version is returned 0.1.0 This function prioritizes download date and version over all other parameters

Usage

create_version_number(
  download_date = NULL,
  version = NULL,
  major = 0,
  minor = "1",
  patch = 0
)

Arguments

download_date

(Optional) download_date This can either be a date or datetime but must include the full year e.g from Sys.Date() 2020-01-01 Or from Sys.time() 2020-01-01 00:00:00 BST note: also accepts / delimited dates e.g 01/02/2020 or 2020/01/01 and accepts date without delimiters but assumes ddmmyyyy or yyyymmdd e.g. 20200201

version

version number using major.minor.patch numbering e.g. 0.1.0 or major.patch e.g. 0.0

major

major number if not using version

minor

minor number if not using date

patch

patch number if not using version

Value

returns a character vector in the format of major.minor.patch e.g. 0.20200101.0

Download source file from database

Description

Download source file from database

Usage

download_from_database(
  source_root,
  source_path,
  path,
  filename,
  overwrite = FALSE
)

Arguments

source_root

a string specifying the source root

source_path

a string specifying the source path

path

a string specifying where the file will be saved

filename

a string specifying the filename (the name given to the saved file)

overwrite

a boolean specifying whether or not the file should be overwritten if it already exists

Download source file from URL

Description

This function will download a file from a url

Usage

download_from_url(source_root, source_path, path, filename)

Arguments

source_root

a string specifying the source root

source_path

a string specifying the source path

path

a string specifying where the file will be saved

filename

a string specifying the filename (the name given to the saved file)

extract_id

Description

extract_id

Usage

extract_id(url, endpoint)

Arguments

url

url

endpoint

endpoint

fair_init

Description

fair_init

Usage

fair_init(name, identifier, endpoint = "http://127.0.0.1:8000/api/")

Arguments

name

a string specifying the full name or organisation name of the author; note that at least one of name or identifier must be specified

identifier

(optional) a string specifying the full URL identifier (e.g. ORCiD or ROR ID) of the author

endpoint

a string specifying the registry endpoint

fair_run

Description

fair_run

Usage

fair_run(
  path = "config.yaml",
  endpoint = "http://127.0.0.1:8000/api/",
  skip = FALSE
)

Arguments

path

string

endpoint

a string specifying the registry endpoint

skip

don't bother checking whether the repo is clean

fdp-class

Description

fdp-class

Details

Container for class fdp

Public fields

yaml: a list containing the contents of the working config.yaml
fdp_config_dir: a string specifying the directory passed from ⁠fair run⁠
model_config: a string specifying the URL of an entry in the object table associated with the storage_location of the working config.yaml
submission_script: a string specifying the URL of an entry in the object table associated with the storage_location of the submission script
code_repo: a string specifying the URL of an entry in the object table associated with the GitHub repository
code_run: a string specifying the URL of an entry in the code_run table
inputs: a data.frame containing metadata associated with code_run inputs
outputs: a data.frame containing metadata associated with code_run outputs
issues: a data.frame containing metadata associated with code_run issues

Methods

Method `new()`

Create a new fdp object

Usage

fdp$new(
  yaml,
  fdp_config_dir,
  model_config,
  submission_script,
  code_repo,
  code_run
)

Arguments

yaml: a list containing the contents of the working config.yaml
fdp_config_dir: a string specifying the directory passed from ⁠fair run⁠
model_config: a string specifying the URL of an entry in the object table associated with the storage_location of the working config.yaml
submission_script: a string specifying the URL of an entry in the object table associated with the storage_location of the submission script
code_repo: a string specifying the URL of an entry in the object table associated with the GitHub repository
code_run: a string specifying the URL of an entry in the code_run table

Returns

Returns a new fdp object

Method `print()`

Print method

Usage

fdp$print(...)

Arguments

...: additional parameters, currently none are used

Method `input()`

Record code_run inputs in fdp object

Usage

fdp$input(
  data_product,
  use_data_product,
  use_component,
  use_version,
  use_namespace,
  path,
  component_url
)

Arguments

data_product: a string specifying the name of the data product, used as a reference
use_data_product: a string specifying the name of the data product, used as input in the code_run
use_component: a string specifying the name of the data product component, used as input in the code_run
use_version: a string specifying the data product version, used as input in the code_run
use_namespace: a string specifying the namespace in which the data product resides, used as input in the code_run
path: a string specifying the location of the data product in the local data store
component_url: a string specifying the URL of an entry in the object_component table

Returns

Returns an updated fdp object

Method `output()`

Record code_run outputs in fdp object

Usage

fdp$output(
  data_product,
  use_data_product,
  use_component,
  use_version,
  use_namespace,
  path,
  data_product_description,
  component_description,
  public
)

Arguments

data_product: a string specifying the name of the data product, used as a reference
use_data_product: a string specifying the name of the data product, used as output in the code_run
use_component: a string specifying the name of the data product component, used as output in the code_run
use_version: a string specifying the version of the data product, used as output in the code_run
use_namespace: a string specifying the namespace in which the data product resides, used as output in the code_run
path: a string specifying the location of the data product in the local data store
data_product_description: a string containing a description of the data product
component_description: a string containing a description of the data product component
public

Returns

Returns an updated fdp object

Method `output_index()`

Return index of data product recorded in fdp object so that an issue may be attached

Usage

fdp$output_index(data_product, component, version, namespace)

Arguments

data_product: a string specifying the name of the data product, used as output in the code_run
component: a string specifying the name of the data product component, used as output in the code_run
version: a string specifying the name of the data product version, used as output in the code_run
namespace: a string specifying the namespace in which the data product resides, used as input in the code_run

Returns

Returns an index used to identify the data product

Method `raise_issue()`

Record issue in fdp object

Usage

fdp$raise_issue(
  index,
  type,
  use_data_product,
  use_component,
  use_version,
  use_namespace,
  issue,
  severity
)

Arguments

index: a numeric index, used to identify each input and output in the fdp object
type: a string specifying the type of issue (one of "data", "config", "script", "repo")
use_data_product: a string specifying the name of the data product, used as output in the code_run
use_component: a string specifying the name of the data product component, used as output in the code_run
use_version: a string specifying the name of the data product version, used as output in the code_run
use_namespace: a string specifying the namespace in which the data product resides, used as input in the code_run
issue: a string containing a free text description of the issue
severity: an integer specifying the severity of the issue

Returns

Returns an updated fdp object

Method `finalise_output_hash()`

Record file hash and update path name in fdp object

Usage

fdp$finalise_output_hash(
  use_data_product,
  use_data_product_runid,
  use_version,
  use_namespace,
  hash,
  new_path,
  data_product_url,
  delete_if_duplicate = FALSE
)

Arguments

use_data_product: a string specifying the name of the data product, used as output in the code_run
use_data_product_runid: a string specifying the name of the data product, the same as use_data_product excluding the RUN_ID variable
use_version: a string specifying the name of the data product version, used as output in the code_run
use_namespace: a string specifying the namespace in which the data product resides, used as input in the code_run
hash: a string specifying the hash of the file
new_path: a string specifying the updated location (filename is now the hash of the file) of the data product in the local data store
data_product_url: a string specifying the URL of an object associated with the data_product
delete_if_duplicate: (optional) default is FALSE

Returns

Returns an updated fdp object

Method `finalise_output_url()`

Record data_product and component URLs in fdp object

Usage

fdp$finalise_output_url(
  use_data_product,
  use_component,
  use_version,
  use_namespace,
  component_url
)

Arguments

use_data_product: a string specifying the name of the data product, used as output in the code_run
use_component: a string specifying the name of the data product component, used as output in the code_run
use_version: a string specifying the name of the data product version, used as output in the code_run
use_namespace: a string specifying the namespace in which the data product resides, used as input in the code_run
component_url: a string specifying the URL of an entry in the object_component table

Returns

Returns an updated fdp object

Method `clone()`

The objects of this class are cloneable with this method.

Usage

fdp$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

fdp_resolve_read

Description

fdp_resolve_read

Usage

fdp_resolve_read(this_read, yaml)

Arguments

this_read

this_read

yaml

user written config file

fdp_resolve_write

Description

fdp_resolve_write

Usage

fdp_resolve_write(this_write, yaml)

Arguments

this_write

this_write

yaml

user written config file

Finalise code run

Description

Finalise Code Run and push associated metadata to the local registry.

Usage

finalise(handle, delete_if_empty = FALSE, delete_if_duplicate = FALSE)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

delete_if_empty

(optional) default is FALSE; see Details

delete_if_duplicate

(optional) default is FALSE; see Details

Details

If a Code Run does not read an input, write an output, or attach an issue, then delete the Code Run entry when delete_if_empty is set to TRUE.

If a data product has the same hash as a previous version, remove it from the registry when delete_if_duplicate is set to TRUE.

Find matching read aliases in config file

Description

Find read aliases in working config that match wildcard string

Usage

find_read_match(handle, data_product)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the data product name

Find matching write aliases in config file

Description

Find write aliases in working config that match wildcard string

Usage

find_write_match(handle, data_product)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the data product name

findme

Description

Returns metadata associated with the calculated hash of a target file. When multiple entries exist in the data registry all are returned.

Usage

findme(file, endpoint)

Arguments

file

file path

endpoint

endpoint

get_author_url

Description

get_author_url

Usage

get_author_url(endpoint)

Arguments

endpoint

a string specifying the registry endpoint

Get H5 file components

Description

Returns the names of the items at the root of the file

Usage

get_components(filename)

Arguments

filename

a string specifying a filename

Value

Returns the names of the items at the root of the file

get_dataproduct

Description

get_dataproduct

Usage

get_dataproduct(
  data_product,
  version,
  namespace,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

data_product

data_product

version

version

namespace

namespace

endpoint

endpoint

Get entity from url

Description

Get entity from url

Usage

get_entity(url)

Arguments

url

a string specifying the url of an entry

Return all fields associated with a table entry in the data registry

Description

Return all fields associated with a table entry in the data registry

Usage

get_entry(table, query, endpoint = "http://127.0.0.1:8000/api/")

Arguments

table

a string specifying the name of the table

query

a list containing a valid query for the table, e.g. list(field = value)

endpoint

a string specifying the registry endpoint

Value

Returns a list of fields present in the specified entry

Return all entries posted to a table in the data registry

Description

Get entries (from the data registry) in a particular table

Usage

get_existing(
  table,
  limit_results = TRUE,
  detail = "all",
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

table

a string specifying the name of the table

limit_results

a boolean specifying whether or not to limit the results, default is TRUE

detail

a string specifying what level of detail to return; options are "all" for all details or "id" for just URL and IDs

endpoint

a string specifying the registry endpoint

Value

Returns a data.frame of entries in table, default is limited to 100 entries

Get fields from table

Description

Use API endpoint to produce a list of fields for a table. Requires API key.

Usage

get_fields(table, endpoint = "http://127.0.0.1:8000/api/")

Arguments

table

a string specifying the name of the table

endpoint

a string specifying the registry endpoint

Value

Returns a data.frame of fields and their attributes set to "none"

Calculate hash from file

Description

Returns the SHA1 hash of a given file

Usage

get_file_hash(filename)

Arguments

filename

a string specifying a filename

Get current GitHub hash

Description

Get the hash of the latest commit in the master branch of a particular repository. This function assumes git is installed and located in the System PATH.

Usage

get_github_hash(repo)

Arguments

repo

a string specifying the github username/repository

Get ID

Description

Retrieve IDs for particular entries or all entries in a table

Usage

get_id(table, query = list(), endpoint = "http://127.0.0.1:8000/api/")

Arguments

table

a string specifying the name of the table

query

a list containing a valid query for the table, e.g. list(field = value)

endpoint

a string specifying the registry endpoint

Value

Returns a string or list of strings specifying the URL or URLs of entries in a table

get_index

Description

get_index

Usage

get_index(write, data_product)

Arguments

write

write

data_product

data_product

get_max_version

Description

If entry doesn't exist in the registry, return version 0.0.0

Usage

get_max_version(data_product, namespace_id)

Arguments

data_product

data_product

namespace_id

namespace_id

Get storage location from url

Description

Get storage location entry

Usage

get_storage_location(location)

Arguments

location

the url of an entry in the storage_location table

Value

Returns a list of fields associated with the specified entry

Get optional fields

Description

Get optional fields

Usage

get_table_optional(table, endpoint)

Arguments

table

a string specifying the name of the table

endpoint

a string specifying the registry endpoint

Value

Returns a data.frame of optional fields and their properties

Get queryable fields

Description

Get queryable fields

Usage

get_table_queryable(table, endpoint)

Arguments

table

a string specifying the name of the table

endpoint

a string specifying the registry endpoint

Value

Returns a character vector of queryable fields

Get readable fields

Description

Get readable fields

Usage

get_table_readable(table, endpoint)

Arguments

table

name of table

endpoint

a string specifying the registry endpoint

Value

a dataframe of readable fields and their properties

Get required fields

Description

Get required fields

Usage

get_table_required(table)

Arguments

table

name of table

Value

a dataframe of required fields and their properties

Get writable fields

Description

Get writable fields

Usage

get_table_writable(table, endpoint)

Arguments

table

a string specifying the name of the table

endpoint

a string specifying the registry endpoint

Value

Returns a character vector of writable fields

Get tables from registry

Description

Use api endpoint to produce a list of tables

Usage

get_tables(endpoint = "http://127.0.0.1:8000/api/")

Arguments

endpoint

a string specifying the registry endpoint

Value

a character vector of tables

get_token

Description

get_token

Usage

get_token()

Get URL

Description

Retrieve URLs for particular entries or all entries in a table

Usage

get_url(table, query = list(), endpoint = "http://127.0.0.1:8000/api/")

Arguments

table

a string specifying the name of the table

query

a list containing a valid query for the table, e.g. list(field = value)

endpoint

a string specifying the registry endpoint

Value

Returns a string or list of strings specifying the URL or URLs of entries in a table

List files in GitHub repository

Description

List files in GitHub repository

Usage

github_files(repo)

Arguments

repo

a string specifying the github username/repository

increment_filename

Description

Searches directory for duplicate files and increments filename.

Usage

increment_filename(path)

Arguments

path

path

Initialise code run

Description

Reads in a working config file, generates new Code Run entry, and returns a handle containing various metadata.

Usage

initialise(config, script)

Arguments

config

a string specifying the location of the working config file in the data store

script

a string specifying the location of the submission script in the data store

Value

Returns an object of class fdp, R6 containing metadata required by the Data Pipeline API

Check whether fields are queryable

Description

Check whether fields are queryable

Usage

is_queryable(table, query, method, endpoint)

Arguments

table

a string specifying the name of the table

query

a list containing the query

method

a string specifying the method, c("GET", "POST")

endpoint

endpoint

Value

Returns TRUE if the entry is queryable and FALSE if it isn't

Link path to external format data

Description

Link path to external format data

Usage

link_read(handle, data_product)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string representing an external object in the config.yaml file

Value

Returns a string specifying the location of the data product to be read

Link path for external format data

Description

Link path for external format data

Usage

link_write(handle, data_product)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string representing an external object in the config.yaml file

Value

Returns a string specifying the location in which the data product should be written

Post entry to author table

Description

Upload information to the author table in the data registry

Usage

new_author(name, identifier, endpoint = "http://127.0.0.1:8000/api/")

Arguments

name

a string specifying the full name or organisation name of the author; note that at least one of name or identifier must be specified

identifier

(optional) a string specifying the full URL identifier (e.g. ORCiD or ROR ID) of the author

endpoint

a string specifying the registry endpoint

Post entry to code_repo_release table

Description

Upload information to the code_repo_release table in the data registry

Usage

new_code_repo_release(
  name,
  version,
  object_url,
  website,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

name

a string specifying the name of an official release of code

version

a string specifying the version release (conforming with semantic versioning syntax)

object_url

a string specifying the URL of an object

website

(optional) a string specifying the URL of the website for this code release

endpoint

a string specifying the registry endpoint

Post entry to code_run table

Description

Upload information to the code_run table in the data registry

Usage

new_code_run(
  run_date,
  description,
  code_repo_url,
  model_config_url,
  submission_script_url,
  inputs_urls = list(),
  outputs_urls = list(),
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

run_date

the date-time of the code_run e.g. Sys.time() or "2010-07-11 12:15:00 BST"

description

(optional) a string containing a free text description of the code_run

code_repo_url

(optional) a string specifying the URL of an object associated with the code_repo_release that was run

model_config_url

(optional) a string specifying the URL of an object associated with the working config file used for the code_run

submission_script_url

(optional) a string specifying the URL of an object associated with the submission script used for the code_run

inputs_urls

a list of object_component URLs referencing code_run inputs

outputs_urls

a list of object_component URLs referencing code_run outputs

endpoint

a string specifying the registry endpoint

Post entry to data_product table

Description

Upload information to the data_product table in the data registry

Usage

new_data_product(
  name,
  version,
  object_url,
  namespace_url,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

name

a string specifying the name of the data_product

version

a string specifying the version identifier of the data_product (must conform to semantic versioning syntax)

object_url

a string specifying the URL of the entry in the object table

namespace_url

a string specifying the URL of the entry in the namespace table

endpoint

a string specifying the registry endpoint

Post entry to external_object table

Description

Upload information to the external_object table in the data registry

Usage

new_external_object(
  doi_or_unique_name,
  primary_not_supplement = TRUE,
  release_date,
  title,
  description,
  data_product_url,
  original_store_url,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

doi_or_unique_name

a string specifying the DOI or name of the external_object

primary_not_supplement

(optional) a boolean flag to indicate whether the external object is the primary source (TRUE) or not (FALSE)

release_date

the date-time that the external_object was released e.g. Sys.time() or "2010-07-11 12:15:00 BST"

title

a string specifying the title of the external_object

description

(optional) a string containing a free text description of the external_object

data_product_url

a string specifying the URL of an entry in the data_product table

original_store_url

(optional) a string specifying the URL of a an entry in the storage_location table that references the original location of an external_object

endpoint

a string specifying the registry endpoint

Post entry to file_type table

Description

Upload information to the file_type table in the data registry

Usage

new_file_type(name, extension, endpoint = "http://127.0.0.1:8000/api/")

Arguments

name

a string specifying the name of the file type

extension

a string specifying the filename extension

endpoint

a string specifying the registry endpoint

Post entry to issue table

Description

Upload information to the issue table in the data registry

Usage

new_issue(
  severity,
  description,
  component_issues,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

severity

an integer specifying the severity of the issue

description

a string containing a free text description of the issue

component_issues

a list of object_component URLs with which the issue is associated; this can be an empty list

endpoint

a string specifying the registry endpoint

Post entry to keyword table

Description

Upload information to the keyword table in the data registry

Usage

new_keyword(
  object_url,
  keyphrase,
  identifier,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

object_url

a string specifying the URL of an object

keyphrase

a string a string containing a free text key phrase

identifier

(optional) a string specifying the URL of ontology annotation to associate with this keyword

endpoint

a string specifying the registry endpoint

Post entry to licence table

Description

Upload information to the licence table in the data registry

Usage

new_licence(object_url, licence_info, endpoint = "http://127.0.0.1:8000/api/")

Arguments

object_url

a string specifying the URL of an object

licence_info

a free text string containing information about the licence

endpoint

a string specifying the registry endpoint

Post entry to namespace table

Description

Upload information to the namespace table in the data registry

Usage

new_namespace(
  name,
  full_name,
  website,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

name

a string specifying the name of the namespace

full_name

(optional) a string specifying the full name of the namespace

website

(optional) a string specifying the website URL associated with the namespace

endpoint

a string specifying the registry endpoint

Post entry to object table

Description

Upload information to the object table in the data registry

Usage

new_object(
  description,
  storage_location_url,
  authors_url,
  file_type_url,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

description

(optional) a string containing a free text description of the object

storage_location_url

(optional) a string specifying the URL of an entry in the storage_location table

authors_url

(optional) a list of author URLs associated with this object

file_type_url

(optional) a string specifying the URL of an entry in the file_type table

endpoint

a string specifying the registry endpoint

Post entry to object_component table

Description

Upload information to the object_component table in the data registry

Usage

new_object_component(
  object_url,
  name,
  description,
  whole_object = FALSE,
  issues_urls,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

object_url

a string specifying the URL of an existing object

name

a string specifying the name of the object_component, unique in the context of object_component and its object reference

description

(optional) a string containing a free text description of the object_component

whole_object

a boolean flag specifying whether or not this object_component refers to the whole object or not - default is FALSE

issues_urls

(optional) a list of issues URLs to associate with this object

endpoint

a string specifying the registry endpoint

Note that the object_component table contains issues as an additional optional field. This is not included here. Instead use attach_issue() and associated functionality to attach issues to objects and object components.

Post entry to quality_controlled table

Description

Upload information to the quality_controlled table in the data registry

Usage

new_quality_controlled(object_url, endpoint = "http://127.0.0.1:8000/api/")

Arguments

object_url

a string specifying the URL of an object

endpoint

a string specifying the registry endpoint

Post entry to storage_location table

Description

Upload information to the storage_location table in the data registry

Usage

new_storage_location(
  path,
  hash,
  public,
  storage_root_url,
  endpoint = "http://127.0.0.1:8000/api/"
)

Arguments

path

a string specifying the path from the storage_root URI to the item location, which when appended to storage_root URI produces a complete URL

hash

a string specifying the SHA1 hash of a file stored in storage_location

public

a boolean indicating whether the storage_location is public or not (default is TRUE)

storage_root_url

a string specifying the URL of an entry in the storage_root table

endpoint

a string specifying the registry endpoint

Post entry to storage_root table

Description

Upload information to the storage_root table in the data registry

Usage

new_storage_root(root, local, endpoint = "http://127.0.0.1:8000/api/")

Arguments

root

a string specifying the URI of a storage_location, which when prepended to a storage_location produces a complete URI to a file

local

(optional) a boolean indicating whether the storage_root is local or not

endpoint

a string specifying the registry endpoint

Post entry to user_author table

Description

Upload information to the user_author table in the data registry

Usage

new_user_author(user_url, author_url, endpoint = "http://127.0.0.1:8000/api/")

Arguments

user_url

a string specifying the URL of an existing user

author_url

a string specifying the URL of an existing author

endpoint

a string specifying the registry endpoint

Check whether paper exists

Description

Check whether paper is in the data registry

Usage

paper_exists(doi)

Arguments

doi

doi

Patch entry in data registry

Description

Patch entry in data registry

Usage

patch_data(url, data)

Arguments

url

url

data

data

Post entry to data registry

Description

Post data to registry

Usage

post_data(table, data, endpoint)

Arguments

table

table name as a character

data

data as a named list

endpoint

a string specifying the registry endpoint

raise_issue

Description

raise_issue

Usage

raise_issue(
  index,
  handle,
  component = NA,
  data_product,
  issue,
  severity,
  whole_object = FALSE
)

Arguments

index

index returned from ⁠link_*()⁠, read_(), or write()

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

component

a string specifying the component name

data_product

a string specifying the data product name

issue

a string specifying the issue

severity

a numeric value specifying the severity

whole_object

a boolean flag specifying whether or not to reference the whole_object

Raise issue with config file

Description

Raise issue with config file

Usage

raise_issue_config(handle, issue, severity)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

issue

a string specifying the issue

severity

a numeric value specifying the severity

Raise issue with remote repository

Description

Raise issue with remote repository

Usage

raise_issue_repo(handle, issue, severity)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

issue

a string specifying the issue

severity

a numeric value specifying the severity

Raise issue with submission script

Description

Raise issue with submission script

Usage

raise_issue_script(handle, issue, severity)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

issue

a string specifying the issue

severity

a numeric value specifying the severity

random_hash

Description

Generates a random hash

Usage

random_hash()

Read array component from HDF5 file

Description

Function to read array type data from hdf5 file.

Usage

read_array(handle, data_product, component)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying a data product

component

a string specifying a data product component

Value

Returns an array with attached Dimension_i_title, Dimension_i_units, Dimension_i_values, and units attributes, if available

Read distribution component from TOML file

Description

Function to read distribution type data from toml file.

Usage

read_distribution(handle, data_product, component)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying a data product

component

a string specifying a data product component

Read estimate component from TOML file

Description

Function to read point-estimate type data from toml file.

Usage

read_estimate(handle, data_product, component)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying a data product

component

a string specifying a data product component

Read table component from HDF5 file

Description

Function to read table type data from hdf5 file.

Usage

read_table(handle, data_product, component)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying a data product

component

a string specifying a data product component

Value

Returns a data.frame with attached column_units attributes, if available

register_issue_dataproduct

Description

register_issue_dataproduct

Usage

register_issue_dataproduct(handle, this_issue)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

this_issue

this_issue

register_issue_script

Description

register_issue_script

Usage

register_issue_script(handle, this_issue, type)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

this_issue

this_issue

type

type

remove_empty_parents

Description

remove_empty_parents

Usage

remove_empty_parents(path, root)

Arguments

path

path

root

root

resolve_read

Description

resolve_read

Usage

resolve_read(handle, data_product, component = NA)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the name of the data product

component

a string specifying the name of data product component

resolve_version

Description

resolve_version

Usage

resolve_version(version, data_product, namespace_id)

Arguments

version

version number

data_product

data_product

namespace_id

namespace_id

resolve_data_product

Description

resolve_data_product

Usage

resolve_write(handle, data_product, file_type)

Arguments

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the name of the data product

file_type

(optional) a string specifying the file type; when missing, file_type will be read from the config file

Validate fields

Description

Function to validate fields in post data

Usage

validate_fields(table, data, endpoint)

Arguments

table

table name as character

data

data as a named list

endpoint

endpoint

Value

Returns

Write array component to HDF5 file

Description

Function to populate hdf5 file with array type data.

Usage

write_array(
  array,
  handle,
  data_product,
  component,
  description,
  dimension_names,
  dimension_values,
  dimension_units,
  units
)

Arguments

array

an array containing the data

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the name of the data product

component

a string specifying a location within the hdf5 file

description

a string describing the data product component

dimension_names

a list where each element is a vector containing the labels associated with a particular dimension (e.g. element 1 corresponds to dimension 1, which corresponds to row names) and the name of each element describes the contents of each dimension (e.g. age classes).

dimension_values

(optional) a list of values corresponding to each dimension (e.g. list element 2 corresponds to columns)

dimension_units

(optional) a list of units corresponding to each dimension (e.g. list element 2 corresponds to columns)

units

(optional) a string specifying the units of the data as a whole

Value

Returns a handle index associated with the just written component, which can be used to raise an issue if necessary

Write distribution component to TOML file

Description

Write distribution component to TOML file

Usage

write_distribution(
  distribution,
  parameters,
  handle,
  data_product,
  component,
  description
)

Arguments

distribution

a string specifying the name of the distribution

parameters

a list specifying the distribution parameters

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the name of the data product

component

a string specifying a location within the toml file

description

a string describing the data product component

Value

Returns a handle index associated with the just written component, which can be used to raise an issue if necessary

Write estimate component to TOML file

Description

Function to populate toml file with point-estimate type data. If a file already exists at the specified location, an additional component will be added.

Usage

write_estimate(value, handle, data_product, component, description)

Arguments

value

an object of class numeric

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the name of the data product

component

a string specifying a location within the toml file

description

a string describing the data product component

Value

Returns a handle index associated with the just written component, which can be used to raise an issue if necessary

Write table component to HDF5 file

Description

Function to populate hdf5 file with array type data.

Usage

write_table(
  df,
  handle,
  data_product,
  component,
  description,
  row_names,
  column_units
)

Arguments

df

an dataframe containing the data

handle

an object of class fdp, R6 containing metadata required by the Data Pipeline API

data_product

a string specifying the name of the data product

component

a string specifying a location within the hdf5 file,

description

a string describing the data product component

row_names

(optional) a vector of rownames

column_units

(optional) a vector comprising column units

Value

Returns a handle index associated with the just written component, which can be used to raise an issue if necessary

rDataPipeline

Description

Details

Author(s)

See Also

add_read

Description

Usage

Arguments

Examples

add_write

Description

Usage

Arguments

Examples

check_config

Description

Usage

Arguments

check_dataproduct_exists

Description

Usage

Arguments

check_datetime

Description

Usage

Arguments

Check if entry exists in the data registry

Description

Usage

Arguments

Value

check_field

Description

Usage

Arguments

check_fields

Description

Usage

Arguments

check_handle

Description

Usage

Arguments

check_integer

Description

Usage

Arguments

check_local_repo

Description

Usage

Arguments

Value

check_string

Description

Usage

Arguments

Check if table exists

Description

Usage

Arguments

Value

check_yaml_write

Description

Usage

Arguments

Clean query

Description

Usage

Arguments

create_config

Description

Usage

Arguments

create_index

Description

Usage

Arguments

Create version number

Description

Method `new()`

Method `print()`

Method `input()`

Method `output()`

Method `output_index()`

Method `raise_issue()`

Method `finalise_output_hash()`

Method `finalise_output_url()`

Method `clone()`