Title: Declarative Feature Extraction from Tabular Data Records
Version: 1.0.0
Description: Extract features from tabular data in a declarative fashion, with a focus on processing medical records. Features are specified as JSON and are independently processed before being joined. Input data can be provided as CSV files or as data frames. This setup ensures that data is transformed in a modular and reproducible manner, and allows the same pipeline to be easily applied to new data.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.1
Imports: dplyr, lubridate, stringr, magrittr, jsonlite, logger, purrr, fs, tibble, rlang
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0), tidyr
Config/testthat/edition: 3
VignetteBuilder: knitr
URL: https://github.com/alan-turing-institute/eider
BugReports: https://github.com/alan-turing-institute/eider/issues
NeedsCompilation: no
Packaged: 2024-05-10 15:14:18 UTC; jyong
Author: Catalina Vallejos ORCID iD [ctb], Louis Aslett ORCID iD [ctb], Simon Rogers ORCID iD [ctb], Camila Rangel Smith ORCID iD [cre, ctb], Helen Duncan Little ORCID iD [aut], Jonathan Yong ORCID iD [aut], The Alan Turing Institute [cph, fnd]
Maintainer: Camila Rangel Smith <crangelsmith@turing.ac.uk>
Repository: CRAN
Date/Publication: 2024-05-13 11:13:19 UTC

eider: Declarative Feature Extraction from Tabular Data Records

Description

eider is a lightweight package for processing tabular data in a declarative fashion. To get started, see: vignette("eider")

Author(s)

Maintainer: Camila Rangel Smith crangelsmith@turing.ac.uk (ORCID) [contributor]

Authors:

Other contributors:

See Also

Useful links:


Obtain filepaths to example data and JSON features

Description

Return an absolute path to the example data and JSON features provided with the package. These files are contained in the package inst/extdata directory.

Usage

eider_example(file = NULL)

Arguments

file

The filename to return the full path for. Defaults to NULL, in which case it will return a vector of all valid filenames.

Value

A string containing the full path to the file, or a vector of filenames

Examples

eider_example()
eider_example("random_ae_data.csv")

Perform the entire feature transformation process

Description

Reads in data and feature specifications and performs the requisite transformations. Please see the package vignettes for more detailed information on the JSON specification of features.

Usage

run_pipeline(
  data_sources,
  feature_filenames = NULL,
  response_filenames = NULL,
  all_ids = NULL
)

Arguments

data_sources

A list, whose names are the unique identifiers of the data sources, and whose values are either the data frame itself or the file path from which they should be read from. Only CSV files are supported at this point in time.

feature_filenames

A vector of file paths to the feature JSON specifications. Defaults to NULL.

response_filenames

A vector of file paths to the response JSON specifications. Defaults to NULL.

all_ids

A vector of all the unique numeric identifiers that should be in the final feature table. If not given, this will be determined by taking the union of all unique identifiers found in input tables used by at least one feature.

Value

A list with the following elementss:

Examples

run_pipeline(
  data_sources = list(ae = eider_example("random_ae_data.csv")),
  feature_filenames = eider_example("ae_total_attendances.json")
)