eider is an R package for extracting machine learning features from tabular data, in particular health records, in a declarative manner.
Features are specified as JSON objects which contain all the
necessary information required to perform a given calculation. For
example, the following calculates the number of total rows per patient
id
in the table labelled ae2
(details on how
to specify this table are in the function documentation).
{
"source_table": "ae2",
"transformation_type": "COUNT",
"grouping_column": "id",
"absent_default_value": 0,
"output_feature_name": "total_ae_attendances"
}
The output of this is a column named
total_ae_attendances
, containing the number of rows per
patient, and with a value of 0 for any patients who do not appear in the
ae2
table.
This declarative approach provides an alternative to traditional,
imperative-style, dplyr
pipelines which can be more
difficult to reason about, especially when a series of features is being
extracted and merged together. As features are specified without
reference to a specific programming language or paradigm, it also
encourages code that is concise, easy to read, and maintainable.
eider
is a collaboration between The Alan Turing
Institute, Public Health Scotland, and the Universities of Edinburgh and
Durham. It grew out of a desire to generalise the feature extraction
process for health data, specifically the SPARRA
(Scottish Patients At Risk of Readmission and Admission)
project (GitHub
repo), and to allow similar analyses to be carried out in different
contexts.
Install via CRAN:
install.packages("eider")
Alternatively, install eider
from its source code on GitHub
using:
install.packages("devtools")
::install_github("alan-turing-institute/eider", build_vignettes = TRUE) devtools
The package documentation is available online. In particular, the package articles contain a series of vignettes which provide detailed guidance on the package and its features.
If you are making changes to the library itself, first clone the repository:
git clone git@github.com:alan-turing-institute/eider.git
You will need to install the lintr
,
pkgdown
, devtools
R packages to build
documentation, run tests, and lint. Then, from the repository root, you
can use the following commands:
make doc
generates all function documentation, and also
generates the README.md
file from
README.rmd
make lint
lints the project directorymake test
runs all testsYou can also use pre-commit
to run all of
these before committing, to ensure that you do not commit incomplete
code. Firstly, install pre-commit
according to the
instructions on the webpage above. Then run
pre-commit install
.
What about vignettes? Well, building vignettes is slightly
more complicated. You can perform a one-time build from the R console
using pkgdown::build_site()
, but running this every time
you edit a file gets tiring quickly. To automate this, first install the
package with make install
, and install a working version of
Python and also entr
(the latter
is available on Homebrew via brew install entr
). Then run
make vig
: this will monitor your vignette RMarkdown files,
rebuild the vignettes any time they are changed, and launch a HTTP
server on port 8000 to view the files. If you change any library code
you will have to run make install
again before rerunning
make vig
.