Title: | Tagging and Validating Epidemiological Data |
Version: | 2.0.1 |
Description: | Provides tools to help storing and handling case line list data. The 'linelist' class adds a tagging system to classical 'data.frame' objects to identify key epidemiological data such as dates of symptom onset, epidemiological case definition, age, gender or disease outcome. Once tagged, these variables can be seamlessly used in downstream analyses, making data pipelines more robust and reliable. |
License: | MIT + file LICENSE |
URL: | https://epiverse-trace.github.io/linelist/, https://github.com/epiverse-trace/linelist |
BugReports: | https://github.com/epiverse-trace/linelist/issues |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | true |
Config/Needs/readme: | incidence2 (>= 2.1.1), ggplot2 |
Depends: | R (≥ 4.1.0) |
Imports: | checkmate, rlang, tidyselect |
Suggests: | callr, dplyr, knitr, outbreaks, rmarkdown, spelling, testthat, tibble |
Config/Needs/website: | r-lib/pkgdown, epiverse-trace/epiversetheme |
VignetteBuilder: | knitr |
Language: | en-GB |
NeedsCompilation: | no |
Packaged: | 2025-06-25 14:39:53 UTC; hugo |
Author: | Hugo Gruson |
Maintainer: | Chris Hartgerink <chris@data.org> |
Repository: | CRAN |
Date/Publication: | 2025-06-25 17:20:02 UTC |
Base Tools for Storing and Handling Case Line Lists
Description
The linelist package provides tools to help storing and handling case line
list data. The linelist
class adds a tagging system to classical
data.frame
or tibble
objects which permits to identify key
epidemiological data such as dates of symptom onset, epi case definition,
age, gender or disease outcome. Once tagged, these variables can be
seamlessly used in downstream analyses, making data pipelines more robust and
reliable.
Main functions
-
make_linelist()
: to createlinelist
objects from adata.frame
or atibble
, with the possibility to tag key epi variables -
set_tags()
: to change or add tagged variables in alinelist
-
tags()
: to get the list of tags of alinelist
-
tags_df()
: to get adata.frame
of all tagged variables -
lost_tags_action()
: to change the behaviour of actions where tagged variables are lost (e.g. removing columns storing tagged variables) to issue warnings, errors, or do nothing -
get_lost_tags_action()
: to check the current behaviour of actions where tagged variables are lost
Dedicated methods
Specific methods commonly used to handle data.frame
are provided for
linelist
objects, typically to help flag or prevent actions which could
alter or lose tagged variables (and may thus break downstream data
pipelines).
-
names() <-
(and related functions, such asdplyr::rename()
) will rename tags as needed -
x[...] <-
andx[[...]] <-
(see sub_linelist): will adopt the desired behaviour when tagged variables are lost -
print()
: prints info about thelinelist
in addition to thedata.frame
ortibble
Author(s)
Maintainer: Chris Hartgerink chris@data.org (ORCID) [reviewer]
Authors:
Hugo Gruson (ORCID)
Thibaut Jombart [conceptor]
Other contributors:
Tim Taylor [contributor]
See Also
Useful links:
Report bugs at https://github.com/epiverse-trace/linelist/issues
Examples
if (require(outbreaks)) {
# using base R style
## dataset we'll create a linelist from, only using the first 50 entries
measles_hagelloch_1861[1:50, ]
## create linelist
x <- make_linelist(measles_hagelloch_1861[1:50, ],
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
x
## check tagged variables
tags(x)
## robust renaming
names(x)[1] <- "identifier"
x
## example of dropping tags by mistake - default: warning
x[, 2:5]
## to silence warnings when taggs are dropped
lost_tags_action("none")
x[, 2:5]
## to trigger errors when taggs are dropped
# lost_tags_action("error")
# x[, 2:5]
## reset default behaviour
lost_tags_action()
# using tidyverse style
## example of creating a linelist, adding a new variable, and adding a tag
## for it
if (require(dplyr)) {
x <- measles_hagelloch_1861 |>
tibble() |>
make_linelist(
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
) %>%
mutate(result = if_else(is.na(date_of_death), "survived", "died")) |>
set_tags(outcome = "result") |>
rename(identifier = case_ID)
head(x)
## extract tagged variables
x |>
select(has_tag(c("gender", "age")))
x |>
tags()
x |>
select(starts_with("date"))
}
}
Subsetting of linelist objects
Description
The []
and [[]]
operators for linelist
objects behaves like for regular
data.frame
or tibble
, but check that tagged variables are not lost, and
takes the appropriate action if this is the case (warning, error, or ignore,
depending on the general option set via lost_tags_action()
) .
Usage
## S3 method for class 'linelist'
x[i, j, drop = FALSE]
## S3 replacement method for class 'linelist'
x[i, j] <- value
## S3 replacement method for class 'linelist'
x[[i, j]] <- value
## S3 replacement method for class 'linelist'
x$name <- value
Arguments
x |
a |
i |
a vector of |
j |
a vector of |
drop |
a |
value |
the replacement to be used for the entries identified in |
name |
a literal character string or a name (possibly backtick
quoted). For extraction, this is normally (see under
‘Environments’) partially matched to the |
Value
If no drop is happening, a linelist
. Otherwise an atomic vector.
See Also
-
lost_tags_action()
to set the behaviour to adopt when tags are lost through subsetting; default is to issue a warning -
get_lost_tags_action()
to check the current the behaviour
Examples
if (require(outbreaks) && require(dplyr)) {
## create a linelist
x <- measles_hagelloch_1861 |>
make_linelist(
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
) |>
mutate(result = if_else(is.na(date_of_death), "survived", "died")) |>
set_tags(outcome = "result") |>
rename(identifier = case_ID)
x
## dangerous removal of a tagged column setting it to NULL issues a warning
x[, 1] <- NULL
x
x[[2]] <- NULL
x
x$age <- NULL
x
}
A selector function to use in tidyverse functions
Description
A selector function to use in tidyverse functions
Usage
has_tag(tags)
Arguments
tags |
A character vector of tags listing the variables you want to operate on |
Value
A numeric vector containing the position of the columns with the requested tags
Examples
if (require(outbreaks) && require(dplyr)) {
## dataset we'll create a linelist from
measles_hagelloch_1861
## create linelist
x <- make_linelist(measles_hagelloch_1861,
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
head(x)
x |>
select(has_tag(c("id", "age"))) |>
head()
}
Check and set behaviour for lost tags
Description
This function determines the behaviour to adopt when tagged variables of a
linelist
are lost e.g. through subsetting. This is achieved using options
defined for the linelist
package.
Usage
lost_tags_action(action = c("warning", "error", "none"), quiet = FALSE)
get_lost_tags_action()
Arguments
action |
a |
quiet |
a |
Details
The errors or warnings generated by linelist in case of tagged
variable loss has a custom class of linelist_error
and linelist_warning
respectively.
Value
returns NULL
; the option itself is set in options("linelist")
Examples
# reset default - done automatically at package loading
lost_tags_action()
# check current value
get_lost_tags_action()
# change to issue errors when tags are lost
lost_tags_action("error")
get_lost_tags_action()
# change to ignore when tags are lost
lost_tags_action("none")
get_lost_tags_action()
# reset to default: warning
lost_tags_action()
Create a linelist from a data.frame
Description
This function converts a data.frame
or a tibble
into a linelist
object,
where different types of epidemiologically relevant data are tagged. This
includes dates of different events (e.g. onset of symptoms, case reporting),
information on the patient (e.g. age, gender, location) as well as other
information such as the type of case (e.g. confirmed, probable) or the
outcome of the disease. The output will seem to be the same data.frame
, but
linelist
-aware packages will then be able to automatically use tagged
fields for further data cleaning and analysis.
Usage
make_linelist(x, ..., allow_extra = FALSE)
Arguments
x |
a |
... |
< |
allow_extra |
a |
Details
Known variable types include:
-
id
: a unique case identifier asnumeric
orcharacter
-
date_onset
: date of symptom onset (see below for date formats) -
date_reporting
: date of case notification (see below for date formats) -
date_admission
: date of hospital admission (see below for date formats) -
date_discharge
: date of hospital discharge (see below for date formats) -
date_outcome
: date of disease outcome (see below for date formats) -
date_death
: date of death (see below for date formats) -
gender
: afactor
orcharacter
indicating the gender of the patient -
age
: anumeric
indicating the age of the patient, in years -
location
: afactor
orcharacter
indicating the location of the patient -
occupation
: afactor
orcharacter
indicating the professional activity of the patient -
hcw
: alogical
indicating if the patient is a health care worker -
outcome
: afactor
orcharacter
indicating the outcome of the disease (death or survival)
Dates can be provided in the following formats/types:
-
Date
objects (e.g. usingas.Date
on acharacter
with a correct date format); this is the recommended format -
POSIXct/POSIXlt
objects (when a finer scale than days is needed) -
numeric
values, typically indicating the number of days since the first case
Value
The function returns a linelist
object.
See Also
An overview of the linelist package
-
tags_names()
: for a list of known tag names -
tags_types()
: for the associated accepted types/classes -
tags()
: for a list of tagged variables in alinelist
-
set_tags()
: for modifying tags -
tags_df()
: for selecting variables by tags
Examples
if (require(outbreaks)) {
## dataset we will convert to linelist
head(measles_hagelloch_1861)
## create linelist
x <- make_linelist(measles_hagelloch_1861,
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
## print result - just first few entries
head(x)
## check tags
tags(x)
## Tags can also be passed as a list with the splice operator (!!!)
my_tags <- list(
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
new_x <- make_linelist(measles_hagelloch_1861, !!!my_tags)
## The output is strictly equivalent to the previous one
identical(x, new_x)
}
Rename columns of a linelist
Description
This function can be used to rename the columns a linelist
, adjusting tags
as needed.
Usage
## S3 replacement method for class 'linelist'
names(x) <- value
Arguments
x |
a |
value |
a |
Value
a linelist
with new column names
Examples
if (require(outbreaks)) {
## dataset to create a linelist from
measles_hagelloch_1861
## create linelist
x <- make_linelist(measles_hagelloch_1861,
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
head(x)
## change names
names(x)[1] <- "case_label"
## see results: tags have been updated
head(x)
tags(x)
# This also works with using `dplyr::rename()` because it uses names<-()
# under hood
if (require(dplyr)) {
x <- x |>
rename(case_id= case_label)
head(x)
tags(x)
}
}
Printing method for linelist objects
Description
This function prints linelist objects.
Usage
## S3 method for class 'linelist'
print(x, ...)
Arguments
x |
a |
... |
further arguments to be passed to 'print' |
Value
Invisibly returns the object.
Examples
if (require(outbreaks)) {
## dataset we'll create a linelist from
measles_hagelloch_1861
## create linelist
x <- make_linelist(measles_hagelloch_1861,
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
## print object - using only the first few entries
head(x)
# version with a tibble
if (require(tibble)) {
measles_hagelloch_1861 |>
tibble() |>
make_linelist(
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
}
}
Changes tags of a linelist object
Description
This function changes the tags
of a linelist
object, using the same
syntax as the constructor make_linelist()
. If some of the default tags are
missing, they will be added to the final object.
Usage
set_tags(x, ..., allow_extra = FALSE)
Arguments
x |
a |
... |
< |
allow_extra |
a |
Value
The function returns a linelist
object.
See Also
make_linelist()
to create a linelist
object
Examples
if (require(outbreaks)) {
## create a linelist
x <- make_linelist(measles_hagelloch_1861, date_onset = "date_of_rash")
tags(x)
## add new tags and fix an existing one
x <- set_tags(x,
age = "age",
gender = "gender",
date_onset = "date_of_prodrome"
)
tags(x)
## add non-default tags using allow_extra
x <- set_tags(x, severe = "complications", allow_extra = TRUE)
tags(x)
## remove tags by setting them to NULL
old_tags <- tags(x)
x <- set_tags(x, age = NULL, gender = NULL)
tags(x)
## setting tags providing a list (used to restore old tags here)
x <- set_tags(x, !!!old_tags)
tags(x)
}
Get the list of tags in a linelist
Description
This function returns the list of tags identifying specific variable types in
a linelist
.
Usage
tags(x, show_null = FALSE)
Arguments
x |
a |
show_null |
a |
Details
Tags are stored as the tags
attribute of the object.
Value
The function returns a named list
where names indicate generic
types of data, and values indicate which column they correspond to.
Examples
if (require(outbreaks)) {
## make a linelist
x <- make_linelist(measles_hagelloch_1861, date_onset = "date_of_prodrome")
## check non-null tags
tags(x)
## get a list of all tags, including NULL ones
tags(x, TRUE)
}
Generate default tags for a linelist
Description
This function returns a named list providing the default tags for a
linelist
object (all default to NULL).
Usage
tags_defaults()
Value
A named list
.
Examples
tags_defaults()
Extract a data.frame of all tagged variables
Description
This function returns a data.frame
of all the tagged variables stored in a
linelist
. Note that the output is no longer a linelist
, but a regular
data.frame
.
Usage
tags_df(x)
Arguments
x |
a |
Value
A data.frame
of tagged variables.
Examples
if (require(outbreaks)) {
## create a linelist
x <- measles_hagelloch_1861 |>
make_linelist(
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
x
## get a data.frame of all tagged variables
tags_df(x)
}
Get the list of tag names used in linelist
Description
This function returns the a character
of all tag names used to designate
specific variable types in a linelist
.
Usage
tags_names()
Value
The function returns a character
vector.
See Also
tags_defaults()
for a list
of default values of the tags
Examples
tags_names()
List acceptable variable types for tags
Description
This function returns a named list providing the acceptable data types for the default tags. If no argument is provided, it returns default values. Otherwise, provided values will be used to define the defaults.
Usage
tags_types(..., allow_extra = FALSE)
Arguments
... |
< |
allow_extra |
a |
Value
A named list
.
See Also
-
tags_defaults()
for the default tags -
validate_types()
usestags_types()
for validating tags -
validate_linelist()
usestags_types()
for validating tags
Examples
# list default values
tags_types()
# change existing values
tags_types(date_onset = "Date") # impose a Date class
# add new types e.g. to allow genetic sequences using ape's format
tags_types(sequence = "DNAbin", allow_extra = TRUE)
Checks the content of a linelist object
Description
This function evaluates the validity of a linelist
object by checking the
object class, its tags, and the types of the tagged variables. It combines
validations checks made by validate_types()
and validate_tags()
. See
'Details' section for more information on the checks performed.
Usage
validate_linelist(x, allow_extra = FALSE, ref_types = tags_types())
Arguments
x |
a |
allow_extra |
a |
ref_types |
a |
Details
The following checks are performed:
-
x
is alinelist
object -
x
has a well-formedtags
attribute all default tags are present (even if
NULL
)all tagged variables correspond to existing columns
all tagged variables have an acceptable class
(optional)
x
has no extra tag beyond the default tags
Value
If checks pass, a linelist
object (invisibly); otherwise issues an
error.
See Also
-
tags_types()
to change allowed types -
validate_types()
to check if tagged variables have the right classes -
validate_tags()
to perform a series of checks on the tags
Examples
if (require(outbreaks)) {
## create a valid linelist
x <- measles_hagelloch_1861 |>
make_linelist(
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
x
## validation
validate_linelist(x)
## create an invalid linelist - onset date is a factor
x <- measles_hagelloch_1861 |>
make_linelist(
id = "case_ID",
date_onset = "gender",
age = "age"
)
x
## the below issues an error
## note: tryCatch is only used to avoid a genuine error in the example
tryCatch(validate_linelist(x), error = paste)
}
Checks the tags of a linelist object
Description
This function evaluates the validity of the tags of a linelist
object by
checking that: i) tags are present ii) tags is a list
of character
iii)
that all default tags are present iv) tagged variables exist v) that no extra
tag exists (if allow_extra
is FALSE
).
Usage
validate_tags(x, allow_extra = FALSE)
Arguments
x |
a |
allow_extra |
a |
Value
If checks pass, a linelist
object; otherwise issues an error.
See Also
validate_types()
to check if tagged variables have
the right classes
Examples
if (require(outbreaks)) {
## create a valid linelist
x <- measles_hagelloch_1861 |>
make_linelist(
id = "case_ID",
date_onset = "date_of_prodrome",
age = "age",
gender = "gender"
)
x
## validation
validate_tags(x)
## hack to create an invalid tags (missing defaults)
attr(x, "tags") <- list(id = "case_ID")
## the below issues an error
## note: tryCatch is only used to avoid a genuine error in the example
tryCatch(validate_tags(x), error = paste)
}
Check tagged variables are the right class
Description
This function checks the class of each tagged variable in a linelist
against pre-defined accepted classes in tags_types()
.
Usage
validate_types(x, ref_types = tags_types())
Arguments
x |
a |
ref_types |
a |
Value
A named list
.
See Also
-
tags_types()
to change allowed types -
validate_tags()
to perform a series of checks on the tags -
validate_linelist()
to combinevalidate_tags
andvalidate_types
Examples
if (require(outbreaks)) {
## create an invalid linelist - gender is a numeric
x <- measles_hagelloch_1861 |>
make_linelist(
id = "case_ID",
gender = "infector"
)
x
## the below would issue an error
## note: tryCatch is only used to avoid a genuine error in the example
tryCatch(validate_types(x), error = paste)
## to allow other types, e.g. gender to be integer, character or factor
validate_types(x, tags_types(gender = c("integer", "character", "factor")))
}