Type: | Package |
Title: | REST API Client for the 'PGS' Catalog |
Version: | 0.1.10 |
Description: | Programmatic access to the 'PGS' Catalog. This package provides easy access to 'PGS' Catalog data by accessing the REST API https://www.pgscatalog.org/rest/. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Suggests: | testthat, knitr, rmarkdown, ggplot2 |
Imports: | stringr, vroom, purrr, glue, dplyr, tidyjson, tibble, lubridate, rlang, tidyr, httr, utils, rvest, progress, methods, writexl, memoise, readr |
Collate: | 'ancestry_categories.R' 'cc.R' 'class-cohorts.R' 'class-performance_metrics.R' 'class-publications.R' 'class-releases.R' 'class-sample_sets.R' 'class-scores.R' 'class-trait_categories.R' 'class-traits.R' 'clear_cache.R' 'contains_question_mark.R' 'count.R' 'delay.R' 'drop_metadata_cols.R' 'first_non_na.R' 'generics.R' 'get.R' 'get_ancestry_categories.R' 'get_cohorts.R' 'get_column.R' 'get_performance_metrics.R' 'get_publications.R' 'get_releases.R' 'get_sample_sets.R' 'get_scores.R' 'get_trait_categories.R' 'get_traits.R' 'id_mapping.R' 'is_json_empty.R' 'is_paginated.R' 'is_pgs_id.R' 'is_pubmed_id.R' 'messages.R' 'n_pages.R' 'nr_to_na.R' 'offsets.R' 'open_in_dbsnp.R' 'open_in_pgs_catalog.R' 'open_in_pubmed.R' 'parse-ancestry_categories.R' 'parse-cohorts.R' 'parse-performance_metrics.R' 'parse-publications.R' 'parse-releases.R' 'parse-sample_sets.R' 'parse-scores.R' 'parse-trait_categories.R' 'parse-traits.R' 'parse_estimate.R' 'read_file_column_names.R' 'read_pgs_scoring_file.R' 'relocate_metadata_cols.R' 'remap_id.R' 'request.R' 's4-utils.R' 'stages.R' 'sure.R' 'unwrap_cohort.R' 'unwrap_demographics.R' 'unwrap_efotrait.R' 'unwrap_interval.R' 'unwrap_publication.R' 'unwrap_sample.R' 'warnings.R' 'write_xlsx.R' |
Depends: | R (≥ 4.1.0) |
URL: | https://github.com/ramiromagno/quincunx, https://rmagno.eu/quincunx/ |
BugReports: | https://github.com/ramiromagno/quincunx/issues |
Config/Needs/website: | patterninstitute/chic |
NeedsCompilation: | no |
Packaged: | 2025-05-31 16:59:44 UTC; rmagno |
Author: | Ramiro Magno |
Maintainer: | Ramiro Magno <rmagno@pattern.institute> |
Repository: | CRAN |
Date/Publication: | 2025-05-31 17:10:02 UTC |
Ancestry categories and classes
Description
A dataset containing the ancestry categories defined in NHGRI-EBI GWAS
Catalog framework (Table 1, doi:10.1186/s13059-018-1396-2). Ancestry
categories are assigned to samples with distinct and well-defined patterns of
genetic variation. You will find these categories in the variable
ancestry_category
of the following objects: scores,
performance_metrics and sample_sets. Ancestry
categories (ancestry_category
) are further clustered into ancestry
classes (ancestry_class
).
Usage
ancestry_categories
Format
A data frame with 19 ancestry categories (rows) and 6 columns:
- ancestry_category
Ancestry category.
- ancestry_class
To reduce the complexity associated with the many ancestry categories, some have been merged into higher-level groupings (
ancestry_class
). These groupings represent the current breadth of data in the PGS Catalog and are likely to change as more data is added.- ancestry_class_symbol
3-letter code for the
ancestry_class
e.g."EUR"
or"MAE"
.- ancestry_class_colour
Hexadecimal colour code associated with ancestry groupings (
ancestry_class
). This can be useful when visually communicating about ancestries.- definition
Description of the ancestry category.
- examples
Examples of detailed descriptions of sample ancestries included in the category.
Source
- Table 1 of Moralles et al. (2018):
- PGS Catalog Ancestry Documentation:
Examples
ancestry_categories
Bind PGS Catalog objects
Description
Binds together PGS Catalog objects of the same class. Note that
bind()
preserves duplicates whereas
union
does not.
Usage
bind(x, ...)
Arguments
x |
An object of either class scores, publications, traits, performance_metrics, sample_sets, cohorts or trait_categories. |
... |
Objects of the same class as |
Value
An object of the same class as x
.
Examples
# Get some `scores` objects:
my_scores_1 <- get_scores(c('PGS000012', 'PGS000013'))
my_scores_2 <- get_scores(c('PGS000013', 'PGS000014'))
# NB: with `bind()`, PGS000013 is repeated (as opposed to `union()`)
bind(my_scores_1, my_scores_2)@scores
Clear quincunx cache of memoised functions
Description
quincunx uses memoised functions for the REST API calls. Use this function to reset the cache.
Usage
clear_cache()
Value
Returns a logical value, indicating whether the resetting of the
cache was successful (TRUE
) or not FALSE
.
Examples
clear_cache()
Constructor for the S4 cohorts object.
Description
Constructor for the S4 cohorts object.
Usage
cohorts(cohorts = s4cohorts_cohorts_tbl(), pgs_ids = s4cohorts_pgs_ids_tbl())
Arguments
cohorts |
A |
pgs_ids |
A |
Value
An object of class cohorts.
An S4 class to represent a set of cohorts
Description
The cohorts object consists of two tables (slots) that combined form a
relational database of a subset of cohorts. Each cohort is an observation
(row) in the cohorts
table (first table).
Slots
cohorts
A table of cohorts. Each cohort (row) is identified by its
cohort_symbol
. Columns:- cohort_symbol
Cohort symbol. Example:
"CECILE"
.- cohort_name
Cohort full name. Example:
"CECILE Breast Cancer Study"
.
pgs_ids
A table of cohorts and their associated polygenic scores identifiers. Columns:
- cohort_symbol
Cohort symbol. Example:
"CECILE"
.- pgs_id
Polygenic Score (PGS) identifier.
- stage
Sample stage: either
"gwas/dev"
or"eval"
.
Collect development and evaluation pgs_ids
Description
Collect development and evaluation pgs_ids.
Usage
collect_pgs_ids(tbl_json)
Collect samples_variants and samples_training
Description
Collect samples_variants and samples_training and reference them by a unique id (sample_id).
Usage
collect_samples(tbl_json)
Does a string contain a question mark?
Description
Find which strings contain a question mark. This function uses the following
regular expression: [\?]
.
Usage
contains_question_mark(str, convert_NA_to_FALSE = TRUE)
Arguments
str |
A character vector of strings. |
convert_NA_to_FALSE |
Whether to treat |
Value
A logical vector.
Extract the count field from a JSON response
Description
This function takes a string with a JSON response and returns the value of
the count field. If it fails to match the pattern then it returns
NA_integer_
.
Usage
count(json_string)
Arguments
json_string |
a string. |
Value
An integer value.
Filter PGS Catalog objects by identifier.
Description
Use filter_by_id
to filter PGS Catalog objects by their respective
identifier (id
).
Usage
filter_by_id(x, id)
Arguments
x |
An object of class eitherscores, publications, traits, performance_metrics, or sample_sets. |
id |
Identifier. |
Value
Returns an object of class either scores, publications, traits, performance_metrics, or sample_sets.
Returns the position of the first non-NA value
Description
Returns the position of the first non-NA value
Usage
first_non_na(x)
Arguments
x |
An atomic vector. |
Get ancestry categories and classes
Description
Retrieves ancestry categories and classes. This function simply returns the
object ancestry_categories
.
Usage
get_ancestry_categories()
Value
A tibble with ancestry categories, classes and associated
information. See ancestry_categories
for details about each
column.
Examples
get_ancestry_categories()
Get PGS Catalog Ancestry Symbol Mappings
Description
Retrieves the mappings between the ancestry class symbols and ancestry class
via the PGS Catalog REST API. Note: this function is not exported and should
only be used for debugging reasons. Use in alternative
get_ancestry_categories
.
Usage
get_ancestry_symbol_mappings(
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
Value
Return a tibble of mappings between the ancestry symbols and their name, e.g. EUR and European, respectively.
Get PGS Catalog Cohorts
Description
Retrieves cohorts via the PGS Catalog REST API. Please note that all
cohort_symbol
is vectorised, thus allowing for batch mode search.
Usage
get_cohorts(
cohort_symbol = NULL,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
cohort_symbol |
A cohort symbol or |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
Value
A cohorts object.
Examples
# Get information about specific cohorts by their symbols (acronyms)
get_cohorts(cohort_symbol = c('23andMe', 'IPOBCS'))
# Get info on all cohorts (may take a few minutes to download)
## Not run:
get_cohorts()
## End(Not run)
Get PGS Catalog Performance Metrics
Description
Retrieves performance metrics via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all performance metrics that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
Usage
get_performance_metrics(
ppm_id = NULL,
pgs_id = NULL,
set_operation = "union",
interactive = TRUE,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
ppm_id |
A character vector of PGS Catalog performance metrics accession identifiers. |
pgs_id |
A |
set_operation |
Either |
interactive |
A logical. If all performance metrics are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Details
Please note that all search criteria are vectorised, thus allowing for batch mode search.
Value
A performance_metrics object.
Examples
## Not run:
# Get performance metrics catalogued with identifier 'PPM000001'
get_performance_metrics(ppm_id = 'PPM000001')
# Get performance metrics associated with polygenic score id 'PGS000001'
get_performance_metrics(pgs_id = 'PGS000001')
# To retrieve all catalogued performed metrics in PGS Catalog you simply
# leave the parameters `ppm_id` and `pgs_id` as `NULL`.
get_performance_metrics()
## End(Not run)
Get PGS Catalog Publications
Description
Retrieves PGS publications via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all publications that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
Usage
get_publications(
pgp_id = NULL,
pgs_id = NULL,
pubmed_id = NULL,
author = NULL,
set_operation = "union",
interactive = TRUE,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pgp_id |
A character vector of PGS Catalog publication accession identifiers. |
pgs_id |
A |
pubmed_id |
An |
author |
A character vector of author names, any author in the list of
authors in a publication, .e.g. |
set_operation |
Either |
interactive |
A logical. If all publications are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Details
Please note that all search criteria are vectorised, thus allowing for batch
mode search. For more details see the help vignette:
vignette("getting-pgs-publications", package = "quincunx")
.
Value
A publications object.
Examples
## Not run:
# Get PGS publications by their identifier
get_publications(pgp_id = c('PGP000001', 'PGP000002'))
# By polygenic score identifier
get_publications(pgs_id = 'PGS000003')
# By PubMed identifier
get_publications(pubmed_id = '30554720')
# By author's last name
get_publications(author = 'Natarajan')
## End(Not run)
Get PGS Catalog Releases
Description
This function retrieves PGS Catalog release information. Note that the
columns pgs_id
, ppm_id
and pgp_id
contain in each
element a vector. These columns can be unnested using
unnest_longer
(see Examples).
Usage
get_releases(
date = "latest",
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
date |
One or more dates formatted as |
verbose |
Whether to print information about the underlying requests to the REST API server. |
warnings |
Whether to print warnings about the underlying requests to the REST API server. |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
Value
A data frame where each row is a release. Columns are:
- date
Release date.
- n_pgs
Number of released Polygenic Score (PGS) identifiers (
pgs_id
).- n_ppm
Number of released Performance Metric (PPM) identifiers (
ppm_id
).- n_pgp
Number of released PGS Catalog Publication (PGP) identifiers (
pgp_id
).- pgs_id
Released Polygenic Score (PGS) identifiers.
- ppm_id
Released Performance Metric (PPM) identifiers.
- pgp_id
Released PGS Catalog Publication (PGP) identifiers.
- notes
News about the release.
Examples
## Not run:
# Get the latest release
get_releases()
get_releases(date = 'latest')
# Get all releases
get_releases(date = 'all')
# Get a specific release by date
get_releases(date = '2020-08-19')
## End(Not run)
Get PGS Catalog Sample Sets
Description
Retrieves sample sets via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all sample sets that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
Usage
get_sample_sets(
pss_id = NULL,
pgs_id = NULL,
set_operation = "union",
interactive = TRUE,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pss_id |
A character vector of PGS Catalog sample sets accession identifiers. |
pgs_id |
A |
set_operation |
Either |
interactive |
A logical. If all sample sets are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
Details
Please note that all search criteria are vectorised, thus allowing for batch mode search.
Value
A sample_sets object.
Examples
## Not run:
# Search by PGS identifier
get_sample_sets(pgs_id = 'PGS000013')
# Search by the PSS identifier
get_sample_sets(pss_id = 'PSS000068')
## End(Not run)
Get PGS Catalog Scores
Description
Retrieves polygenic scores via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all scores that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
associations that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
Usage
get_scores(
pgs_id = NULL,
efo_id = NULL,
pubmed_id = NULL,
set_operation = "union",
interactive = TRUE,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pgs_id |
A |
efo_id |
A character vector of EFO identifiers. |
pubmed_id |
An |
set_operation |
Either |
interactive |
A logical. If all scores are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Details
Please note that all search criteria are vectorised, thus allowing for batch mode search.
Value
A scores object.
Examples
## Not run:
# By `pgs_id`
get_scores(pgs_id = 'PGS000088')
# By `efo_id`
get_scores(efo_id = 'EFO_0007992')
# By `pubmed_id`
get_scores(pubmed_id = '25748612')
## End(Not run)
Get PGS Catalog Trait Categories
Description
Retrieves all trait categories via the PGS Catalog REST API.
Usage
get_trait_categories(verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
Value
A trait_categories object.
Examples
get_trait_categories(progress_bar = FALSE)
Get PGS Catalog Traits
Description
Retrieves traits via the PGS Catalog REST API. The REST
API is queried multiple times with the criteria passed as arguments (see
below). By default all traits that match the criteria supplied in the
arguments are retrieved: this corresponds to the default option
set_operation
set to 'union'
. If you rather have only the
traits that match simultaneously all criteria provided, then set
set_operation
to 'intersection'
.
Usage
get_traits(
efo_id = NULL,
trait_term = NULL,
exact_term = TRUE,
include_children = FALSE,
set_operation = "union",
interactive = TRUE,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
efo_id |
A character vector of EFO identifiers. |
trait_term |
A character vector of terms to be matched against trait
identifiers ( |
exact_term |
A logical value, indicating whether to match the
|
include_children |
A logical value, indicating whether to include child traits or not. |
set_operation |
Either |
interactive |
A logical. If all traits are requested, whether to ask interactively if we really want to proceed. |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar indicating download progress from the REST API server. |
Details
Please note that all search criteria are vectorised, thus allowing for batch mode search.
Value
A traits object.
Examples
## Not run:
# Get a trait by its EFO identifier
get_traits(efo_id = 'EFO_0004631')
# Get a trait by matching a term in EFO identifier (`efo_id`), label,
# description synonyms, categories, or external mapped terms
get_traits(trait_term = 'stroke', exact_term = FALSE)
# Get a trait matching its name (`trait`) exactly (default)
get_traits(trait_term = 'stroke', exact_term = TRUE)
# Get traits, excluding its children traits (default)
get_traits(trait_term = 'breast cancer')
# Get traits, including its children traits (check column `is_child` for
# child traits)
get_traits(trait_term = 'breast cancer', include_children = TRUE)
## End(Not run)
Is the response paginated?
Description
Checks if the response is paginated by checking if a count element exists in the response.
Usage
is_paginated(json_string)
Arguments
json_string |
a string. |
Value
A logical value.
Is a string a PGS Catalog identifier?
Description
Find which strings are valid PGS Catalog identifiers (returns TRUE
).
Association IDs are tested against the following regular expression:
^PGS\\d{6}$
.
Usage
is_pgs_id(str, convert_NA_to_FALSE = TRUE)
Arguments
str |
A character vector of strings. |
convert_NA_to_FALSE |
Whether to treat |
Value
A logical vector.
Is a string a PubMed ID?
Description
Find which strings are valid PubMed IDs (returns TRUE
). PubMed IDs are
tested against the following regular expression: ^\\d+$
.
Usage
is_pubmed_id(str, convert_NA_to_FALSE = TRUE)
Arguments
str |
A character vector of strings. |
convert_NA_to_FALSE |
Whether to treat |
Value
A logical vector.
Convert a named list to an S4 object
Description
Convert a named list to an S4 object
Usage
list_to_s4(list, class)
Arguments
list |
list |
class |
character vector indicating the S4 class |
Value
S4 object of class class
.
Pretty printing of 404 error
Description
Pretty printing of 404 error.
Usage
msg_400(response)
Arguments
response |
A |
Value
This function is run for its side effect: printing.
Pretty printing of 404 error
Description
Pretty printing of 404 error.
Usage
msg_404(response)
Arguments
response |
A |
Value
This function is run for its side effect: printing.
Pretty printing of empty response
Description
Pretty printing of empty response
Usage
msg_empty(response)
Arguments
response |
A |
Value
This function is run for its side effect: printing.
Number of PGS Catalog entities
Description
This function returns the number of entities in a PGS Catalog object. To
avoid ambiguity with dplyr::n()
use quincunx::n()
.
Usage
n(x, unique = FALSE)
## S4 method for signature 'scores'
n(x, unique = FALSE)
## S4 method for signature 'publications'
n(x, unique = FALSE)
## S4 method for signature 'traits'
n(x, unique = FALSE)
## S4 method for signature 'performance_metrics'
n(x, unique = FALSE)
## S4 method for signature 'sample_sets'
n(x, unique = FALSE)
## S4 method for signature 'cohorts'
n(x, unique = FALSE)
## S4 method for signature 'trait_categories'
n(x, unique = FALSE)
## S4 method for signature 'releases'
n(x, unique = FALSE)
Arguments
x |
A scores, publications, traits, performance_metrics, sample_sets, cohorts, trait_categories or releases object. |
unique |
Whether to count only unique entries ( |
Value
An integer scalar.
Examples
# Return the number of polygenic scores in a scores object:
my_scores <- get_scores(pgs_id = c('PGS000007', 'PGS000007', 'PGS000042'))
n(my_scores)
# If you want to count unique scores only, then use the `unique` parameter:
n(my_scores, unique = TRUE)
# Total number of curated publications in the PGS Catalog:
all_pub <- get_publications(interactive = FALSE, progress_bar = FALSE)
n(all_pub)
# Total number of curated traits in the PGS Catalog:
all_traits <- get_traits(interactive = FALSE, progress_bar = FALSE)
n(all_traits)
Number of pages
Description
Determine the number of pages to be requested from the total number of
results (count
) and the number of results per page (limit
). The
wording used here — count
and limit
— is borrowed from the
PGS Catalog REST API documentation.
Usage
n_pages(count, limit = 50L)
Arguments
count |
total number of results. |
limit |
number of results per page. |
Value
The number of pages, an integer value.
Pretty printing of no error message
Description
Pretty printing of no error message
Usage
no_msg(response)
Arguments
response |
A |
Value
This function is run for its side effect: printing.
Convert NR (Not Recorded) to NA (Not Available)
Description
This function converts the 'NR'
string to NA
.
Usage
nr_to_na(x)
Arguments
x |
a character vector. |
Value
a character vector whose 'NR'
have been replaced with
NA
.
Generate offset values
Description
Generate offset values to be passed to the PGS Catalog REST API endpoints. The offset parameter, together with the limit parameter, allow to access to a desired range of results. The offset parameter specifies the starting index of the range of results desired. It is a zero based index.
Usage
offsets(count, limit)
Arguments
count |
total number of results. |
limit |
number of results per page. |
Value
Offset values, an integer vector.
Browse dbSNP from SNP identifiers.
Description
This function launches the web browser at dbSNP and opens a tab for each SNP identifier.
Usage
open_in_dbsnp(variant_id)
Arguments
variant_id |
A variant identifier, a character vector. |
Value
Returns TRUE
if successful. Note however that this
function is run for its side effect.
Examples
open_in_dbsnp('rs56261590')
Browse PGS Catalog entities from the PGS Catalog Web Graphical User Interface
Description
This function launches the web browser and opens a tab for each identifier on the PGS Catalog web graphical user interface: https://www.pgscatalog.org/.
Usage
open_in_pgs_catalog(
identifier = NULL,
pgs_catalog_entity = c("pgs", "pgp", "pss", "efo")
)
Arguments
identifier |
A vector of identifiers. The identifiers can be: PGS, PGP, PSS or EFO identifiers. |
pgs_catalog_entity |
Either |
Value
Returns TRUE
if successful, or FALSE
otherwise. But
note that this function is run for its side effect.
Examples
# Open in PGS scores Catalog Web Graphical User Interface
open_in_pgs_catalog(c('PGS000001', 'PGS000002'))
# Open PGS Catalog Publications
open_in_pgs_catalog(c('PGP000001', 'PGP000002'),
pgs_catalog_entity = 'pgp')
# Open Sample Sets (PSS)
open_in_pgs_catalog(c('PSS000001', 'PSS000002'),
pgs_catalog_entity = 'pss')
# Open EFO traits (EFO)
open_in_pgs_catalog(c('EFO_0001645', 'MONDO_0007254'),
pgs_catalog_entity = 'efo')
Browse PubMed from PubMed identifiers.
Description
This function launches the web browser and opens a tab for each PubMed citation.
Usage
open_in_pubmed(pubmed_id)
Arguments
pubmed_id |
A PubMed identifier, either a character or an integer vector. |
Value
Returns TRUE
if successful. Note however that this
function is run for its side effect.
Examples
open_in_pubmed(c('26301688', '30595370'))
Constructor for the S4 performance_metrics object.
Description
Constructor for the S4 performance_metrics object.
Usage
performance_metrics(
performance_metrics = s4ppm_performance_metrics_tbl(),
publications = s4ppm_publications_tbl(),
sample_sets = s4ppm_sample_sets_tbl(),
samples = s4ppm_samples_tbl(),
demographics = s4ppm_demographics_tbl(),
cohorts = s4ppm_pgs_cohorts_tbl(),
pgs_effect_sizes = s4ppm_pgs_effect_sizes_tbl(),
pgs_classification_metrics = s4ppm_pgs_classification_metrics_tbl(),
pgs_other_metrics = s4ppm_pgs_other_metrics_tbl()
)
Arguments
performance_metrics |
A |
publications |
A |
sample_sets |
A |
samples |
A |
demographics |
A |
cohorts |
A |
pgs_effect_sizes |
A |
pgs_classification_metrics |
A |
pgs_other_metrics |
A |
Value
An object of class performance_metrics.
An S4 class to represent a set of PGS Catalog Performance Metrics
Description
The performance_metrics object consists of nine tables (slots) that combined
form a relational database of a subset of performance metrics. Each
performance metric is an observation (row) in the scores
table (first
table).
Slots
performance_metrics
A table of PGS Performance Metrics (PPM). Each PPM (row) is uniquely identified by the
ppm_id
column. Columns:- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- pgs_id
Polygenic Score (PGS) identifier.
- reported_trait
The author-reported trait that the PGS has been developed to predict. Example:
"Breast Cancer"
.- covariates
Comma-separated list of covariates used in the prediction model to evaluate the PGS.
- comments
Any other information relevant to the understanding of the performance metrics.
publications
A table of publications. Each publication (row) is uniquely identified by the column
pgp_id
. Columns:- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- pgp_id
PGS Publication identifier. Example:
"PGP000001"
.- pubmed_id
PubMed identifier. Example:
"25855707"
.- publication_date
Publication date. Example:
"2020-09-28"
. Note that the class ofpublication_date
isDate
.- publication
Abbreviated name of the journal. Example:
"Am J Hum Genet"
.- title
Publication title.
- author_fullname
First author of the publication. Example:
'Mavaddat N'
.- doi
Digital Object Identifier (DOI). This variable is also curated to allow unpublished work (e.g. preprints) to be added to the catalog. Example:
"10.1093/jnci/djv036"
.
sample_sets
A table of sample sets. Each sample set (row) is uniquely identified by the column
pss_id
. Columns:- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.
samples
A table of samples. Each sample (row) is uniquely identified by the combination of values from the columns:
ppm_id
,pss_id
, andsample_id
. Columns:- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.- sample_id
Sample identifier. This is a surrogate key to identify each sample.
- stage
Sample stage: should be always Evaluation (
"eval"
).- sample_size
Number of individuals included in the sample.
- sample_cases
Number of cases.
- sample_controls
Number of controls.
- sample_percent_male
Percentage of male participants.
- phenotype_description
Detailed phenotype description.
- ancestry_category
Author reported ancestry is mapped to the best matching ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.- ancestry
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
- country
Author reported countries of recruitment (if available).
- ancestry_additional_description
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
- study_id
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.- pubmed_id
PubMed identifier.
- cohorts_additional_description
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each demographics' variable (row) is uniquely identified by the combination of values from the columns:
ppm_id
,pss_id
,sample_id
, andvariable
. Columns:- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.- sample_id
Sample identifier. This is a surrogate identifier to identify each sample.
- variable
Demographics variable. Following columns report about the indicated variable.
- estimate_type
Type of statistical estimate for variable.
- estimate
The variable's statistical value.
- unit
Unit of the variable.
- variability_type
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
- variability
The value of the measure of dispersion.
- interval_type
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
- interval_lower
Interval lower bound.
- interval_upper
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by the combination of values from the columns:
ppm_id
,sample_id
andcohort_symbol
. Columns:- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- sample_id
Sample identifier. This is a surrogate key to identify each sample.
- cohort_symbol
Cohort symbol.
- cohort_name
Cohort full name.
pgs_effect_sizes
A table of effect sizes per standard deviation change in PGS. Examples include regression coefficients (betas) for continuous traits, odds ratios (OR) and/or hazard ratios (HR) for dichotomous traits depending on the availability of time-to-event data. Each effect size is uniquely identified by the combination of values from the columns:
ppm_id
andeffect_size_id
. Columns:- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- effect_size_id
Effect size identifier. This is a surrogate identifier to identify each effect size.
- estimate_type_long
Long notation of the effect size (e.g. Odds Ratio).
- estimate_type
Short notation of the effect size (e.g. OR).
- estimate
The estimate's value.
- unit
Unit of the estimate.
- variability_type
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
- variability
The value of the measure of dispersion.
- interval_type
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
- interval_lower
Interval lower bound.
- interval_upper
Interval upper bound.
pgs_classification_metrics
A table of classification metrics. Examples include the Area under the Receiver Operating Characteristic (AUROC) or Harrell's C-index (Concordance statistic). Columns:
- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- classification_metrics_id
Classification metric identifier. This is a surrogate identifier to identify each classification metric.
- estimate_type_long
Long notation of the classification metric (e.g. Concordance Statistic).
- estimate_type
Short notation classification metric (e.g. C-index).
- estimate
The estimate's value.
- unit
Unit of the estimate.
- variability_type
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
- variability
The value of the measure of dispersion.
- interval_type
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
- interval_lower
Interval lower bound.
- interval_upper
Interval upper bound.
pgs_other_metrics
A table of other metrics that are neither effect sizes nor classification metrics. Examples include: R² (proportion of the variance explained), or reclassification metrics. Columns:
- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.- other_metrics_id
Other metric identifier. This is a surrogate identifier to identify each metric.
- estimate_type_long
Long notation of the metric. Example: "Proportion of the variance explained".
- estimate_type
Short notation metric. Example: "R²".
- estimate
The estimate's value.
- unit
Unit of the estimate.
- variability_type
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
- variability
The value of the measure of dispersion.
- interval_type
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
- interval_lower
Interval lower bound.
- interval_upper
Interval upper bound.
Map PGP identifiers to PGS identifiers
Description
Map PGP identifiers to PGS identifiers.
Usage
pgp_to_pgs(
pgp_id = NULL,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pgp_id |
A character vector of PGS Catalog Publication identifiers,
e.g., "PGP000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pgp_id
and pgs_id
.
Examples
## Not run:
pgp_to_pgs('PGP000001')
pgp_to_pgs(c('PGP000017', 'PGP000042'))
## End(Not run)
Map PGP identifiers to PPM identifiers
Description
Map PGP identifiers to PPM identifiers.
Usage
pgp_to_ppm(
pgp_id = NULL,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pgp_id |
A character vector of PGS Catalog Publication identifiers,
e.g., "PGP000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pgp_id
and ppm_id
.
Examples
## Not run:
pgp_to_ppm('PGP000001')
pgp_to_ppm(c('PGP000017', 'PGP000042'))
## End(Not run)
Map PGP identifiers to PSS identifiers
Description
Map PGP identifiers to PSS identifiers.
Usage
pgp_to_pss(
pgp_id = NULL,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pgp_id |
A character vector of PGS Catalog Publication identifiers,
e.g., "PGP000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pgp_id
and pss_id
.
Examples
## Not run:
pgp_to_pss('PGP000001')
pgp_to_pss(c('PGP000017', 'PGP000042'))
## End(Not run)
PGS REST API server
Description
PGS REST API server
Usage
pgs_server()
Value
A string containing PGS REST API server URL.
Map PGS identifiers to PGP identifiers
Description
Map PGS identifiers to PGP identifiers.
Usage
pgs_to_pgp(
pgs_id = NULL,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pgs_id |
A character vector of PGS identifiers,
e.g., "PGS000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pgs_id
and pgp_id
.
Examples
## Not run:
pgs_to_pgp('PGS000001')
pgs_to_pgp(c('PGS000017', 'PGS000042'))
## End(Not run)
Map PGS identifiers to PPM identifiers
Description
Map PGS identifiers to PPM identifiers.
Usage
pgs_to_ppm(pgs_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
pgs_id |
A character vector of PGS identifiers, e.g., "PGS000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pgs_id
and ppm_id
.
Examples
## Not run:
pgs_to_ppm('PGS000001')
pgs_to_ppm(c('PGS000017', 'PGS000042'))
## End(Not run)
Map PGS identifiers to PSS identifiers
Description
Map PGS identifiers to PSS identifiers.
Usage
pgs_to_pss(
pgs_id = NULL,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pgs_id |
A character vector of PGS identifiers,
e.g., "PGS000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pgs_id
and pss_id
.
Examples
## Not run:
pgs_to_pss('PGS000001')
pgs_to_pss(c('PGS000017', 'PGS000042'))
## End(Not run)
Map PGS identifiers to GWAS study identifiers
Description
Map PGS identifiers to GWAS study identifiers. Retrieves GWAS study identifiers associated with samples used in the discovery stage of queried PGS identifiers.
Usage
pgs_to_study(
pgs_id = NULL,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
pgs_id |
A character vector of PGS Catalog score accession identifiers.,
e.g., "PGS000001". If |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pgs_id
and study_id
.
Examples
## Not run:
pgs_to_study('PGS000001')
# Unmappable pgs ids will be missing, e.g., PGS000023
pgs_to_study(c('PGS000013', 'PGS000023'))
## End(Not run)
Map PPM identifiers to PGP identifiers
Description
Map PPM identifiers to PGP identifiers.
Usage
ppm_to_pgp(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
ppm_id |
A character vector of PPM identifiers, e.g., "PPM000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: ppm_id
and pgp_id
.
Examples
## Not run:
ppm_to_pgp('PPM000001')
ppm_to_pgp(c('PPM000017', 'PPM000042'))
## End(Not run)
Map PPM identifiers to PGS identifiers
Description
Map PPM identifiers to PGS identifiers.
Usage
ppm_to_pgs(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
ppm_id |
A character vector of PPM identifiers, e.g., "PPPM000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: ppm_id
and pgs_id
.
Examples
## Not run:
ppm_to_pgs('PPM000001')
ppm_to_pgs(c('PPM000017', 'PPM000042'))
## End(Not run)
Map PPM identifiers to PSS identifiers
Description
Map PPM identifiers to PSS identifiers.
Usage
ppm_to_pss(ppm_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
ppm_id |
A character vector of PPM identifiers, e.g., "PPM000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: ppm_id
and pss_id
.
Examples
## Not run:
ppm_to_pss('PPM000001')
ppm_to_pss(c('PPM000017', 'PPM000042'))
## End(Not run)
Map PSS identifiers to PGP identifiers
Description
Map PSS identifiers to PGP identifiers. This is a slow function because it starts by downloading first all Performance Metrics, as this is the linkage between PSS and PGP.
Usage
pss_to_pgp(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
pss_id |
A character vector of PSS identifiers, e.g., "PSS000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pss_id
and pgp_id
.
Examples
## Not run:
pss_to_pgp('PSS000001')
pss_to_pgp(c('PSS000017', 'PSS000042'))
## End(Not run)
Map PSS identifiers to PGS identifiers
Description
Map PSS identifiers to PGS identifiers. This is a slow function because it starts by downloading first all Performance Metrics, as this is the linkage between PSS and PGS.
Usage
pss_to_pgs(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
pss_id |
A character vector of PSS identifiers, e.g., "PSS000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pss_id
and pgs_id
.
Examples
## Not run:
pss_to_pgs('PSS000001')
pss_to_pgs(c('PSS000017', 'PSS000042'))
## End(Not run)
Map PSS identifiers to PPM identifiers
Description
Map PSS identifiers to PPM identifiers. This is a slow function because it starts by downloading first all Performance Metrics.
Usage
pss_to_ppm(pss_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
pss_id |
A character vector of PSS identifiers, e.g., "PSS000001". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: pss_id
and ppm_id
.
Examples
## Not run:
pss_to_ppm('PSS000001')
pss_to_ppm(c('PSS000017', 'PSS000042'))
## End(Not run)
Constructor for the S4 publications object.
Description
Constructor for the S4 publications object.
Usage
publications(publications = publications_tbl(), pgs_ids = pgs_ids_tbl())
Arguments
publications |
A |
pgs_ids |
A |
Value
An object of class publications.
An S4 class to represent a set of PGS Catalog Publications
Description
The publications object consists of two tables (slots), each a table that
combined form a relational database of a subset of PGS Catalog Publications.
Each publication is an observation (row) in the publications
table
(first table).
Slots
publications
A table of publications. Each publication (row) is uniquely identified by the
pgp_id
column. Columns:- pgp_id
PGS Publication identifier. Example:
"PGP000001"
.- pubmed_id
PubMed identifier. Example:
"25855707"
.- publication_date
Publication date. Example:
"2020-09-28"
. Note that the class ofpublication_date
isDate
.- publication
Abbreviated name of the journal. Example:
"Am J Hum Genet"
.- title
Publication title.
- author_fullname
First author of the publication. Example:
'Mavaddat N'
.- doi
Digital Object Identifier (DOI). This variable is also curated to allow unpublished work (e.g. preprints) to be added to the catalog. Example:
"10.1093/jnci/djv036"
.- authors
Concatenated list of all the publication authors.
pgs_ids
A table of publication and associated PGS identifiers. Columns:
- pgp_id
PGS Publication identifier. Example:
"PGP000001"
.- pgs_id
Polygenic Score (PGS) identifier.
- stage
PGS stage: either "gwas/dev" or "eval".
Read a polygenic scoring file
Description
This function imports a PGS scoring file. For more information about the
scoring file schema check vignette("pgs-scoring-file", package =
"quincunx")
.
Usage
read_scoring_file(
source,
harmonized = FALSE,
assembly = c("GRCh38", "GRCh37"),
protocol = "http",
metadata_only = FALSE
)
Arguments
source |
PGS scoring file. This can be specified in three forms: (i) a
PGS identifier, e.g. |
harmonized |
Whether to read an alternative, harmonized version of the
PGS scoring file. This version contains harmonized variant information.
This information is provided in extra columns whose names are prefixed with
|
assembly |
If |
protocol |
Network protocol for communication with the PGS Catalog FTP
server: either |
metadata_only |
Whether to read only the comment block (header) from the scoring file. |
Value
The returned value is a named list. The names are copied from the
arguments passed in source
. Each element of the list contains
another list of two elements: "metadata"
and "data"
. The
"metadata" element contains data parsed from the header of the PGS scoring
file. The "data" element contains a data frame with as many rows as
variants that constitute the PGS score. The columns can vary. There are
mandatory and optional columns. The mandatory columns are those that
identify the variant, effect allele (effect_allele
), and its
respective weight (effect_weight
) in the score. The columns that
identify the variant can either be the rsID
or the combination of
chr_name
and chr_position
. The "data" element will be
NULL
is argument metadata_only
is TRUE
. For more
information about the scoring file schema check
vignette("pgs-scoring-file", package = "quincunx")
.
Examples
## Not run:
# Read a PGS scoring file by PGS ID
# (internally, it translates the PGS ID
# to the corresponding FTP URL)
try(read_scoring_file("PGS000655"))
# Equivalent to `read_scoring_file("PGS000655")`
url <- paste0(
"http://ftp.ebi.ac.uk/",
"pub/databases/spot/pgs/scores/",
"PGS000655/ScoringFiles/",
"PGS000655.txt.gz"
)
read_scoring_file(url)
# Reading from a local file
try(read_scoring_file("~/PGS000655.txt.gz"))
## End(Not run)
Constructor for the S4 releases object.
Description
Constructor for the S4 releases object.
Usage
releases(
releases = s4releases_releases_tbl(),
pgs_ids = s4releases_pgs_ids_tbl(),
ppm_ids = s4releases_ppm_ids_tbl(),
pgp_ids = s4releases_pgp_ids_tbl()
)
Arguments
releases |
A |
pgs_ids |
A |
ppm_ids |
A |
pgp_ids |
A |
Value
An object of class releases.
An S4 class to represent a set of PGS Catalog Releases
Description
The releases object consists of four tables (slots) that combined form a
relational database of a subset of PGS Catalog releases. Each release is an
observation (row) in the releases
table (first table).
Slots
releases
A table of PGS Catalog releases. Each release (row) is uniquely identified by the release date (
date
). Columns:- date
Release date.
- n_pgs
Number of newly released Polygenic Scores.
- n_ppm
Number of newly released PGS Performance Metrics.
- n_pgp
Number of newly released PGS Publications.
pgs_ids
A table of released Polygenic Scores (PGS) identifiers. Columns:
- date
Release date.
- pgs_id
Polygenic Score (PGS) identifier. Example:
"PGS000001"
.
ppm_ids
A table of the released PGS Performance Metrics identifiers. Columns:
- date
Release date.
- ppm_id
A PGS Performance Metrics identifier. Example:
"PPM000001"
.
pgp_ids
A table of the released PGS Publication identifiers. Columns:
- date
Release date.
- pgp_id
PGS Publication identifier. Example:
"PGP000001"
.
Remap identifiers
Description
Remap identifiers
Usage
remap_id(lst_tbls, old, new)
Request a resource from the PGS REST API
Description
Performs a GET
request on the endpoint as specified by
resource_url
.
Usage
request(
resource_url,
base_url = pgs_server(),
user_agent = user_agent_id(),
verbose = FALSE,
warnings = TRUE
)
Arguments
resource_url |
Endpoint URL. The endpoint is internally appended to the
|
base_url |
The PGS REST API base URL. |
user_agent |
User agent. |
verbose |
Whether to be verbose. |
warnings |
Whether to print warnings. |
Value
A named list of four elements:
- resource
The URL endpoint.
- code
- message
A string describing the status of the response obtained:
'OK'
if successful or a description of the error.- json
JSON response as string.
Request a paginated resource from the PGS REST API
Description
Performs a GET request on the specified resource_url
and all its
pages.
Usage
request_all(
resource_url = "/",
base_url = pgs_server(),
limit = 20L,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
resource_url |
Endpoint URL. The endpoint is internally appended to the
|
base_url |
The PGS REST API base URL (one should not need to change its default value). |
limit |
number of results per page. |
verbose |
whether to print information about each API request. |
warnings |
whether to print warnings related to API requests. |
progress_bar |
whether to show a progress bar as the paginated resources are retrieved. |
Value
A list four named elements:
- resource
The URL endpoint.
- code
- message
A string describing the status of the response obtained. It is "OK" if everything went OK or some other string describing the problem.
- json
A list of JSON responses (each response is a string).
Pretty printing of http error
Description
Pretty printing of http error
Usage
request_warning(response, warnings = FALSE)
Arguments
response |
A |
Value
This function is run for its side effect: printing.
Convert an S4 object into a list
Description
Convert an S4 object into a list
Usage
s4_to_list(s4_obj)
Arguments
s4_obj |
an S4 object |
Value
A list version of the S4 object.
Constructor for the S4 sample_sets object.
Description
Constructor for the S4 sample_sets object.
Usage
sample_sets(
sample_sets = s4pss_sample_sets_tbl(),
samples = s4pss_samples_tbl(),
demographics = s4pss_demographics_tbl(),
cohorts = s4pss_pgs_cohorts_tbl()
)
Arguments
sample_sets |
A |
samples |
A |
demographics |
A |
cohorts |
A |
Value
An object of class sample_sets.
An S4 class to represent a set of PGS Catalog Sample Sets
Description
The sample_sets object consists of four tables (slots) that combined form a
relational database of a subset of PGS Catalog sample sets. Each sample set
is an observation (row) in the sample_sets
table (first table).
Slots
sample_sets
A table of sample sets. Each sample set (row) is uniquely identified by the column
pss_id
. Columns:- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.
samples
A table of samples. Each sample (row) is uniquely identified by the combination of values from the columns:
pss_id
andsample_id
. Columns:- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.- sample_id
Sample identifier. This is a surrogate key to identify each sample.
- stage
Sample stage: should be always Evaluation (
"eval"
).- sample_size
Number of individuals included in the sample.
- sample_cases
Number of cases.
- sample_controls
Number of controls.
- sample_percent_male
Percentage of male participants.
- phenotype_description
Detailed phenotype description.
- ancestry_category
Author reported ancestry is mapped to the best matching ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.- ancestry
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
- country
Author reported countries of recruitment (if available).
- ancestry_additional_description
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
- study_id
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.- pubmed_id
PubMed identifier.
- cohorts_additional_description
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each demographics' variable (row) is uniquely identified by the combination of values from the columns:
pss_id
,sample_id
, andvariable
. Columns:- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.- sample_id
Sample identifier. This is a surrogate identifier to identify each sample.
- variable
Demographics variable. Following columns report about the indicated variable.
- estimate_type
Type of statistical estimate for variable.
- estimate
The variable's statistical value.
- unit
Unit of the variable.
- variability_type
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
- variability
The value of the measure of dispersion.
- interval_type
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
- interval_lower
Interval lower bound.
- interval_upper
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by the combination of values from the columns:
pss_id
,sample_id
andcohort_symbol
. Columns:- pss_id
A PGS Sample Set identifier. Example:
"PSS000042"
.- sample_id
Sample identifier. This is a surrogate key to identify each sample.
- cohort_symbol
Cohort symbol.
- cohort_name
Cohort full name.
Constructor for the S4 scores object.
Description
Constructor for the S4 scores object.
Usage
scores(
scores = s4scores_scores_tbl(),
publications = s4scores_publications_tbl(),
samples = s4scores_samples_tbl(),
demographics = s4scores_demographics_tbl(),
cohorts = s4scores_cohorts_tbl(),
traits = s4scores_traits_tbl(),
stages_tally = s4scores_stages_tally_tbl(),
ancestry_frequencies = s4scores_ancestry_frequencies_tbl(),
multi_ancestry_composition = s4scores_multi_ancestry_composition_tbl()
)
Arguments
scores |
A |
publications |
A |
samples |
A |
demographics |
A |
cohorts |
A |
traits |
A |
Value
An object of class scores.
An S4 class to represent a set of PGS Catalog Polygenic Scores
Description
The scores object consists of six tables (slots) that combined form a
relational database of a subset of PGS Catalog polygenic scores. Each score
is an observation (row) in the scores
table (the first table).
Slots
scores
A table of polygenic scores. Each polygenic score (row) is uniquely identified by the
pgs_id
column. Columns:- pgs_id
Polygenic Score (PGS) identifier. Example:
"PGS000001"
.- pgs_name
This may be the name that the authors describe the PGS with in the source publication, or a name that a curator of the PGS Catalog has assigned to identify the score during the curation process (before a PGS identifier has been given). Example:
PRS77_BC
.- scoring_file
URL to the scoring file on the PGS FTP server. Example:
"http://ftp.ebi.ac.uk/pub/databases/spot/pgs/scores/PGS000001/ScoringFiles/PGS000001.txt.gz"
.- matches_publication
Indicate if the PGS data matches the published polygenic score (
TRUE
). If not (FALSE
), the authors have provided an alternative polygenic for the Catalog and some other data, such as performance metrics, may differ from the publication.- reported_trait
The author-reported trait that the PGS has been developed to predict. Example:
"Breast Cancer"
.- trait_additional_description
Any additional description not captured in the other columns. Example:
"Femoral neck BMD (g/cm2)"
.- pgs_method_name
The name or description of the method or computational algorithm used to develop the PGS.
- pgs_method_params
A description of the relevant inputs and parameters relevant to the PGS development method/process.
- n_variants
Number of variants used to calculate the PGS.
- n_variants_interactions
Number of higher-order variant interactions included in the PGS.
- assembly
The version of the genome assembly that the variants present in the PGS are associated with. Example:
GRCh37
.- license
The PGS Catalog distributes its data according to EBI's standard Terms of Use. Some PGS have specific terms, licenses, or restrictions (e.g. non-commercial use) that we highlight in this field, if known.
publications
A table of publications. Each publication (row) is uniquely identified by the
pgp_id
column. Columns:- pgs_id
Polygenic Score (PGS) identifier.
- pgp_id
PGS Publication identifier. Example:
"PGP000001"
.- pubmed_id
PubMed identifier. Example:
"25855707"
.- publication_date
Publication date. Example:
"2020-09-28"
. Note that the class ofpublication_date
isDate
.- publication
Abbreviated name of the journal. Example:
"Am J Hum Genet"
.- title
Publication title.
- author_fullname
First author of the publication. Example:
'Mavaddat N'
.- doi
Digital Object Identifier (DOI). This variable is also curated to allow unpublished work (e.g. preprints) to be added to the catalog. Example:
"10.1093/jnci/djv036"
.
samples
A table of samples. Each sample (row) is uniquely identified by the combination of values from the columns:
pgs_id
andsample_id
. Columns:- pgs_id
Polygenic score identifier. An identifier that starts with
'PGS'
and is followed by six digits, e.g.'PGS000001'
.- sample_id
Sample identifier. This is a surrogate key to identify each sample.
- stage
Sample stage: either
"discovery"
or"training"
.- sample_size
Number of individuals included in the sample.
- sample_cases
Number of cases.
- sample_controls
Number of controls.
- sample_percent_male
Percentage of male participants.
- phenotype_description
Detailed phenotype description.
- ancestry_category
Author reported ancestry is mapped to the best matching ancestry category from the NHGRI-EBI GWAS Catalog framework (see
ancestry_categories
) for possible values.- ancestry
A more detailed description of sample ancestry that usually matches the most specific description described by the authors (e.g. French, Chinese).
- country
Author reported countries of recruitment (if available).
- ancestry_additional_description
Any additional description not captured in the other columns (e.g. founder or genetically isolated populations, or further description of admixed samples).
- study_id
Associated GWAS Catalog study accession identifier, e.g.,
"GCST002735"
.- pubmed_id
PubMed identifier.
- cohorts_additional_description
Any additional description about the samples (e.g. sub-cohort information).
demographics
A table of sample demographics' variables. Each demographics' variable (row) is uniquely identified by the combination of values from the columns:
pgs_id
,sample_id
andvariable
. Columns:- pgs_id
Polygenic Score (PGS) identifier.
- sample_id
Sample identifier. This is a surrogate identifier to identify each sample.
- variable
Demographics variable. Following columns report about the indicated variable.
- estimate_type
Type of statistical estimate for variable.
- estimate
The variable's statistical value.
- unit
Unit of the variable.
- variability_type
Measure of statistical dispersion for variable, e.g. standard error (se) or standard deviation (sd).
- variability
The value of the measure of dispersion.
- interval_type
Type of statistical interval for variable: range, iqr (interquartile), ci (confidence interval).
- interval_lower
Interval lower bound.
- interval_upper
Interval upper bound.
cohorts
A table of cohorts. Each cohort (row) is uniquely identified by the combination of values from the columns:
pgs_id
,sample_id
andcohort_symbol
. Columns:- pgs_id
Polygenic Score (PGS) identifier.
- sample_id
Sample identifier. This is a surrogate key to identify each sample.
- cohort_symbol
Cohort symbol.
- cohort_name
Cohort full name.
traits
A table of EFO traits. Each trait (row) is uniquely identified by the combination of the columns
pgs_id
andefo_id
. Columns:- pgs_id
Polygenic Score (PGS) identifier.
- efo_id
An EFO identifier.
- trait
Trait name.
- description
Detailed description of the trait from EFO.
- url
External link to the EFO entry.
stages_tally
A table of sample sizes and number of samples sets at each stage.
- pgs_id
Polygenic Score (PGS) identifier.
- stage
Sample stage: either
"gwas"
,"dev"
or"eval"
.- sample_size
Sample size.
- n_sample_sets
Number of sample sets (only meaningful for the evaluation stage
"eval"
)
ancestry_frequencies
This table describes the ancestry composition at each stage.
- pgs_id
Polygenic Score (PGS) identifier.
- stage
Sample stage: either
"gwas"
,"dev"
or"eval"
.- ancestry_class_symbol
Ancestry class symbol.
- frequency
Ancestry fraction (percentage).
multi_ancestry_composition
A table of a breakdown of the ancestries included in multi-ancestries.
- pgs_id
Polygenic Score (PGS) identifier.
- stage
Sample stage: either
"gwas"
,"dev"
or"eval"
.- multi_ancestry_class_symbol
Multi-ancestry class symbol.
- ancestry_class_symbol
Ancestry class symbol.
Set operations on PGS Catalog objects
Description
Performs set union, intersection, and (asymmetric!) difference on two objects
of either class scores, publications, traits,
performance_metrics, sample_sets, cohorts or
trait_categories. Note that union()
removes duplicated
entities, whereas bind()
does not.
Usage
union(x, y, ...)
intersect(x, y, ...)
setdiff(x, y, ...)
setequal(x, y, ...)
Arguments
x , y |
Objects of either class scores, publications, traits, performance_metrics, sample_sets, cohorts or trait_categories. |
... |
other arguments passed on to methods. |
Value
In the case of union()
, intersect()
, or setdiff()
: an object of
the same class as x
and y
. In the case of setequal()
, a
logical scalar.
Examples
# Get some `scores` objects:
my_scores_1 <- get_scores(c('PGS000012', 'PGS000013'))
my_scores_2 <- get_scores(c('PGS000013', 'PGS000014'))
#
# union()
#
# NB: with `union()`, PGS000013 is not repeated.
union(my_scores_1, my_scores_2)@scores
#
# intersect()
#
intersect(my_scores_1, my_scores_2)@scores
#
# setdiff()
#
setdiff(my_scores_1, my_scores_2)@scores
#
# setequal()
#
setequal(my_scores_1, my_scores_2)
setequal(my_scores_1, my_scores_1)
setequal(my_scores_2, my_scores_2)
Study stages
Description
A dataset containing the various study stages assigned to samples in the PGS Catalog.
Usage
stages
Format
A data frame with 5 stages (rows) and 4 columns:
- stage
Study stage.
- symbol
One-letter symbol for the stage, or a comma separated combination thereof.
- name
Stage name.
- definition
Stage description.
Source
https://www.pgscatalog.org/docs/ancestry
Examples
stages
Map GWAS studies identifiers to PGS identifiers
Description
Map GWAS studies identifiers to PGS identifiers.
Usage
study_to_pgs(study_id, verbose = FALSE, warnings = TRUE, progress_bar = TRUE)
Arguments
study_id |
A character vector of GWAS Catalog study accession identifiers, e.g., "GCST001937". |
verbose |
A |
warnings |
A |
progress_bar |
Whether to show a progress bar as the queries are performed. |
Value
A data frame of two columns: study_id
and pgs_id
.
Examples
## Not run:
study_to_pgs('GCST001937')
study_to_pgs(c('GCST000998', 'GCST000338'))
## End(Not run)
Subset a cohorts object
Description
You can subset cohorts by identifier or by position using the
`[`
operator.
Usage
## S4 method for signature 'cohorts,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'cohorts,numeric,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'cohorts,character,missing,missing'
x[i, j, ..., drop = FALSE]
Arguments
x |
A cohorts object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
Value
A cohorts object.
Examples
# Get a few cohorts by their symbol:
my_cohorts <- get_cohorts(c('23andMe', 'BioImage', 'Rotterdam-SI', 'SGWAS'),
progress_bar = FALSE)
#
# Subsetting by position
#
my_cohorts[c(1, 3)]
#
# Subsetting by cohort symbol (character)
#
my_cohorts[c('23andMe', 'SGWAS')]
Subset a performance_metrics object
Description
You can subset performance_metrics by identifier or by position
using the `[`
operator.
Usage
## S4 method for signature 'performance_metrics,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'performance_metrics,numeric,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'performance_metrics,character,missing,missing'
x[i, j, ..., drop = FALSE]
Arguments
x |
A performance_metrics object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
Value
A performance_metrics object.
Examples
# Get a few performance metrics:
my_ppm <- get_performance_metrics(sprintf('PPM%06d', 38:42))
#
# Subsetting by position
#
my_ppm[c(1, 4)]
#
# Subsetting by performance metrics identifier (character)
#
my_ppm['PPM000042']
Subset a publications object
Description
You can subset publications by identifier or by position using
the `[`
operator.
Usage
## S4 method for signature 'publications,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'publications,numeric,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'publications,character,missing,missing'
x[i, j, ..., drop = FALSE]
Arguments
x |
A publications object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
Value
A publications object.
Examples
# Get all publications in the PGS Catalog:
all_pub <- get_publications(interactive = FALSE, progress_bar = FALSE)
#
# Subsetting by position
#
all_pub[1:5]
#
# Subsetting by publication identifier (character)
#
all_pub['PGP000001']
Subset a releases object
Description
You can subset releases by identifier (release date) or by
position using the `[`
operator.
Usage
## S4 method for signature 'releases,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'releases,numeric,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'releases,character,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'releases,Date,missing,missing'
x[i, j, ..., drop = FALSE]
Arguments
x |
A releases object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
Value
A releases object.
Examples
# Get details about all PGS Catalog data releases thus far:
all_releases <- get_releases(date = 'all', progress_bar = FALSE)
#
# Subsetting by position
#
# Releases are, by default, sorted by date in descending order, thus the
# first PGS Catalog release is in the last position of the returned
# `all_releases` object. Here's how you can extract that first release (last
# position in `all_releases`):
all_releases[n(all_releases)]
#
# Subsetting by date (character)
#
date_of_interest <- '2021-06-11'
class(date_of_interest)
all_releases[date_of_interest]
#
# Subsetting by date (Date object)
#
date_of_interest <- as.Date('2021-06-11')
class(date_of_interest)
all_releases[date_of_interest]
Subset a sample_sets object
Description
You can subset sample_sets by identifier or by position using
the `[`
operator.
Usage
## S4 method for signature 'sample_sets,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'sample_sets,numeric,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'sample_sets,character,missing,missing'
x[i, j, ..., drop = FALSE]
Arguments
x |
A sample_sets object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
Value
A sample_sets object.
Examples
# Get a few sample sets:
my_pss <- get_sample_sets(sprintf('PSS%06d', 42:48))
#
# Subsetting by position
#
my_pss[c(1, 3)]
#
# Subsetting by sample set identifier (character)
#
my_pss['PSS000042']
Subset a scores object
Description
You can subset scores by identifier or by position using the
`[`
operator.
Usage
## S4 method for signature 'scores,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'scores,numeric,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'scores,character,missing,missing'
x[i, j, ..., drop = FALSE]
Arguments
x |
A scores object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
Value
A scores object.
Examples
# Get a few polygenic scores:
my_scores <- get_scores(sprintf('PGS%06d', 10:14), progress_bar = FALSE)
#
# Subsetting by position
#
my_scores[c(1, 3, 5)]@scores
#
# Subsetting by PGS identifier (character)
#
my_scores[c('PGS000011', 'PGS000014')]@scores
Subset a trait_categories object
Description
You can subset trait_categories by trait category (string) or
by position using the `[`
operator.
Usage
## S4 method for signature 'trait_categories,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'trait_categories,numeric,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'trait_categories,character,missing,missing'
x[i, j, ..., drop = FALSE]
Arguments
x |
A trait_categories object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
Value
A trait_categories object.
Examples
# Get details about all trait categories:
all_trait_categories <- get_trait_categories(progress_bar = FALSE)
#
# Subsetting by position
#
all_trait_categories[1:5]
#
# Subsetting by trait category (character)
#
all_trait_categories['Liver enzyme measurement']
Subset a traits object
Description
You can subset traits by identifier or by position using the
`[`
operator.
Usage
## S4 method for signature 'traits,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'traits,numeric,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'traits,character,missing,missing'
x[i, j, ..., drop = FALSE]
Arguments
x |
A traits object. |
i |
Position of the identifier or the name of the identifier itself. |
j |
Not used. |
... |
Additional arguments not used here. |
drop |
Not used. |
Value
A traits object.
Examples
# Get a few traits:
my_traits <- get_traits(trait_term = 'stroke', exact_term = FALSE,
progress_bar = FALSE)
#
# Subsetting by position
#
my_traits[1]
#
# Subsetting by EFO trait identifier (character)
#
my_traits['EFO_0000712']
Are you sure?
Description
This function asks you interactively for permission to continue or not. You can specify a custom message before the question and also different messages for a both a positive and negative answer.
Usage
sure(
before_question = NULL,
after_saying_no = NULL,
after_saying_yes = NULL,
default_answer = NULL
)
Arguments
before_question |
String with message to be printed before question. |
after_saying_no |
String with message to be printed after answering
|
after_saying_yes |
String with message to be printed after answering
|
default_answer |
String with answer to question, if run in non-interactive mode. |
Details
If you run this function in non-interactive mode, you should pass an
automatic answer to default_answer
: 'yes'
or 'no'
.
Value
A logical indicating if answer was 'yes'
/'y'
(TRUE
) or otherwise (FALSE
).
Constructor for the S4 trait_categories object.
Description
Constructor for the S4 trait_categories object.
Usage
trait_categories(
trait_categories = s4trait_categories_trait_categories_tbl(),
traits = s4trait_categories_traits_tbl()
)
Arguments
trait_categories |
A |
traits |
A |
Value
An object of class trait_categories.
An S4 class to represent a set of PGS Catalog Trait Categories
Description
The trait_categories object consists of two tables (slots) that combined form
a relational database of a subset of PGS Catalog trait categories. Each score
is an observation (row) in the trait_categories
table (first table).
Slots
trait_categories
A table of trait categories. Columns:
- trait_category
Trait category name.
traits
A table of associated traits. Columns:
- trait_category
Trait category name.
- efo_id
An EFO identifier.
- trait
Trait name.
- description
Detailed description of the trait from EFO.
- url
External link to the EFO entry.
Constructor for the S4 traits object.
Description
Constructor for the S4 traits object.
Usage
traits(
traits = s4traits_traits_tbl(),
pgs_ids = s4traits_pgs_ids_tbl(),
child_pgs_ids = s4traits_child_pgs_ids_tbl(),
trait_categories = s4traits_trait_categories_tbl(),
trait_synonyms = s4traits_trait_synonyms_tbl(),
trait_mapped_terms = s4traits_trait_mapped_terms_tbl()
)
Arguments
traits |
A |
pgs_ids |
A |
child_pgs_ids |
A |
trait_categories |
A |
trait_synonyms |
A |
trait_mapped_terms |
A |
Value
An object of class traits.
An S4 class to represent a set of PGS Catalog Traits
Description
The traits object consists of six slots, each a table
(tibble
), that combined form a relational database of a
subset of PGS Catalog traits. Each trait is an observation (row) in
the traits
table — main table. All tables have the column
efo_id
as primary key.
Slots
traits
A table of traits. Columns:
pgs_ids
A table of associated polygenic score identifiers. Columns:
child_pgs_ids
A table of polygenic score identifiers associated with the child traits. Columns:
trait_categories
A table of associated trait categories. Columns:
trait_synonyms
A table of associated trait synonyms. Columns:
trait_mapped_terms
A table of associated external references, identifiers or other terms. Columns:
User agent identification
Description
Generates an S3 request
object as defined by the package httr
,
that is used to identify this package as the user agent in requests to the
PGS REST API. The user agent identification string is: "quincunx: R
Client for the PGS REST API"
.
Usage
user_agent_id()
Value
An S3 request
object as defined by the package httr
.
Export a PGS Catalog object to xlsx
Description
This function exports a PGS Catalog object to Microsoft Excel xlsx file. Each table (slot) is saved in its own sheet.
Usage
write_xlsx(x, file = stop("`file` must be specified"))
Arguments
x |
A scores, publications, traits, performance_metrics, sample_sets, cohorts, trait_categories or releases object. |
file |
A file name to write to. |
Value
No return value, called for its side effect.