Title: A Collection of Proteome Panels and Meta-Data
Version: 0.5
Date: 2025-3-5
Description: It aggregates protein panel data and metadata for protein quantitative trait locus (pQTL) analysis using 'pQTLtools' (https://jinghuazhao.github.io/pQTLtools/). The package includes data from affinity-based panels such as 'Olink' (https://olink.com/) and 'SomaScan' (https://somalogic.com/), as well as mass spectrometry-based panels from 'CellCarta' (https://cellcarta.com/) and 'Seer' (https://seer.bio/). The metadata encompasses updated annotations and publication details.
License: MIT + file LICENSE
URL: https://jinghuazhao.github.io/pQTLdata/, https://jinghuazhao.github.io/pQTLdata/
Depends: R (≥ 3.5.0)
Imports: knitr, Rdpack
RdMacros: Rdpack
Suggests: dplyr, grid, EnsDb.Hsapiens.v75, ensembldb, IRanges, org.Hs.eg.db, S4Vectors, VennDiagram
VignetteBuilder: knitr
LazyData: Yes
LazyLoad: Yes
LazyDataCompression: xz
NeedsCompilation: no
Encoding: UTF-8
RoxygenNote: 7.3.2
Packaged: 2025-03-05 16:14:38 UTC; jhz22
Author: Jing Hua Zhao ORCID iD [aut, cre] (0000-0003-4930-3582), Uwe Ligges [ctb], Benjamin Altmann [ctb]
Maintainer: Jing Hua Zhao <jinghuazhao@hotmail.com>
Repository: CRAN
Date/Publication: 2025-03-07 11:30:02 UTC

A summary of datasets

Description

It aggregates protein panel data and metadata for protein quantitative trait locus (pQTL) analysis using 'pQTLtools' (https://jinghuazhao.github.io/pQTLtools/). The package includes data from affinity-based panels such as 'Olink' (https://olink.com/) and 'SomaScan' (https://somalogic.com/), as well as mass spectrometry-based panels from 'CellCarta' (https://cellcarta.com/) and 'Seer' (https://seer.bio/). The metadata encompasses updated annotations and publication details.

Details

Available data are listed in the following table.

Objects Description
Datasets  
caprion Caprion panel
inf1 Olink/INF panel
Olink_Explore_1536 Olink/NGS 1472 panels
Olink_Explore_3072 Olink/Explore 3072 panels
Olink_Explore_HT Olink/Explore HT panels
Olink_Target_96 Olink/Target 96 panels
Olink_qPCR Olink/qPCR panels
SomaScan160410 SomaScan panel
SomaScanV4.1 SomaScan v4.1 panel
SomaScan11k SomaScan 11k panel
scallop_inf1 SCALLOP/INF meta-analysis results
seer1980 ST1 from Suhre et al. (2024) bioRxiv
swath_ms SWATH-MS panel
Installations  
EndNote/ Proteogenomics references
Olink/ Olink-COVID analysis by MGH

Some generic description for the datasets are as follows.

Usage

Vignettes on package usage:

Author(s)

Jing Hua Zhao in collaboration with other colleagues.

See Also

Useful links:

Examples


# Olink-SomaScan panel overlap
p <- list(setdiff(inf1$uniprot,"P23560"),
          setdiff(SomaScan160410$UniProt[!is.na(SomaScan160410$UniProt)],"P23560"))
cnames <- c("INF1","SomaScan")
os <- VennDiagram::venn.diagram(x = p, category.names=cnames, filename=NULL,
                                disable.logging = TRUE,height=8,width=8,units="in")
grid::grid.newpage()
grid::grid.draw(os)
m <- merge(inf1,SomaScan160410,by.x="uniprot",by.y="UniProt")
u <- setdiff(with(m,unique(uniprot)),"P23560")
o <- subset(inf1,uniprot %in% u)
dim(o)
vars <- c("UniProt","chr","start","end","extGene","Target","TargetFullName")
s <- subset(SomaScan160410[vars], UniProt %in% u)
dim(s)
us <- s[!duplicated(s),]
dim(us)
us


Description

Information based on pilot studies

Usage

Olink_Explore_1536

Format

A data frame with 1,472 rows and 3 variables:

UniProt

UniProt id

Assay

Experimental assay

Panel

Olink panel

Details

Curated from R.


Description

Information on all qPCR panels

Usage

Olink_Explore_3072

Format

A data frame with 2,945 rows and 4 variables:

UniProt.ID

UniProt id

Protein.name

Protein name

Gene.name

Gene name

Explore.384.panel

Explore 384 panel

Details

Curated from Excel.


Description

Information on all qPCR panels

Usage

Olink_Explore_HT

Format

A data frame with 5,416 rows and 4 variables:

Olink.ID

Olink id

UniProt.ID

UniProt id

Protein.name

Protein name

Gene.name

Gene name

Details

Curated from Excel.


Description

Information on all Target 96 panels. Individual panels are also available from the companion xlsx in the Olink/ directory.

Usage

Olink_Target_96

Format

A data frame with 1,116 rows and 3 variables:

UniProt

UniProt id

Protein

Protein

Panel

Panel

Details

Curated from Excel.


Description

Information on all qPCR panels

Usage

Olink_qPCR

Format

A data frame with 1,112 rows and 7 variables:

UniProt

UniProt id

Panel

Panels

Target

Protein

gene

HGNC symbol

chr

Chromosome

start

start

end

end

Details

Curated from Excel.


SomaScan 11k

Description

This is also the latest panel

Usage

SomaScan11k

Format

A data frame with 10,776 rows and 5 variables:

Sequence.ID

Sequence ID

Full.Name

Full name

Target.Name

Target name

UniProt.ID

UniProt ID

Entrez.Gene.Name

Entrez gene name

Details

curated from SomaLogic website.

Source

https://somalogic.com/somascan-11k-assay/


Somascan panel

Description

This is based on panel used in Sun et al. (2018).

Usage

SomaScan160410

Format

A data frame with 5,178 rows and 10 variables:

SOMAMER_ID

Somamer id

UniProt

UniProt id

Target

Protein target

TargetFullName

Protein target full name

chr

chromosome (1-22,X,Y)

start

start

end

end

entGene

entrez gene

ensGene

ENSEMBL gene

extGene

external gene

Details

from the INTERVAL study.

References

Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, Burgess S, Jiang T, Paige E, Surendran P, Oliver-Williams C, Kamat MA, Prins BP, Wilcox SK, Zimmerman ES, Chi A, Bansal N, Spain SL, Wood AM, Morrell NW, Bradley JR, Janjic N, Roberts DJ, Ouwehand WH, Todd JA, Soranzo N, Suhre K, Paul DS, Fox CS, Plenge RM, Danesh J, Runz H, Butterworth AS (2018). “Genomic atlas of the human plasma proteome.” Nature, 558(7708), 73-79. ISSN 1476-4687 (Electronic) 0028-0836 (Linking), doi:10.1038/s41586-018-0175-2.


SomaScan v4.1

Description

This is the 7k panel

Usage

SomaScanV4.1

Format

A data frame with 7,288 rows and 6 variables:

#

A serial number

SeqID

SeqID

Human.Target.or.Analyte

Human target/analyte

UniProt.ID

UniProt id

GeneID

HGNC symbol

Type

"Protein"

Details

obtained directly from SomaLogic.


Caprion panel

Description

Information based on Caprion pilot studies

Usage

caprion

Format

A data frame with 987 rows and 12 variables:

Gene

HGNC symbols simplified in four instances

Gene.orig

HGNC symbol

Protein

Protein name as in UniProt

Accession

UniProt id

Protein.Description

Detailed information on protein

GO.Cellular.Component

GO Ceullular component

GO.Function

GO function

GO.Process

GO process

ensGenes

Ensembl genes

chrom

chromosome

chr

chromosome

starts

start positions

ends

end positions

start

minimum start

end

maximum end

Details

See the Caprion repository involving its use.


Olink/INF1 panel

Description

The panel is based on SCALLOP-INF Zhao et al. (2023).

Usage

inf1

Format

A data frame with 92 rows and 9 variables:

uniprot

UniProt id

prot

Protein

target

Protein target name

target.short

Protein target short name

gene

HGNC symbol

chr

chromosome (1-13,16-17,19-22)

start

start

end

end

chromosome

updated chromosomes

start38

start position under build 38

end38

end position under build 38

ensGene

Ensembl gene name

ensembl_gene_id

ENSEMBL gene

alt_name

recent name from www.uniprot.org

Details

Assembled for SCALLOP-INF

References

Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, Enroth S, Cozzetto D, Digby-Bell J, Marten J, Folkersen L, Herder C, Jonsson L, Bergen SE, Gieger C, Needham EJ, Surendran P, Team EBR, Paul DS, Polasek O, Thorand B, Grallert H, Roden M, Võsa U, Esko T, Hayward C, Johansson Å, Gyllensten U, Powell N, Hansson O, Mattsson-Carlgren N, Joshi PK, Danesh J, Padyukov L, Klareskog L, Landén M, Wilson JF, Siegbahn A, Wallentin L, Mälarstig A, Butterworth AS, Peters JE (2023). “Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets.” Nature Immunology, 24(9), 1540-1551. doi:10.1038/s41590-023-01588-w.


Supplementary table 3

Description

Supplementary information for Zhao et al. (2023).

Usage

scallop_inf1

Format

A data frame with 180 rows and 19 variables:

UniProt

UnitProt ID

Protein

Protein name

Protein_gene_symbol

Gene symbol

Chromosome

Chromosome

Position

Position

cistrans

cis/trans

rsid

reference sequence ID

Effect_allele

Effect allele

Other_allele

Eeference allele

EAF

Effect allele frequency

b

b

SE

SE

log10P

log10(P)

Direction

Direction field in METAL output

HetISq

I^2

HetChiSq

Heterogeneity chi-square

HetDf

degrees of freedom

logHetP

Heterogeneity log10(P)

N

N

References

Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, Enroth S, Cozzetto D, Digby-Bell J, Marten J, Folkersen L, Herder C, Jonsson L, Bergen SE, Gieger C, Needham EJ, Surendran P, Team EBR, Paul DS, Polasek O, Thorand B, Grallert H, Roden M, Võsa U, Esko T, Hayward C, Johansson Å, Gyllensten U, Powell N, Hansson O, Mattsson-Carlgren N, Joshi PK, Danesh J, Padyukov L, Klareskog L, Landén M, Wilson JF, Siegbahn A, Wallentin L, Mälarstig A, Butterworth AS, Peters JE (2023). “Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets.” Nature Immunology, 24(9), 1540-1551. doi:10.1038/s41590-023-01588-w.


Seer 1980 panel

Description

ST1 from Suhre et al. (2024).

Usage

seer1980

Format

A data frame with 1,980 rows:

PID.NP

PID.NP

protein_ids

protein_ids

protein_names

protein_names

mapped.UniProtID

mapped.UniProtID

mapped_gene_id

mapped_gene_id

gene_name

gene_name

description

description

chr

chr

start

start

end

end

Details

As above.

References

Suhre K, Chen Q, Halama A, Mendez K, Dahlin A, Stephan N, Thareja G, Sarwath H, Guturu H, Dwaraka VB, Batzoglou S, Schmidt F, Lasky-Su JA (2024). “A genome-wide association study of mass spectrometry proteomics using the Seer Proteograph platform.” BioRxiv. doi:10.1101/2024.05.27.596028.


SWATH-MS panel

Description

Curated during INTERVAL pilot study.

Usage

swath_ms

Format

A data frame with 684 rows and 5 variables:

Accession

UniProt id

accList

List of UniProt ids

uniprotName

Protein

ensGene

ENSEMBL gene

geneName

HGNC symbol

Details

As above.