Title: | A Collection of Proteome Panels and Meta-Data |
Version: | 0.5 |
Date: | 2025-3-5 |
Description: | It aggregates protein panel data and metadata for protein quantitative trait locus (pQTL) analysis using 'pQTLtools' (https://jinghuazhao.github.io/pQTLtools/). The package includes data from affinity-based panels such as 'Olink' (https://olink.com/) and 'SomaScan' (https://somalogic.com/), as well as mass spectrometry-based panels from 'CellCarta' (https://cellcarta.com/) and 'Seer' (https://seer.bio/). The metadata encompasses updated annotations and publication details. |
License: | MIT + file LICENSE |
URL: | https://jinghuazhao.github.io/pQTLdata/, https://jinghuazhao.github.io/pQTLdata/ |
Depends: | R (≥ 3.5.0) |
Imports: | knitr, Rdpack |
RdMacros: | Rdpack |
Suggests: | dplyr, grid, EnsDb.Hsapiens.v75, ensembldb, IRanges, org.Hs.eg.db, S4Vectors, VennDiagram |
VignetteBuilder: | knitr |
LazyData: | Yes |
LazyLoad: | Yes |
LazyDataCompression: | xz |
NeedsCompilation: | no |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Packaged: | 2025-03-05 16:14:38 UTC; jhz22 |
Author: | Jing Hua Zhao |
Maintainer: | Jing Hua Zhao <jinghuazhao@hotmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-03-07 11:30:02 UTC |
A summary of datasets
Description
It aggregates protein panel data and metadata for protein quantitative trait locus (pQTL) analysis using 'pQTLtools' (https://jinghuazhao.github.io/pQTLtools/). The package includes data from affinity-based panels such as 'Olink' (https://olink.com/) and 'SomaScan' (https://somalogic.com/), as well as mass spectrometry-based panels from 'CellCarta' (https://cellcarta.com/) and 'Seer' (https://seer.bio/). The metadata encompasses updated annotations and publication details.
Details
Available data are listed in the following table.
Objects | Description |
Datasets | |
caprion | Caprion panel |
inf1 | Olink/INF panel |
Olink_Explore_1536 | Olink/NGS 1472 panels |
Olink_Explore_3072 | Olink/Explore 3072 panels |
Olink_Explore_HT | Olink/Explore HT panels |
Olink_Target_96 | Olink/Target 96 panels |
Olink_qPCR | Olink/qPCR panels |
SomaScan160410 | SomaScan panel |
SomaScanV4.1 | SomaScan v4.1 panel |
SomaScan11k | SomaScan 11k panel |
scallop_inf1 | SCALLOP/INF meta-analysis results |
seer1980 | ST1 from Suhre et al. (2024) bioRxiv |
swath_ms | SWATH-MS panel |
Installations | |
EndNote/ | Proteogenomics references |
Olink/ | Olink-COVID analysis by MGH |
Some generic description for the datasets are as follows.
chr Chromosome.
start Start position.
end End position.
gene Gene name.
UniProt UniProt ID.
Usage
Vignettes on package usage:
An Overview of pQTLdata.
vignette("pQTLdata")
.
Author(s)
Jing Hua Zhao in collaboration with other colleagues.
See Also
Useful links:
Examples
# Olink-SomaScan panel overlap
p <- list(setdiff(inf1$uniprot,"P23560"),
setdiff(SomaScan160410$UniProt[!is.na(SomaScan160410$UniProt)],"P23560"))
cnames <- c("INF1","SomaScan")
os <- VennDiagram::venn.diagram(x = p, category.names=cnames, filename=NULL,
disable.logging = TRUE,height=8,width=8,units="in")
grid::grid.newpage()
grid::grid.draw(os)
m <- merge(inf1,SomaScan160410,by.x="uniprot",by.y="UniProt")
u <- setdiff(with(m,unique(uniprot)),"P23560")
o <- subset(inf1,uniprot %in% u)
dim(o)
vars <- c("UniProt","chr","start","end","extGene","Target","TargetFullName")
s <- subset(SomaScan160410[vars], UniProt %in% u)
dim(s)
us <- s[!duplicated(s),]
dim(us)
us
Olink/Explore 1536 panel
Description
Information based on pilot studies
Usage
Olink_Explore_1536
Format
A data frame with 1,472 rows and 3 variables:
UniProt
UniProt id
Assay
Experimental assay
Panel
Olink panel
Details
Curated from R.
Olink/Explore 3072 panels
Description
Information on all qPCR panels
Usage
Olink_Explore_3072
Format
A data frame with 2,945 rows and 4 variables:
UniProt.ID
UniProt id
Protein.name
Protein name
Gene.name
Gene name
Explore.384.panel
Explore 384 panel
Details
Curated from Excel.
Olink/Explore HT panels
Description
Information on all qPCR panels
Usage
Olink_Explore_HT
Format
A data frame with 5,416 rows and 4 variables:
Olink.ID
Olink id
UniProt.ID
UniProt id
Protein.name
Protein name
Gene.name
Gene name
Details
Curated from Excel.
Olink/Target 96 panels
Description
Information on all Target 96 panels. Individual panels are also available from the companion xlsx in the Olink/ directory.
Usage
Olink_Target_96
Format
A data frame with 1,116 rows and 3 variables:
UniProt
UniProt id
Protein
Protein
Panel
Panel
Details
Curated from Excel.
Olink/qPCR panels
Description
Information on all qPCR panels
Usage
Olink_qPCR
Format
A data frame with 1,112 rows and 7 variables:
UniProt
UniProt id
Panel
Panels
Target
Protein
gene
HGNC symbol
chr
Chromosome
start
start
end
end
Details
Curated from Excel.
SomaScan 11k
Description
This is also the latest panel
Usage
SomaScan11k
Format
A data frame with 10,776 rows and 5 variables:
Sequence.ID
Sequence ID
Full.Name
Full name
Target.Name
Target name
UniProt.ID
UniProt ID
Entrez.Gene.Name
Entrez gene name
Details
curated from SomaLogic website.
Source
https://somalogic.com/somascan-11k-assay/
Somascan panel
Description
This is based on panel used in Sun et al. (2018).
Usage
SomaScan160410
Format
A data frame with 5,178 rows and 10 variables:
SOMAMER_ID
Somamer id
UniProt
UniProt id
Target
Protein target
TargetFullName
Protein target full name
chr
chromosome (1-22,X,Y)
start
start
end
end
entGene
entrez gene
ensGene
ENSEMBL gene
extGene
external gene
Details
from the INTERVAL study.
References
Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, Burgess S, Jiang T, Paige E, Surendran P, Oliver-Williams C, Kamat MA, Prins BP, Wilcox SK, Zimmerman ES, Chi A, Bansal N, Spain SL, Wood AM, Morrell NW, Bradley JR, Janjic N, Roberts DJ, Ouwehand WH, Todd JA, Soranzo N, Suhre K, Paul DS, Fox CS, Plenge RM, Danesh J, Runz H, Butterworth AS (2018). “Genomic atlas of the human plasma proteome.” Nature, 558(7708), 73-79. ISSN 1476-4687 (Electronic) 0028-0836 (Linking), doi:10.1038/s41586-018-0175-2.
SomaScan v4.1
Description
This is the 7k panel
Usage
SomaScanV4.1
Format
A data frame with 7,288 rows and 6 variables:
#
A serial number
SeqID
SeqID
Human.Target.or.Analyte
Human target/analyte
UniProt.ID
UniProt id
GeneID
HGNC symbol
Type
"Protein"
Details
obtained directly from SomaLogic.
Caprion panel
Description
Information based on Caprion pilot studies
Usage
caprion
Format
A data frame with 987 rows and 12 variables:
Gene
HGNC symbols simplified in four instances
Gene.orig
HGNC symbol
Protein
Protein name as in UniProt
Accession
UniProt id
Protein.Description
Detailed information on protein
GO.Cellular.Component
GO Ceullular component
GO.Function
GO function
GO.Process
GO process
ensGenes
Ensembl genes
chrom
chromosome
chr
chromosome
starts
start positions
ends
end positions
start
minimum start
end
maximum end
Details
See the Caprion repository involving its use.
Olink/INF1 panel
Description
The panel is based on SCALLOP-INF Zhao et al. (2023).
Usage
inf1
Format
A data frame with 92 rows and 9 variables:
uniprot
UniProt id
prot
Protein
target
Protein target name
target.short
Protein target short name
gene
HGNC symbol
chr
chromosome (1-13,16-17,19-22)
start
start
end
end
chromosome
updated chromosomes
start38
start position under build 38
end38
end position under build 38
ensGene
Ensembl gene name
ensembl_gene_id
ENSEMBL gene
alt_name
recent name from www.uniprot.org
Details
Assembled for SCALLOP-INF
References
Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, Enroth S, Cozzetto D, Digby-Bell J, Marten J, Folkersen L, Herder C, Jonsson L, Bergen SE, Gieger C, Needham EJ, Surendran P, Team EBR, Paul DS, Polasek O, Thorand B, Grallert H, Roden M, Võsa U, Esko T, Hayward C, Johansson Å, Gyllensten U, Powell N, Hansson O, Mattsson-Carlgren N, Joshi PK, Danesh J, Padyukov L, Klareskog L, Landén M, Wilson JF, Siegbahn A, Wallentin L, Mälarstig A, Butterworth AS, Peters JE (2023). “Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets.” Nature Immunology, 24(9), 1540-1551. doi:10.1038/s41590-023-01588-w.
Supplementary table 3
Description
Supplementary information for Zhao et al. (2023).
Usage
scallop_inf1
Format
A data frame with 180 rows and 19 variables:
- UniProt
UnitProt ID
- Protein
Protein name
- Protein_gene_symbol
Gene symbol
- Chromosome
Chromosome
- Position
Position
- cistrans
cis/trans
- rsid
reference sequence ID
- Effect_allele
Effect allele
- Other_allele
Eeference allele
- EAF
Effect allele frequency
- b
b
- SE
SE
- log10P
log10(P)
- Direction
Direction field in METAL output
- HetISq
I
^2
- HetChiSq
Heterogeneity chi-square
- HetDf
degrees of freedom
- logHetP
Heterogeneity log10(P)
- N
N
References
Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, Enroth S, Cozzetto D, Digby-Bell J, Marten J, Folkersen L, Herder C, Jonsson L, Bergen SE, Gieger C, Needham EJ, Surendran P, Team EBR, Paul DS, Polasek O, Thorand B, Grallert H, Roden M, Võsa U, Esko T, Hayward C, Johansson Å, Gyllensten U, Powell N, Hansson O, Mattsson-Carlgren N, Joshi PK, Danesh J, Padyukov L, Klareskog L, Landén M, Wilson JF, Siegbahn A, Wallentin L, Mälarstig A, Butterworth AS, Peters JE (2023). “Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets.” Nature Immunology, 24(9), 1540-1551. doi:10.1038/s41590-023-01588-w.
Seer 1980 panel
Description
ST1 from Suhre et al. (2024).
Usage
seer1980
Format
A data frame with 1,980 rows:
PID.NP
PID.NP
protein_ids
protein_ids
protein_names
protein_names
mapped.UniProtID
mapped.UniProtID
mapped_gene_id
mapped_gene_id
gene_name
gene_name
description
description
chr
chr
start
start
end
end
Details
As above.
References
Suhre K, Chen Q, Halama A, Mendez K, Dahlin A, Stephan N, Thareja G, Sarwath H, Guturu H, Dwaraka VB, Batzoglou S, Schmidt F, Lasky-Su JA (2024). “A genome-wide association study of mass spectrometry proteomics using the Seer Proteograph platform.” BioRxiv. doi:10.1101/2024.05.27.596028.
SWATH-MS panel
Description
Curated during INTERVAL pilot study.
Usage
swath_ms
Format
A data frame with 684 rows and 5 variables:
Accession
UniProt id
accList
List of UniProt ids
uniprotName
Protein
ensGene
ENSEMBL gene
geneName
HGNC symbol
Details
As above.