Version: 1.1.0
Date: 2025-05-04
Title: Build a Minimalist Gene Ontology (GO) Database (GODB)
Maintainer: Barry Zeeberg <barryz2013@gmail.com>
Author: Barry Zeeberg [aut, cre]
Depends: R (≥ 4.2.0)
LazyData: true
LazyDataCompression: xz
Description: Normally building a GODB is fairly complicated, involving downloading multiple database files and using these to build e.g. a 'mySQL' database. Accessing this database is also complicated, involving an intimate knowledge of the database in order to construct reliable queries. Here we have a more modest goal, generating GOGOA3, which is a stripped down version of the GODB that was originally restricted to human genes as designated by the HUGO Gene Nomenclature Committee (HGNC) (see https://geneontology.org/). I have now added about two dozen additional species, namely all species represented on the Gene Ontology download page https://current.geneontology.org/products/pages/downloads.html. This covers most of the model organisms that are commonly used in bio-medical and basic research (assuming that anyone still has a grant to do such research). This can be built in a matter of seconds from 2 easily downloaded files (see https://current.geneontology.org/products/pages/downloads.html and https://geneontology.org/docs/download-ontology/), and it can be queried by e.g. w<-which(GOGOA3[,"HGNC"] %in% hgncList) where GOGOA3 is a matrix representing the minimalist GODB and hgncList is a list of gene identifiers. This database will be used in my upcoming package 'GoMiner' which is based on my previous publication (see Zeeberg, B.R., Feng, W., Wang, G. et al. (2003)<doi:10.1186/gb-2003-4-4-r28>). Relevant .RData files are available from GitHub (https://github.com/barryzee/GO/tree/main/databases).
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Encoding: UTF-8
VignetteBuilder: knitr
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
RoxygenNote: 7.3.2
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-05-04 13:35:56 UTC; barryzeeberg
Repository: CRAN
Date/Publication: 2025-05-04 14:00:02 UTC

minimalistGODB data set

Description

minimalistGODB data set generated by parseGOBASIC()

Usage

data(GO)

minimalistGODB data set

Description

minimalistGODB data set generated by parseGOA()

Usage

data(GOA)

minimalistGODB data set

Description

small version of minimalistGODB data set generated by buildGODatabase()

Usage

data(GOGOAsmall)

buildGODatabase

Description

driver to build GO database

Usage

buildGODatabase(goa, gobasic, dir = NULL, verbose = FALSE)

Arguments

goa

character string path name to downloaded goa_human.gaf

gobasic

character string path name to downloaded go-basic.obo

dir

character string path name to directory to hold subdirectory GODB_RDATA

verbose

Boolean if TRUE print out some diagnostic info

Details

download goa_human.gaf from https://current.geneontology.org/products/pages/downloads.html download go-basic.obo from https://geneontology.org/docs/download-ontology/ parameter dir should be omitted or NULL except for the developer harvesting the updated .RData DBs

The output GOGOA was saved as an .RData file. This was too large for CRAN. It is available from https://github.com/barryzee/GO/tree/main/databases

Value

returns no value but has side effect of saving GOGOA3 to a subdirectory

Examples

## Not run: 
# replace my path names for goa and gobasic with your own!!
# these were obtained from the download sites listed in 'details' section
goa<-"~/goa_human.gaf"
gobasic<-"~/go-basic.obo"
buildGODatabase(goa,gobasic,dir="~/",verbose=TRUE)
# > dim(GOGOA)
# [1] 720139      5
# > GOGOA[1:5,]
#      HGNC          GO           RELATION      NAME                    ONTOLOGY            
# [1,] "NUDT4B"      "GO:0003723" "enables"     "RNA binding"           "molecular_function"
# [2,] "NUDT4B"      "GO:0005515" "enables"     "protein binding"       "molecular_function"
# [3,] "NUDT4B"      "GO:0046872" "enables"     "metal ion binding"     "molecular_function"
# [4,] "NUDT4B"      "GO:0005829" "located_in"  "cytosol"               "cellular_component"
# [5,] "TRBV20OR9-2" "GO:0002376" "involved_in" "immune system process" "biological_process"

## End(Not run)

# here is a small example that you can run
f1<-system.file("extdata","goa_human.small.gaf",package="minimalistGODB")
f2<-system.file("extdata","go-basic.small.obo",package="minimalistGODB")
buildGODatabase(f1,f2,verbose=TRUE)


buildGODatabaseDriver

Description

driver to build multiple GO databases for many species

Usage

buildGODatabaseDriver(goaDir, gobasic, dir = NULL, verbose = FALSE)

Arguments

goaDir

character string path name to directory containing downloaded goa .gaf files

gobasic

character string path name to downloaded go-basic.obo

dir

character string path name to directory to hold species database subdirectories

verbose

Boolean if TRUE print out some diagnostic info

Details

download goa .gaf files from https://current.geneontology.org/products/pages/downloads.html download go-basic.obo from https://geneontology.org/docs/download-ontology/

The output GOGOA3 was saved as an .RData file. This was too large for CRAN. It is available from https://github.com/barryzee/GO/tree/main/databases

Value

returns GO database with columns c("HGNC","GO","RELATION","NAME","ONTOLOGY")

Examples

## Not run: 
# replace my path names for goa and gobasic with your own!!
# these were obtained from the download sites listed in 'details' section
goaDir<-"/Users/barryzeeberg/Downloads/gaf/"
gobasic<-"~/go-basic.obo"
buildGODatabaseDriver(goaDir,gobasic,dir="~/personal",verbose=TRUE)

## End(Not run)

# here is a small example that you can run
goaDir<-system.file("extdata",package="minimalistGODB")
gobasic<-system.file("extdata","go-basic.small.obo",package="minimalistGODB")
dir<-tempdir()
buildGODatabaseDriver(goaDir,gobasic,dir,verbose=TRUE)


grepList

Description

determine the correct pattern to grep for depending on the species

Usage

grepList(gaf)

Arguments

gaf

character string containing the basename of the gaf file downloaded from https://current.geneontology.org/products/pages/downloads.html

Value

returns the correct pattern to grep for

Examples

pattern<-grepList("tair.gaf")


joinGO

Description

join the outputs of parseGOA and parseGOBASIC to add the GO category name and the ontology to GOA

Usage

joinGO(GOA, GO)

Arguments

GOA

output of parseGOA()

GO

output of parseGOBASIC()

Value

returns a matrix with columns c("HGNC","GO","RELATION","NAME","ONTOLOGY")

Examples

GOGOA<-joinGO(GOA,GO)
# GOGOA[1:5,]
# HGNC          GO           RELATION      NAME                    ONTOLOGY            
# [1,] "NUDT4B"      "GO:0003723" "enables"     "RNA binding"           "molecular_function"
# [2,] "NUDT4B"      "GO:0005515" "enables"     "protein binding"       "molecular_function"
# [3,] "NUDT4B"      "GO:0046872" "enables"     "metal ion binding"     "molecular_function"
# [4,] "NUDT4B"      "GO:0005829" "located_in"  "cytosol"               "cellular_component"
# v[5,] "TRBV20OR9-2" "GO:0002376" "involved_in" "immune system process" "biological_process"
# GO_NAME                              
# [1,] "GO_0003723__RNA_binding"          
# [2,] "GO_0005515__protein_binding"      
# [3,] "GO_0046872__metal_ion_binding"    
# [4,] "GO_0005829__cytosol"              
# [5,] "GO_0002376__immune_system_process"

# querying GOGOA to compute gene enrichment of some GO categories
hgncList<-GOGOA[1:1000,"HGNC"]
ontology<-"biological_process"
w<-which(GOGOA[,"ONTOLOGY"] == ontology)
GOGOA<-GOGOA[w,]
w<-which(GOGOA[,"HGNC"] %in% hgncList)
t<-sort(table(GOGOA[w,"NAME"]),decreasing=TRUE)[1:10]


parseGOA

Description

parse goa_human.gaf

Usage

parseGOA(goa)

Arguments

goa

character string path name to downloaded goa_human.gaf

Details

download goa_human.gaf from https://current.geneontology.org/products/pages/downloads.html

Value

returns matrix with columns c("HGNC","GO","RELATION")

Examples

## Not run: 
# replace my path name for goa with your own!!
# this was obtained from the download sites listed in 'details' section
GOA<-parseGOA("~/goa_human.gaf")
# GOA[1:5,]
#      HGNC          GO           RELATION     
# [1,] "NUDT4B"      "GO:0003723" "enables"    
# [2,] "NUDT4B"      "GO:0005515" "enables"    
# [3,] "NUDT4B"      "GO:0046872" "enables"    
# [4,] "NUDT4B"      "GO:0005829" "located_in" 
# [5,] "TRBV20OR9-2" "GO:0002376" "involved_in"

## End(Not run)
# here is a small example that you can run
f<-system.file("extdata","goa_human.small.gaf",package="minimalistGODB")
GOAsmall<-parseGOA(f)



parseGOBASIC

Description

parse go-basic.obo

Usage

parseGOBASIC(gobasic, verbose = FALSE)

Arguments

gobasic

character string path name to downloaded go-basic.obo

verbose

Boolean if TRUE print out some diagnostic info

Details

download go-basic.obo from https://geneontology.org/docs/download-ontology/

Value

returns a list whose components are c("m", "bp", "mf", "cc")

Examples

## Not run: 
# replace my path name for gobasic with your own!!
# this was obtained from the download sites listed in 'details' section
GO<-parseGOBASIC("~/go-basic.obo",verbose=FALSE)
# GO$bp[1:5,]
#            GO           NAME                               ONTOLOGY            
# GO:0000001 "GO:0000001" "mitochondrion inheritance"        "biological_process"
# GO:0000002 "GO:0000002" "mitochondrial genome maintenance" "biological_process"
# GO:0000011 "GO:0000011" "vacuole inheritance"              "biological_process"
# GO:0000012 "GO:0000012" "single strand break repair"       "biological_process"
# GO:0000017 "GO:0000017" "alpha-glucoside transport"        "biological_process"

## End(Not run)

# here is a small example that you can run
f<-system.file("extdata","go-basic.small.obo",package="minimalistGODB")
GOsmall<-parseGOBASIC(f)


restrictGOA

Description

restrict GO categories in GOA to those in GO

Usage

restrictGOA(GOA, GO)

Arguments

GOA

output of parseGOA()

GO

output of parseGOBASIC()

Value

returns a restricted version of GOA

Examples

GOA<-restrictGOA(GOA,GO)


subsetGOGOA

Description

split GOGOA into 3 separate ontologies

Usage

subsetGOGOA(GOGOA)

Arguments

GOGOA

return value of minimalistGODB::joinGO()

Value

returns a list containing subsets of GOGOA for each ontology, unique gene and cat lists, and stats

Examples

#load("data/GOGOAsmall.RData")
GOGOA3small<-subsetGOGOA(GOGOAsmall)