Type: Package
Title: Create and Query a Local Copy of 'GenBank' in R
Version: 2.1.5
Maintainer: Joel H. Nitta <joelnitta@gmail.com>
Description: Download large sections of 'GenBank' https://www.ncbi.nlm.nih.gov/genbank/ and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' https://CRAN.R-project.org/package=rentrez wrappers.
URL: https://github.com/ropensci/restez, https://docs.ropensci.org/restez/
BugReports: https://github.com/ropensci/restez/issues
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends: R (≥ 3.3.0)
Imports: utils, rentrez, DBI (≥ 1.0.0), curl, cli, crayon, stringi, duckdb, fs, assertthat, ape
Suggests: sessioninfo, testthat, knitr, R.utils, rmarkdown, mockery
RoxygenNote: 7.3.2
NeedsCompilation: no
Packaged: 2025-03-06 23:44:02 UTC; joelnitta
Author: Joel H. Nitta ORCID iD [aut, cre], Dom Bennett ORCID iD [aut]
Repository: CRAN
Date/Publication: 2025-03-07 00:00:02 UTC

restez: Create and Query a Local Copy of GenBank in R

Description

The restez package comes with five families of functions: setup, database, get, entrez and internal/private.

Setup functions

These functions allow a user to set the filepath for where the GenBank files should be stored, create connections and verify these settings.

Database functions

These functions download specific parts of GenBank and create the local SQL-like database.

GenBank functions

These functions allow a user to query the local SQL-like database. A user can use an NCBI accession ID to retrieve sequences or whole GenBank records.

Entrez functions

The entrez functions are wrappers to the ⁠entrez_*⁠ functions in the rentrez package. e.g the restez's entrez_fetch will first try to search the local database, if it fails it will then call rentrez's rentrez::entrez_fetch() with the same arguments.

Private/internal functions

These functions work behind the scenes to make everything work. If you're curious you can read their documentation using the form ?restez:::functionname.

Author(s)

Maintainer: Joel H. Nitta joelnitta@gmail.com (ORCID)

Authors:

See Also

Useful links:


Log files added to the SQL database in the restez path

Description

This function is called whenever sequence files have been successfully added to the nucleotide SQL database. Row entries are added to 'add_lot.tsv' in the user's restez path containing the filename, GB release numbers and the time of successful adding. The log is to help users keep track of when sequences have been added.

Usage

add_rcrd_log(fl)

Arguments

fl

filename, character

See Also

Other private: cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Cat lines

Description

Helper function for printing lines to console. Automatically formats lines by adding newlines.

Usage

cat_line(...)

Arguments

...

Text to print, character

See Also

Other private: add_rcrd_log(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Print green

Description

Print to console green text to indicate a name/filepath/text

Usage

char(x)

Arguments

x

Text to print, character

Value

coloured character encoding, character

See Also

Other private: add_rcrd_log(), cat_line(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Helper function to test if a stable internet connection can be established.

Description

All retrieval functions need a stable internet connection to work properly. This internal function pings the google homepage and throws an error if it cannot be reached.

Usage

check_connection()

Author(s)

Hajk-Georg Drost

See Also

Other private: add_rcrd_log(), cat_line(), char(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Clean up test data

Description

Removes all temporary test data created.

Usage

cleanup()

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Is restez connected?

Description

Returns TRUE if a restez SQL database has been connected.

Usage

connected()

Value

Logical

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Retrieve restez connection

Description

Safely acquire the restez connection. Raises error if no connection set.

Usage

connection_get()

Value

connection

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Return the number of ids

Description

Return the number of ids in a user's restez database.

Usage

count_db_ids(db = "nucleotide")

Arguments

db

character, database name

Details

Requires an open connection. If no connection or db 0 is returned.

Value

integer

See Also

Other database: db_create(), db_delete(), db_download(), demo_db_create(), is_in_db(), list_db_ids()

Examples

library(restez)
restez_path_set(filepath = tempdir())
demo_db_create(n = 5)
(count_db_ids())

# delete demo after example
db_delete(everything = TRUE)

Create new NCBI database

Description

Create a new local SQL database from downloaded files. Currently only GenBank/nucleotide/nuccore database is supported.

Usage

db_create(
  db_type = "nucleotide",
  min_length = 0,
  max_length = NULL,
  acc_filter = NULL,
  invert = FALSE,
  alt_restez_path = NULL,
  scan = FALSE
)

Arguments

db_type

character, database type

min_length

Minimum sequence length, default 0.

max_length

Maximum sequence length, default NULL.

acc_filter

Character vector; accessions to include or exclude from the database as specified by invert.

invert

Logical vector of length 1; if TRUE, accessions in acc_filter will be excluded from the database; if FALSE, only accessions in acc_filter will be included in the database. Default FALSE.

alt_restez_path

Alternative restez path if you would like to use the downloads from a different restez path.

scan

Logical vector of length 1; should the sequence file be scanned for accessions in acc_filter prior to processing? Requires zgrep to be installed (so does not work on Windows). Only used if acc_filter is not NULL and invert is FALSE. Default FALSE.

Details

All .seq.gz files are added to the database by default. A user can specify minimum/maximum sequence lengths or accession numbers to limit the sequences to be added to the database – smaller databases are faster to search. The final selection of sequences is the result of applying all filters (acc_filter, min_length, max_length) in combination.

The scan option can decrease the time needed to build a database if only a small number of sequences should be written to the database compared to the number of the sequences downloaded from GenBank; i.e., if many of the files downloaded from GenBank do not contain any sequences that should be written to the database. When set to TRUE, if a file does not contain any of the accessions in acc_filter, further processing of that file will be skipped and none of the sequences it contains will be added to the database.

Alternatively, a user can use the alt_restez_path to add the files from an alternative restez file path. For example, you may wish to have a database of all environmental sequences but then an additional smaller one of just the sequences with lengths below 100 bp. Instead of having to download all environmental sequences twice, you can generate multiple restez databases using the same downloaded files from a single restez path.

This function will not overwrite a pre-existing database. Old databases must be deleted before a new one can be created. Use db_delete() with everything=FALSE to delete an SQL database.

Connections/disconnections to the database are made automatically.

See Also

Other database: count_db_ids(), db_delete(), db_download(), demo_db_create(), is_in_db(), list_db_ids()

Examples

## Not run: 
# Example of general usage
library(restez)
restez_path_set(filepath = 'path/for/downloads/and/database')
db_download()
db_create()

# Example of using `acc_filter`
#
# Download files to temporary directory
temp_dir <- paste0(tempdir(), "/restez", collapse = "")
dir.create(temp_dir)
restez_path_set(filepath = temp_dir)
# Choose GenBank domain 20 ('unannotated'), the smallest
db_download(preselection = 20)
# Only include three accessions in database
db_create(
  acc_filter = c("AF000122", "AF000123", "AF000124")
)
list_db_ids()
db_delete()
unlink(temp_dir)

## End(Not run)

Delete database

Description

Delete the local SQL database and/or restez folder.

Usage

db_delete(everything = FALSE)

Arguments

everything

T/F, delete the whole restez folder as well?

Details

Any connected database will be automatically disconnected.

See Also

Other database: count_db_ids(), db_create(), db_download(), demo_db_create(), is_in_db(), list_db_ids()

Examples

library(restez)
fp <- tempdir()
restez_path_set(filepath = fp)
demo_db_create(n = 10)
db_delete(everything = FALSE)
# Will not run: gb_sequence_get(id = 'demo_1')
# only the SQL database is deleted
db_delete(everything = TRUE)
# Now returns NULL
(restez_path_get())

Download database

Description

Download .seq.tar files from the latest GenBank release.

Usage

db_download(
  db = "nucleotide",
  overwrite = FALSE,
  preselection = NULL,
  max_tries = 1
)

Arguments

db

Database type, only 'nucleotide' currently available.

overwrite

T/F, overwrite pre-existing downloaded files?

preselection

Character vector of length 1; GenBank domains to download. If not specified (default), a menu will be provided for selection. To specify, provide either a single number or a single character string of numbers separated by spaces, e.g. "19 20" for 'Phage' (19) and 'Unannotated' (20).

max_tries

Numeric vector of length 1; maximum number of times to attempt downloading database (default 1).

Details

In default mode, the user interactively selects the parts (i.e., "domains") of GenBank to download (e.g. primates, plants, bacteria ...). Alternatively, the selected domains can be provided as a character string to preselection.

The max_tries argument is useful for large databases that may otherwise fail due to periodic lapses in internet connectivity. This value can be set to Inf to continuously try until the database download succeeds (not recommended if you do not have an internet connection!).

Value

T/F, if all files download correctly, TRUE else FALSE.

See Also

ncbi_acc_get()

Other database: count_db_ids(), db_create(), db_delete(), demo_db_create(), is_in_db(), list_db_ids()

Examples

## Not run: 
library(restez)
restez_path_set(filepath = 'path/for/downloads')
db_download()

## End(Not run)

Download database (internal version)

Description

Download .seq.tar files from the latest GenBank release. The user interactively selects the parts of GenBank to download (e.g. primates, plants, bacteria ...). This is an internal function so the download can be wrapped in ⁠while()⁠ to enable persistent downloading.

Usage

db_download_intern(db = "nucleotide", overwrite = FALSE, preselection = NULL)

Arguments

db

Database type, only 'nucleotide' currently available.

overwrite

T/F, overwrite pre-existing downloaded files?

preselection

Character vector of length 1; GenBank domains to download. If not specified (default), a menu will be provided for selection. To specify, provide either a single number or a single character string of numbers separated by spaces, e.g. "19 20" for 'Phage' (19) and 'Unannotated' (20).

Details

The downloaded files will appear in the restez filepath under downloads.

Value

T/F, if all files download correctly, TRUE else FALSE.

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Return the minimum and maximum sequence lengths in db

Description

Returns the maximum and minimum sequence lengths as set by the user upon db creation.

Usage

db_sqlngths_get()

Details

If no file found, returns empty character vector.

Value

vector of integers

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Log the min and max sequence lengths

Description

Log the min and maximum sequence length used in the created db.

Usage

db_sqlngths_log(min_lngth, max_lngth)

Arguments

min_lngth

Minimum length

max_lngth

Maximum length

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Create demo database

Description

Creates a local mock SQL database from package test data for demonstration purposes. No internet connection required.

Usage

demo_db_create(db_type = "nucleotide", n = 100)

Arguments

db_type

character, database type

n

integer, number of mock sequences

See Also

Other database: count_db_ids(), db_create(), db_delete(), db_download(), is_in_db(), list_db_ids()

Examples

library(restez)
# set the restez path to a temporary dir
restez_path_set(filepath = tempdir())
# create demo database
demo_db_create(n = 5)
# in the demo, IDs are 'demo_1', 'demo_2' ...
(gb_sequence_get(id = 'demo_1'))

# Delete a demo database after an example
db_delete(everything = TRUE)

Calculate the size of a directory

Description

Returns the size of directory in GB

Usage

dir_size(fp)

Arguments

fp

File path, character

Value

numeric

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Get dwnld path

Description

Return path to folder where raw .seq files are stored.

Usage

dwnld_path_get()

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Log a downloaded file in the restez path

Description

This function is called whenever a file is successfully downloaded. A row entry is added to the 'download_log.tsv' in the user's restez path containing the file name, the GB release number and the time of successfully download. The log is to help users keep track of when they downloaded files and to determine if the downloaded files are out of date.

Usage

dwnld_rcrd_log(fl)

Arguments

fl

file name, character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Get Entrez fasta

Description

Return fasta format as expected from an Entrez call. If not all IDs are returned, will run rentrez::entrez_fetch.

Usage

entrez_fasta_get(id, ...)

Arguments

id

vector, unique ID(s) for record(s)

...

arguments passed on to rentrez

Value

character string containing the file created

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Entrez fetch

Description

Wrapper for rentrez::entrez_fetch.

Usage

entrez_fetch(db, id = NULL, rettype, retmode = "", ...)

Arguments

db

character, name of the database

id

vector, unique ID(s) for record(s)

rettype

character, data format

retmode

character, data mode

...

Arguments to be passed on to rentrez

Details

Attempts to first search local database with user-specified parameters, if the record is missing in the database, the function then calls rentrez::entrez_fetch to search GenBank remotely.

rettype='fasta' and rettype='gb' are respectively equivalent to gb_fasta_get() and gb_record_get().

Value

character string containing the file created

Supported return types and modes

XML retmode is not supported. Rettypes 'seqid', 'ft', 'acc' and 'uilist' are also not supported.

Note

It is advisable to call restez and rentrez functions with '::' notation rather than library() calls to avoid namespace issues. e.g. restez::entrez_fetch().

See Also

rentrez::entrez_fetch()

Examples

library(restez)
restez_path_set(tempdir())
demo_db_create(n = 5)
# return fasta record
fasta_res <- entrez_fetch(db = 'nucleotide',
                          id = c('demo_1', 'demo_2'),
                          rettype = 'fasta')
cat(fasta_res)
# return whole GB record in text format
gb_res <- entrez_fetch(db = 'nucleotide',
                       id = c('demo_1', 'demo_2'),
                       rettype = 'gb')
cat(gb_res)
# NOT RUN
# whereas these request would go through rentrez
# fasta_res <- entrez_fetch(db = 'nucleotide',
#                           id = c('S71333', 'S71334'),
#                           rettype = 'fasta')
# gb_res <- entrez_fetch(db = 'nucleotide',
#                        id = c('S71333', 'S71334'),
#                        rettype = 'gb')

# delete demo after example
db_delete(everything = TRUE)

Get Entrez GenBank record

Description

Return gb and gbwithparts format as expected from an Entrez call. If not all IDs are returned, will run rentrez::entrez_fetch.

Usage

entrez_gb_get(id, ...)

Arguments

id

vector, unique ID(s) for record(s)

...

arguments passed on to rentrez

Value

character string containing the file created

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract accession

Description

Return accession ID from GenBank record

Usage

extract_accession(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract by keyword

Description

Search through GenBank record for a keyword and return text up to the end_pattern.

Usage

extract_by_patterns(record, start_pattern, end_pattern = "\n")

Arguments

record

GenBank record in text format, character

start_pattern

REGEX pattern indicating the point to start extraction, character

end_pattern

REGEX pattern indicating the point to stop extraction, character

Details

The start_pattern should be any of the capitalized elements in a GenBank record (e.g. LOCUS, DESCRIPTION, ACCESSION). The end_pattern depends on how much of the selected element a user wants returned. By default, the extraction will stop at the next newline. If keyword or end pattern not found, returns NULL.

Value

character or NULL

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract clean sequence from sequence part

Description

Return clean sequence from seqrecpart of a GenBank record

Usage

extract_clean_sequence(seqrecpart, max_len = 1e+08)

Arguments

seqrecpart

Sequence part of a GenBank record, character

max_len

Number: maximum number of characters allowed in a single record before splitting the record into parts. Does not affect output, but only internal calculations, so generally should not be changed. Default = 1e8.

Details

If element is not found, ” returned.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract definition

Description

Return definition from GenBank record.

Usage

extract_definition(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract features

Description

Return feature table as list from GenBank record

Usage

extract_features(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, empty list returned.

Value

list of lists

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract the information record part

Description

Return information part from GenBank record

Usage

extract_inforecpart(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract keywords

Description

Return keywords as list from GenBank record

Usage

extract_keywords(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

character vector

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract locus

Description

Return locus information from GenBank record

Usage

extract_locus(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

named character vector

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract organism

Description

Return organism name from GenBank record

Usage

extract_organism(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract the sequence record part

Description

Return sequence part from GenBank record

Usage

extract_seqrecpart(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract sequence

Description

Return sequence from GenBank record

Usage

extract_sequence(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract version

Description

Return accession + version ID from GenBank record

Usage

extract_version(record)

Arguments

record

GenBank record in text format, character

Details

If element is not found, ” returned.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Download a file

Description

Download a GenBank .seq.tar file. Check the file has downloaded properly. If not, returns FALSE. If overwrite is true, any previous file will be overwritten.

Usage

file_download(fl, overwrite = FALSE)

Arguments

fl

character, base filename (e.g. gbpri9.seq) to be downloaded

overwrite

T/F

Value

T/F

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Write filenames to log files

Description

Record a filename in a log file along with GB release and time.

Usage

filename_log(fl, fp)

Arguments

fl

file name, character

fp

filepath to log file, character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Read flatfile sequence records

Description

Read records from a .seq file.

Usage

flatfile_read(flpth)

Arguments

flpth

Path to .seq file

Value

list of GenBank records in text format

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Read and add .seq files to database

Description

Given a list of seq_files, read and add the contents of the files to a SQL-like database. If any errors during the process, FALSE is returned.

Usage

gb_build(
  dpth,
  seq_files,
  max_length,
  min_length,
  acc_filter = NULL,
  invert = FALSE,
  scan = FALSE
)

Arguments

dpth

Download path (where seq_files are stored)

seq_files

.seq.tar seq file names

max_length

Maximum sequence length, default NULL.

min_length

Minimum sequence length, default 0.

acc_filter

Character vector; accessions to include or exclude from the database as specified by invert.

invert

Logical vector of length 1; if TRUE, accessions in acc_filter will be excluded from the database; if FALSE, only accessions in acc_filter will be included in the database. Default FALSE.

scan

Logical vector of length 1; should the sequence file be scanned for accessions in acc_filter prior to processing? Requires zgrep to be installed (so does not work on Windows). Only used if acc_filter is not NULL and invert is FALSE. Default FALSE.

Details

This function will automatically connect to the restez database.

Value

Logical

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Get definition from GenBank

Description

Return the definition line for an accession ID.

Usage

gb_definition_get(id)

Arguments

id

character, sequence accession ID(s)

Value

named vector of definitions, if no results found NULL

See Also

ncbi_acc_get()

Other get: gb_fasta_get(), gb_organism_get(), gb_record_get(), gb_sequence_get(), gb_version_get()

Examples

library(restez)
restez_path_set(filepath = tempdir())
demo_db_create(n = 5)
(def <- gb_definition_get(id = 'demo_1'))
(defs <- gb_definition_get(id = c('demo_1', 'demo_2')))


# delete demo after example
db_delete(everything = TRUE)

Create GenBank data.frame

Description

Make data.frame from columns vectors for nucleotide entries. As part of gb_df_generate().

Usage

gb_df_create(accessions, versions, organisms, definitions, sequences, records)

Arguments

accessions

character, vector of accessions

versions

character, vector of accessions + versions

organisms

character, vector of organism names

definitions

character, vector of sequence definitions

sequences

character, vector of sequences

records

character, vector of GenBank records in text format

Value

data.frame

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Generate GenBank records data.frame

Description

For a list of records, construct a data.frame for insertion into SQL database.

Usage

gb_df_generate(
  records,
  min_length = 0,
  max_length = NULL,
  acc_filter = NULL,
  invert = FALSE
)

Arguments

records

character, vector of GenBank records in text format

min_length

Minimum sequence length, default 0.

max_length

Maximum sequence length, default NULL.

acc_filter

Character vector; accessions to include or exclude from the database as specified by invert.

invert

Logical vector of length 1; if TRUE, accessions in acc_filter will be excluded from the database; if FALSE, only accessions in acc_filter will be included in the database. Default FALSE.

Details

The resulting data.frame has five columns: accession, organism, raw_definition, raw_sequence, raw_record. The prefix 'raw_' indicates the data has been converted to the raw format, see ?charToRaw, in order to save on RAM. The raw_record contains the entire GenBank record in text format.

Use acc_filter and max and min sequence lengths to minimize the size of the database. All sequences have to be at least as long as min and less than or equal in length to max, unless max is NULL in which there is no maximum length. The final selection of sequences is the result of applying all filters (acc_filter, min_length, max_length) in combination.

Value

data.frame, or NULL if no records pass filters

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Extract elements of a GenBank record

Description

Return elements of GenBank record e.g. sequence, definition ...

Usage

gb_extract(
  record,
  what = c("accession", "version", "organism", "sequence", "definition", "locus",
    "features", "keywords")
)

Arguments

record

GenBank record in text format, character

what

Which element to extract

Details

This function uses a REGEX to extract particular elements of a GenBank record. All of the what options return a single character with the exception of 'locus' or 'keywords' that return character vectors and 'features' that returns a list of lists for all features.

The accuracy of these functions cannot be guaranteed due to the enormity of the GenBank database. But the function is regularly tested on a range of GenBank records.

Note: all non-latin1 characters are converted to '-'.

Value

character or list of lists (what='features') or named character vector (what='locus')

Examples

library(restez)
data('record')
(gb_extract(record = record, what = 'locus'))

Get fasta from GenBank

Description

Get sequence and definition data in FASTA format. Equivalent to rettype='fasta' in rentrez::entrez_fetch().

Usage

gb_fasta_get(id, width = 70)

Arguments

id

character, sequence accession ID(s)

width

integer, maximum number of characters in a line

Value

named vector of fasta sequences, if no results found NULL

See Also

ncbi_acc_get()

Other get: gb_definition_get(), gb_organism_get(), gb_record_get(), gb_sequence_get(), gb_version_get()

Examples

library(restez)
restez_path_set(filepath = tempdir())
demo_db_create(n = 5)
(fasta <- gb_fasta_get(id = 'demo_1'))
(fastas <- gb_fasta_get(id = c('demo_1', 'demo_2')))


# delete demo after example
db_delete(everything = TRUE)

Get organism from GenBank

Description

Return the organism name for an accession ID.

Usage

gb_organism_get(id)

Arguments

id

character, sequence accession ID(s)

Value

named vector of definitions, if no results found NULL

See Also

ncbi_acc_get()

Other get: gb_definition_get(), gb_fasta_get(), gb_record_get(), gb_sequence_get(), gb_version_get()

Examples

library(restez)
restez_path_set(filepath = tempdir())
demo_db_create(n = 5)
(org <- gb_organism_get(id = 'demo_1'))
(orgs <- gb_organism_get(id = c('demo_1', 'demo_2')))


# delete demo after example
db_delete(everything = TRUE)

Get record from GenBank

Description

Return the entire GenBank record for an accession ID. Equivalent to rettype='gb' in rentrez::entrez_fetch().

Usage

gb_record_get(id)

Arguments

id

character, sequence accession ID(s)

Value

named vector of records, if no results found NULL

See Also

ncbi_acc_get()

Other get: gb_definition_get(), gb_fasta_get(), gb_organism_get(), gb_sequence_get(), gb_version_get()

Examples

library(restez)
restez_path_set(filepath = tempdir())
demo_db_create(n = 5)
(rec <- gb_record_get(id = 'demo_1'))
(recs <- gb_record_get(id = c('demo_1', 'demo_2')))


# delete demo after example
db_delete(everything = TRUE)

Get sequence from GenBank

Description

Return the sequence(s) for a record(s) from the accession ID(s).

Usage

gb_sequence_get(id, dnabin = FALSE)

Arguments

id

character, sequence accession ID(s)

dnabin

Logical vector of length 1; should the sequences be returned using the bit-level coding scheme of the ape package? Default FALSE.

Details

For more information about the dnabin format, see ape::DNAbin().

Value

named vector of sequences, if no results found NULL

See Also

ncbi_acc_get()

Other get: gb_definition_get(), gb_fasta_get(), gb_organism_get(), gb_record_get(), gb_version_get()

Examples

library(restez)
restez_path_set(filepath = tempdir())
demo_db_create(n = 5)
(seq <- gb_sequence_get(id = 'demo_1'))
(seqs <- gb_sequence_get(id = c('demo_1', 'demo_2')))
(fasta_dnabin <- gb_sequence_get(id = 'demo_1', dnabin = TRUE))

# delete demo after example
db_delete(everything = TRUE)


Add to GenBank SQL database

Description

Add records data.frame to SQL-like database.

Usage

gb_sql_add(df)

Arguments

df

Records data.frame

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Query the GenBank SQL

Description

Generic query function for retrieving data from the SQL database for the get functions.

Usage

gb_sql_query(nm, id)

Arguments

nm

character, column name

id

character, sequence accession ID(s)

Value

data.frame

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Get version from GenBank

Description

Return the accession version for an accession ID.

Usage

gb_version_get(id)

Arguments

id

character, sequence accession ID(s)

Value

named vector of versions, if no results found NULL

See Also

ncbi_acc_get()

Other get: gb_definition_get(), gb_fasta_get(), gb_organism_get(), gb_record_get(), gb_sequence_get()

Examples

library(restez)
restez_path_set(filepath = tempdir())
demo_db_create(n = 5)
(ver <- gb_version_get(id = 'demo_1'))
(vers <- gb_version_get(id = c('demo_1', 'demo_2')))


# delete demo after example
db_delete(everything = TRUE)



Check if the last GenBank release number is the latest

Description

Returns TRUE if the GenBank release number is the most recent GenBank release available.

Usage

gbrelease_check()

Value

logical

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Get the GenBank release number in the restez path

Description

Returns the GenBank release number. Returns empty character if none found.

Usage

gbrelease_get()

Details

If no file found, returns empty character vector.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Log the GenBank release number in the restez path

Description

This function is called whenever db_download is run. It logs the GB release number in the 'gb_release.txt' in the user's restez path. The log is to help users keep track of whether their database if out of date.

Usage

gbrelease_log(release)

Arguments

release

GenBank release number, character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Does the connected database have data?

Description

Returns TRUE if a restez SQL database has data.

Usage

has_data()

Value

Logical

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Identify downloadable files

Description

Searches through the release notes for a GenBank release to find all listed .seq files. Returns a data.frame for all .seq files and their description.

Usage

identify_downloadable_files()

Value

data.frame

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Is in db

Description

Determine whether an id(s) is/are present in a database.

Usage

is_in_db(id, db = "nucleotide")

Arguments

id

character, sequence accession ID(s)

db

character, database name

Value

named vector of booleans

See Also

Other database: count_db_ids(), db_create(), db_delete(), db_download(), demo_db_create(), list_db_ids()

Examples

library(restez)
# set the restez path to a temporary dir
restez_path_set(filepath = tempdir())
# create demo database
demo_db_create(n = 5)
# in the demo, IDs are 'demo_1', 'demo_2' ...
ids <- c('thisisnotanid', 'demo_1', 'demo_2')
(is_in_db(id = ids))


# delete demo after example
db_delete(everything = TRUE)

Return date and time of the last added sequence

Description

Return the date and time of the last added sequence as determined using the 'add_log.tsv'.

Usage

last_add_get()

Details

If no file found, returns empty character vector.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Return date and time of the last download

Description

Return the date and time of the last download as determined using the 'download_log.tsv'.

Usage

last_dwnld_get()

Details

If no file found, returns empty character vector.

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Return the last entry

Description

Return the last entry from a tab-delimited log file.

Usage

last_entry_get(fp)

Arguments

fp

Filepath, character

Value

vector

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Retrieve latest GenBank release number

Description

Downloads the latest GenBank release number and returns it.

Usage

latest_genbank_release()

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Download the latest GenBank Release Notes

Description

Downloads the latest GenBank release notes to a user's restez download path.

Usage

latest_genbank_release_notes()

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


List database IDs

Description

Return a vector of all IDs in a database.

Usage

list_db_ids(db = "nucleotide", n = 100)

Arguments

db

character, database name

n

Maximum number of IDs to return, if NULL returns all

Details

Warning: can return very large vectors for large databases.

Value

vector of characters

See Also

Other database: count_db_ids(), db_create(), db_delete(), db_download(), demo_db_create(), is_in_db()

Examples

library(restez)
restez_path_set(filepath = tempdir())
demo_db_create(n = 5)
# Warning: not recommended for real databases
#  with potentially millions of IDs
all_ids <- list_db_ids()


# What shall we do with these IDs?
# ... how about make a mock fasta file
seqs <- gb_sequence_get(id = all_ids)
defs <- gb_definition_get(id = all_ids)
# paste together
fasta_seqs <- paste0('>', defs, '\n', seqs)
fasta_file <- paste0(fasta_seqs, collapse = '\n')
cat(fasta_file)


# delete after example
db_delete(everything = TRUE)

Produce message of missing IDs

Description

Sends message to console stating number of missing IDs.

Usage

message_missing(n)

Arguments

n

Number of missing IDs

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Mock def

Description

Make a mock sequence definition. Designed to be part of a loop.

Usage

mock_def(i)

Arguments

i

integer, iterator

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Generate mock GenBank records data.frame

Description

Make a mock nucleotide data.frame for entry into a demonstration SQL database.

Usage

mock_gb_df_generate(n)

Arguments

n

integer, number of entries

Value

data.frame

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Mock org

Description

Make a mock sequence organism. Designed to be part of a loop.

Usage

mock_org(i)

Arguments

i

integer, iterator

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Mock rec

Description

Create a mock GenBank record for demo-ing and testing purposes. Designed to be part of a loop. Accession, organism... etc. are optional arguments.

Usage

mock_rec(
  i,
  definition = NULL,
  accession = NULL,
  version = NULL,
  organism = NULL,
  sequence = NULL
)

Arguments

i

integer, iterator

definition

character

accession

character

version

character

organism

character

sequence

character

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Mock seq

Description

Make a mock sequence. Designed to be part of a loop.

Usage

mock_seq(i, sqlngth = 10)

Arguments

i

integer, iterator

sqlngth

integer, sequence length

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Get accession numbers by querying NCBI GenBank

Description

The query string can be formatted using GenBank advanced query terms to obtain accession numbers corresponding to a specific set of criteria.

Usage

ncbi_acc_get(query, strict = TRUE, drop_ver = TRUE)

Arguments

query

Character vector of length 1; query string to search GenBank.

strict

Logical vector of length 1; should an error be issued if the number of unique accessions retrieved does not match the number of hits from GenBank? Default TRUE.

drop_ver

Logical vector of length 1; should the version part of the accession number (e.g., '.1' in 'AB001538.1') be dropped? Default TRUE.

Details

Note this queries NCBI GenBank, not the local database generated with restez.

It can be used either to restrict the accessions used to construct the local database (acc_filter argument of db_create()) or to specify accessions to read from the local database (id argument of gb_fasta_get() and other gb_*_get() functions).

Value

Character vector; accession numbers resulting from query.

See Also

db_create(), gb_fasta_get()

Examples

## Not run: 
  # requires an internet connection
  cmin_accs <- ncbi_acc_get("Crepidomanes minutum")
  length(cmin_accs)
  head(cmin_accs)

## End(Not run)

Print file size predictions to screen

Description

Predicts the file sizes of the downloads and the database from the GenBank filesize information. Conversion factors are based on previous restez downloads.

Usage

predict_datasizes(uncompressed_filesize)

Arguments

uncompressed_filesize

GBs of the stated filesize, numeric

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Print method for status class

Description

Prints to screen the three sections of the status class. Not meant to be used interactively.

Usage

## S3 method for class 'status'
print(x, ...)

Arguments

x

Status object

...

Other arguments (not used by this function)


Create README in restez_path

Description

Write notes for the curious sorts who peruse the restez_path.

Usage

readme_log()

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Example GenBank record

Description

Example GenBank record in text format for demonstration purposes.

Usage

data("record")

Format

A large character object containing record information and DNA sequence.

Source

https://www.ncbi.nlm.nih.gov/nuccore/AY952423.1

References

GenBank

Examples

data(record)
cat(record)

Connect to the restez database

Description

Sets a connection to the local database.

Usage

restez_connect(read_only = FALSE)

Arguments

read_only

Logical; should the connection be made in read-only mode? Read-only mode is required for multiple R processes to access the database simultaneously. Default FALSE.

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Disconnect from restez database

Description

Safely disconnect from the restez connection

Usage

restez_disconnect()

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Check restez filepath

Description

Raises error if restez path does not exist.

Usage

restez_path_check()

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Get restez path

Description

Return filepath to where the restez database is stored.

Usage

restez_path_get()

Value

character

See Also

Other setup: restez_path_set(), restez_path_unset(), restez_ready(), restez_status()

Examples

library(restez)
# set a restez path with a tempdir
restez_path_set(filepath = tempdir())
# check what the set path is
(restez_path_get())

Set restez path

Description

Specify the filepath for the local GenBank database.

Usage

restez_path_set(filepath)

Arguments

filepath

character, valid filepath to the folder where the database should be stored.

Details

Adds 'restez_path' to options(). In this path the folder 'restez' will be created and all downloaded and database files will be stored there.

See Also

Other setup: restez_path_get(), restez_path_unset(), restez_ready(), restez_status()

Examples

## Not run: 
library(restez)
restez_path_set(filepath = 'path/to/where/you/want/files/to/download')

## End(Not run)

Unset restez path

Description

Set the restez path to NULL

Usage

restez_path_unset()

See Also

Other setup: restez_path_get(), restez_path_set(), restez_ready(), restez_status()


Is restez ready?

Description

Returns TRUE if a restez SQL database is available. Use restez_status() for more information.

Usage

restez_ready()

Value

Logical

See Also

Other setup: restez_path_get(), restez_path_set(), restez_path_unset(), restez_status()

Examples

library(restez)
fp <- tempdir()
restez_path_set(filepath = fp)
demo_db_create(n = 5)
(restez_ready())
db_delete(everything = TRUE)
(restez_ready())

Restez readline

Description

Wrapper for base readline.

Usage

restez_rl(prompt)

Arguments

prompt

character, display text

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Check restez status

Description

Report to console current setup status of restez.

Usage

restez_status(gb_check = FALSE)

Arguments

gb_check

Check whether last download was from latest GenBank release? Default FALSE.

Details

Set gb_check=TRUE to see if your downloads are up-to-date.

Value

Status class

See Also

Other setup: restez_path_get(), restez_path_set(), restez_path_unset(), restez_ready()

Examples

library(restez)
fp <- tempdir()
restez_path_set(filepath = fp)
demo_db_create(n = 5)
restez_status()
db_delete(everything = TRUE)
# Errors:
# restez_status()

Scan a gzipped file for text

Description

Scans a zipped file for text strings and returns TRUE if any are present.

Usage

search_gz(terms, path)

Arguments

terms

Character vector; search terms (most likely GenBank accession numbers)

path

Path to the gzipped file to scan

Value

Logical

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Log the system session information in restez path

Description

Records the session and system information to file.

Usage

seshinfo_log()

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Set up test common test data

Description

Creates temporary test folders.

Usage

setup()

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Retrieve GenBank selections made by user

Description

Returns the selections made by the user.

Usage

slctn_get()

Details

If no file found, returns empty character vector.

Value

character vector

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_log(), sql_path_get(), stat(), status_class(), testdatadir_get()


Log the GenBank selection made by a user

Description

This function is called whenever a user makes a selection with the db_download(). It records GenBank numbers selections.

Usage

slctn_log(selection)

Arguments

selection

selected GenBank sequences, named vector

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), sql_path_get(), stat(), status_class(), testdatadir_get()


Get SQL path

Description

Return path to where SQL database is stored.

Usage

sql_path_get()

Value

character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), stat(), status_class(), testdatadir_get()


Print blue

Description

Print to console blue text to indicate a number/statistic.

Usage

stat(...)

Arguments

...

Any number of text arguments to print, character

Value

coloured character encoding, character

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), status_class(), testdatadir_get()


Generate a list class for storing status information

Description

Creates a three-part list for holding information on the status of the restez file path.

Usage

status_class()

Value

Status class

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), testdatadir_get()


Get test data directory

Description

Get the folder containing test data.

Usage

testdatadir_get()

See Also

Other private: add_rcrd_log(), cat_line(), char(), check_connection(), cleanup(), connected(), connection_get(), db_download_intern(), db_sqlngths_get(), db_sqlngths_log(), dir_size(), dwnld_path_get(), dwnld_rcrd_log(), entrez_fasta_get(), entrez_gb_get(), extract_accession(), extract_by_patterns(), extract_clean_sequence(), extract_definition(), extract_features(), extract_inforecpart(), extract_keywords(), extract_locus(), extract_organism(), extract_seqrecpart(), extract_sequence(), extract_version(), file_download(), filename_log(), flatfile_read(), gb_build(), gb_df_create(), gb_df_generate(), gb_sql_add(), gb_sql_query(), gbrelease_check(), gbrelease_get(), gbrelease_log(), has_data(), identify_downloadable_files(), last_add_get(), last_dwnld_get(), last_entry_get(), latest_genbank_release(), latest_genbank_release_notes(), message_missing(), mock_def(), mock_gb_df_generate(), mock_org(), mock_rec(), mock_seq(), predict_datasizes(), readme_log(), restez_connect(), restez_disconnect(), restez_path_check(), restez_rl(), search_gz(), seshinfo_log(), setup(), slctn_get(), slctn_log(), sql_path_get(), stat(), status_class()