Type: | Package |
Title: | Accessing Statistics Canada Data Table and Vectors |
Version: | 0.4.3 |
Maintainer: | Jens von Bergmann <jens@mountainmath.ca> |
Description: | Searches for, accesses, and retrieves Statistics Canada data tables, as well as individual vectors, as tidy data frames. This package enriches the tables with metadata, deals with encoding issues, allows for bilingual English or French language data retrieval, and bundles convenience functions to make it easier to work with retrieved table data. For more efficient data access the package allows for caching data in a local database and database level filtering, data manipulation and summarizing. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
ByteCompile: | yes |
NeedsCompilation: | no |
LazyData: | true |
Depends: | R (≥ 4.1) |
Imports: | digest (≥ 0.6), dplyr (≥ 1.1), httr (≥ 1.0.0), tidyr (≥ 1.3), readr (≥ 2.1), rlang (≥ 1.1), stringr (≥ 1.5), purrr (≥ 1.0), tibble (≥ 3.2), arrow (≥ 18.1), DBI (≥ 1.2), RSQLite (≥ 2.3), utils (≥ 4.3), dbplyr (≥ 2.5) |
RoxygenNote: | 7.3.2 |
Suggests: | knitr, rmarkdown, ggplot2, scales, testthat (≥ 3.0.0) |
URL: | https://github.com/mountainMath/cansim, https://mountainmath.github.io/cansim/, https://www.statcan.gc.ca/ |
BugReports: | https://github.com/mountainMath/cansim/issues |
VignetteBuilder: | knitr |
Language: | en-CA |
Config/testthat/edition: | 3 |
Packaged: | 2025-05-30 19:47:25 UTC; jens |
Author: | Jens von Bergmann [aut, cre], Dmitry Shkolnik [aut] |
Repository: | CRAN |
Date/Publication: | 2025-05-30 20:00:02 UTC |
Retrieve series info for given table id and coordinates
Description
Retrieves vector information for given table and coordinates. This can be used to query data by vectors, it only returns vector information on coordinates which are present in the data table, so it gives an effective way to filter coordinates. Vector information is not available for census data tables.
Usage
add_cansim_vectors_to_template(template, refresh = FALSE)
Arguments
template |
A (possibly filtered) cansim table template as returned by 'get_cansim_table_template' |
refresh |
Refresh the data from the Statistics Canada API |
Value
a tibble containing the table template with added vector information
Examples
## Not run:
template <- get_cansim_table_template("34-10-0013")
template |>
filter(Geography=="Canada") |>
add_cansim_vectors_to_template()
## End(Not run)
Add provincial abbreviations as factor
Description
Add provincial abbreviations as factor
Usage
add_provincial_abbreviations(data)
Arguments
data |
A tibble as returned by |
Value
The input tibble with additional factor GEO.abb that contains language-specific provincial abbreviations
Examples
## Not run:
df <- get_cansim("17-10-0005")
df <- add_provincial_abbreviations(df)
## End(Not run)
Translate deprecated CANSIM table number into new NDM-format table catalogue number
Description
Returns NDM table catalogue equivalent given a standard old-format CANSIM table number
Usage
cansim_old_to_new(oldCansimTableNumber)
Arguments
oldCansimTableNumber |
deprecated style CANSIM table number (e.g. "427-0001") |
Value
A character string with the new-format NDM table number
Examples
## Not run:
cansim_old_to_new("026-0018")
## End(Not run)
Repartitions a cached cansim table to a new partitioning scheme
Description
Repartitions and already downloaded and cached parquet or feather dataset
Usage
cansim_repartition_cached_table(
cansimTableNumber,
new_partitioning = c(),
language = "english",
format = "parquet",
cache_path = getOption("cansim.cache_path")
)
Arguments
cansimTableNumber |
the NDM table number to load |
new_partitioning |
(Optional) Partition columns to use for parquet or feather formats. |
language |
|
format |
(Optional) The format of the data table to retrieve. Either |
cache_path |
(Optional) Path to where to cache the table permanently. By default, the data is cached in the path specified by 'getOption("cansim.cache_path")', if this is set. Otherwise it will use 'tempdir()'. |
Examples
## Not run:
cansim_repartition_cached_table("34-10-0013",new_partitioning=c("GeoUID"))
## End(Not run)
Use metadata to extract categories for column of specific level
Description
For tables with data with hierarchical categories, metadata containing hierarchy level descriptions is used to extract categories at a specified level of hierarchy only.
Usage
categories_for_level(
data,
column_name,
level = NA,
strict = FALSE,
remove_duplicates = TRUE
)
Arguments
data |
data table object as returned from |
column_name |
the quoted name of the column to extract categories from |
level |
the hierarchy level depth to which to extract categories, where 0 is top category |
strict |
(default |
remove_duplicates |
(default |
Value
A vector of categories
Examples
## Not run:
data <- get_cansim("16-10-0117")
categories_for_level(data,"North American Industry Classification System (NAICS)",level=2)
## End(Not run)
Collect data from a parquet, feather or sqlite query and normalize cansim table output
Description
Collect data from a parquet, feather or sqlite query and normalize cansim table output
Usage
collect_and_normalize(
connection,
replacement_value = "val_norm",
normalize_percent = TRUE,
default_month = "07",
default_day = "01",
factors = TRUE,
strip_classification_code = FALSE,
disconnect = FALSE
)
Arguments
connection |
A connection to a local arrow connection as returned by |
replacement_value |
(Optional) the name of the column the manipulated value should be returned in. Defaults to adding the 'val_norm' value field. |
normalize_percent |
(Optional) When |
default_month |
The default month that should be used when creating Date objects for annual data (default set to "07") |
default_day |
The default day of the month that should be used when creating Date objects for monthly data (default set to "01") |
factors |
(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to |
strip_classification_code |
(Optional) Logical value indicating if classification code should be stripped from names. (Default set to |
disconnect |
(Optional) Only used when format is sqlite. Logical value to indicate if the SQLite database connection should be disconnected. (Default is |
Value
A tibble with the collected and normalized data
Examples
## Not run:
library(dplyr)
con <- get_cansim_connection("34-10-0013")
data <- con %>%
filter(GEO=="Ontario") %>%
collect_and_normalize()
## End(Not run)
The correspondence file for old to new StatCan table numbers is included in the package
Description
The correspondence file for old to new StatCan table numbers is included in the package
Author(s)
Statistics Canada
References
https://www.statcan.gc.ca/eng/developers-developpeurs/cansim_id-product_id-concordance.csv
create database index
Description
create database index
Usage
create_index(connection, table_name, field)
Arguments
connection |
connection to database |
table_name |
sql table name |
field |
name of field to index |
Value
'NULL“
convert csv to arrow
Description
convert csv to arrow
Usage
csv2arrow(
csv_file,
arrow_file,
format = "parquet",
col_names,
value_column = "VALUE",
partitioning = c(),
na = c(NA, "..", "", "...", "F"),
text_encoding = "UTF-8",
delim = ","
)
Arguments
csv_file |
input csv path |
arrow_file |
output arrow database path |
format |
format of arrow file, "parquet" or "feather" (default parquet) |
col_names |
column names of the csv file |
value_column |
name of the value column with numeric data |
partitioning |
optional partition columns |
na |
na character strings |
text_encoding |
encoding of csv file (default UTF-8) |
delim |
(Optional) csv deliminator, default is "," |
Value
A database connection
convert csv to sqlite adapted from https://rdrr.io/github/coolbutuseless/csv2sqlite/src/R/csv2sqlite.R
Description
convert csv to sqlite adapted from https://rdrr.io/github/coolbutuseless/csv2sqlite/src/R/csv2sqlite.R
Usage
csv2sqlite(
csv_file,
sqlite_file,
table_name,
transform = NULL,
chunk_size = 5e+06,
append = FALSE,
col_types = NULL,
na = c(NA, "..", "", "...", "F"),
text_encoding = "UTF-8",
delim = ",",
...
)
Arguments
csv_file |
input csv path |
sqlite_file |
output sql database path |
table_name |
sql table name |
transform |
optional function that transforms each chunk |
chunk_size |
optional chunk size to read/write data, default=1,000,000 |
append |
optional parameter, append to database or overwrite, default='FALSE' |
col_types |
optional parameter for csv column types |
na |
na character strings |
text_encoding |
encoding of csv file (default UTF-8) |
delim |
(Optional) csv deliminator, default is "," |
... |
(Optional) additional parameters passed to 'readr::read_delim_chunked' |
Value
A database connection
Disconnect from a cansim database connection
Description
Disconnect from a cansim database connection
Usage
disconnect_cansim_sqlite(connection)
Arguments
connection |
connection to database |
Value
'NULL“
Examples
## Not run:
con <- get_cansim_sqlite("34-10-0013")
disconnect_cansim_sqlite(con)
## End(Not run)
Fold in metadata and for selected columns
Description
Fold in metadata and for selected columns
Usage
fold_in_metadata_for_columns(data, data_path, column_names)
Arguments
data |
A tibble with StatCan table data as e.g. returned by |
data_path |
base path to save parsed metadata |
column_names |
the names of the columns |
Value
A tibble including the metadata information
Retrieve a Statistics Canada data table using NDM catalogue number
Description
Retrieves a data table using an NDM catalogue number as a tidy data frame. Retrieved table data is cached for the duration of the current R session only by default.
Usage
get_cansim(
cansimTableNumber,
language = "english",
refresh = FALSE,
timeout = 200,
factors = TRUE,
default_month = "07",
default_day = "01"
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
factors |
(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to |
default_month |
The default month that should be used when creating Date objects for annual data (default set to "07") |
default_day |
The default day of the month that should be used when creating Date objects for monthly data (default set to "01")
Set to higher values for large tables and slow network connection. (Default is |
Value
A tibble with StatCan Table data and added Date
column with inferred date objects and
added val_norm
column with normalized value from the VALUE
column.
Examples
## Not run:
get_cansim("34-10-0013")
## End(Not run)
Retrieve a list of modified tables since a given date
Description
Retrieve a list of tables that have been modified or updated since the specified date.
Usage
get_cansim_changed_tables(start_date, end_date = NULL)
Arguments
start_date |
Starting date in |
end_date |
Optional end date in |
Value
A tibble with Statistics Canada data table product ids and their release times
Examples
## Not run:
get_cansim_changed_tables("2018-08-01")
## End(Not run)
Get NDM code sets
Description
Useful to get a list of surveys or subjects and used internally
Usage
get_cansim_code_set(
code_set = c("scalar", "frequency", "symbol", "status", "uom", "survey", "subject",
"wdsResponseStatus"),
refresh = FALSE
)
Arguments
code_set |
the code set to retrieve. |
refresh |
Default is |
Value
A tibble with english and french labels for the given code set
Examples
## Not run:
get_cansim_code_set("survey")
## End(Not run)
Retrieve Statistics Canada data table categories for a specific column
Description
Returns table column details given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.
Usage
get_cansim_column_categories(
cansimTableNumber,
column,
language = "english",
refresh = FALSE,
timeout = 200
)
Arguments
cansimTableNumber |
the NDM table number to load |
column |
the specified column for which to retrieve category information for |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
Value
A tibble with detailed information on StatCan table categories for the specified field
Examples
## Not run:
get_cansim_column_categories("34-10-0013", "Geography")
## End(Not run)
Retrieve Statistics Canada data table column list
Description
Returns table column details given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.
Usage
get_cansim_column_list(
cansimTableNumber,
language = "english",
refresh = FALSE,
timeout = 200
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
Value
A tibble listing the column names of the StatCan table.
Examples
## Not run:
get_cansim_column_list("34-10-0013")
## End(Not run)
Retrieve a Statistics Canada data table using NDM catalogue number as parquet, feather, or sqlite database connection
Description
Retrieves a data table using an NDM catalogue number as parquet, feather, or SQLite database connection. Retrieved table data is cached permanently if a cache path is supplied or for duration of the current R session. If the table is cached the function will check if a newer version is available and emit a warning message if the cached table is out of date.
Usage
get_cansim_connection(
cansimTableNumber,
language = "english",
format = "parquet",
partitioning = c(),
refresh = FALSE,
timeout = 1000,
cache_path = getOption("cansim.cache_path")
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
format |
(Optional) The format of the data table to retrieve. Either |
partitioning |
(Optional) Partition columns to use for parquet or feather formats. |
refresh |
(Optional) Valid options are |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
cache_path |
(Optional) Path to where to cache the table permanently. By default, the data is cached in the path specified by 'getOption("cansim.cache_path")', if this is set. Otherwise it will use 'tempdir()'. |
Value
A database connection to a local parquet, feather, or sqlite database with the StatCan Table data. The data frames after calling 'collect()' or 'collect_and_normalize()' are identical up to possibly different row order.
Examples
## Not run:
con <- get_cansim_connection("34-10-0013")
# Work with the data connection
glimpse(con)
## End(Not run)
Retrieve table metadata from Statistics Canada API
Description
Retrieves table metadata given an input table number or vector of table numbers using either the new or old table number format. Patience is suggested as the Statistics Canada API can be very slow. The 'list_cansim_tables()' function can be used as an alternative to retrieve a (cached) list of CANSIM tables with (more limited) metadata.
Usage
get_cansim_cube_metadata(cansimTableNumber, type = "overview", refresh = FALSE)
Arguments
cansimTableNumber |
A new or old CANSIM/NDM table number or a vector of table numbers |
type |
Which type of metadata to get, options are "overview", "members", "notes", or "corrections". |
refresh |
Refresh the data from the Statistics Canada API |
Value
a tibble containing the table metadata
Examples
## Not run:
get_cansim_cube_metadata("34-10-0013")
## End(Not run)
Retrieve data for specified Statistics Canada data product for last N periods for specific coordinates
Description
Allows for the retrieval of data for a Statistics Canada data table with specific table and coordinates.
This allows partial targeted download of tables and can be effectively combined with the get_cansim_table_template
function to help pinpoint data series of interest.
The StatCan API can only process 300 coordinates at a time,
if more than 300 coordinates are specified the function will batch the requests to the API.
Usage
get_cansim_data_for_table_coord_periods(
tableCoordinates,
periods = NULL,
language = "english",
refresh = FALSE,
timeout = 200,
factors = TRUE,
default_month = "07",
default_day = "01"
)
Arguments
tableCoordinates |
Either a list with vectors of coordinates by table number, or a
(filtered) data frame as returned by |
periods |
Optional numeric value for number of latest periods to retrieve data for, default is |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
factors |
(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to |
default_month |
The default month that should be used when creating Date objects for annual data (default set to "07") |
default_day |
The default day of the month that should be used when creating Date objects for monthly data (default set to "01") |
Value
A tibble with data matching specified coordinate and period input arguments
Examples
## Not run:
get_cansim_data_for_table_coord_periods(list("35-10-0003"=c("1.1","1.12")),periods=3)
## End(Not run)
Major economic indicator release schedule
Description
Returns every release date of major economic indicators since March 14, 2012. This also includes scheduled future releases.
Usage
get_cansim_key_release_schedule()
Value
a tibble with data, and details for major economic indicator release
Examples
## Not run:
get_cansim_key_release_schedule()
## End(Not run)
Retrieve series info for given table id and coordinates
Description
Retrieves series information by coordinates
Usage
get_cansim_series_info_cube_coord(
cansimTableNumber,
coordinates,
timeout = 1000,
refresh = FALSE
)
Arguments
cansimTableNumber |
A new or old CANSIM/NDM table number or a vector of table numbers |
coordinates |
A vector of coordinates |
timeout |
Timeout for the API call |
refresh |
Refresh the data from the Statistics Canada API |
Value
a tibble containing the table template
Examples
## Not run:
get_cansim_table_template("34-10-0013")
## End(Not run)
Retrieve a Statistics Canada data table using NDM catalogue number as SQLite database connection
Description
Retrieves a data table using an NDM catalogue number as an SQLite table. Retrieved table data is cached permanently if a cache path is supplied or for duration of the current R session. The function will check against the latest release data for the table and emit a warning message if the cached table is out of date.
Usage
get_cansim_sqlite(
cansimTableNumber,
language = "english",
refresh = FALSE,
auto_refresh = FALSE,
timeout = 1000,
cache_path = getOption("cansim.cache_path")
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
auto_refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
cache_path |
(Optional) Path to where to cache the table permanently. By default, the data is cached in the path specified by 'getOption("cansim.cache_path")', if this is set. Otherwise it will use 'tempdir()'. |
Value
A database connection to a local SQLite database with the StatCan Table data.
Examples
## Not run:
con <- get_cansim_connection("34-10-0013", format="sqlite")
# Work with the data connection
gplimpse(con)
disconnect_cansim_sqlite(con)
## End(Not run)
Retrieve Statistics Canada data table information
Description
Returns table information given an NDM table catalogue number in English or French. Retrieved table information data is cached for the duration of the R session only.
Usage
get_cansim_table_info(
cansimTableNumber,
language = "english",
refresh = FALSE,
timeout = 200
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
Value
A tibble with the table overview information
Examples
## Not run:
get_cansim_table_info("34-10-0013")
## End(Not run)
Get the latest release data for a StatCan table, if available
Description
This can be used to check when a table has last been updated.
Usage
get_cansim_table_last_release_date(cansimTableNumber)
Arguments
cansimTableNumber |
the NDM table number |
Value
A datetime object if a release data is available, NULL otherwise.
Examples
## Not run:
get_cansim_table_last_release_date("34-10-0013")
## End(Not run)
Retrieve Statistics Canada data table notes and column categories
Description
Returns table notes given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.
Usage
get_cansim_table_notes(
cansimTableNumber,
language = "en",
refresh = FALSE,
timeout = 200
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
Value
A tibble with table notes.
Examples
## Not run:
get_cansim_table_notes("34-10-0013")
## End(Not run)
Retrieve Statistics Canada data table overview text
Description
Prints table overview information as console output. In order to display table overview information, the selected CANSIM table must be loaded entirely to display overview information. Overview information is printed in console an in English or French, as specified.
Usage
get_cansim_table_overview(
cansimTableNumber,
language = "english",
refresh = FALSE
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
Value
none
Examples
## Not run:
get_cansim_table_overview("34-10-0013")
## End(Not run)
Retrieve Statistics Canada data table short notes
Description
Returns table notes given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.
Usage
get_cansim_table_short_notes(
cansimTableNumber,
language = "english",
refresh = FALSE,
timeout = 200
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
Value
A tibble with the StatCan Notes for the table
Examples
## Not run:
get_cansim_table_short_notes("34-10-0013")
## End(Not run)
Retrieve Statistics Canada data table subject detail
Description
Returns table subject detail given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.
Usage
get_cansim_table_subject(
cansimTableNumber,
language = "english",
refresh = FALSE,
timeout = 200
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
Value
A tibble with the table subject code and name.
Examples
## Not run:
get_cansim_table_subject("34-10-0013")
## End(Not run)
Retrieve Statistics Canada data table survey detail
Description
Returns table survey detail given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.
Usage
get_cansim_table_survey(
cansimTableNumber,
language = "english",
refresh = FALSE,
timeout = 200
)
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
Value
A tibble with the table survey code and name
Examples
## Not run:
get_cansim_table_survey("34-10-0013")
## End(Not run)
Retrieve table template from Statistics Canada API
Description
A table template consists of the dimensions and members and coordinates of a table that can be used to explore and filter table data before downloading subsets of the table. To add vector Ids to (a possibly filtered) template the 'add_cansim_vectors_to_template' function can be used.
Usage
get_cansim_table_template(cansimTableNumber, language = "eng", refresh = FALSE)
Arguments
cansimTableNumber |
A new or old CANSIM/NDM table number or a vector of table numbers |
language |
Language for the dimension and member names, either "eng" or "fra" |
refresh |
Refresh the data from the Statistics Canada API |
Value
a tibble containing the table template
Examples
## Not run:
get_cansim_table_template("34-10-0013")
## End(Not run)
Retrieve a Statistics Canada data table URL given a table number
Description
Retrieve URL of a table from the API given a table number. Offers a more stable approach than manually guessing the URL of the table.
Usage
get_cansim_table_url(cansimTableNumber, language = "en")
Arguments
cansimTableNumber |
the NDM table number to load |
language |
|
Value
String object containing URL for specified table number
Examples
## Not run:
get_cansim_table_url("34-10-0013")
get_cansim_table_url("34-10-0013", language = "fr")
## End(Not run)
Retrieve data for a Statistics Canada data vector released within a given time frame
Description
Allows for the retrieval of data for specified vector series for a given time window.
Accessing data by vector allows for targeted extraction of time series. Discovering vectors of interest can be achieved
using the StatCan table web interface or using get_cansim_table_template
function to help pinpoint data series of interest, and then chaining the add_cansim_vectors_to_template
function to add
cansim vector information to the template data.
The StatCan API can only process 300 coordinates at a time,
if more than 300 coordinates are specified the function will batch the requests to the API.
Usage
get_cansim_vector(
vectors,
start_time = as.Date("1800-01-01"),
end_time = Sys.time(),
use_ref_date = TRUE,
language = "english",
refresh = FALSE,
timeout = 200,
factors = TRUE,
default_month = "07",
default_day = "01"
)
Arguments
vectors |
The list of vectors to retrieve |
start_time |
Starting date in |
end_time |
Set an optional end time filter in |
use_ref_date |
Optional, |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
factors |
(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to |
default_month |
The default month that should be used when creating Date objects for annual data (default set to "07") |
default_day |
The default day of the month that should be used when creating Date objects for monthly data (default set to "01") |
Value
A tibble with data for vectors released between start and end time
Examples
## Not run:
get_cansim_vector("v41690973","2015-01-01")
## End(Not run)
Retrieve data for specified Statistics Canada data vector(s) for last N periods
Description
Allows for the retrieval of data for specified vector series for the N most-recently released periods.
Accessing data by vector allows for targeted extraction of time series. Discovering vectors of interest can be achieved
using the StatCan table web interface or using get_cansim_table_template
function to help pinpoint data series of interest, and then chaining the add_cansim_vectors_to_template
function to add
cansim vector information to the template data.
The StatCan API can only process 300 coordinates at a time,
if more than 300 coordinates are specified the function will batch the requests to the API.
Usage
get_cansim_vector_for_latest_periods(
vectors,
periods = NULL,
language = "english",
refresh = FALSE,
timeout = 200,
factors = TRUE,
default_month = "07",
default_day = "01"
)
Arguments
vectors |
The list of vectors to retrieve |
periods |
Numeric value for number of latest periods to retrieve data for, but default all data is retrieved. |
language |
|
refresh |
(Optional) When set to |
timeout |
(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection. |
factors |
(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to |
default_month |
The default month that should be used when creating Date objects for annual data (default set to "07") |
default_day |
The default day of the month that should be used when creating Date objects for monthly data (default set to "01") |
Value
A tibble with data for specified vector(s) for the last N periods
Examples
## Not run:
get_cansim_vector_for_latest_periods("v41690973",10)
## End(Not run)
Retrieve metadata for specified Statistics Canada data vectors
Description
Allows for the retrieval of metadata for Statistics Canada data vectors
Usage
get_cansim_vector_info(vectors)
Arguments
vectors |
a vector of cansim vectors |
Value
A tibble with metadata for selected vectors
Examples
## Not run:
get_cansim_vector_info("v41690973")
## End(Not run)
Get column names de-duplicated and in the correct order
Description
Get column names de-duplicated and in the correct order
Usage
get_deduped_column_level_data(cansimTableNumber, language, column)
Arguments
cansimTableNumber |
The table number |
language |
The language of the column names |
column |
The column name |
Value
A tibble with the column names
List cached cansim arrow and SQlite databases
Description
List cached cansim arrow and SQlite databases
Usage
list_cansim_cached_tables(
cache_path = getOption("cansim.cache_path"),
refresh = FALSE
)
Arguments
cache_path |
Optional, default value is 'getOption("cansim.cache_path")'. |
refresh |
Optional, refresh the last updated date of cached cansim tables |
Value
A tibble with the list of all tables that are currently cached at the given cache path.
Examples
## Not run:
list_cansim_cached_tables()
## End(Not run)
Get overview list for all Statistics Canada data cubes
Description
Generates an overview table containing metadata of available Statistics Canada data cubes.
Usage
list_cansim_cubes(lite = FALSE, refresh = FALSE, quiet = FALSE)
Arguments
lite |
Get the version without cube dimensions and comments for faster retrieval, default is |
refresh |
Default is |
quiet |
Optional, suppress messages
To refresh the code list during a running R session set to |
Value
A tibble with available Statistics Canada data cubes, including NDM table number, cube title, start and end dates, achieve status, subject and survey codes, frequency codes and a list of cube dimensions.
Examples
## Not run:
list_cansim_cubes()
## End(Not run)
List cached cansim SQLite database
Description
List cached cansim SQLite database
Usage
list_cansim_sqlite_cached_tables(
cache_path = getOption("cansim.cache_path"),
refresh = FALSE
)
Arguments
cache_path |
Optional, default value is 'getOption("cansim.cache_path")'. |
refresh |
Optional, refresh the last updated date of cached cansim tables |
Value
A tibble with the list of all tables that are currently cached at the given cache path.
Examples
## Not run:
list_cansim_cached_tables()
## End(Not run)
Get overview list for all Statistics Canada data tables (deprecated)
Description
This method is deprecated, please use 'list_cansim_cubes' instead.
Generates an overview table containing metadata of available Statistics Canada data tables. A new and updated table will be generated if this table does not already exist in cached form or if the force refresh option is selected (set to FALSE
by default). This can take some time as this process involves scraping through hundreds of Statistics Canada web pages to gather the required metadata. If option cansim.cache_path
is set it will look for and store the overview table in that directory.
Usage
list_cansim_tables(refresh = FALSE)
Arguments
refresh |
Default is |
Value
A tibble with available Statistics Canada data tables, listing title, Statistics Canada data table catalogue number, deprecated CANSIM table number, description, and geography
Examples
## Not run:
list_cansim_tables()
## End(Not run)
Normalize retrieved data table values to appropriate scales
Description
Facilitates working with Statistics Canada data table values retrieved using the package by setting all units to counts/dollars instead of millions, etc. If "replacement_value" is not set, it will replace the VALUE
field with normalized values and drop the scale
column. Otherwise it will keep the scale columns and create a new column named replacement_value with the normalized value. It will attempt to parse the REF_DATE
field and create an R date variable. This is currently experimental.
Usage
normalize_cansim_values(
data,
replacement_value = "val_norm",
normalize_percent = TRUE,
default_month = "01",
default_day = "01",
factors = TRUE,
strip_classification_code = FALSE,
cansimTableNumber = NULL,
internal = FALSE
)
Arguments
data |
A retrieved data table as returned from |
replacement_value |
(Optional) the name of the column the manipulated value should be returned in. Defaults to "val_norm" |
normalize_percent |
(Optional) When |
default_month |
The default month that should be used when creating Date objects for annual data (default set to "01") |
default_day |
The default day of the month that should be used when creating Date objects for monthly data (default set to "01") |
factors |
(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to |
strip_classification_code |
(strip_classification_code) Logical value indicating if classification code should be stripped
from names. (Default set to |
cansimTableNumber |
(Optional) Only needed when operating on results of SQLite connections. |
internal |
(Optional) Flag to indicate that this function is called internally. |
Value
Returns a tibble with with adjusted values.
Examples
## Not run:
cansim_table <- get_cansim("34-10-0013")
normalize_cansim_values(cansim_table)
## End(Not run)
Parse metadata
Description
Parse metadata
Usage
parse_metadata(meta, data_path)
Arguments
meta |
the raw metadata table |
data_path |
base path to save parsed metadata |
Remove cached cansim SQLite and parquet database
Description
Remove cached cansim SQLite and parquet database
Usage
remove_cansim_cached_tables(
cansimTableNumber,
format = c("parquet", "feather", "sqlite"),
language = NULL,
cache_path = getOption("cansim.cache_path")
)
Arguments
cansimTableNumber |
Vector of the table(s) to be removed, or a (filtered) table as returned by 'list_cansim_cached_tables' with the list of tables to be removed. |
format |
Format of cache to remove, possible values are '"parquet"', '"feather"' or '"sqlite"' or a subset of these (the default is all of these) |
language |
Language for which to remove the cached data. If unspecified ('NULL') tables for all languages will be removed. |
cache_path |
Optional, default value is 'getOption("cansim.cache_path")' |
Value
'NULL“
Examples
## Not run:
con <- get_cansim_connection("34-10-0013", format="parquet")
remove_cansim_cached_tables("34-10-0013", format="parquet")
## End(Not run)
Remove cached cansim SQLite database
Description
Remove cached cansim SQLite database
Usage
remove_cansim_sqlite_cached_table(
cansimTableNumber,
language = NULL,
cache_path = getOption("cansim.cache_path")
)
Arguments
cansimTableNumber |
Number of the table to be removed |
language |
Language for which to remove the cached data. If unspecified ('NULL') tables for all languages will be removed |
cache_path |
Optional, default value is 'getOption("cansim.cache_path")' |
Value
'NULL“
Examples
## Not run:
con <- get_cansim_connection("34-10-0013", format="sqlite")
disconnect_cansim_sqlite(con)
remove_cansim_cached_tables("34-10-0013", format="sqlite")
## End(Not run)
Search through Statistics Canada data cubes
Description
Searches through Statistics Canada data cubes using a search term.
Usage
search_cansim_cubes(search_term, refresh = FALSE)
Arguments
search_term |
User-supplied search term used to find Statistics Canada data cubes with matching titles, table numbers, subject and survey codes. |
refresh |
Default is |
Value
A tibble with available Statistics Canada data cubes, listing title, Statistics Canada data cube catalogue number, deprecated CANSIM table number, survey and subject.
Examples
## Not run:
search_cansim_cubes("Labour force")
## End(Not run)
Search through Statistics Canada data tables (deprecated)
Description
This method is deprecated, please use 'search_cansim_cubes' instead.
Searches through Statistics Canada data tables using a search term. A new table is generated if it already does not exist or if refresh option is set to TRUE
. Search-terms are case insensitive, but will accept regular expressions for more advanced searching. The search function can search either through table titles or through table descriptions, depending on the whether or not search_description
is set to TRUE
or not. If refresh = TRUE
, the table will be updated and regenerated using Statistics Canada's latest data. This can take some time since this process involves scraping through several hundred web pages to gather the required metadata. If option cache_path
is set it will look for and store the overview table in that directory.
Usage
search_cansim_tables(search_term, search_fields = "both", refresh = FALSE)
Arguments
search_term |
User-supplied search term used to find Statistics Canada data tables with matching titles |
search_fields |
By default, this function will search through table titles and keywords. Setting this parameter to "title" will only search through the title, setting it to "keyword" will only search through keywords |
refresh |
Default is |
Value
A tibble with available Statistics Canada data tables, listing title, Statistics Canada data table catalogue number, deprecated CANSIM table number, description and geography that match the search term.
Examples
## Not run:
search_cansim_tables("Labour force")
## End(Not run)
View CANSIM table or vector information in browser
Description
Opens CANSIM table or vector on Statistics Canada's website using default browser. This may be useful for getting further info on CANSIM table and survey methods.
Usage
view_cansim_webpage(cansimTableNumber = NULL)
Arguments
cansimTableNumber |
CANSIM or NDM table number or cansim vectors with "v" prefix. If no number is provided, the vector search page on the Statistic Canada website will be opened. |
Value
none
Examples
## Not run:
view_cansim_webpage("34-10-0013")
## End(Not run)