Type: Package
Title: Accessing Statistics Canada Data Table and Vectors
Version: 0.4.3
Maintainer: Jens von Bergmann <jens@mountainmath.ca>
Description: Searches for, accesses, and retrieves Statistics Canada data tables, as well as individual vectors, as tidy data frames. This package enriches the tables with metadata, deals with encoding issues, allows for bilingual English or French language data retrieval, and bundles convenience functions to make it easier to work with retrieved table data. For more efficient data access the package allows for caching data in a local database and database level filtering, data manipulation and summarizing.
License: MIT + file LICENSE
Encoding: UTF-8
ByteCompile: yes
NeedsCompilation: no
LazyData: true
Depends: R (≥ 4.1)
Imports: digest (≥ 0.6), dplyr (≥ 1.1), httr (≥ 1.0.0), tidyr (≥ 1.3), readr (≥ 2.1), rlang (≥ 1.1), stringr (≥ 1.5), purrr (≥ 1.0), tibble (≥ 3.2), arrow (≥ 18.1), DBI (≥ 1.2), RSQLite (≥ 2.3), utils (≥ 4.3), dbplyr (≥ 2.5)
RoxygenNote: 7.3.2
Suggests: knitr, rmarkdown, ggplot2, scales, testthat (≥ 3.0.0)
URL: https://github.com/mountainMath/cansim, https://mountainmath.github.io/cansim/, https://www.statcan.gc.ca/
BugReports: https://github.com/mountainMath/cansim/issues
VignetteBuilder: knitr
Language: en-CA
Config/testthat/edition: 3
Packaged: 2025-05-30 19:47:25 UTC; jens
Author: Jens von Bergmann [aut, cre], Dmitry Shkolnik [aut]
Repository: CRAN
Date/Publication: 2025-05-30 20:00:02 UTC

Retrieve series info for given table id and coordinates

Description

Retrieves vector information for given table and coordinates. This can be used to query data by vectors, it only returns vector information on coordinates which are present in the data table, so it gives an effective way to filter coordinates. Vector information is not available for census data tables.

Usage

add_cansim_vectors_to_template(template, refresh = FALSE)

Arguments

template

A (possibly filtered) cansim table template as returned by 'get_cansim_table_template'

refresh

Refresh the data from the Statistics Canada API

Value

a tibble containing the table template with added vector information

Examples

## Not run: 
template <- get_cansim_table_template("34-10-0013")
template |>
  filter(Geography=="Canada") |>
  add_cansim_vectors_to_template()

## End(Not run)

Add provincial abbreviations as factor

Description

Add provincial abbreviations as factor

Usage

add_provincial_abbreviations(data)

Arguments

data

A tibble as returned by get_cansim with provincial level data

Value

The input tibble with additional factor GEO.abb that contains language-specific provincial abbreviations

Examples

## Not run: 
df <- get_cansim("17-10-0005")
df <- add_provincial_abbreviations(df)

## End(Not run)


Translate deprecated CANSIM table number into new NDM-format table catalogue number

Description

Returns NDM table catalogue equivalent given a standard old-format CANSIM table number

Usage

cansim_old_to_new(oldCansimTableNumber)

Arguments

oldCansimTableNumber

deprecated style CANSIM table number (e.g. "427-0001")

Value

A character string with the new-format NDM table number

Examples

## Not run: 
cansim_old_to_new("026-0018")

## End(Not run)

Repartitions a cached cansim table to a new partitioning scheme

Description

Repartitions and already downloaded and cached parquet or feather dataset

Usage

cansim_repartition_cached_table(
  cansimTableNumber,
  new_partitioning = c(),
  language = "english",
  format = "parquet",
  cache_path = getOption("cansim.cache_path")
)

Arguments

cansimTableNumber

the NDM table number to load

new_partitioning

(Optional) Partition columns to use for parquet or feather formats.

language

"en" or "english" for English and "fr" or "french" for French language versions (defaults to English)

format

(Optional) The format of the data table to retrieve. Either "parquet", "feather", or sqlite (default is "parquet").

cache_path

(Optional) Path to where to cache the table permanently. By default, the data is cached in the path specified by 'getOption("cansim.cache_path")', if this is set. Otherwise it will use 'tempdir()'.

Examples

## Not run: 
cansim_repartition_cached_table("34-10-0013",new_partitioning=c("GeoUID"))


## End(Not run)

Use metadata to extract categories for column of specific level

Description

For tables with data with hierarchical categories, metadata containing hierarchy level descriptions is used to extract categories at a specified level of hierarchy only.

Usage

categories_for_level(
  data,
  column_name,
  level = NA,
  strict = FALSE,
  remove_duplicates = TRUE
)

Arguments

data

data table object as returned from get_cansim()

column_name

the quoted name of the column to extract categories from

level

the hierarchy level depth to which to extract categories, where 0 is top category

strict

(default FALSE) when TRUE will only extract that specific hierarchy level

remove_duplicates

(default TRUE) When set to TRUE higher level grouping categories already captured by lower level hierarchy data will be removed

Value

A vector of categories

Examples

## Not run: 
data <- get_cansim("16-10-0117")
categories_for_level(data,"North American Industry Classification System (NAICS)",level=2)

## End(Not run)

Collect data from a parquet, feather or sqlite query and normalize cansim table output

Description

Collect data from a parquet, feather or sqlite query and normalize cansim table output

Usage

collect_and_normalize(
  connection,
  replacement_value = "val_norm",
  normalize_percent = TRUE,
  default_month = "07",
  default_day = "01",
  factors = TRUE,
  strip_classification_code = FALSE,
  disconnect = FALSE
)

Arguments

connection

A connection to a local arrow connection as returned by get_cansim_connection, possibly with filters or other dplyr verbs applied

replacement_value

(Optional) the name of the column the manipulated value should be returned in. Defaults to adding the 'val_norm' value field.

normalize_percent

(Optional) When true (the default) normalizes percentages by changing them to rates

default_month

The default month that should be used when creating Date objects for annual data (default set to "07")

default_day

The default day of the month that should be used when creating Date objects for monthly data (default set to "01")

factors

(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to FALSE).

strip_classification_code

(Optional) Logical value indicating if classification code should be stripped from names. (Default set to false).

disconnect

(Optional) Only used when format is sqlite. Logical value to indicate if the SQLite database connection should be disconnected. (Default is FALSE)

Value

A tibble with the collected and normalized data

Examples

## Not run: 
library(dplyr)

con <- get_cansim_connection("34-10-0013")
data <- con %>%
  filter(GEO=="Ontario") %>%
  collect_and_normalize()


## End(Not run)

The correspondence file for old to new StatCan table numbers is included in the package

Description

The correspondence file for old to new StatCan table numbers is included in the package

Author(s)

Statistics Canada

References

https://www.statcan.gc.ca/eng/developers-developpeurs/cansim_id-product_id-concordance.csv


create database index

Description

create database index

Usage

create_index(connection, table_name, field)

Arguments

connection

connection to database

table_name

sql table name

field

name of field to index

Value

'NULL“


convert csv to arrow

Description

convert csv to arrow

Usage

csv2arrow(
  csv_file,
  arrow_file,
  format = "parquet",
  col_names,
  value_column = "VALUE",
  partitioning = c(),
  na = c(NA, "..", "", "...", "F"),
  text_encoding = "UTF-8",
  delim = ","
)

Arguments

csv_file

input csv path

arrow_file

output arrow database path

format

format of arrow file, "parquet" or "feather" (default parquet)

col_names

column names of the csv file

value_column

name of the value column with numeric data

partitioning

optional partition columns

na

na character strings

text_encoding

encoding of csv file (default UTF-8)

delim

(Optional) csv deliminator, default is ","

Value

A database connection


convert csv to sqlite adapted from https://rdrr.io/github/coolbutuseless/csv2sqlite/src/R/csv2sqlite.R

Description

convert csv to sqlite adapted from https://rdrr.io/github/coolbutuseless/csv2sqlite/src/R/csv2sqlite.R

Usage

csv2sqlite(
  csv_file,
  sqlite_file,
  table_name,
  transform = NULL,
  chunk_size = 5e+06,
  append = FALSE,
  col_types = NULL,
  na = c(NA, "..", "", "...", "F"),
  text_encoding = "UTF-8",
  delim = ",",
  ...
)

Arguments

csv_file

input csv path

sqlite_file

output sql database path

table_name

sql table name

transform

optional function that transforms each chunk

chunk_size

optional chunk size to read/write data, default=1,000,000

append

optional parameter, append to database or overwrite, default='FALSE'

col_types

optional parameter for csv column types

na

na character strings

text_encoding

encoding of csv file (default UTF-8)

delim

(Optional) csv deliminator, default is ","

...

(Optional) additional parameters passed to 'readr::read_delim_chunked'

Value

A database connection


Disconnect from a cansim database connection

Description

Disconnect from a cansim database connection

Usage

disconnect_cansim_sqlite(connection)

Arguments

connection

connection to database

Value

'NULL“

Examples

## Not run: 
con <- get_cansim_sqlite("34-10-0013")
disconnect_cansim_sqlite(con)

## End(Not run)

Fold in metadata and for selected columns

Description

Fold in metadata and for selected columns

Usage

fold_in_metadata_for_columns(data, data_path, column_names)

Arguments

data

A tibble with StatCan table data as e.g. returned by get_cansim.

data_path

base path to save parsed metadata

column_names

the names of the columns

Value

A tibble including the metadata information


Retrieve a Statistics Canada data table using NDM catalogue number

Description

Retrieves a data table using an NDM catalogue number as a tidy data frame. Retrieved table data is cached for the duration of the current R session only by default.

Usage

get_cansim(
  cansimTableNumber,
  language = "english",
  refresh = FALSE,
  timeout = 200,
  factors = TRUE,
  default_month = "07",
  default_day = "01"
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (defaults to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

factors

(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to TRUE).

default_month

The default month that should be used when creating Date objects for annual data (default set to "07")

default_day

The default day of the month that should be used when creating Date objects for monthly data (default set to "01") Set to higher values for large tables and slow network connection. (Default is 200).

Value

A tibble with StatCan Table data and added Date column with inferred date objects and added val_norm column with normalized value from the VALUE column.

Examples

## Not run: 
get_cansim("34-10-0013")

## End(Not run)

Retrieve a list of modified tables since a given date

Description

Retrieve a list of tables that have been modified or updated since the specified date.

Usage

get_cansim_changed_tables(start_date, end_date = NULL)

Arguments

start_date

Starting date in YYYY-MM-DD format to look for changes that changed on or after that date

end_date

Optional end date in YYYY-MM-DD format to look for changes that changed on or before that date, default is same as start date

Value

A tibble with Statistics Canada data table product ids and their release times

Examples

## Not run: 
get_cansim_changed_tables("2018-08-01")

## End(Not run)

Get NDM code sets

Description

Useful to get a list of surveys or subjects and used internally

Usage

get_cansim_code_set(
  code_set = c("scalar", "frequency", "symbol", "status", "uom", "survey", "subject",
    "wdsResponseStatus"),
  refresh = FALSE
)

Arguments

code_set

the code set to retrieve.

refresh

Default is FALSE, repeated calls during the same session will hit the cached data. To refresh the code list during a running R session set to TRUE

Value

A tibble with english and french labels for the given code set

Examples

## Not run: 
get_cansim_code_set("survey")

## End(Not run)

Retrieve Statistics Canada data table categories for a specific column

Description

Returns table column details given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.

Usage

get_cansim_column_categories(
  cansimTableNumber,
  column,
  language = "english",
  refresh = FALSE,
  timeout = 200
)

Arguments

cansimTableNumber

the NDM table number to load

column

the specified column for which to retrieve category information for

language

"en" or "english" for English and "fr" or "french" for French language versions (default set to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

Value

A tibble with detailed information on StatCan table categories for the specified field

Examples

## Not run: 
get_cansim_column_categories("34-10-0013", "Geography")

## End(Not run)

Retrieve Statistics Canada data table column list

Description

Returns table column details given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.

Usage

get_cansim_column_list(
  cansimTableNumber,
  language = "english",
  refresh = FALSE,
  timeout = 200
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (default set to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

Value

A tibble listing the column names of the StatCan table.

Examples

## Not run: 
get_cansim_column_list("34-10-0013")

## End(Not run)

Retrieve a Statistics Canada data table using NDM catalogue number as parquet, feather, or sqlite database connection

Description

Retrieves a data table using an NDM catalogue number as parquet, feather, or SQLite database connection. Retrieved table data is cached permanently if a cache path is supplied or for duration of the current R session. If the table is cached the function will check if a newer version is available and emit a warning message if the cached table is out of date.

Usage

get_cansim_connection(
  cansimTableNumber,
  language = "english",
  format = "parquet",
  partitioning = c(),
  refresh = FALSE,
  timeout = 1000,
  cache_path = getOption("cansim.cache_path")
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (defaults to English)

format

(Optional) The format of the data table to retrieve. Either "parquet", "feather", or sqlite (default is "parquet").

partitioning

(Optional) Partition columns to use for parquet or feather formats.

refresh

(Optional) Valid options are FALSE (the default), TRUE, and "auto". When set to TRUE, forces a reload of data table, when set to "auto" it will refresh the table by downloading the newest version from StatCan if the table is out of date. If set to FALSE and the table is out of date a warning will be emitted to alert the user that the data is outdated.

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

cache_path

(Optional) Path to where to cache the table permanently. By default, the data is cached in the path specified by 'getOption("cansim.cache_path")', if this is set. Otherwise it will use 'tempdir()'.

Value

A database connection to a local parquet, feather, or sqlite database with the StatCan Table data. The data frames after calling 'collect()' or 'collect_and_normalize()' are identical up to possibly different row order.

Examples

## Not run: 
con <- get_cansim_connection("34-10-0013")

# Work with the data connection
glimpse(con)


## End(Not run)

Retrieve table metadata from Statistics Canada API

Description

Retrieves table metadata given an input table number or vector of table numbers using either the new or old table number format. Patience is suggested as the Statistics Canada API can be very slow. The 'list_cansim_tables()' function can be used as an alternative to retrieve a (cached) list of CANSIM tables with (more limited) metadata.

Usage

get_cansim_cube_metadata(cansimTableNumber, type = "overview", refresh = FALSE)

Arguments

cansimTableNumber

A new or old CANSIM/NDM table number or a vector of table numbers

type

Which type of metadata to get, options are "overview", "members", "notes", or "corrections".

refresh

Refresh the data from the Statistics Canada API

Value

a tibble containing the table metadata

Examples

## Not run: 
get_cansim_cube_metadata("34-10-0013")

## End(Not run)

Retrieve data for specified Statistics Canada data product for last N periods for specific coordinates

Description

Allows for the retrieval of data for a Statistics Canada data table with specific table and coordinates. This allows partial targeted download of tables and can be effectively combined with the get_cansim_table_template function to help pinpoint data series of interest. The StatCan API can only process 300 coordinates at a time, if more than 300 coordinates are specified the function will batch the requests to the API.

Usage

get_cansim_data_for_table_coord_periods(
  tableCoordinates,
  periods = NULL,
  language = "english",
  refresh = FALSE,
  timeout = 200,
  factors = TRUE,
  default_month = "07",
  default_day = "01"
)

Arguments

tableCoordinates

Either a list with vectors of coordinates by table number, or a (filtered) data frame as returned by get_cansim_table_template.

periods

Optional numeric value for number of latest periods to retrieve data for, default is NULL in which case data for all periods is downloaded. Alternatively this can be specified by coordinate if tableCoordinates is a data frame, this argument will be ignored if that data frame as a "periods" column.

language

"en" or "english" for English and "fr" or "french" for French language versions (defaults to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

factors

(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to TRUE).

default_month

The default month that should be used when creating Date objects for annual data (default set to "07")

default_day

The default day of the month that should be used when creating Date objects for monthly data (default set to "01")

Value

A tibble with data matching specified coordinate and period input arguments

Examples

## Not run: 
get_cansim_data_for_table_coord_periods(list("35-10-0003"=c("1.1","1.12")),periods=3)

## End(Not run)

Major economic indicator release schedule

Description

Returns every release date of major economic indicators since March 14, 2012. This also includes scheduled future releases.

Usage

get_cansim_key_release_schedule()

Value

a tibble with data, and details for major economic indicator release

Examples

## Not run: 
get_cansim_key_release_schedule()

## End(Not run)


Retrieve series info for given table id and coordinates

Description

Retrieves series information by coordinates

Usage

get_cansim_series_info_cube_coord(
  cansimTableNumber,
  coordinates,
  timeout = 1000,
  refresh = FALSE
)

Arguments

cansimTableNumber

A new or old CANSIM/NDM table number or a vector of table numbers

coordinates

A vector of coordinates

timeout

Timeout for the API call

refresh

Refresh the data from the Statistics Canada API

Value

a tibble containing the table template

Examples

## Not run: 
get_cansim_table_template("34-10-0013")

## End(Not run)

Retrieve a Statistics Canada data table using NDM catalogue number as SQLite database connection

Description

Retrieves a data table using an NDM catalogue number as an SQLite table. Retrieved table data is cached permanently if a cache path is supplied or for duration of the current R session. The function will check against the latest release data for the table and emit a warning message if the cached table is out of date.

Usage

get_cansim_sqlite(
  cansimTableNumber,
  language = "english",
  refresh = FALSE,
  auto_refresh = FALSE,
  timeout = 1000,
  cache_path = getOption("cansim.cache_path")
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (defaults to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

auto_refresh

(Optional) When set to TRUE, it will reload of data table if a new version is available (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

cache_path

(Optional) Path to where to cache the table permanently. By default, the data is cached in the path specified by 'getOption("cansim.cache_path")', if this is set. Otherwise it will use 'tempdir()'.

Value

A database connection to a local SQLite database with the StatCan Table data.

Examples

## Not run: 
con <- get_cansim_connection("34-10-0013", format="sqlite")

# Work with the data connection
gplimpse(con)

disconnect_cansim_sqlite(con)

## End(Not run)

Retrieve Statistics Canada data table information

Description

Returns table information given an NDM table catalogue number in English or French. Retrieved table information data is cached for the duration of the R session only.

Usage

get_cansim_table_info(
  cansimTableNumber,
  language = "english",
  refresh = FALSE,
  timeout = 200
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (default set to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

Value

A tibble with the table overview information

Examples

## Not run: 
get_cansim_table_info("34-10-0013")

## End(Not run)

Get the latest release data for a StatCan table, if available

Description

This can be used to check when a table has last been updated.

Usage

get_cansim_table_last_release_date(cansimTableNumber)

Arguments

cansimTableNumber

the NDM table number

Value

A datetime object if a release data is available, NULL otherwise.

Examples

## Not run: 
get_cansim_table_last_release_date("34-10-0013")

## End(Not run)

Retrieve Statistics Canada data table notes and column categories

Description

Returns table notes given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.

Usage

get_cansim_table_notes(
  cansimTableNumber,
  language = "en",
  refresh = FALSE,
  timeout = 200
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (default set to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

Value

A tibble with table notes.

Examples

## Not run: 
get_cansim_table_notes("34-10-0013")

## End(Not run)

Retrieve Statistics Canada data table overview text

Description

Prints table overview information as console output. In order to display table overview information, the selected CANSIM table must be loaded entirely to display overview information. Overview information is printed in console an in English or French, as specified.

Usage

get_cansim_table_overview(
  cansimTableNumber,
  language = "english",
  refresh = FALSE
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (default set to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

Value

none

Examples

## Not run: 
get_cansim_table_overview("34-10-0013")

## End(Not run)

Retrieve Statistics Canada data table short notes

Description

Returns table notes given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.

Usage

get_cansim_table_short_notes(
  cansimTableNumber,
  language = "english",
  refresh = FALSE,
  timeout = 200
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (default set to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

Value

A tibble with the StatCan Notes for the table

Examples

## Not run: 
get_cansim_table_short_notes("34-10-0013")

## End(Not run)

Retrieve Statistics Canada data table subject detail

Description

Returns table subject detail given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.

Usage

get_cansim_table_subject(
  cansimTableNumber,
  language = "english",
  refresh = FALSE,
  timeout = 200
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (default set to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

Value

A tibble with the table subject code and name.

Examples

## Not run: 
get_cansim_table_subject("34-10-0013")

## End(Not run)

Retrieve Statistics Canada data table survey detail

Description

Returns table survey detail given an NDM table number in English or French. Retrieved table information data is cached for the duration of the R session only.

Usage

get_cansim_table_survey(
  cansimTableNumber,
  language = "english",
  refresh = FALSE,
  timeout = 200
)

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (default set to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

Value

A tibble with the table survey code and name

Examples

## Not run: 
get_cansim_table_survey("34-10-0013")

## End(Not run)

Retrieve table template from Statistics Canada API

Description

A table template consists of the dimensions and members and coordinates of a table that can be used to explore and filter table data before downloading subsets of the table. To add vector Ids to (a possibly filtered) template the 'add_cansim_vectors_to_template' function can be used.

Usage

get_cansim_table_template(cansimTableNumber, language = "eng", refresh = FALSE)

Arguments

cansimTableNumber

A new or old CANSIM/NDM table number or a vector of table numbers

language

Language for the dimension and member names, either "eng" or "fra"

refresh

Refresh the data from the Statistics Canada API

Value

a tibble containing the table template

Examples

## Not run: 
get_cansim_table_template("34-10-0013")

## End(Not run)

Retrieve a Statistics Canada data table URL given a table number

Description

Retrieve URL of a table from the API given a table number. Offers a more stable approach than manually guessing the URL of the table.

Usage

get_cansim_table_url(cansimTableNumber, language = "en")

Arguments

cansimTableNumber

the NDM table number to load

language

"en" or "english" for English and "fr" or "french" for French language versions (defaults to English)

Value

String object containing URL for specified table number

Examples

## Not run: 
get_cansim_table_url("34-10-0013")
get_cansim_table_url("34-10-0013", language = "fr")

## End(Not run)

Retrieve data for a Statistics Canada data vector released within a given time frame

Description

Allows for the retrieval of data for specified vector series for a given time window. Accessing data by vector allows for targeted extraction of time series. Discovering vectors of interest can be achieved using the StatCan table web interface or using get_cansim_table_template function to help pinpoint data series of interest, and then chaining the add_cansim_vectors_to_template function to add cansim vector information to the template data. The StatCan API can only process 300 coordinates at a time, if more than 300 coordinates are specified the function will batch the requests to the API.

Usage

get_cansim_vector(
  vectors,
  start_time = as.Date("1800-01-01"),
  end_time = Sys.time(),
  use_ref_date = TRUE,
  language = "english",
  refresh = FALSE,
  timeout = 200,
  factors = TRUE,
  default_month = "07",
  default_day = "01"
)

Arguments

vectors

The list of vectors to retrieve

start_time

Starting date in YYYY-MM-DD format, applies to REF_DATE or releaseTime, depending on use_ref_date parameter

end_time

Set an optional end time filter in YYYY-MM-DD format (defaults to current system time)

use_ref_date

Optional, TRUE by default. When set to TRUE, uses REF_DATE of vector data to filter, otherwise it uses StatisticsCanada's releaseDate value for filtering the specified vectors.

language

"en" or "english" for English and "fr" or "french" for French language versions (defaults to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

factors

(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to TRUE).

default_month

The default month that should be used when creating Date objects for annual data (default set to "07")

default_day

The default day of the month that should be used when creating Date objects for monthly data (default set to "01")

Value

A tibble with data for vectors released between start and end time

Examples

## Not run: 
get_cansim_vector("v41690973","2015-01-01")

## End(Not run)

Retrieve data for specified Statistics Canada data vector(s) for last N periods

Description

Allows for the retrieval of data for specified vector series for the N most-recently released periods. Accessing data by vector allows for targeted extraction of time series. Discovering vectors of interest can be achieved using the StatCan table web interface or using get_cansim_table_template function to help pinpoint data series of interest, and then chaining the add_cansim_vectors_to_template function to add cansim vector information to the template data. The StatCan API can only process 300 coordinates at a time, if more than 300 coordinates are specified the function will batch the requests to the API.

Usage

get_cansim_vector_for_latest_periods(
  vectors,
  periods = NULL,
  language = "english",
  refresh = FALSE,
  timeout = 200,
  factors = TRUE,
  default_month = "07",
  default_day = "01"
)

Arguments

vectors

The list of vectors to retrieve

periods

Numeric value for number of latest periods to retrieve data for, but default all data is retrieved.

language

"en" or "english" for English and "fr" or "french" for French language versions (defaults to English)

refresh

(Optional) When set to TRUE, forces a reload of data table (default is FALSE)

timeout

(Optional) Timeout in seconds for downloading cansim table to work around scenarios where StatCan servers drop the network connection.

factors

(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to TRUE).

default_month

The default month that should be used when creating Date objects for annual data (default set to "07")

default_day

The default day of the month that should be used when creating Date objects for monthly data (default set to "01")

Value

A tibble with data for specified vector(s) for the last N periods

Examples

## Not run: 
get_cansim_vector_for_latest_periods("v41690973",10)

## End(Not run)

Retrieve metadata for specified Statistics Canada data vectors

Description

Allows for the retrieval of metadata for Statistics Canada data vectors

Usage

get_cansim_vector_info(vectors)

Arguments

vectors

a vector of cansim vectors

Value

A tibble with metadata for selected vectors

Examples

## Not run: 
get_cansim_vector_info("v41690973")

## End(Not run)

Get column names de-duplicated and in the correct order

Description

Get column names de-duplicated and in the correct order

Usage

get_deduped_column_level_data(cansimTableNumber, language, column)

Arguments

cansimTableNumber

The table number

language

The language of the column names

column

The column name

Value

A tibble with the column names


List cached cansim arrow and SQlite databases

Description

List cached cansim arrow and SQlite databases

Usage

list_cansim_cached_tables(
  cache_path = getOption("cansim.cache_path"),
  refresh = FALSE
)

Arguments

cache_path

Optional, default value is 'getOption("cansim.cache_path")'.

refresh

Optional, refresh the last updated date of cached cansim tables

Value

A tibble with the list of all tables that are currently cached at the given cache path.

Examples

## Not run: 
list_cansim_cached_tables()

## End(Not run)

Get overview list for all Statistics Canada data cubes

Description

Generates an overview table containing metadata of available Statistics Canada data cubes.

Usage

list_cansim_cubes(lite = FALSE, refresh = FALSE, quiet = FALSE)

Arguments

lite

Get the version without cube dimensions and comments for faster retrieval, default is FALSE.

refresh

Default is FALSE, repeated calls during the same session will hit the cached data.

quiet

Optional, suppress messages To refresh the code list during a running R session set to TRUE

Value

A tibble with available Statistics Canada data cubes, including NDM table number, cube title, start and end dates, achieve status, subject and survey codes, frequency codes and a list of cube dimensions.

Examples

## Not run: 
list_cansim_cubes()

## End(Not run)


List cached cansim SQLite database

Description

List cached cansim SQLite database

Usage

list_cansim_sqlite_cached_tables(
  cache_path = getOption("cansim.cache_path"),
  refresh = FALSE
)

Arguments

cache_path

Optional, default value is 'getOption("cansim.cache_path")'.

refresh

Optional, refresh the last updated date of cached cansim tables

Value

A tibble with the list of all tables that are currently cached at the given cache path.

Examples

## Not run: 
list_cansim_cached_tables()

## End(Not run)

Get overview list for all Statistics Canada data tables (deprecated)

Description

This method is deprecated, please use 'list_cansim_cubes' instead. Generates an overview table containing metadata of available Statistics Canada data tables. A new and updated table will be generated if this table does not already exist in cached form or if the force refresh option is selected (set to FALSE by default). This can take some time as this process involves scraping through hundreds of Statistics Canada web pages to gather the required metadata. If option cansim.cache_path is set it will look for and store the overview table in that directory.

Usage

list_cansim_tables(refresh = FALSE)

Arguments

refresh

Default is FALSE, and will regenerate the table if set to TRUE

Value

A tibble with available Statistics Canada data tables, listing title, Statistics Canada data table catalogue number, deprecated CANSIM table number, description, and geography

Examples

## Not run: 
list_cansim_tables()

## End(Not run)


Normalize retrieved data table values to appropriate scales

Description

Facilitates working with Statistics Canada data table values retrieved using the package by setting all units to counts/dollars instead of millions, etc. If "replacement_value" is not set, it will replace the VALUE field with normalized values and drop the scale column. Otherwise it will keep the scale columns and create a new column named replacement_value with the normalized value. It will attempt to parse the REF_DATE field and create an R date variable. This is currently experimental.

Usage

normalize_cansim_values(
  data,
  replacement_value = "val_norm",
  normalize_percent = TRUE,
  default_month = "01",
  default_day = "01",
  factors = TRUE,
  strip_classification_code = FALSE,
  cansimTableNumber = NULL,
  internal = FALSE
)

Arguments

data

A retrieved data table as returned from get_cansim() pr get_cansim_ndm()

replacement_value

(Optional) the name of the column the manipulated value should be returned in. Defaults to "val_norm"

normalize_percent

(Optional) When TRUE (the default) normalizes percentages by changing them to rates

default_month

The default month that should be used when creating Date objects for annual data (default set to "01")

default_day

The default day of the month that should be used when creating Date objects for monthly data (default set to "01")

factors

(Optional) Logical value indicating if dimensions should be converted to factors. (Default set to TRUE).

strip_classification_code

(strip_classification_code) Logical value indicating if classification code should be stripped from names. (Default set to FALSE, if factors=TRUE this is overridden and set to TRUE).

cansimTableNumber

(Optional) Only needed when operating on results of SQLite connections.

internal

(Optional) Flag to indicate that this function is called internally.

Value

Returns a tibble with with adjusted values.

Examples

## Not run: 
cansim_table <- get_cansim("34-10-0013")
normalize_cansim_values(cansim_table)

## End(Not run)

Parse metadata

Description

Parse metadata

Usage

parse_metadata(meta, data_path)

Arguments

meta

the raw metadata table

data_path

base path to save parsed metadata


Remove cached cansim SQLite and parquet database

Description

Remove cached cansim SQLite and parquet database

Usage

remove_cansim_cached_tables(
  cansimTableNumber,
  format = c("parquet", "feather", "sqlite"),
  language = NULL,
  cache_path = getOption("cansim.cache_path")
)

Arguments

cansimTableNumber

Vector of the table(s) to be removed, or a (filtered) table as returned by 'list_cansim_cached_tables' with the list of tables to be removed.

format

Format of cache to remove, possible values are '"parquet"', '"feather"' or '"sqlite"' or a subset of these (the default is all of these)

language

Language for which to remove the cached data. If unspecified ('NULL') tables for all languages will be removed.

cache_path

Optional, default value is 'getOption("cansim.cache_path")'

Value

'NULL“

Examples

## Not run: 
con <- get_cansim_connection("34-10-0013", format="parquet")
remove_cansim_cached_tables("34-10-0013", format="parquet")

## End(Not run)

Remove cached cansim SQLite database

Description

Remove cached cansim SQLite database

Usage

remove_cansim_sqlite_cached_table(
  cansimTableNumber,
  language = NULL,
  cache_path = getOption("cansim.cache_path")
)

Arguments

cansimTableNumber

Number of the table to be removed

language

Language for which to remove the cached data. If unspecified ('NULL') tables for all languages will be removed

cache_path

Optional, default value is 'getOption("cansim.cache_path")'

Value

'NULL“

Examples

## Not run: 
con <- get_cansim_connection("34-10-0013", format="sqlite")
disconnect_cansim_sqlite(con)
remove_cansim_cached_tables("34-10-0013", format="sqlite")

## End(Not run)

Search through Statistics Canada data cubes

Description

Searches through Statistics Canada data cubes using a search term.

Usage

search_cansim_cubes(search_term, refresh = FALSE)

Arguments

search_term

User-supplied search term used to find Statistics Canada data cubes with matching titles, table numbers, subject and survey codes.

refresh

Default is FALSE. The underlying cube list is cached for the duration of the R sessions and will regenerate the cube list if set to TRUE

Value

A tibble with available Statistics Canada data cubes, listing title, Statistics Canada data cube catalogue number, deprecated CANSIM table number, survey and subject.

Examples

## Not run: 
search_cansim_cubes("Labour force")

## End(Not run)


Search through Statistics Canada data tables (deprecated)

Description

This method is deprecated, please use 'search_cansim_cubes' instead. Searches through Statistics Canada data tables using a search term. A new table is generated if it already does not exist or if refresh option is set to TRUE. Search-terms are case insensitive, but will accept regular expressions for more advanced searching. The search function can search either through table titles or through table descriptions, depending on the whether or not search_description is set to TRUE or not. If refresh = TRUE, the table will be updated and regenerated using Statistics Canada's latest data. This can take some time since this process involves scraping through several hundred web pages to gather the required metadata. If option cache_path is set it will look for and store the overview table in that directory.

Usage

search_cansim_tables(search_term, search_fields = "both", refresh = FALSE)

Arguments

search_term

User-supplied search term used to find Statistics Canada data tables with matching titles

search_fields

By default, this function will search through table titles and keywords. Setting this parameter to "title" will only search through the title, setting it to "keyword" will only search through keywords

refresh

Default is FALSE, and will regenerate the table if set to TRUE

Value

A tibble with available Statistics Canada data tables, listing title, Statistics Canada data table catalogue number, deprecated CANSIM table number, description and geography that match the search term.

Examples

## Not run: 
search_cansim_tables("Labour force")

## End(Not run)


View CANSIM table or vector information in browser

Description

Opens CANSIM table or vector on Statistics Canada's website using default browser. This may be useful for getting further info on CANSIM table and survey methods.

Usage

view_cansim_webpage(cansimTableNumber = NULL)

Arguments

cansimTableNumber

CANSIM or NDM table number or cansim vectors with "v" prefix. If no number is provided, the vector search page on the Statistic Canada website will be opened.

Value

none

Examples

## Not run: 
view_cansim_webpage("34-10-0013")

## End(Not run)