Title: | South Africa Macroeconomic Database API |
Version: | 0.3.0 |
Description: | An R API providing access to a relational database with macroeconomic time series data for South Africa, obtained from the South African Reserve Bank (SARB) and Statistics South Africa (STATSSA), and updated on a weekly basis via the EconData https://www.econdata.co.za/ platform and automated scraping of the SARB and STATSSA websites. The database is maintained at the Department of Economics at Stellenbosch University. |
BugReports: | https://github.com/Stellenbosch-Econometrics/SAMADB-Issues/issues |
License: | GPL-3 |
Encoding: | UTF-8 |
Imports: | DBI, RMySQL, writexl, data.table, collapse (≥ 2.0.0) |
RoxygenNote: | 7.2.3 |
Depends: | R (≥ 3.3.0) |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-05-24 15:59:22 UTC; sebastiankrantz |
Author: | Sebastian Krantz [aut, cre] |
Maintainer: | Sebastian Krantz <sebastian.krantz@graduateinstitute.ch> |
Repository: | CRAN |
Date/Publication: | 2024-05-24 18:00:08 UTC |
South Africa Macroeconomic Database API
Description
An R API providing access to a relational database with public macroeconomic data for South Africa, obtained from from the South African Reserve Bank (SARB) and Statistics South Africa (STATSSA), and updated on a regular basis via the EconData (https://www.econdata.co.za/) platform and automated scraping of the SARB and STATSSA websites. The database is maintained at the Department of Economics at Stellenbosch University.
Functions
Functions and data providing information about the available data
sm_datasources()
sm_datasets()
sm_series()
Function to retrieve the data from the database
Functions to reshape data and add temporal identifiers
sm_pivot_wider()
sm_pivot_longer()
sm_expand_date()
Function to export wide format data to Excel
Helper functions to convert inputs to R dates and transpose the data
Global Macros with core ID variables in the database
Global Identifier Macros
Description
The macro .SAMADB_ID
contains the string c("dsid", "series")
denoting
the names of ID variables that identify the cross-sectional dimension in the database. All series codes are unique across datasets.
The macro .SAMADB_T
contains the string
c("date", "year", "quarter", "month", "day")
denoting temporal identifiers generated by sm_expand_date
.
The "date"
variable is sufficient to uniquely identify a point in time in the database.
Each value in the database is uniquely identified by dsid, series and date.
Usage
.SAMADB_ID
.SAMADB_T
See Also
Examples
.SAMADB_ID
.SAMADB_T
Coerce Vectors to Dates
Description
This function coerces date strings i.e. "YYYY-MM-DD"
or "YYYY-MM"
, years e.g. 2015
(numeric or character),
year-quarters e.g. "2015Q1"
or "2015-Q1"
, year-months e.g. "2015M01"
or "2015-M01"
or numeric values representing dates (e.g. previously imported Excel date) to a regular R date.
Usage
sm_as_date(x, end = FALSE, origin = "1899-12-30")
Arguments
x |
a character date string |
end |
logical. |
origin |
a date or date-string that can be used as reference for converting numeric values to dates. The default corresponds to dates generated in Excel for Windows. See |
Value
A Date
vector.
See Also
Examples
sm_as_date("2011-05")
sm_as_date(2011)
sm_as_date("2011Q1")
sm_as_date("2011Q1", end = TRUE)
sm_as_date("2011M2")
sm_as_date("2011M2", end = TRUE)
Retrieve Data from the Database
Description
This is the main function of the package to retrieve data from the database.
Usage
sm_data(
dsid = NULL,
series = NULL,
from = NULL,
to = NULL,
freq = NULL,
labels = TRUE,
wide = TRUE,
expand.date = FALSE,
ordered = TRUE,
return.query = FALSE,
...
)
Arguments
dsid |
character. (Optional) id's of datasets matching the 'dsid' column of the 'DATASET' table (retrieved using |
series |
character. (Optional) codes of series matching the 'series' column of the 'SERIES' table (retrieved using |
from |
set the start time of the data retrieved by either supplying a start date, a date-string of the form |
to |
same as |
freq |
character. Return only series at a certain frequency. Allowed are values |
labels |
logical. |
wide |
logical. |
expand.date |
logical. |
ordered |
logical. |
return.query |
logical. |
... |
further arguments passed to |
Details
Series from datasets at different frequencies can be queried, but, if wide = TRUE
, this will result in missing values in the lower frequency series.
Value
A data.table
with the result of the query.
See Also
sm_pivot_wider
, sm_expand_date
, samadb
Examples
# Return all electricity indicators from 2000
sm_data("ELECTRICITY", from = 2000)
Retrieve Datasets Table
Description
This function pulls and return a table called 'DATASET' from the database.
Usage
sm_datasets(ordered = TRUE)
Arguments
ordered |
logical. |
Details
The 'DATASET' table gives information about the different datasets fetched from different providers at regular intervals. It provides a unique id for each dataset, the frequency of data, the number of records (datapoints) in each dataset, the minimum and maximum time coverage, when the dataset was last updated, and information about the data source, provider, and method of data access.
Value
A data.table
with information about the available datasets in the database.
See Also
sm_datasources
, sm_series
, samadb
Examples
sm_datasets()
Retrieve Data Sources Table
Description
This function pulls and returns a table called 'DATASOURCE' from the database.
Usage
sm_datasources(ordered = TRUE)
Arguments
ordered |
logical. |
Details
The 'DATASOURCE' table gives information about the sources of data in this database, including the source website, and the number of datasets available from the source.
Value
A data.table
with information about the sources of data in the database.
See Also
sm_datasets
, sm_series
, samadb
Examples
sm_datasources()
Generate Temporal Identifiers from a Date Column
Description
This function expands a date column and generates additional temporal identifiers from it (year, month, quarter, day).
Usage
sm_expand_date(
x,
gen = c("year", "quarter", "month"),
origin = "1899-12-30",
keep.date = TRUE,
remove.missing.date = TRUE,
sort = TRUE,
as.factor = TRUE,
name = "date",
...
)
Arguments
x |
either a vector of class 'Date', or coercible to date using |
gen |
character. A vector of identifiers to generate from |
origin |
character / Date. Passed to |
keep.date |
logical. |
remove.missing.date |
logical. |
sort |
logical. |
as.factor |
|
name |
character. The name of the date variable to expand. |
... |
not used. |
Value
A data.table
containing the computed identifiers as columns. See Examples.
See Also
Examples
# First a basic example
x <- seq.Date(as.Date("1999-01-01"), as.Date("2000-01-01"), by = "month")
sm_expand_date(x)
sm_expand_date(x, gen = .SAMADB_T[-1L], keep.date = FALSE)
# Now using the API
sm_expand_date(sm_data("BUSINESS_CYCLES"))
# Same thing
sm_data("BUSINESS_CYCLES", expand.date = TRUE)
Reshape Column-Based Data to Long Format
Description
This function automatically reshapes wide (column-based) data into a long format akin to the format of the raw data coming from the database (sm_data(..., wide = FALSE)
).
Internally it uses melt
from data.table.
Usage
sm_pivot_longer(
data,
id_cols = intersect(.SAMADB_T, names(data)),
to_value = setdiff(names(data), id_cols),
variable_name = "series",
value_name = "value",
label_name = "label",
na.rm = TRUE,
variable.factor = TRUE,
label.factor = TRUE,
...
)
Arguments
data |
a wide format data frame where all series have their own column. |
id_cols |
character. Temporal identifiers of the data. By default all variables in |
to_value |
character. The names of all series to be stacked into the long format data frame. |
variable_name |
character. The name of the variable to store the names of the series. |
value_name |
character. The name of the variable to store the data values. |
label_name |
character. The name of the variable to store the series labels. |
na.rm |
logical. |
variable.factor , label.factor |
logical. |
... |
further arguments passed to |
Value
A data.table
with the reshaped data.
See Also
Examples
# Return all electricity indicators from the year 2000 onwards
data <- sm_data("ELECTRICITY", from = 2000)
sm_pivot_longer(data)
Reshape Long API Data to Column-Based Format
Description
This function automatically reshapes long (stacked) raw data from the API (sm_data(..., wide = FALSE)
) to a wide format where each variable has its own column.
Internally it uses pivot
from collapse.
Usage
sm_pivot_wider(
data,
id_cols = intersect(.SAMADB_T, names(data)),
names_from = "series",
values_from = "value",
labels_from = if (any(names(data) == "label")) "label" else NULL,
expand.date = FALSE,
...
)
Arguments
data |
raw data from the API: A long format data frame where all values are stacked in a value column. |
id_cols |
character. Temporal identifiers of the data. By default all variables in |
names_from |
character. The column containing the series codes. These will become the names of the new columns in the wider data format. |
values_from |
character. The column containing the data values. |
labels_from |
character. The column containing the labels describing the series. |
expand.date |
logical. |
... |
further arguments passed to |
Value
A data.table
with the reshaped data.
See Also
Examples
# Return all electricity indicators from the year 2000 onwards
sm_pivot_wider(sm_data("ELECTRICITY", from = 2000, wide = FALSE))
Retrieve Series Table
Description
This function pulls the 'SERIES' table from the database, providing information about the time series in the database. Each series is given a code which unique across datasets.
Usage
sm_series(
dsid = NULL,
series = NULL,
dataset.info = FALSE,
ordered = TRUE,
return.query = FALSE
)
Arguments
dsid |
character. (Optional) id's of datasources matching the 'dsid' column of the 'DATASET' table (retrieved using |
series |
character. (Optional) codes of series for which information in to be returned. If 'dsid' is also specificed, the two are combined using SQL 'OR' i.e. these series are retrieved in addition to all series matched through 'dsid'. |
dataset.info |
logical. |
ordered |
logical. |
return.query |
logical. |
Details
Each series is given a code which is unique across datasets. Each series also has a label describing the series. Further information recorded are the series frequency, unit, whether it was seasonally adjusted, number of observations, minimum and maximum date, and (optionally) topic, alternative code provided by the data source (data retrieved from EconData uses EconData codes as series codes, so the 'src_code' field gives the codes used by the SARB or STATSSA), or further comments on the series.
Value
A data.table
with information about the available series in the database.
See Also
sm_datasources
, sm_datasets
, sm_data
, samadb
Examples
# By default returns all series
sm_series()
# Adding information about the dataset and provider
sm_series(dataset.info = TRUE)
# Only series in the QB
sm_series("QB")
Transpose a Wide Dataset to a Row-Based Format
Description
This function is called by sm_write_excel
with option transpose = TRUE
to generate a row-based tabular data format from a wide data frame in R that is suitable for exporting to Excel.
Usage
sm_transpose(data, date.format = "%d/%m/%Y")
Arguments
data |
a wide format data frame where each column is a variable and the first variable uniquely identifies the data. |
date.format |
a format for date columns which is passed to |
Value
A transposed data frame or data.table
(the class of the input is preserved).
See Also
transpose
, sm_pivot_wider
, sm_write_excel
, samadb
Examples
sm_transpose(sm_data("ELECTRICITY"))
Export Wide Data to Excel
Description
This function exports a wide format dataset to a column- (default) or row-oriented Excel format.
Usage
sm_write_excel(
data,
...,
transpose = FALSE,
transpose.date.format = "%d/%m/%Y"
)
Arguments
data |
a wide dataset from |
... |
further arguments to |
transpose |
logical. If |
transpose.date.format |
argument passed to |
Value
Writes an Excel file to the specified path (no return value).
See Also
sm_transpose
, write_xlsx
, samadb
Examples
## Not run:
# Getting electricity indicators from 2000
data <- sm_data("ELECTRICITY", from = 2000)
# Saving to different Excel formats
sm_write_excel(data, "ELECTRICITY.xlsx")
sm_write_excel(data, "ELECTRICITY.xlsx", transpose = TRUE)
# Saving to alternative path
sm_write_excel(data, "C:/Users/.../ELECTRICITY.xlsx")
## End(Not run)