Title: | Use Data from the Czech Public Finance Database |
Version: | 0.7.4 |
Description: | Get programmatic access to data from the Czech public budgeting and accounting database, Státní pokladna https://monitor.statnipokladna.cz/. |
License: | MIT + file LICENSE |
URL: | https://github.com/petrbouchal/statnipokladna, https://petrbouchal.xyz/statnipokladna/ |
BugReports: | https://github.com/petrbouchal/statnipokladna/issues |
Depends: | R (≥ 3.5.0) |
Imports: | cli, curl (≥ 4.3), dplyr (≥ 1.0.0), httr (≥ 1.4.1), jsonlite, lifecycle (≥ 1.0.1), lubridate (≥ 1.7.4), magrittr, purrr (≥ 0.3.2), readr (≥ 1.3.1), rlang (≥ 0.4.0), stringi (≥ 1.4.3), stringr (≥ 1.4.0), tibble (≥ 2.1.3), tidyr (≥ 1.0.0), utils (≥ 3.6.0), xml2 (≥ 1.2.2) |
Suggests: | knitr (≥ 1.30), markdown (≥ 1.1), ragg (≥ 0.4.0), rmarkdown, testthat (≥ 2.1.0), tidyselect (≥ 1.2.0) |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
Language: | en |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2024-08-21 09:22:02 UTC; petr |
Author: | Petr Bouchal |
Maintainer: | Petr Bouchal <pbouchal@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-08-21 11:00:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Deprecated: Add codelist data to downloaded data
Description
Deprecated, use sp_add_codelist()
instead.
Usage
add_codelist(
data,
codelist = NULL,
period_column = .data$vykaz_date,
redownload = FALSE,
dest_dir = NULL
)
Arguments
data |
a data frame returned by |
codelist |
The codelist to add. Either a character vector of length one (see |
period_column |
Unquoted column name of column identifying the data period in |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
Details
Value
A tibble of same length as data
, with added columns from codelist
. See Details.
See Also
Other Core workflow:
get_codelist()
,
sp_add_codelist()
,
sp_get_codelist()
,
sp_get_dataset()
,
sp_get_table()
Deprecated: Get codelist
Description
Deprecated: use sp_get_codelist()
Usage
get_codelist(codelist_id, n = NULL, dest_dir = NULL, redownload = FALSE)
Arguments
codelist_id |
A codelist ID. See |
n |
Number of rows to return. Default (NULL) means all. Useful for quickly inspecting a codelist. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
Details
Value
A tibble
See Also
Other Core workflow:
add_codelist()
,
sp_add_codelist()
,
sp_get_codelist()
,
sp_get_dataset()
,
sp_get_table()
Deprecated: Retrieve and read dataset from statnipokladna
Description
Deprecated, use sp_get_dataset()
instead.
Usage
get_dataset(dataset_id, year, month = 12, dest_dir = NULL, redownload = FALSE)
Arguments
dataset_id |
A dataset ID. See |
year |
year, numeric vector of length <= 1 (can take multiple values), 2015-2019 for some datasets, 2010-2020 for others. Defaults to 2018. (see Details for how to work with data across time periods.) |
month |
month, numeric vector of length <= 1 (can take multiple values). Must be between 1 and 12. Defaults to 12. (see Details for how to work with data across time periods.) |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
Details
Value
character (link) if download = TRUE, nothing otherwise.
Deprecated: Get dataset documentation
Description
Deprecated, use sp_get_dataset_doc()
instead.
Usage
get_dataset_doc(dataset_id, dest_dir = ".", download = TRUE)
Arguments
dataset_id |
dataset ID. See |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
download |
Whether to download (the default) or open link in browser. |
Details
Value
a tibble
Deprecated: Get a statnipokladna table
Deprecated, use sp_get_table()
instead.
Description
Usage
get_table(
table_id,
year,
month = 12,
ico = NULL,
redownload = FALSE,
dest_dir = NULL
)
Arguments
table_id |
A table ID. See |
year |
year, numeric, 2015-2019 for some datasets, 2010-2020 for others. Can be a vector of length > 1 (see Details for how to work with data across time periods.). |
month |
month, numeric. Must be 3, 6, 9 or 12. Can be a vector of length > 1 (see details). |
ico |
ID(s) of org to return, character of length one or more. If unset, returns all orgs. ID not checked for correctness/existence. See https://monitor.statnipokladna.cz/datovy-katalog/prohlizec-ciselniku/ucjed to look up ID of any org in the dataset. |
redownload |
Redownload even if recent file present? Defaults to FALSE. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
Value
a tibble
Add codelist data to downloaded data
Description
Joins a provided codelist, or downloads and processes one if necessary, and adds it to the data.
Usage
sp_add_codelist(
data,
codelist = NULL,
period_column = .data$vykaz_date,
by = NULL,
redownload = FALSE,
dest_dir = NULL
)
Arguments
data |
a data frame returned by |
codelist |
The codelist to add. Either a character vector of length one (see |
period_column |
Unquoted column name of column identifying the data period in |
by |
character. Columns by which to join the codelist. Same form as for |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
Details
The data
argument should be a data frame produced by sp_get_table()
If this is true, the period_column
argument is not needed.
The codelist
argument, if a data frame, should be a data frame produced by
sp_get_codelist()
. Specifically, it assumes it contains the following columns:
start_date, a date
end_date, a date
column with the code, character usually named the same as the codelist
#' You can usually tell which codelist you need from the name of the column whose code you are looking to expand, e.g. the codes in column paragraf can be expanded by codelist paragraf.
The function filters the codelist to obtain a set of entries relevant to the time period of data
.
If data
contains tables for multiple periods, this is handled appropriately.
Codelist-originating columns in the resulting data frame are renamed so they do not interfere with
joining additional codelists, perhaps in a single pipe call.
Note that some codelists are "secondary" and can only be joined onto other codelists.
If a codelist does not join using sp_add_codelis()
, store the output of sp_get_codelist()
and join
it manually using dplyr
.
Value
A tibble of same length as data
, with added columns from codelist
. See Details.
See Also
Other Core workflow:
add_codelist()
,
get_codelist()
,
sp_get_codelist()
,
sp_get_dataset()
,
sp_get_table()
Examples
## Not run:
sp_get_table("budget-central", 2017) %>%
sp_add_codelist("polozka") %>%
sp_add_codelist("paragraf")
pol <- sp_get_codelist("paragraf")
par <- sp_get_codelist("polozka")
sp_get_table("budget-central", 2017) %>%
sp_add_codelist(pol) %>%
sp_add_codelist(par)
## End(Not run)
List of available codelists
Description
Contains IDs and names of all (most) available codelists that can be retrieved by sp_get_codelist.
Usage
sp_codelists
Format
A data frame with 27 rows and 2 variables:
id
character. ID, used as
codelist_id
argument insp_get_codelist
.name
character. Short name, mostly corresponds to title used on statnipokladna.cz.
Details
The id
is to be used as the codelist_id
parameter in sp_get_codelist
.
See https://monitor.statnipokladna.cz/datovy-katalog/ciselniky for a more detailed
descriptions and a GUI for exploring the lists.
See Also
Other Lists of available entities:
sp_datasets
,
sp_tables
List of available datasets
Description
Contains IDs and names of all available datasets that can be retrieved by get_dataset.
Usage
sp_datasets
Format
A data frame with 9 rows and 3 variables:
id
character. Dataset ID, used as
dataset_id
argument tosp_get_dataset
.name
character. Dataset name, mostly corresponds to title on the statnipokladna GUI.
Details
See https://monitor.statnipokladna.cz/datovy-katalog/transakcni-data for a more detailed descriptions of the datasets.
See Also
Other Lists of available entities:
sp_codelists
,
sp_tables
Get codelist
Description
Downloads and processes codelist identified by codelist_id
. See sp_codelists
for a list of
of available codelists with their IDs and names.
Usage
sp_get_codelist(codelist_id, n = NULL, dest_dir = NULL, redownload = FALSE)
Arguments
codelist_id |
A codelist ID. See |
n |
Number of rows to return. Default (NULL) means all. Useful for quickly inspecting a codelist. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
Details
You can usually tell which codelist you need from the name of the column whose code you are looking to expand, e.g. the codes in column paragraf can be expanded by codelist paragraf.
The processing ensures that the resulting codelist can be correctly joined to
the data, automatically using sp_add_codelist()
or manually.
The entire codelist is downloaded and not filtered for any particular date.
Codelist XML files are stored in a temporary directory as determined by tempdir()
and persist per session to avoid redownloads.
Value
a tibble
See Also
Other Core workflow:
add_codelist()
,
get_codelist()
,
sp_add_codelist()
,
sp_get_dataset()
,
sp_get_table()
Examples
## Not run:
sp_get_codelist("paragraf")
## End(Not run)
Download a codelist XML file
Description
This is normally called inside sp_get_codelist()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
Usage
sp_get_codelist_file(
codelist_id = NULL,
url = NULL,
dest_dir = NULL,
redownload = FALSE
)
Arguments
codelist_id |
A codelist ID. See |
url |
DESCRIPTION. Either this or |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
Value
path to XML file; character vector of length one.
See Also
Other Detailed workflow:
sp_get_codelist_url()
,
sp_get_dataset_url()
,
sp_get_table_file()
,
sp_load_codelist()
,
sp_load_table()
Examples
## Not run:
sp_get_codelist_file("druhuj")
codelist_url <- sp_get_codelist_url("druhuj")
sp_get_codelist_file(url = codelist_url)
## End(Not run)
Get URL of a given codelist
Description
This is normally called inside sp_get_codelist()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
Usage
sp_get_codelist_url(codelist_id, check_if_exists = TRUE)
Arguments
codelist_id |
DESCRIPTION. |
check_if_exists |
Whether to check that the URL works (HTTP 200). |
Value
character vector of length one containing URL
See Also
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_dataset_url()
,
sp_get_table_file()
,
sp_load_codelist()
,
sp_load_table()
Examples
## Not run:
sp_get_codelist_url("ucjed", FALSE)
if(FALSE) sp_get_codelist_url("ucjed_wrong", TRUE) # fails, invalid codelist
## End(Not run)
Get/open URL of codelist viewer
Description
Returns a URL for the online codelist browser in monitor.statnipokladna.cz and opens it in browser if open = TRUE.
Usage
sp_get_codelist_viewer(codelist_id, open = TRUE)
Arguments
codelist_id |
A codelist ID. See |
open |
Whether to open URL in browser. Defaults to TRUE. |
Value
a URL, character vector of length one.
See Also
Other Utilities:
sp_get_dataset_doc()
Retrieve dataset from statnipokladna
Description
Downloads ZIP archives for a given dataset. If year
or month
have length > 1, gets all combinations.
Usage
sp_get_dataset(
dataset_id,
year,
month = 12,
dest_dir = NULL,
redownload = FALSE
)
Arguments
dataset_id |
A dataset ID. See |
year |
year, numeric vector of length <= 1 (can take multiple values), 2015-2019 for some datasets, 2010-2020 for others. Defaults to 2018. (see Details for how to work with data across time periods.) |
month |
month, numeric vector of length <= 1 (can take multiple values). Must be between 1 and 12. Defaults to 12. (see Details for how to work with data across time periods.) |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
redownload |
Redownload even if file has already been downloaded? Defaults to FALSE. |
Details
Files are stored in a temp folder as determined by tempdir()
or the dest_dir
param or the statnipokladna.dest_dir
option.
and further sorted into subdirectories by dataset, year and month. If saved to tempdir()
(the default), downloaded files per session to avoid redownloads.
How data for different time periods is exported differs by dataset.
This has significant implications for how you get to usable full-year numbers or time series in different tables.
See vignette("statnipokladna")
for details on this.
Value
character string with complete paths to downloaded ZIP archives.
See Also
Other Core workflow:
add_codelist()
,
get_codelist()
,
sp_add_codelist()
,
sp_get_codelist()
,
sp_get_table()
Examples
## Not run:
budget_2018 <- sp_get_dataset("finm", 2018)
budget_mid2018 <- sp_get_dataset("finm", 2018, 6)
## End(Not run)
Get dataset documentation
Description
Downloads XLS file with dataset documentation, or opens link to this file in browser.
Usage
sp_get_dataset_doc(dataset_id, dest_dir = NULL, download = TRUE)
Arguments
dataset_id |
dataset ID. See |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
download |
Whether to download (the default) or open link in browser. |
Value
(invisible) path to file if download = TRUE
, URL otherwise
See Also
Other Utilities:
sp_get_codelist_viewer()
Examples
## Not run:
sp_get_dataset_doc("finm")
## End(Not run)
Get URL of dataset
Description
Useful for workflows where you want to keep track of URLs and intermediate files, rather than having all steps performed by one function.
Usage
sp_get_dataset_url(dataset_id, year, month = 12, check_if_exists = TRUE)
Arguments
dataset_id |
Dataset ID. See |
year |
year, numeric vector of length <= 1 (can take multiple values), 2015-2019 for some datasets, 2010-2020 for others. (see Details for how to work with data across time periods.) |
month |
month, numeric vector of length <= 1 (can take multiple values). Must be between 1 and 12. Defaults to 12. (see Details for how to work with data across time periods.) |
check_if_exists |
Whether to check that the URL works (HTTP 200). |
Value
a character vector of length one, containing a URL
See Also
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_codelist_url()
,
sp_get_table_file()
,
sp_load_codelist()
,
sp_load_table()
Examples
## Not run:
sp_get_dataset_url("finm", 2018, 6, FALSE)
sp_get_dataset_url("finm", 2029, 6, FALSE) # works but returns invalid URL
if(FALSE) sp_get_dataset_url("finm_wrong", 2018, 6, TRUE) # fails, invalid dataset ID
if(FALSE) sp_get_dataset_url("finm", 2022, 6, TRUE) # fails, invalid time period
## End(Not run)
Get a statnipokladna table
Description
Cleans and loads a table. If needed, a dataset containing the table is downloaded.
Usage
sp_get_table(
table_id,
year,
month = 12,
ico = NULL,
redownload = FALSE,
dest_dir = NULL
)
Arguments
table_id |
A table ID. See |
year |
year, numeric, 2015-2019 for some datasets, 2010-2020 for others. Can be a vector of length > 1 (see Details for how to work with data across time periods.). |
month |
month, numeric. Must be 3, 6, 9 or 12. Can be a vector of length > 1 (see details). |
ico |
ID(s) of org to return, character of length one or more. If unset, returns all orgs. ID not checked for correctness/existence. See https://monitor.statnipokladna.cz/datovy-katalog/prohlizec-ciselniku/ucjed to look up ID of any org in the dataset. |
redownload |
Redownload even if recent file present? Defaults to FALSE. |
dest_dir |
character. Directory in which downloaded files will be stored.
If left unset, will use the |
Details
The data is loaded from files downloaded automatically by sp_get_dataset()
;
files persist in a temporary directory per session.
How data for different time periods is exported differs by dataset.
This has significant implications for how you get to usable full-year numbers or time series in different tables.
See vignette("statnipokladna")
for details on this.
Data is processed in the following way:
all columns are given names that are human-readable and facilitato add codelists
ICO (org. IDs) are normalised as in some datasets they are padded with leading zeros
a vykaz_date, vykaz_year and vykaz_month columns are created to identify the time period
value columns are transformed into numeric
other columns are left as character to avoid losing information
Correspondence between input and output columns
Shared/multiple tables
Original | Output | English | Czech | Note |
ZC_VTAB | vtab | table number | tabulka | - |
ZC_UCJED | ucjed | accounting unit | účetní jednotka | NB: ucjed != ico; the two codes are different; ICO is universal, ucjed is specific to SP, and both denote an organisation. |
ZC_VYKAZ | vykaz | report number | výkaz | - |
ZFUNDS_CT | finmisto | accounting centre | finanční místo | either kapitola or "organizační složka státu" (core state org) |
ZC_ICO | ico | org ID | IČO | see ucjed |
ZC_FUND | zdroj | funding source | zdroj | - |
ZC_KRAJ | kraj | region | kraj | - |
ZC_NUTS | nuts | NUTS code | NUTS kód | - |
ZC_LAU | okres | NUTS code of LAU1 unit | LAU1 kód | - |
Tables budget-*
Original | Output | English | Czech | Note |
ZCMMT_ITM | polozka | item/line | položka (druhové členění) | NB: polozka != polvyk |
0FM_AREA | kapitola | chapter | kapitola | - |
0CI_TYPE | polozka_type | item/line type | typ položky | - |
FUNC0AREA | paragraf | sector line | paragraf (odvětvové členění) | - |
ZC_PVS | pvs | programme code | programové výdaje státu | post-2014 |
ZC_EDS | eds | subsidy and proprty evidence | Evidenční dotační systém / správa majetku ve vlastnictví státu | post-2014, No codelist - perhaps external via https://www.edssmvs.cz/DocumentsList.aspx?Agenda=CEIS |
ZC_UCRIS | ucris | purpose | Účel | post-2014, no codelist found |
0FUNC_AREA | paragraf | sector line | paragraf (odvětvové členění) | - |
ZU_ROZSCH | budget_adopted | budget as originally adopted | schválený rozpočet | - |
ZU_ROZPZM | budget_amended | budget as amended throughout the year | rozpočet po změnách | - |
ZU_KROZP | budget_final | final budget | konečný rozpočet | - |
ZU_OBLIG | budget_oblig | ? | obligo | - |
ZU_ROZKZ | budget_spending | actual spending | skutečnost | - |
Table budget-indicators
| Original | Output | English | Czech | Note |
| ZC_PSUK | psuk | budgetary indicator | Závazný a průřezový indikátor | Use psuk
codelist |
**Table budget-central-old-subsidies
**
| Original | Output | English | Czech | Note | | ZC_ZREUZ | ucelznak | Purpose identifier | Účelový znak | - |
**Table budget-central-old-purpose-grants
**
| Original | Output | English | Czech | Note | | 0PU_MEASURE | rozprog | Budgetary programme | Rozpočtový program | - |
Tables balance-sheet*
Original | Output | English | Czech | Note |
ZC_POLVYK | polvyk | item/line | položka výkazu | - |
ZC_SYNUC | synuc | synthetic account | syntetický účet | - |
ZU_MONET | previous_net | net, previous period | netto minulé období | - |
ZU_AOBTTO | current_gross | gross, current period | brutto běžné období | - |
ZU_AONET | current_net | net, current period | netto běžné období | - |
ZU_AOKORR | current_correction | correction, current period | korekce běžné období | - |
Tables profit-and-loss-*
Original | Output | English | Czech | Note |
ZU_HLCIN | previous_core | core activity, previous period | hlavní činnost, minulé období | - |
ZU_HOSCIN | previous_economic | economic activity, previous period | hospodářská činnost, minulé období | - |
ZU_HLCIBO | current_core | core activity, current period | hlavní činnost, běžné období | - |
ZU_HCINBO | current_economic | economic activity, current period | hospodářská činnost, běžné období | - |
Table changes-in-equity
Original | Output | English | Czech | Note |
ZU_STAVP | before | previous period | stav minulé období | - |
ZU_STAVPO | after | current period | stav běžné období | - |
ZU_ZVYS | increase | increase | zvýšení stavu | - |
ZU_SNIZ | decrease | decrease | snížení stavu | - |
Table cash-flow
Original | Output | English | Czech | Note |
ZU_BEZUO | current | current period | běžné účetní období | - |
Value
a tibble; see Details for key to the columns
See Also
Other Core workflow:
add_codelist()
,
get_codelist()
,
sp_add_codelist()
,
sp_get_codelist()
,
sp_get_dataset()
Examples
## Not run:
allorgs_2018 <- sp_get_table("budget-central", 2018)
allorgs_mid2018 <- sp_get_table("budget-central", 2018, 6)
oneorg_multiyear <- sp_get_table("budget-central", 2017:2018, 12, ico = "00064581")
oneorg_multihalfyears <- sp_get_table("budget-central", 2017:2018, c(6, 12), ico = "00064581")
## End(Not run)
Get path to a CSV file containing a table.
Description
This is normally called inside sp_get_table()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
Usage
sp_get_table_file(table_id, dataset_path, reunzip = FALSE)
Arguments
table_id |
Table ID; see |
dataset_path |
Path to downloaded dataset, as output by |
reunzip |
Whether to overwrite existing CSV files by unzipping the archive downlaoded by |
Value
Character vector of length one - a path.
See Also
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_codelist_url()
,
sp_get_dataset_url()
,
sp_load_codelist()
,
sp_load_table()
Examples
## Not run:
ds <- sp_get_dataset("rozv", 2018, 12)
sp_get_table_file("balance-sheet", ds)
## End(Not run)
Load codelist into a tibble from XML file
Description
This is normally called inside sp_get_codelist()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
Usage
sp_load_codelist(path, n = NULL)
Arguments
path |
Path to a file as returned by |
n |
Number of rows to return. Default (NULL) means all. Useful for quickly inspecting a codelist. |
Value
a tibble
See Also
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_codelist_url()
,
sp_get_dataset_url()
,
sp_get_table_file()
,
sp_load_table()
Examples
## Not run:
cf <- sp_get_codelist_file("druhuj")
sp_load_codelist(cf)
## End(Not run)
Load a statnipokladna table from a CSV file
Description
This is normally called inside sp_get_table()
but can be used separately if
finer-grained control of intermediate outputs is needed, e.g. in a {targets}
workflow.
Usage
sp_load_table(path, ico = NULL)
Arguments
path |
path to a CSV file, as output by |
ico |
Organisation ID to filter by, if supplied. |
Value
a tibble. See help for sp_get_table()
for a key to the columns.
See Also
Other Detailed workflow:
sp_get_codelist_file()
,
sp_get_codelist_url()
,
sp_get_dataset_url()
,
sp_get_table_file()
,
sp_load_codelist()
Examples
## Not run:
ds <- sp_get_dataset("rozv", 2018, 12)
tf <- sp_get_table_file("balance-sheet", ds)
sp_load_table(tf)
## End(Not run)
List of available tables (PARTIAL)
Description
Contains IDs and names of all available tables that can be
retrieved by sp_get_table. Look inside the XLS documentation for each dataset at https://monitor.statnipokladna.cz/datovy-katalog/transakcni-data
to see more detailed descriptions. Note that tables do not correspond to the tabulka/vtab
attribute of the tables, they represent files inside datasets.
Usage
sp_tables
Format
A data frame with 2 rows and 4 variables:
id
character Table id, used as
table_id
argument tosp_get_table
.dataset_id
integer Table number.
czech_name
character Czech name of the table.
note
character Note.
See Also
Other Lists of available entities:
sp_codelists
,
sp_datasets