Title: | Processing Regional Statistics |
Version: | 0.1.8 |
Date: | 2021-06-19 |
Description: | Validating sub-national statistical typologies, re-coding across standard typologies of sub-national statistics, and making valid aggregate level imputation, re-aggregation, re-weighting and projection down to lower hierarchical levels to create meaningful data panels and time series. |
License: | GPL-3 |
Encoding: | UTF-8 |
Language: | en-US |
URL: | https://regions.dataobservatory.eu/ |
BugReports: | https://github.com/rOpenGov/regions |
LazyData: | true |
RoxygenNote: | 7.1.1 |
Depends: | R (≥ 2.10) |
Imports: | dplyr, magrittr, countrycode, tidyselect, utils, purrr, rlang, glue, stats, tidyr, readxl, stringr, assertthat, tibble, here |
Suggests: | knitr, testthat, rmarkdown, covr, spelling, devtools, eurostat, ggplot2 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2021-06-21 10:13:45 UTC; Daniel Antal |
Author: | Daniel Antal |
Maintainer: | Daniel Antal <daniel.antal@ceemid.eu> |
Repository: | CRAN |
Date/Publication: | 2021-06-21 11:20:01 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
European Union: All Valid NUTS Codes
Description
A dataset containing all recognised geo codes in the EU
NUTS correspondence tables. This is re-arranged from
nuts_changes
.
Usage
all_valid_nuts_codes
Format
A data frame with 3 variables:
- geo
NUTS geo identifier
- typology
country, NUTS1, NUTS2 or NUTS3
- nuts
The NUTS definition where the geo code can be found.
Source
https://ec.europa.eu/eurostat/web/nuts/history/
See Also
nuts_recoded, nuts_changes, nuts_exceptions
Australia: States And Territories
Description
A dataset containing the states and territories of Australia.
Usage
australia_states
Format
A data frame with 8 rows and 3 variables:
- country_code
ISO 3166-1 country codes
- geo_code
subdivision codes within Australia (states and territories)
- geo_name
subdivision names within Australia (states and territories)
Source
The Online Browsing Platform of the International Organization for Standardization https://www.iso.org/obp/ui/#iso:code:3166:AU
Create the nuts_lau_2019 correspondence table May be used to create similar historical correspondence tables.
Description
Create the nuts_lau_2019 correspondence table May be used to create similar historical correspondence tables.
Usage
create_nuts_lau_2019()
Value
A data.frame which is also saved and can be retrieved with
data(nuts_lau_2019).
Use this function as a template to
obtain historical correspondence tables.
Daily Internet Users
Description
A dataset containing the percentage of individuals who used the Internet on a daily basis in the European countries and regions.
Usage
daily_internet_users
Format
A data frame with 3 variables:
- geo
National and sub-national geographical codes from Eurostat
- time
Time, coded as a numeric variable of the year, 2006-2019
- values
The numeric statistical values
Details
The fresh version of this statistic can be obtained by
eurostat::get_eurostat("isoc_r_iuse_i", time_format = "num")
and filtered for the indic_is = "I_IDAY"
indicator and the
unit="PC_IND"
unit.
Source
The eventual source of the data is the Eurostat table isoc_r_iuse_i
https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=isoc_r_iuse_i&lang=en
Get Country Code Of Regions
Description
The function identifies the sub-national geographical identifiers from known typologies and returns the ISO 3166-1 alpha-2 country codes.
Usage
get_country_code(geo, typology = "NUTS")
Arguments
geo |
A character variable with geo codes. |
typology |
Currently the following typologies are supported:
|
Value
The ISO 3166-1 alpha-2 codes of the countries as a character vector.
See Also
Other recode functions:
recode_nuts()
Examples
{
get_country_code (c("EL", "GR", "DED", "HU102"))
}
Google Mobility Report European Correspondence Table
Description
A dataset containing the correspondence table between the EU NUTS 2016 typology and the typology used by Google in the Google Mobility Reports.
Usage
google_nuts_matchtable
Format
A data frame with 817 rows and 6 variables:
- country_code
ISO 3166-1 alpha2 code
- google_region_level
Hierarchical level in the Google Mobility Reports
- google_region_name
The name used by Google.
- code_2016
NUTS code in the 2016 definition
- typology
country, NUTS1, NUTS2 or NUTS3, nuts_level_3_lau, nuts_level_3_iso-3166-2
- valid_2016
Logical variable, if the coding is valid in NUTS2016
Details
In some cases only a full correspondence is not possible. In these
cases we created pseudo-NUTS codes, which have a FALSE
valid_2016
value. These pseudo-NUTS codes can help
approximation for the underlying regions.
Pseudo-NUTS codes were used in Estonia, Italy, Portugal, Slovenia and in parts of Latvia.
In Latvia and Slovenia, the pseudo NUTS code is a combination of the the containing NUTS3 code and the municipality's LAU code.
In Estonia, they are a combination of the NUTS3 code and the
ISO-3166-2
LAU code (county level.) This is the case in most of
Portugal and the United Kingdom, too. In these cases the pseudo-codes refer to a
quasi-NUTS4 code, which are smaller than the containing NUTS3 region,
therefore they should be aggregated.
A special case is ITD_IT-32
, which is is a combination
of two NUTS2 statistical regions, but it forms under the ISO-3166-2
ITD_IT-32
a single unit, the autonomous region of
Trentino and South Tyrol. In this case, they should be disaggregated.
A similar solution is required for the United Kingdom.
Author(s)
Istvan Zsoldos, Daniel Antal
Source
https://ec.europa.eu/eurostat/web/nuts/history/
Imputing Data From Larger To Smaller Units
Description
Imputing Data From Larger To Smaller Units
Usage
impute_down(
upstream_data = NULL,
downstream_data = NULL,
country_var = "country_code",
regional_code = "geo_code",
values_var = "values",
time_var = NULL,
upstream_method_var = NULL,
downstream_method_var = NULL
)
Arguments
upstream_data |
An upstream data frame to project on containing smaller geographical units, for example, country-level data. |
downstream_data |
A downstream data frame containing the smaller level missing data observations. It must contain all the necessary structural information for imputation. |
country_var |
The geographical ID of the upstream data,
defaults to |
regional_code |
The geographical ID of the downstream data,
defaults to |
values_var |
The variable that contains the upstream data to be
imputed to the downstream data, defaults to |
time_var |
The time component, if present, defaults to
|
upstream_method_var |
The name of the variable that contains the
potentially applied imputation methods. Defaults to |
downstream_method_var |
The name of the variable that will contain
the metadata of the potentially applied imputation methods.
Defaults to |
Value
The upstream data frame (containing data of a larger unit) and
the downstream data (containing data of smaller sub-divisional units) are
joined; whenever data is missing in the downstream sub-divisional column,
it is imputed with the corresponding values from the upstream data frame.
The 'method'
metadata column explains if the actual downstream
data or the imputed data can be found in the downstream value column.
See Also
Other impute functions:
impute_down_nuts()
Examples
{
upstream <- data.frame ( country_code = rep( "AU", 3),
year = c(2018:2020),
my_var = c(10,12,11),
description = c("note1", NA_character_,
"note3")
)
downstream <- australia_states
impute_down ( upstream_data = upstream,
downstream_data = downstream,
country_var = "country_code",
regional_code = "geo_code",
values_var = "my_var",
time_var = "year" )
}
Imputing Data From Larger To Smaller Units in the EU NUTS
Description
This is a special case of impute_down
for the EU NUTS
hierarchical typologies. All valid actual rows will be projected down
to all smaller constituent typologies where data is missing.
Usage
impute_down_nuts(
dat,
geo_var = "geo",
values_var = "values",
method_var = NULL,
nuts_year = 2016
)
Arguments
dat |
A data frame with exactly two or three columns: |
geo_var |
The variable that contains the geographical codes in the NUTS typologies, defaults to code"geo_var". |
values_var |
The variable that contains the upstream data to be
imputed to the downstream data, defaults to |
method_var |
The variable that contains the metadata on various
processing information, defaults to |
nuts_year |
The year of the NUTS typology to use, it defaults to the
currently valid |
Details
The more general function requires typology information from the higher and lower level typologies. This is not needed when the EU vocabulary is used, and the hierarchy can be established from the EU vocabularies.
Be mindful that while all possible imputations are made, imputations beyond one hierarchical level will result in very crude estimates.
The imputed dataset dat
must refer to a single time unit, i.e.
panel data is not supported.
Value
An augmented version of the dat
imputed data frame with all
possible projections to valid smaller units, i.e. NUTS0 = country
values
imputed to all missing NUTS1
units, NUTS1
values
imputed to all missing NUTS2
units, NUTS2
values
imputed to all missing NUTS3
units.
See Also
Other impute functions:
impute_down()
Examples
data(mixed_nuts_example)
impute_down_nuts(mixed_nuts_example, nuts_year = 2016)
Example Data Frame: Mixed EU Typologies.
Description
This data frame is a fictious example that contains in a small, easy-to-review example many potential typological problems. It is used to test imputation functions and to create examples with them.
Usage
mixed_nuts_example
Format
A data frame with 22 rows and 3 variables:
- geo
NUTS geo identifier, mixed from 4 typology levels.
- values
Random numbers.
- method
Descriptive metadata.
Source
https://ec.europa.eu/eurostat/web/nuts/history/
See Also
nuts_changes, all_valid_nuts_codes, impute_down_nuts
European Union: Recoded NUTS units 1995-2021.
Description
A dataset containing the joined correspondence tables of the EU NUTS typologies.
Usage
nuts_changes
Format
A data frame with 3097 rows and 22 variables:
- typology
country, NUTS1, NUTS2 or NUTS3
- start_year
The year when the code was first used
- end_year
The year when the code was last used
- code_1999
NUTS code in the 2003 definition
- code_2003
NUTS code in the 2003 definition
- code_2006
NUTS code in the 2006 definition
- code_2010
NUTS code in the 2010 definition
- code_2013
NUTS code in the 2013 definition
- code_2016
NUTS code in the 2016 definition
- code_2021
NUTS code in the 2021 definition
- geo_name_2003
NUTS territorial name in the 2003 definition
- geo_name_2006
NUTS territorial name in the 2006 definition
- geo_name_2010
NUTS territorial name in the 2010 definition
- geo_name_2013
NUTS territorial name in the 2013 definition
- geo_name_2016
NUTS territorial name in the 2016 definition
- geo_name_2021
NUTS territorial name in the 2021 definition
- change_2003
Change described in the 2003 correspondence table
- change_2006
Change described in the 2006 correspondence table
- change_2010
Change described in the 2010 correspondence table
- change_2013
Change described in the 2013 correspondence table
- change_2016
Change described in the 2016 correspondence table
- change_2021
Change described in the 2021 correspondence table
Source
https://ec.europa.eu/eurostat/web/nuts/history/
See Also
nuts_recoded, all_valid_nuts_codes
NUTS Coding Exceptions
Description
A dataset containing exceptions to the NUTS geographical codes.
Usage
nuts_exceptions
Format
A data frame with 2 variables:
- geo
National and sub-national geographical codes from Eurostat
- typology
Short description of exception
Details
They contains non-EU regions that are consistent with NUTS, but not defined within the NUTS.
The also contain European country codes that do not conform with NUTS.
Source
Eurostat NUTS history: https://ec.europa.eu/eurostat/web/nuts/history/
See Also
nuts_recoded, nuts_changes, all_valid_nuts_codes
European Union: NUTS And LAU Correspondence
Description
A dataset containing the joined correspondence tables of the EU NUTS and local administration units (LAU) typologies.
Usage
nuts_lau_2019
Format
A data frame with 99140 rows and 22 variables:
- code_2016
NUTS3 code of the local administrative unit, 2016 definition
- lau_code
Local Administrative Unit code
- lau_name_national
LAU name, official in national language(s)
- lau_name_latin
LAU name, official Latin alphabet version
- name_change_last_year
Change in name in the year before?
- population
Population
- total_area_m2
Area in square meters
- degurba
Degree of urbanization
- degurba_change_last_year
Change in degree of urbanization?
- coastal_area
Part of coastal area classification?
- coastal_change_last_year
Change in coastal area classification
- city_id
NUTS territorial name in the 2006 definition
- city_id_change_last_year
NUTS territorial name in the 2010 definition
- city_name
Name of the city
- greater_city_id
Containing metro area ID, if applicable
- greater_city_id_change_last_year
Change in metro area ID
- greater_city_name
Name of containing greater city (metropolitan) area, if applicable
- fua_id
FUA ID
- fua_id_change_last_year
Change of FUA ID since last year
- fua_name
Name in FUA database
- country
NUTS country code with exceptions: EL for Greece, UK for United Kingdom
- gisco_id
GISCO ID
Details
This is also the authoritative vocabulary for local administration, names, including city and metropolitan area names.
Source
https://ec.europa.eu/eurostat/web/nuts/local-administrative-units
See Also
nuts_recoded, all_valid_nuts_codes
European Union: Recoded NUTS units 1995-2021.
Description
Containing all recoded NUTS units from the European Union.
This is re-arranged from nuts_changes
.
Usage
nuts_recoded
Format
A data frame with 8 rows and 3 variables:
- geo
NUTS geo identifier
- typology
country, NUTS1, NUTS2 or NUTS3
- nuts_year
year of the NUTS definition or version
- change_year
when the geo code changed
- iso2c
Two character ISO standard country codes.
Source
https://ec.europa.eu/eurostat/web/nuts/history/
See Also
nuts_changes, all_valid_nuts_codes
Recode Region Codes From Source To Target NUTS Typology
Description
Validate your geo codes, pair them with the appropriate standard
typology, look up potential causes of invalidity in the EU correspondence
tables, and look up the appropriate geographical codes in the other
(target) typology. For example, validate geo codes in the 'NUTS2016'
typology and translate them to the now obsolete the 'NUTS2010'
typology
to join current data with historical data sets.
Usage
recode_nuts(dat, geo_var = "geo", nuts_year = 2016)
Arguments
dat |
A data frame with a 3-5 character |
geo_var |
Defaults to |
nuts_year |
The year of the NUTS typology to use.
You can select any valid
NUTS definition, i.e. |
Value
The original data frame with a 'geo_var'
column is extended
with a 'typology'
column that states in which typology is the 'geo_var'
a valid code. For invalid codes, looks up potential reasons of invalidity
and adds them to the 'typology_change'
column, and at last it
adds a column of character vector containing the desired codes in the
target typology, for example, in the NUTS2013 typology.
See Also
Other recode functions:
get_country_code()
Examples
{
foo <- data.frame (
geo = c("FR", "DEE32", "UKI3" ,
"HU12", "DED",
"FRK"),
values = runif(6, 0, 100 ),
stringsAsFactors = FALSE )
recode_nuts(foo, nuts_year = 2013)
}
R&D Personnel by NUTS 2 Regions
Description
A subset of the Eurostat dataset
R&D personnel and researchers by sector of performance, sex and NUTS 2 regions
.
Usage
regional_rd_personnel
Format
A data frame with 956 observations of 7 variables:
- geo
National and sub-national geographical codes from Eurostat
- time
Time, coded as a numeric variable of the year, 2006-2019
- values
The numeric statistical values
- unit
Unit of measurement, contains only FTE
- sex
Sex of researchers, contains only both sexes as T
- prof_pos
Professional position, contains all R&D employees not only researchers
- sectperf
Sector of performance, filtered for all sectors as TOTAL
Details
Mapping Regional Data, Mapping Metadata Problem
The fresh version of this statistic can be obtained by
eurostat::get_eurostat_json (id = "rd_p_persreg",
filters = list (sex = "T", prof_pos = "TOTAL",sectperf = "TOTAL", unit = "FTE" ))
Source
https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=rd_p_persreg&lang=en
See Also
recode_nuts
regions: A package for working with regional statistics.
Description
The regions package provides four categories of functions: validate, recode, impute and aggregate.
validate functions
The validate functions validate the conformity of a typological (geographical) label with a certain typology. Currently the EU statistical NUTS typologies and countries are implemented.
recode functions
These functions correct the geo coding of sub-national statistics, or bring them to a consistent format.
impute functions
The impute functions impute data from one regional unit to a different
level of regional unit, such as a country level data to a province / state
level data.
impute_down
and provides
imputation functions from higher aggregation hierarchy levels to
lower ones, for example from ISO-3166-1
to ISO-3166-2
.
impute_down_nuts
provides the same functionality with the
EU typologies, but with far less work, because they rely on the internal
hierarchical structure of these metadata, for example, from NUTS1
to NUTS2
.
aggregate functions
Aggregation function from lower hierarchy levels to higher ones,
for example from NUTS3 to NUTS1
or from ISO-3166-2
to
ISO-3166-1
.
Disaggregation functions from higher hierarchy levels to lower ones,
for example from NUTS1
to NUTS2
or from
ISO-3166-1
to ISO-3166-2
.
Validate Parameter 'dat'
Description
Validate Parameter 'dat'
Usage
validate_data_frame(
dat,
geo_var = NULL,
nuts_year = NULL,
values_var = NULL,
method_var = NULL
)
Arguments
dat |
A data frame input to be validated. |
geo_var |
The variable that contains the geographical codes in the NUTS typologies, defaults to code"geo_var". |
nuts_year |
The year of the NUTS typology to use. |
values_var |
The variable that contains the upstream data to be
imputed to the downstream data, defaults to |
method_var |
The variable that contains the metadata on various
processing information, defaults to |
Value
A logical variable showing if all assertions were met.
Validate Conformity with NUTS Geo Codes (vector)
Description
Validate that geo
is conforming with the NUTS1
,
NUTS2
, or NUTS3
typologies.
While country codes are technically not part of the NUTS typologies,
Eurostat de facto uses a NUTS0
typology to identify countries.
This de facto typology has three exception which are handled by the
validate_nuts_countries function.
Usage
validate_geo_code(geo, nuts_year = 2016)
Arguments
geo |
A vector of geographical code to validate. |
nuts_year |
A valid NUTS edition year. |
Details
NUTS typologies have different versions, therefore the conformity
is validated with one specific versions, which can be any of these:
1999
, 2003
, 2006
, 2010
,
2013
, the currently used 2016
and the already
announced and defined 2021
.
The NUTS typology was codified with the NUTS2003
, and the
pre-1999 NUTS typologies may confuse programmatic data processing,
given that some NUTS1 regions were identified with country codes
in smaller countries that had no NUTS1
divisions.
Currently the 2016
is used by Eurostat, but many datasets
still contain 2013
and sometimes earlier metadata.
Value
A character list with the valid typology, or 'invalid' in the cases when the geo coding is not valid.
Examples
my_reg_data <- data.frame (
geo = c("BE1", "HU102", "FR1",
"DED", "FR7", "TR", "DED2",
"EL", "XK", "GB"),
values = runif(10))
validate_geo_code(my_reg_data$geo)
Validate Conformity with NUTS Country Codes
Description
This function is mainly a wrapper around the well-known countrycode function, with three exception that are particular to the European Union statistical nomenclature.
- EL
Treated valid, because NUTS uses EL instead of GR for Greece since 2010.
- UK
Treated valid, because NUTS uses UK instead of GB for the United Kingdom.
- XK
XK is used for Kosovo, because Eurostat uses this code, too.
All ISO-3166-1 country codes are validated, and also the three exceptions.
Usage
validate_nuts_countries(dat, geo_var = "geo")
Arguments
dat |
A data frame with a 2-character geo variable to be validated |
geo_var |
Defaults to |
Value
The original data frame extended with the column 'typology'
.
This column states 'country'
for valid country typology coding, or
appropriate label for invalid ISO-3166-alpha-2 and ISO-3166-alpha-3 codes.
See Also
Other validate functions:
validate_nuts_regions()
Examples
{
my_dat <- data.frame (
geo = c("AL", "GR", "XK", "EL", "UK", "GB", "NLD", "ZZ" ),
values = runif(8)
)
## NLD is an ISO 3-character code and is not validated.
validate_nuts_countries(my_dat)
}
Validate Conformity With NUTS Geo Codes
Description
Validate that geo_var
is conforming with the NUTS1
,
NUTS2
, or NUTS3
typologies.
While country codes are technically not part of the NUTS typologies,
Eurostat de facto uses a NUTS0
typology to identify countries.
This de facto typology has three exception which are handled by the
validate_nuts_countries function.
Usage
validate_nuts_regions(dat, geo_var = "geo", nuts_year = 2016)
Arguments
dat |
A data frame with a 3-5 character |
geo_var |
Defaults to |
nuts_year |
The year of the NUTS typology to use.
Defaults to |
Details
NUTS typologies have different versions, therefore the conformity
is validated with one specific versions, which can be any of these:
1999
, 2003
, 2006
, 2010
,
2013
, the currently used 2016
and the already
announced and defined 2021
.
The NUTS typology was codified with the NUTS2003
, and the
pre-1999 NUTS typologies may confuse programmatic data processing,
given that some NUTS1 regions were identified with country codes
in smaller countries that had no NUTS1
divisions.
Currently the 2016
is used by Eurostat, but many datasets
still contain 2013
and sometimes earlier metadata.
Value
Returns the original dat
data frame with a column
that specifies the comformity with the NUTS definition of the year
nuts_year
.
See Also
Other validate functions:
validate_nuts_countries()
Examples
my_reg_data <- data.frame (
geo = c("BE1", "HU102", "FR1",
"DED", "FR7", "TR", "DED2",
"EL", "XK", "GB"),
values = runif(10))
validate_nuts_regions (my_reg_data)
validate_nuts_regions (my_reg_data, nuts_year = 2013)
validate_nuts_regions (my_reg_data, nuts_year = 2003)
Validate Mandatory Parameters
Description
These parameters must not be NULL
. The param_name
is needed for a
meaningful error message.
Usage
validate_param(param, param_name)
Arguments
param |
A parameter value that must not be |
param_name |
The name of the parameter that must not have a value of |
Value
A boolean, logical variable if the mandatory parameter is present.
Assertion for Correct Function Calls
Description
Assertions are made to give early and precise error messages for wrong API call parameters.
Usage
validate_parameters(typology = NULL, param = NULL, param_name = NULL)
Arguments
typology |
Currently the following typologies are supported:
|
param |
A parameter value that must not be |
param_name |
The name of the parameter that must not have a value of |
Details
These assertions are called from various wrapper functions. However, you can also call this function directly to make sure that you are adding (programmatically) the correct parameters to a call.
All validate_parameters
parameters default to NULL
.
Asserts the correct parameter values for any values that are not NULL
.
Value
A boolean, logical variable if the parameter calls are valid.
Validate typology Parameter
Description
Validate typology Parameter
Usage
validate_typology(typology)
Arguments
typology |
Currently the following typologies are supported:
|
Value
A boolean, logical variable if the typology in question exists, the typology parameter is valid.