Type: | Package |
Title: | Tools to Easily Download Data from INSEE BDM Database |
Version: | 1.1.7 |
Description: | Using embedded sdmx queries, get the data of more than 150 000 insee series from 'bdm' macroeconomic database. |
URL: | https://pyr-opendatafr.github.io/R-Insee-Data/ |
Encoding: | UTF-8 |
License: | MIT + file LICENSE |
VignetteBuilder: | knitr |
BugReports: | https://github.com/pyr-opendatafr/R-Insee-Data/issues |
Imports: | httr, xml2, tibble, dplyr, stringr, tidyselect (≥ 1.2.0), rlang, purrr, crayon, openssl, rappdirs |
Suggests: | lubridate, ggplot2, prettydoc, htmltools, kableExtra, knitr, rmarkdown, markdown, magrittr, testthat, covr, png |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 2.10) |
NeedsCompilation: | no |
Packaged: | 2024-08-24 07:50:21 UTC; onyxia |
Author: | Hadrien Leclerc [aut, cre], INSEE [cph] |
Maintainer: | Hadrien Leclerc <leclerc.hadrien@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-08-26 07:30:01 UTC |
Add metadata to the raw data
Description
Add metadata to the raw data
Usage
add_insee_metadata(df)
Arguments
df |
a dataframe containing data obtained from get_insee_idbank or get_insee_dataset |
Details
Add metadata to the raw data obtained from get_insee_idbank or get_insee_dataset
Value
a tibble with the data given as parameter plus the corresponding metadata
Examples
library(magrittr)
data =
get_insee_idbank("001694061") %>%
add_insee_metadata()
Add a title column to the idbank list dataset
Description
Add a title column to the idbank list dataset
Usage
add_insee_title(df, n_split, lang = "en", split = TRUE, clean = TRUE)
Arguments
df |
a dataframe containing an idbank column called "idbank" or "IDBANK" |
n_split |
number of new columns, by default the maximum is chosen |
lang |
returns an English title, by default is "en", any other value returns a French title |
split |
split the title column in several columns, by default is TRUE |
clean |
remove the columns filled with NA (missing value), by default is TRUE |
Details
this function uses extensively the get_insee_title function. Then, it should be used on an already filtered dataset, not on the full idbank dataset (cf. get_insee_title). The number of separators in the official INSEE title can vary and is not normalized. Beware all title columns created may not be a cleaned dimension label.
Value
the same dataframe but with one or several title columns
Examples
library(magrittr)
library(dplyr)
idbank_empl =
get_idbank_list("EMPLOI-SALARIE-TRIM-NATIONAL") %>% #employment
slice(1:15) %>%
add_insee_title()
Get the title of dataset's columns
Description
Get the title of dataset's columns
Usage
get_column_title(dataset = NULL)
Arguments
dataset |
an INSEE's dataset, if NULL |
Value
a dataframe
Examples
column_titles_all_dataset = get_column_title()
column_titles = get_column_title("CNA-2014-CONSO-MEN")
Download a full INSEE's dataset list
Description
Download a full INSEE's dataset list
Usage
get_dataset_list()
Details
the datasets returned are the ones available through a SDMX query
Value
a tibble with 5 columns : id, Name.fr, Name.en, url, n_series
Examples
insee_dataset = get_dataset_list()
Download a full INSEE's series key list
Description
Download a full INSEE's series key list
Usage
get_idbank_list(..., dataset = NULL, update = FALSE)
Arguments
... |
one or several dataset names |
dataset |
if a dataset name is provided, only a subset of the data is delivered, otherwise all the data is returned, and column names refer directly to data dimensions |
update |
It is FALSE by default, if it is set to TRUE, it triggers the metadata update. This update is automatically triggered once every 6 months. |
Details
Download a mapping dataset between INSEE series keys (idbank) and SDMX series names. Under the hood the get_idbank_list uses download.file function from utils, the user can change the mode argument with the following command : Sys.getenv(INSEE_download_option_idbank_list = "wb") If INSEE makes an update, the user can also change the zip file downloaded, the data file contained in the zip and data the separator : Sys.setenv(INSEE_idbank_dataset_path = "new_zip_file_link") Sys.setenv(INSEE_idbank_sep = ",") Sys.setenv(INSEE_idbank_dataset_file = "new_data_file_name")
Value
a tibble the idbank dataset
Examples
# download datasets list
dt = get_dataset_list()
# use a dataset name to retrieve the series key list related to the dataset
idbank_list = get_idbank_list('CNT-2014-PIB-EQB-RF')
Get data from INSEE BDM database with a SDMX query link
Description
Get data from INSEE BDM database with a SDMX query link
Usage
get_insee(link, step = "1/1")
Arguments
link |
SDMX query link |
step |
argument used only for internal package purposes to tweak download display |
Details
Get data from INSEE BDM database with a SDMX query link. This function is mainly for package internal use. It is used by the functions get_insee_dataset, get_insee_idbank and get_dataset_list. The data is cached, hence all queries are only run once per R session. The user can disable the download display in the console with the following command : Sys.setenv(INSEE_download_verbose = "FALSE"). The use of cached data can be disabled with : Sys.setenv(INSEE_no_cache_use = "TRUE"). All queries are printed in the console with this command: Sys.setenv(INSEE_print_query = "TRUE").
Value
a tibble containing the data
Examples
insee_link = "http://www.bdm.insee.fr/series/sdmx/data/SERIES_BDM"
insee_query = file.path(insee_link, paste0("010539365","?", "firstNObservations=1"))
data = get_insee(insee_query)
Get dataset from INSEE BDM database
Description
Get dataset from INSEE BDM database
Usage
get_insee_dataset(
dataset,
startPeriod = NULL,
endPeriod = NULL,
firstNObservations = NULL,
lastNObservations = NULL,
includeHistory = NULL,
updatedAfter = NULL,
filter = NULL
)
Arguments
dataset |
dataset name to be downloaded |
startPeriod |
start date of data |
endPeriod |
end date of data |
firstNObservations |
get the first N observations for each key series (idbank) |
lastNObservations |
get the last N observations for each key series (idbank) |
includeHistory |
boolean to access the previous releases (not available on all series) |
updatedAfter |
starting point for querying the previous releases (format yyyy-mm-ddThh:mm:ss) |
filter |
Use the filter to choose only some values in a dimension. It is recommended to use it for big datasets. A dimension left empty means all values are selected. To select multiple values in one dimension put a "+" between those values (see example) |
Details
Get dataset from INSEE BDM database
Value
a tibble with the data
Examples
insee_dataset = get_dataset_list()
idbank_ipc = get_idbank_list("IPC-2015")
#example 1
data = get_insee_dataset("IPC-2015", filter = "M+A.........CVS..", startPeriod = "2015-03")
#example 2
data = get_insee_dataset("IPC-2015", filter = "A..SO...VARIATIONS_A....BRUT..SO",
includeHistory = TRUE, updatedAfter = "2017-07-11T08:45:00")
Get data from INSEE series idbank
Description
Get data from INSEE series idbank
Usage
get_insee_idbank(
...,
limit = TRUE,
startPeriod = NULL,
endPeriod = NULL,
firstNObservations = NULL,
lastNObservations = NULL,
includeHistory = NULL,
updatedAfter = NULL
)
Arguments
... |
one or several series key (idbank) |
limit |
by default, the function get_insee_idbank has a 1200-idbank limit. Set limit argument to FALSE to ignore the limit or modify the limit with the following command : Sys.setenv(INSEE_idbank_limit = 1200) |
startPeriod |
start date of data |
endPeriod |
end date of data |
firstNObservations |
get the first N observations for each key series (idbank) |
lastNObservations |
get the last N observations for each key series (idbank) |
includeHistory |
boolean to access the previous releases (not available on all series) |
updatedAfter |
starting point for querying the previous releases (format yyyy-mm-ddThh:mm:ss) |
Details
Get data from INSEE series idbanks. The user can disable the download display in the console with the following command : Sys.setenv(INSEE_download_verbose = "FALSE")
Value
a tibble with the data
Examples
#example 1 : import price index of industrial products and turnover index : manufacture of wood
data = get_insee_idbank("001558315", "010540726")
#example 2 : unemployment data
library(magrittr)
library(dplyr)
library(ggplot2)
df_idbank_list_selected =
get_idbank_list("CHOMAGE-TRIM-NATIONAL") %>% #unemployment dataset
filter(SEXE == 0) %>% #men and women
add_insee_title()
idbank_list_selected = df_idbank_list_selected %>% pull(idbank)
unem = get_insee_idbank(idbank_list_selected)
#example 3 : French GDP growth rate
df_idbank_list_selected =
get_idbank_list("CNT-2014-PIB-EQB-RF") %>% # Gross domestic product balance
filter(FREQ == "T") %>% #quarter
filter(OPERATION == "PIB") %>% #GDP
filter(NATURE == "TAUX") %>% #rate
filter(CORRECTION == "CVS-CJO") #SA-WDA, seasonally adjusted, working day adjusted
idbank = df_idbank_list_selected %>% pull(idbank)
data = get_insee_idbank(idbank) %>%
add_insee_metadata()
#plot
ggplot(data, aes(x = DATE, y = OBS_VALUE)) +
geom_col() +
ggtitle("French GDP growth rate, quarter-on-quarter, sa-wda") +
labs(subtitle = sprintf("Last updated : %s", data$TIME_PERIOD[1]))
Get title from INSEE series idbank
Description
Get title from INSEE series idbank
Usage
get_insee_title(..., lang = "en")
Arguments
... |
list of series key (idbank) |
lang |
language of the title, by default it is Engligh, if lang is different from "en" then French will be the title's language |
Details
Query INSEE website to get series title from series key (idbank). Any query to INSEE database can handle around 400 idbanks at maximum, if necessary the idbank list will then be splitted in several lists of 400 idbanks each. Consequently, it is not advised to use it on the whole idbank dataset, the user should filter the idbank dataset first.
Value
a character vector with the titles
Examples
#example 1 : industrial production index on manufacturing and industrial activities
title = get_insee_title("010537900")
#example 2 : automotive industry and overall industrial production
library(magrittr)
library(dplyr)
library(stringr)
idbank_list_selected =
get_idbank_list("IPI-2015") %>% #industrial production index dataset
filter(FREQ == "M") %>% #monthly
filter(NATURE == "INDICE") %>% #index
filter(CORRECTION == "CVS-CJO") %>% #Working day and seasonally adjusted SA-WDA
filter(str_detect(NAF2,"^29$|A10-BE")) %>% #automotive industry and overall industrial production
mutate(title = get_insee_title(idbank))
Search a pattern among insee datasets and idbanks
Description
Search a pattern among insee datasets and idbanks
Usage
search_insee(pattern = ".*")
Arguments
pattern |
string used to filter the dataset and idbank list |
Details
The data related to idbanks is stored internally in the package and might the most up to date. The function ignores accents and cases.
Value
the dataset and idbank table filtered with the pattern
Examples
# example 1 : search one pattern, the accents do not matter
writeLines("the word 'enqu\U00EAte' (meaning survey in French) will match with 'enquete'")
dataset_enquete = search_insee("enquete")
# example 2 : search multiple patterns
dataset_survey_gdp = search_insee("Survey|gdp")
# example 3 : data about paris
data_paris = search_insee('paris')
# example 4 : all data
data_all = search_insee()
Split the title column in several columns
Description
Split the title column in several columns
Usage
split_title(df, title_col_name, pattern, n_split = "max", lang = NULL)
Arguments
df |
a dataframe containing a title column |
title_col_name |
the column name to be splitted, if missing it will be either TITLE_EN |
pattern |
the value by default is stored in the package and it is advised to use it, but in some cases it is useful to use one's pattern |
n_split |
number of new columns, by default the maximum is chosen |
lang |
by default it returns both the French and the English title provided by INSEE |
Details
The number of separators in the official INSEE title can vary and is not normalized. Beware all title columns created may not be a cleaned dimension label.
Value
the same dataframe with the title column splitted
Examples
library(magrittr)
# quarterly payroll enrollment in the construction sector
data_raw = get_insee_idbank("001577236")
data = data_raw %>%
split_title()