--- title: "Tutorial" output: prettydoc::html_pretty: toc: true theme: architect highlight: github includes: # in_header: header.html vignette: > %\VignetteIndexEntry{Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteDepends{kableExtra,magrittr,htmltools} --- ```{r setup,echo=FALSE, include=FALSE} # setup chunk # Sys.setenv("NOT_CRAN" = "TRUE") NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")),"true") knitr::opts_chunk$set(purl = NOT_CRAN) library(insee) library(dplyr) library(magrittr) library(stringr) embed_png <- function(path, dpi = NULL) { meta <- attr(png::readPNG(path, native = TRUE, info = TRUE), "info") if (!is.null(dpi)) meta$dpi <- rep(dpi, 2) knitr::asis_output(paste0( "" ))} ``` ```{r message=FALSE, warning=FALSE, include=FALSE} library(kableExtra) library(htmltools) library(prettydoc) ``` # Introduction The [insee](https://CRAN.R-project.org/package=insee) package gathers tools to easily download data and metadata from insee BDM database. It uses SDMX queries under the hood. Have a look at the detailed SDMX webservice page on insee.fr. The first version of the package was published on CRAN 2020-07-29. ## Proxy issues
**Requirement for INSEE employees**
In order for someone working behind a proxy server to be able to use [insee](https://CRAN.R-project.org/package=insee), it is necessary to modify system variables as follow. ```{r, message = FALSE, warning = FALSE, eval = FALSE} Sys.setenv(http_proxy = "my_proxy_server") Sys.setenv(https_proxy = "my_proxy_server") ```
**Requirement for DG TRESOR employees**
In order for someone working behind a proxy server to be able to use [insee](https://CRAN.R-project.org/package=insee), it is necessary to modify system variables as follow. ```{r, message = FALSE, warning = FALSE, eval = FALSE} Sys.setenv(INSEE_download_option_method = "mymethod") Sys.setenv(INSEE_download_option_port = "1234") Sys.setenv(INSEE_download_option_extra = "-U : --proxy-myprotocol --proxy myproxy:1234") Sys.setenv(INSEE_download_option_proxy = "myproxy") Sys.setenv(INSEE_download_option_auth = "myprotocol") ``` ## Installation & Loading You can easily install [insee](https://CRAN.R-project.org/package=insee) with the following code : ```{r, message = FALSE, warning = FALSE, eval = FALSE} # Get the development version from GitHub # install.packages("devtools") devtools::install_github("InseeFr/R-Insee-Data") # CRAN version # install.packages("insee") # library Loading library(insee) ``` # Functionalities This section will give you an overview of what you can do with [insee](https://CRAN.R-project.org/package=insee). Series have two identifiers the SDMX identifier and the so called idbank. Both can be used to download data. ## Datasets List INSEE BDM database offers more than 200 Datasets. The get_dataset_list() function returns the datasets catalogue : ```{r, message = FALSE, warning = FALSE, eval = FALSE} insee_dataset = get_dataset_list() ``` ```{r echo = FALSE, message = FALSE, warning = FALSE, eval = FALSE} rownames(insee_dataset) <- NULL insee_dataset %>% select(id, Name.en, Name.fr, url, n_series) %>% slice(1:10) %>% kable(row.names=NA) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed")) ``` ## Series Keys List INSEE BDM database currently offers more than 150 000 series. The get_idbank_list function returns the series catalogue from a dataset name. ```{r, message=FALSE,warning=FALSE,eval=FALSE} idbank_list = get_idbank_list('BALANCE-PAIEMENTS') ``` ```{r echo=FALSE, message=FALSE, warning=FALSE,eval=FALSE} idbank_list = get_idbank_list() rownames(idbank_list) <- NULL idbank_list %>% select(nomflow, idbank, cleFlow) %>% group_by(nomflow) %>% slice(1) %>% ungroup() %>% head(10) %>% kable(row.names=NA) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed")) ``` ## Find a series key The best way to download data is to find the right series key (idbank), but how ? Indeed, in some cases it is not easy to understand what are the differences among series, especially for non-French speakers. To make the search easier, the best way is to use the get_idbank_list function with a dataset name, then it can be helpful to filter with the columns FREQ, NATURE, UNIT etc. Moreover, the [insee](https://CRAN.R-project.org/package=insee) package provides the function add_insee_title to get titles from idbanks, either in English or in French. It is not advised to use the function on the whole idbank dataset, as each SDMX query has 400-idbank limit. Then, add_insee_title function splits the list into several lists of 400 idbanks each. Thus, the user should filter the idbank dataset before using the function to avoid as much as possible this bottleneck as the following example shows. After the data retrieval, it is really nice to use the split_title function on the dataframe to get more readable titles easy to use in plots and add_insee_metadata to get the metadata with the data. ```{r message=FALSE, warning=FALSE,eval=FALSE} idbank_list_selected = get_idbank_list("IPI-2015") %>% #industrial production index dataset filter(FREQ == "M") %>% #monthly filter(NATURE == "INDICE") %>% #index filter(CORRECTION == "CVS-CJO") %>% #Working day and seasonally adjusted SA-WDA #automotive industry and overall industrial production filter(str_detect(NAF2,"^29$|A10-BE")) %>% add_insee_title() ``` Another way to find a series key is to perform a keyword-based search with the function search_insee. Beware that this function uses package internal data which might not be the most up-to-date. See the following examples : ```{r message=FALSE, warning=FALSE,eval=FALSE} # search multiple patterns dataset_survey_gdp = search_insee("Survey|gdp") # data about paris data_paris = search_insee('paris') # all data data_all = search_insee() ``` ## Download data ### Download using a list of idbanks The get_insee_idbank function should handle up to 1200 idbanks. It is then advised to narrow down the idbanks list used as argument of the function. Otherwise, put the limit argument to FALSE to ignore the function's idbank limit. ```{r message=FALSE, warning=FALSE,eval=FALSE} library(insee) # the user can make a manual list of idbanks to get the data # example 1 data = get_insee_idbank("001558315", "010540726") %>% add_insee_metadata() # using a list of idbanks extracted from the insee idbank dataset # example 2 : household's confidence survey df_idbank = get_idbank_list("ENQ-CONJ-MENAGES") %>% #monthly households' confidence survey add_insee_title() %>% filter(CORRECTION == "CVS") #seasonally adjusted list_idbank = df_idbank %>% pull(idbank) data = get_insee_idbank(list_idbank) %>% split_title() %>% add_insee_metadata() # example 3 : get more than 1200 idbanks idbank_dataset = get_idbank_list() df_idbank = idbank_dataset %>% slice(1:1201) list_idbank = df_idbank %>% pull(idbank) data = get_insee_idbank(list_idbank, firstNObservations = 1, limit = FALSE) ``` ### Download using a dataset name For some datasets as IPC-2015 (inflation), the filter is necessary. ```{r message=FALSE, warning=FALSE,eval=FALSE} insee_dataset = get_dataset_list() # example 1 : full dataset data = get_insee_dataset("CLIMAT-AFFAIRES") # example 2 : filtered dataset # the user can filter the data data = get_insee_dataset("IPC-2015", filter = "M+A.........CVS.", startPeriod = "2015-03") # in the filter, the + is used to select several values in one dimension, like an "and" statement # the void means "all" values available # example 3 : only one series # by filtering with the full SDMX series key, the user will get only one series data = get_insee_dataset("CNA-2014-CPEB", filter = "A.CNA_CPEB.A38-CB.VAL.D39.VALEUR_ABSOLUE.FE.EUROS_COURANTS.BRUT", lastNObservations = 10) ``` # Examples * [GDP growth rate](https://pyr-opendatafr.github.io/R-Insee-Data/articles/v2_gdp-vignettes.html) * [Inflation](https://pyr-opendatafr.github.io/R-Insee-Data/articles/v3_inflation-vignettes.html) * [Unemployment rate](https://pyr-opendatafr.github.io/R-Insee-Data/articles/v4_unem-vignettes.html) * [Population by age](https://pyr-opendatafr.github.io/R-Insee-Data/articles/v5_pop-vignettes.html) * [Population map](https://pyr-opendatafr.github.io/R-Insee-Data/articles/v6_pop_map-vignettes.html) * [Deaths and Births](https://pyr-opendatafr.github.io/R-Insee-Data/articles/v7_death_birth-vignettes.html) # Support Feel free to open an issue with any question about this package using Github repository