class: center, middle, inverse, title-slide .title[ #
The rOpenSpain project
] .subtitle[ ## R and open data ] .author[ ###
Iñaki Úcar
| Postdoctoral Fellow @ uc3m-Santander Big Data Institute ] .date[ ### June 9, 2022 ] --- class: base28 # Introduction .left-column[ ## When ] .right-column[ It was **February 2018** when four people came together around open data: - **Carlos J. Gil Bellosta**, statistical consultant, CEO @ circiteR - **Luz Frías**, data developer, CTO @ circiteR - **José Manuel Vera**, senior data scientist - **Iñaki Úcar**, postdoctoral fellow @ IBiDat ### Motto > rOpenSci is our form; Spanish public data, our matter ] --- class: base28 # Introduction .left-column[ ## When ## What ] .right-column[ The **github.com/rOpenSpain** organization - Website (ropenspain.es) - Onboarding info - Templates (shout-outs to Diego Hernangómez) - **Packages** The **ropenspain.slack.com** channel - Do not hesitate to contact us for an invitation! ] --- class: base28 # Introduction .left-column[ ## When ## What ## How ] .right-column[ Do you have a package about Spanish data? <br>**Bring it to rOpenSpain!** - The author transfers the repo to our GH organization - The author retains full admin rights - The package is added to the webpage and is available for installation through our [r-universe](https://ropenspain.r-universe.dev) organization - We try to encourage R packaging best practices - We try to help each other out Do you need help? Get an invite to our Slack! ] --- # Introduction .left-column[ ## When ## What ## How ## Overview ] .right-column[ .pull-left[ ### Statistical data - istacbaser - MicroDatosES - MorbiditySpainR - Siane ### Maps - mapSpain - LAU2boundaries4spain - CatastRo, CatastRoNav - caRtociudad ] .pull-right[ ### Government - BOE - infoelectoral - senadoRES - opendataes ### Economy - tidyBdE ### Climate - climaemet - airqualityES ] ] --- class: inverse, center, middle # Statistical data --- # istacbaser <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - José Manuel Cazorla-Artiles - **Christian González-Martel** ### Key features - Retrieve all the data available in the **Canary Islands Statistics Institute** API. - Supports searching and downloading data. - Supports **grep-like style search*.** - Supports **Most Recent Value** queries. ] .pull-right[ ### Data sources - [Instituto Canario de Estadística](http://www.gobiernodecanarias.org/istac/) (ISTAC). ### Output formats - **data.frame**, with optional conversion to **POSIXct** for dates. ] --- # MicroDatosEs <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Authors - **Carlos J. Gil Bellosta** - Carlos Neira - Diego Paniagua Sánchez - Fiorella Mori Peláez - Jorge López Pérez - José Luis Cañadas Reche ### Key features - Retrieve and process microdata provided by Spanish statistical agencies. ] .pull-right[ ### Data sources - [Instituto Nacional de Estadística](https://ine.es/) (INE). ### Output formats - **data.frame**. ### Notes - Currently covers the following datasets: **EPA** (Encuesta de Población Activa), **census** (2011), **EES** (Encuesta de Estructura Salarial), **mortality** (annual deaths in Spain; unfortunately, public microdata does not include the cause of death), **EPF** (Encuesta de Presupuestos Familiares), **padrón**. ] --- # MorbiditySpainR <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - **Rafael Menéndez** ### Key features - Retrieve and process morbidity microdata provided by Spanish statistical agencies. - Provides functions for basic manipulation (filtering, extract diagnoses, reduce data and compute prevalences). ] .pull-right[ ### Data sources - [Encuesta de morbilidad hospitalaria](https://www.ine.es/dyngs/INEbase/es/operacion.htm?c=Estadistica_C&cid=1254736176778&menu=resultados&secc=1254736195291&idp=1254735573175) del [Instituto Nacional de Estadística](https://ine.es/) (INE). ### Output formats - **data.frame**. ] --- # Siane <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - Carlos J. Gil Bellosta - **Nuno Carvalho** ### Key features - Find maps in the (pre-downloaded) Siane repository, i.e. searching by map year or administrative level using `siane_map()`. - Bind numerical data to polygons using `siane_merge()`. - Compatible with IGN maps and INE data. ] .pull-right[ ### Data sources - [Instituto Geografico Nacional](https://www.ign.es/) (IGN). - [Instituto Nacional de Estadística](https://ine.es/) (INE). ### Output formats - **raster** objects. - **data.frame**. ] --- class: inverse, center, middle # Maps --- # mapSpain <img src="https://ropenspain.github.io/mapSpain/logo.png" alt="mapSpain-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Authors - **Diego Hernangómez** ### Key features - **Easy mapping of boundaries of Spain** (nation-wide, autonomous communities, provinces, municipalities). - Use of **WMS/WMTS image tiles** (Google Maps-like) on either static and interactive maps (with {`leaflet`}) provided by Spanish public organisms. - Translates names of autonomous communities and provinces across languages (Spanish, English, Catalan, ...) and into standardized codes (ISO, NUTS, INE...). ] .pull-right[ ### Data sources - [GISCO](https://ec.europa.eu/eurostat/web/gisco) (Eurostat). - [Instituto Geografico Nacional](https://www.ign.es/) (IGN). - For tiles: Public organisms (<https://www.idee.es/web/idee/segun-tipo-de-servicio>). ### Output formats - **sf** for vectors (as boundaries, roads, etc.). - **SpatRaster** ({`terra`}) for static tiles. ] --- # mapSpain <img src="https://ropenspain.github.io/mapSpain/logo.png" alt="mapSpain-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Quick demo ```r library(mapSpain) library(ggplot2) *galicia <- esp_get_munic_siane(region = "Galicia") |> transform( * Provincia = esp_dict_translate( ine.prov.name, "es" ) ) ggplot(galicia) + geom_sf(aes(fill = Provincia), color = "grey70") + labs(title = "Provincias de Galicia") + scale_fill_discrete( type = hcl.colors(4, "Blues") ) + theme_bw() ``` ] .pull-right[ <img src="index_files/figure-html/mapspain-plot-1.png" width="100%" /> ] --- # LAU2boundaries4spain <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - Francisco Goerlich - **Pedro J. Pérez** ### Key features - Datasets of historical municipality boundaries from 2002 to 2021. ] .pull-right[ ### Data sources - [Instituto Geografico Nacional](https://www.ign.es/) (IGN). ### Output formats - **sf** data frames. ] --- # CatastRo <img src="https://ropenspain.github.io/CatastRo/logo.png" alt="catastro-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Authors - Ángel Delgado Panadero - Iñaki Úcar - **Diego Hernangómez** ### Key features - Takes advantage of **INSPIRE Directive**. - Retrieves cadastral **spatial data** of **buildings**, **parcels** and specific **cadastral references**. - Get data **by bounding box** (WFS service) or **by municipality** (ATOM service). - Retrieval of imagery via tiles available on the Cadastre. ] .pull-right[ ### Data sources - [Cadastre of Spain](https://www.catastro.minhap.es/webinspire/index.html). - **Does not include neither Navarre not the Basque Country**, as they have their own cadastral offices (see **CatastRoNav** for Navarre). ### Output formats - **sf** for vectors (buildings, parcels, etc.). - **SpatRaster** (`terra`) for static tiles. ] --- # CatastRo <img src="https://ropenspain.github.io/CatastRo/logo.png" alt="catastro-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Quick demo ```r library(CatastRo) library(ggplot2) *burgo_osma <- catr_atom_get_buildings( "Burgo de Osma" ) |> sf::st_transform(4326) ggplot(burgo_osma) + geom_sf(aes(fill = currentUse), col = NA) + scale_fill_viridis_d(na.translate = FALSE) + theme_minimal() + coord_sf( xlim = c(-3.0752, -3.0679), ylim = c(41.5831, 41.5884) ) + labs( title = "El Burgo de Osma, Soria", fill = "Use of the building" ) ``` ] .pull-right[ <img src="index_files/figure-html/catastro-plot-1.png" width="100%" /> ] --- # CatastRoNav <img src="https://ropenspain.github.io/CatastRoNav/logo.png" alt="catastronav-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - **Diego Hernangómez** ### Key features - Takes advantage of **INSPIRE Directive**. - Retrieves cadastral **spatial data** of **buildings**, **parcels** and specific **cadastral references**. - Get data **by bounding box** (WFS service). ] .pull-right[ ### Data sources - [Cadastre of Navarre](https://idena.navarra.es/portal/servicios?lang=en). ### Output formats - **sf** objects. ### Notes - Service provided by the **Cadastre of Navarre is more limited** than the provided by the Spanish Cadastre (see **CatastRo** package). ] --- # CatastRoNav <img src="https://ropenspain.github.io/CatastRoNav/logo.png" alt="catastronav-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Quick demo ```r library(CatastRoNav) library(ggplot2) olite <- c(-1.646812, 42.814528, -1.638036, 42.820320) *olite_bu <- catrnav_wfs_get_buildings_bbox( olite, srs = 4326 ) ggplot(olite_bu) + geom_sf(aes(fill = value), color = NA) + scale_fill_viridis_b( show.limits = TRUE, breaks = seq(0, 30, 5) ) + theme_minimal() + labs( title = "Olite, Navarre", subtitle = "Height of buldings", fill = "meters" ) ``` ] .pull-right[ <img src="index_files/figure-html/catastronav-plot-1.png" width="100%" /> ] --- # caRtociudad <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - **Carlos J. Gil Bellosta** - Luz Frías ### Key features - Access to **CartoCiudad API**, which provides mapping and other related services for Spain. - Services: geocoding, reverse geocoding, routes, maps... - **Unlimited and free** (no quota limits, no registration procedures). ] .pull-right[ ### Data sources - [CartoCiudad](https://www.cartociudad.es/web/portal). ### Output formats - **data.frame**. - **ggmap**-compatible **raster** objects. ] --- # caRtociudad <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Quick demo ```r library(caRtociudad) *soria <- cartociudad_geocode("ayuntamiento soria") *soria_map <- cartociudad_get_map( c(soria$lat, soria$lng), 0.3) ggmap::ggmap(soria_map) ``` ] .pull-right[ <img src="index_files/figure-html/caRtociudad-plot-1.png" width="100%" /> ] --- class: inverse, center, middle # Government --- # BOE <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - **Lluís Revilla Sancho** ### Key features - Retrieve data from **Boletín Oficial del Estado** (BOE). - Retrieve data from **Boletín Oficial del Registro Mercantil del Estado** (BORME). - Main function `retrieve_sumario()` to retrieve summaries by date. - Additional functions to obtain URLs and download publications. ] .pull-right[ ### Data sources - [Agencia Estatal Boletín Oficial del Estado](https://boe.es/). ### Output formats - **data.frame**. - XML document. ### Notes - See <https://llrs.github.io/BOE_historico> for a detailed analysis. ] --- # BOE <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Quick demo ```r library(BOE) library(ggplot2) *sumario <- retrieve_sumario(as.Date("2022-05-06")) |> transform(dpt = sub("MINISTERIO", "M.", departament)) |> transform(dpt = stringr::str_trunc(dpt, 20)) ggplot(sumario) + aes(forcats::fct_infreq(dpt)) + geom_bar() + coord_flip() + theme_minimal() + labs( title = "Publicaciones por departamento", subtitle = "BOE del 6 de mayo de 2022", caption = "Fuente: BOE", x = NULL ) ``` ] .pull-right[ <img src="index_files/figure-html/boe-plot-1.png" width="100%" /> ] --- # infoelectoral <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - **Héctor Meleiro** ### Key features - Retrieve election data at municipality level. - Retrieve election data at polling station level. - Retrieve election candidates data. - Provides datasets with administrative codes for autonomous communities, provinces, and municipalities, as well as median income data for census tracts. ] .pull-right[ ### Data sources - [Ministerio del Interior](https://infoelectoral.interior.gob.es/opencms/es/inicio/). ### Output formats - **data.frame**. ] --- # infoelectoral <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Quick demo ```r library(infoelectoral) library(dplyr) library(ggplot2) *df <- municipios("congreso", anno="1982", mes="10") |> group_by(siglas) |> summarise(votos = sum(votos)) |> mutate(seats = round(votos / sum(votos) * 350)) |> filter(seats >= 10) df <- ggparliament::parliament_data( df, type="semicircle", 7, df$seats) ggplot(df) + aes(x, y, colour = siglas) + ggparliament::geom_parliament_seats() + ggparliament::theme_ggparliament() + scale_color_manual(values=c( "#3399FF", "#3399FF", "#009900", "#0000EB", "#F10000", "#F10000", "#F10000", "#FFA500" )) + theme(legend.position = 'bottom') ``` ] .pull-right[ <img src="index_files/figure-html/infoelectoral-plot-1.png" width="100%" /> ] --- # senadoRES <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - **Lluís Revilla Sancho** ### Key features - Retrieve senators data since 1977. - Retrieve summaries, commissions, documents, initiatives... ] .pull-right[ ### Data sources - [Senado](https://www.senado.es/web/relacionesciudadanos/datosabiertos/catalogodatos/index.html). ### Output formats - **data.frame**. ] --- # senadoRES <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Quick demo ```r library(senadoRES) library(dplyr) library(ggplot2) *df <- senadores() |> group_by(legislatura) |> count(sex) |> filter(!is.na(sex)) |> mutate(ratio = n/sum(n)) |> filter(sex != "male") ggplot(df) + aes(legislatura, ratio) + geom_line() + geom_hline(yintercept=0.5, linetype=2, col="red") + scale_y_continuous( labels = scales::percent_format(accuracy = 1)) + theme_bw() + labs( title = "Ratio of women", x = "Legislatura", y = "% of women" ) ``` ] .pull-right[ <img src="index_files/figure-html/senadoRES-plot-1.png" width="100%" /> ] --- # opendataes <span style="font-size: 50%;">(on GitHub)</span> .pull-left[ ### Authors - **Jorge Cimentada** - Jorge López ### Key features - Retrieve data from **datos.gob.es**, the open-data initiative from the Spanish Government. - Currently, supports CSV format and 11 publishers (see `publishers_available`). - The identifier of a web-based search can be directly provided to `openes_load()`. - R-based search via `openes_keywords()`. ] .pull-right[ ### Data sources - [datos.gob.es](https://datos.gob.es/). ### Output formats - An object with `metadata` and `data`, both as **tibble**. ] --- class: inverse, center, middle # Economy --- # tidyBdE <img src="https://ropenspain.github.io/tidyBdE/logo.png" alt="tidyBdE-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Authors - **Diego Hernangómez** ### Key features - API package that helps to retrieve data from Banco de España. **~14.000** time series available. Specific series can be search by keyword. - Includes **macroeconomic data** from the Statistical Bulletin, **key summary indicators, exchange rates and interest rates**. - Helper functions to retrieve some of the most relevant indicators via `bde_ind_*` functions. - Specific color palettes and theme for {`ggplot2`}. ] .pull-right[ ### Data sources - [Bank of Spain](https://www.bde.es/webbde/en/estadis/infoest/descarga_series_temporales.html) time-series bulk data download. This includes also data from ECB, INE, Eurostat, etc. ### Output formats - **tibble**, with dates and numbers formatted to base **R** specification (i.e. `2,000.32` with decimal comma is converted to `2000,32`). ] --- # tidyBdE <img src="https://ropenspain.github.io/tidyBdE/logo.png" alt="tidyBdE-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Quick demo ```r library(tidyBdE) library(ggplot2) *euribor_month <- bde_ind_euribor_12m_monthly() |> subset(Date > "2010-01-01") ggplot(euribor_month) + aes(Date, Euribor_12M_Monthly) + * geom_line(colour = bde_vivid_pal()(1)) + scale_y_continuous( labels = scales::number_format(suffix = "%") ) + * theme_bde() + labs( title = "Euribor 12 months", subtitle = "Monthly data", caption = "Source: BdE" ) ``` ] .pull-right[ <img src="index_files/figure-html/tidybde-plot-1.png" width="100%" /> ] --- class: inverse, center, middle # Climate --- # climaemet <img src="https://ropenspain.github.io/climaemet/logo.png" alt="catastro-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Authors - Manuel Pizarro - **Diego Hernangómez** - Gema Fernández-Avilés ### Key features - Retrieve climatic information registered by the AEMET stations (wind speed, temperature, air pressure...). - Optional spatial information in **sf** format. - Create scientific graphs (climate charts, trend analysis of climate time series, temperature and precipitation anomalies maps, “warming stripes” graphics, climatograms, etc.). ] .pull-right[ ### Data sources - [Agencia Estatal de Meteorología](https://opendata.aemet.es/centrodedescargas/inicio). ### Output formats - Formatted **tibble** for compatibility with **tidyverse**. - Dates and numbers are formatted properly. - Geo-tagged points (**sf**) using the option `return_sf = TRUE`. ### Notes - (Free) API key required ([go get it](https://opendata.aemet.es/centrodedescargas/obtencionAPIKey)). Can be recorded as environment variable on R using `aemet_api_key(..., install=TRUE)`. ] --- # climaemet <img src="https://ropenspain.github.io/climaemet/logo.png" alt="catastro-logo" height="70" style="margin-top: -10px;vertical-align: middle;"> <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Quick demo ```r library(climaemet) library(ggplot2) *temp2020 <- aemet_daily_period( "8416", start = 2020, end = 2020 ) ggplot(temp2020) + geom_col(aes(fecha, tmed, fill = tmed)) + scale_fill_gradientn( colours = hcl.colors(20, "RdBu", rev = TRUE), labels = scales::label_number(suffix = "º") ) + guides(fill = guide_colorsteps()) + theme_minimal() + labs( title = "Valencia, Spain", subtitle = "AEMET Station Id: 8416", fill = "Avg. daily temp.", y = "Celsius degrees", x = "date" ) ``` ] .pull-right[ <img src="index_files/figure-html/climaemet-plot-1.png" width="100%" /> ] --- # airqualityES <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Authors - **Jose V. Die** - Jose R. Caro ### Key features - Dataset of **air quality measurements** in Spain from 2001 to 2018. - Measurements of several pollutants: As B(a)P Cd Ni Pb PM10 PM2.5. - Dataset of measurement stations. ] .pull-right[ ### Data sources - [Ministerio para la Transición Ecológica y el Reto Demográfico](https://www.miteco.gob.es/es/calidad-y-evaluacion-ambiental/temas/atmosfera-y-calidad-del-aire/calidad-del-aire/evaluacion-datos/datos/Default.aspx). ### Output formats - **tibble**. ] --- # airqualityES <span style="font-size: 50%;">(on CRAN)</span> .pull-left[ ### Quick demo ```r library(dplyr) library(ggplot2) *selection <- airqualityES::stations |> filter(station_name == "Barcelona (Sants)") *df <- airqualityES::airqES |> filter(grepl(selection$id, station_id)) |> filter(pollutant == "PM10") |> mutate(pm10 = rowMeans( across(starts_with("D")), na.rm=TRUE)) |> mutate(date = as.Date(paste(year, month, 1, sep="-"))) ggplot(df) + aes(date, pm10) + geom_line() + geom_smooth() + theme_bw() + labs( title = "Montly average of PM10", subtitle = "Measurement at Barcelona (Sants)" ) ``` ] .pull-right[ <img src="index_files/figure-html/airqualityES-plot-1.png" width="100%" /> ] --- class: base24 # Join rOpenSpain! - Much more could be done about the retrieval of statistical data - In particular, we still lack an **"inebaser"** similar to istacbaser - Mapping needs are pretty much covered, but improvements can always be made - We still lack cadastral access to the Basque Country - Others would require a transition to the new spatial stack (based on **sf**) - **opendataes** needs your help! - It does a great job setting the framework required to work with datos.gob.es - There are many publishers and formats that could be added - Do you have other ideas? We'd love to hear you out! --- class: center, middle, end-slide # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan) with the [**rosxaringan**](https://github.com/rOpenSpain/rosxaringan) template. The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](https://yihui.org/knitr/), and [R Markdown](https://rmarkdown.rstudio.com).