+ - 0:00:00
Notes for current slide
Notes for next slide

The rOpenSpain project

R and open data

Iñaki Úcar | Postdoctoral Fellow @ uc3m-Santander Big Data Institute

June 9, 2022

1

Introduction

When

It was February 2018 when four people came together around open data:

  • Carlos J. Gil Bellosta, statistical consultant, CEO @ circiteR
  • Luz Frías, data developer, CTO @ circiteR
  • José Manuel Vera, senior data scientist
  • Iñaki Úcar, postdoctoral fellow @ IBiDat

Motto

rOpenSci is our form; Spanish public data, our matter

2

Introduction

When

What

The github.com/rOpenSpain organization

  • Website (ropenspain.es)
  • Onboarding info
  • Templates (shout-outs to Diego Hernangómez)
  • Packages

The ropenspain.slack.com channel

  • Do not hesitate to contact us for an invitation!
3

Introduction

When

What

How

Do you have a package about Spanish data?
Bring it to rOpenSpain!

  • The author transfers the repo to our GH organization
  • The author retains full admin rights
  • The package is added to the webpage and is available for installation through our r-universe organization
  • We try to encourage R packaging best practices
  • We try to help each other out

Do you need help? Get an invite to our Slack!

4

Introduction

When

What

How

Overview

Statistical data

  • istacbaser
  • MicroDatosES
  • MorbiditySpainR
  • Siane

Maps

  • mapSpain
  • LAU2boundaries4spain
  • CatastRo, CatastRoNav
  • caRtociudad

Government

  • BOE
  • infoelectoral
  • senadoRES
  • opendataes

Economy

  • tidyBdE

Climate

  • climaemet
  • airqualityES
5

Statistical data

6

istacbaser (on GitHub)

Authors

  • José Manuel Cazorla-Artiles
  • Christian González-Martel

Key features

  • Retrieve all the data available in the Canary Islands Statistics Institute API.
  • Supports searching and downloading data.
  • Supports grep-like style search*.
  • Supports Most Recent Value queries.

Data sources

Output formats

  • data.frame, with optional conversion to POSIXct for dates.
7

MicroDatosEs (on CRAN)

Authors

  • Carlos J. Gil Bellosta
  • Carlos Neira
  • Diego Paniagua Sánchez
  • Fiorella Mori Peláez
  • Jorge López Pérez
  • José Luis Cañadas Reche

Key features

  • Retrieve and process microdata provided by Spanish statistical agencies.

Data sources

Output formats

  • data.frame.

Notes

  • Currently covers the following datasets: EPA (Encuesta de Población Activa), census (2011), EES (Encuesta de Estructura Salarial), mortality (annual deaths in Spain; unfortunately, public microdata does not include the cause of death), EPF (Encuesta de Presupuestos Familiares), padrón.
8

MorbiditySpainR (on GitHub)

Authors

  • Rafael Menéndez

Key features

  • Retrieve and process morbidity microdata provided by Spanish statistical agencies.
  • Provides functions for basic manipulation (filtering, extract diagnoses, reduce data and compute prevalences).

Data sources

Output formats

  • data.frame.
9

Siane (on GitHub)

Authors

  • Carlos J. Gil Bellosta
  • Nuno Carvalho

Key features

  • Find maps in the (pre-downloaded) Siane repository, i.e. searching by map year or administrative level using siane_map().
  • Bind numerical data to polygons using siane_merge().
  • Compatible with IGN maps and INE data.

Data sources

Output formats

  • raster objects.
  • data.frame.
10

mapSpain mapSpain-logo (on CRAN)

Authors

  • Diego Hernangómez

Key features

  • Easy mapping of boundaries of Spain (nation-wide, autonomous communities, provinces, municipalities).
  • Use of WMS/WMTS image tiles (Google Maps-like) on either static and interactive maps (with {leaflet}) provided by Spanish public organisms.
  • Translates names of autonomous communities and provinces across languages (Spanish, English, Catalan, ...) and into standardized codes (ISO, NUTS, INE...).

Data sources

Output formats

  • sf for vectors (as boundaries, roads, etc.).
  • SpatRaster ({terra}) for static tiles.
12

mapSpain mapSpain-logo (on CRAN)

Quick demo

library(mapSpain)
library(ggplot2)
galicia <- esp_get_munic_siane(region = "Galicia") |>
transform(
Provincia = esp_dict_translate(
ine.prov.name, "es"
)
)
ggplot(galicia) +
geom_sf(aes(fill = Provincia), color = "grey70") +
labs(title = "Provincias de Galicia") +
scale_fill_discrete(
type = hcl.colors(4, "Blues")
) +
theme_bw()

13

LAU2boundaries4spain (on GitHub)

Authors

  • Francisco Goerlich
  • Pedro J. Pérez

Key features

  • Datasets of historical municipality boundaries from 2002 to 2021.

Data sources

Output formats

  • sf data frames.
14

CatastRo catastro-logo (on CRAN)

Authors

  • Ángel Delgado Panadero
  • Iñaki Úcar
  • Diego Hernangómez

Key features

  • Takes advantage of INSPIRE Directive.
  • Retrieves cadastral spatial data of buildings, parcels and specific cadastral references.
  • Get data by bounding box (WFS service) or by municipality (ATOM service).
  • Retrieval of imagery via tiles available on the Cadastre.

Data sources

  • Cadastre of Spain.
  • Does not include neither Navarre not the Basque Country, as they have their own cadastral offices (see CatastRoNav for Navarre).

Output formats

  • sf for vectors (buildings, parcels, etc.).
  • SpatRaster (terra) for static tiles.
15

CatastRo catastro-logo (on CRAN)

Quick demo

library(CatastRo)
library(ggplot2)
burgo_osma <- catr_atom_get_buildings(
"Burgo de Osma"
) |> sf::st_transform(4326)
ggplot(burgo_osma) +
geom_sf(aes(fill = currentUse), col = NA) +
scale_fill_viridis_d(na.translate = FALSE) +
theme_minimal() +
coord_sf(
xlim = c(-3.0752, -3.0679),
ylim = c(41.5831, 41.5884)
) +
labs(
title = "El Burgo de Osma, Soria",
fill = "Use of the building"
)

16

CatastRoNav catastronav-logo (on GitHub)

Authors

  • Diego Hernangómez

Key features

  • Takes advantage of INSPIRE Directive.
  • Retrieves cadastral spatial data of buildings, parcels and specific cadastral references.
  • Get data by bounding box (WFS service).

Data sources

Output formats

  • sf objects.

Notes

  • Service provided by the Cadastre of Navarre is more limited than the provided by the Spanish Cadastre (see CatastRo package).
17

CatastRoNav catastronav-logo (on GitHub)

Quick demo

library(CatastRoNav)
library(ggplot2)
olite <- c(-1.646812, 42.814528, -1.638036, 42.820320)
olite_bu <- catrnav_wfs_get_buildings_bbox(
olite,
srs = 4326
)
ggplot(olite_bu) +
geom_sf(aes(fill = value), color = NA) +
scale_fill_viridis_b(
show.limits = TRUE,
breaks = seq(0, 30, 5)
) +
theme_minimal() +
labs(
title = "Olite, Navarre",
subtitle = "Height of buldings",
fill = "meters"
)

18

caRtociudad (on GitHub)

Authors

  • Carlos J. Gil Bellosta
  • Luz Frías

Key features

  • Access to CartoCiudad API, which provides mapping and other related services for Spain.
  • Services: geocoding, reverse geocoding, routes, maps...
  • Unlimited and free (no quota limits, no registration procedures).

Data sources

Output formats

  • data.frame.
  • ggmap-compatible raster objects.
19

caRtociudad (on GitHub)

Quick demo

library(caRtociudad)
soria <- cartociudad_geocode("ayuntamiento soria")
soria_map <- cartociudad_get_map(
c(soria$lat, soria$lng), 0.3)
ggmap::ggmap(soria_map)

20

Government

21

BOE (on GitHub)

Authors

  • Lluís Revilla Sancho

Key features

  • Retrieve data from Boletín Oficial del Estado (BOE).
  • Retrieve data from Boletín Oficial del Registro Mercantil del Estado (BORME).
  • Main function retrieve_sumario() to retrieve summaries by date.
  • Additional functions to obtain URLs and download publications.

Data sources

Output formats

  • data.frame.
  • XML document.

Notes

22

BOE (on GitHub)

Quick demo

library(BOE)
library(ggplot2)
sumario <- retrieve_sumario(as.Date("2022-05-06")) |>
transform(dpt = sub("MINISTERIO", "M.", departament)) |>
transform(dpt = stringr::str_trunc(dpt, 20))
ggplot(sumario) +
aes(forcats::fct_infreq(dpt)) +
geom_bar() +
coord_flip() +
theme_minimal() +
labs(
title = "Publicaciones por departamento",
subtitle = "BOE del 6 de mayo de 2022",
caption = "Fuente: BOE",
x = NULL
)

23

infoelectoral (on GitHub)

Authors

  • Héctor Meleiro

Key features

  • Retrieve election data at municipality level.
  • Retrieve election data at polling station level.
  • Retrieve election candidates data.
  • Provides datasets with administrative codes for autonomous communities, provinces, and municipalities, as well as median income data for census tracts.

Data sources

Output formats

  • data.frame.
24

infoelectoral (on GitHub)

Quick demo

library(infoelectoral)
library(dplyr)
library(ggplot2)
df <- municipios("congreso", anno="1982", mes="10") |>
group_by(siglas) |>
summarise(votos = sum(votos)) |>
mutate(seats = round(votos / sum(votos) * 350)) |>
filter(seats >= 10)
df <- ggparliament::parliament_data(
df, type="semicircle", 7, df$seats)
ggplot(df) +
aes(x, y, colour = siglas) +
ggparliament::geom_parliament_seats() +
ggparliament::theme_ggparliament() +
scale_color_manual(values=c(
"#3399FF", "#3399FF", "#009900", "#0000EB",
"#F10000", "#F10000", "#F10000", "#FFA500"
)) +
theme(legend.position = 'bottom')

25

senadoRES (on GitHub)

Authors

  • Lluís Revilla Sancho

Key features

  • Retrieve senators data since 1977.
  • Retrieve summaries, commissions, documents, initiatives...

Data sources

Output formats

  • data.frame.
26

senadoRES (on GitHub)

Quick demo

library(senadoRES)
library(dplyr)
library(ggplot2)
df <- senadores() |>
group_by(legislatura) |>
count(sex) |>
filter(!is.na(sex)) |>
mutate(ratio = n/sum(n)) |>
filter(sex != "male")
ggplot(df) +
aes(legislatura, ratio) +
geom_line() +
geom_hline(yintercept=0.5, linetype=2, col="red") +
scale_y_continuous(
labels = scales::percent_format(accuracy = 1)) +
theme_bw() +
labs(
title = "Ratio of women",
x = "Legislatura",
y = "% of women"
)

27

opendataes (on GitHub)

Authors

  • Jorge Cimentada
  • Jorge López

Key features

  • Retrieve data from datos.gob.es, the open-data initiative from the Spanish Government.
  • Currently, supports CSV format and 11 publishers (see publishers_available).
  • The identifier of a web-based search can be directly provided to openes_load().
  • R-based search via openes_keywords().

Data sources

Output formats

  • An object with metadata and data, both as tibble.
28

Economy

29

tidyBdE tidyBdE-logo (on CRAN)

Authors

  • Diego Hernangómez

Key features

  • API package that helps to retrieve data from Banco de España. ~14.000 time series available. Specific series can be search by keyword.
  • Includes macroeconomic data from the Statistical Bulletin, key summary indicators, exchange rates and interest rates.
  • Helper functions to retrieve some of the most relevant indicators via bde_ind_* functions.
  • Specific color palettes and theme for {ggplot2}.

Data sources

  • Bank of Spain time-series bulk data download. This includes also data from ECB, INE, Eurostat, etc.

Output formats

  • tibble, with dates and numbers formatted to base R specification (i.e. 2,000.32 with decimal comma is converted to 2000,32).
30

tidyBdE tidyBdE-logo (on CRAN)

Quick demo

library(tidyBdE)
library(ggplot2)
euribor_month <- bde_ind_euribor_12m_monthly() |>
subset(Date > "2010-01-01")
ggplot(euribor_month) +
aes(Date, Euribor_12M_Monthly) +
geom_line(colour = bde_vivid_pal()(1)) +
scale_y_continuous(
labels = scales::number_format(suffix = "%")
) +
theme_bde() +
labs(
title = "Euribor 12 months",
subtitle = "Monthly data",
caption = "Source: BdE"
)

31

Climate

32

climaemet catastro-logo (on CRAN)

Authors

  • Manuel Pizarro
  • Diego Hernangómez
  • Gema Fernández-Avilés

Key features

  • Retrieve climatic information registered by the AEMET stations (wind speed, temperature, air pressure...).
  • Optional spatial information in sf format.
  • Create scientific graphs (climate charts, trend analysis of climate time series, temperature and precipitation anomalies maps, “warming stripes” graphics, climatograms, etc.).

Data sources

Output formats

  • Formatted tibble for compatibility with tidyverse.
  • Dates and numbers are formatted properly.
  • Geo-tagged points (sf) using the option return_sf = TRUE.

Notes

  • (Free) API key required (go get it). Can be recorded as environment variable on R using aemet_api_key(..., install=TRUE).
33

climaemet catastro-logo (on CRAN)

Quick demo

library(climaemet)
library(ggplot2)
temp2020 <- aemet_daily_period(
"8416",
start = 2020, end = 2020
)
ggplot(temp2020) +
geom_col(aes(fecha, tmed, fill = tmed)) +
scale_fill_gradientn(
colours = hcl.colors(20, "RdBu", rev = TRUE),
labels = scales::label_number(suffix = "º")
) +
guides(fill = guide_colorsteps()) +
theme_minimal() +
labs(
title = "Valencia, Spain",
subtitle = "AEMET Station Id: 8416",
fill = "Avg. daily temp.",
y = "Celsius degrees",
x = "date"
)

34

airqualityES (on CRAN)

Authors

  • Jose V. Die
  • Jose R. Caro

Key features

  • Dataset of air quality measurements in Spain from 2001 to 2018.
  • Measurements of several pollutants: As B(a)P Cd Ni Pb PM10 PM2.5.
  • Dataset of measurement stations.
35

airqualityES (on CRAN)

Quick demo

library(dplyr)
library(ggplot2)
selection <- airqualityES::stations |>
filter(station_name == "Barcelona (Sants)")
df <- airqualityES::airqES |>
filter(grepl(selection$id, station_id)) |>
filter(pollutant == "PM10") |>
mutate(pm10 = rowMeans(
across(starts_with("D")), na.rm=TRUE)) |>
mutate(date = as.Date(paste(year, month, 1, sep="-")))
ggplot(df) +
aes(date, pm10) +
geom_line() +
geom_smooth() +
theme_bw() +
labs(
title = "Montly average of PM10",
subtitle = "Measurement at Barcelona (Sants)"
)

36

Join rOpenSpain!

  • Much more could be done about the retrieval of statistical data

    • In particular, we still lack an "inebaser" similar to istacbaser
  • Mapping needs are pretty much covered, but improvements can always be made

    • We still lack cadastral access to the Basque Country
    • Others would require a transition to the new spatial stack (based on sf)
  • opendataes needs your help!

    • It does a great job setting the framework required to work with datos.gob.es
    • There are many publishers and formats that could be added
  • Do you have other ideas? We'd love to hear you out!

37

Thanks!

Slides created via the R package xaringan with the rosxaringan template.

The chakra comes from remark.js, knitr, and R Markdown.

38

Introduction

When

It was February 2018 when four people came together around open data:

  • Carlos J. Gil Bellosta, statistical consultant, CEO @ circiteR
  • Luz Frías, data developer, CTO @ circiteR
  • José Manuel Vera, senior data scientist
  • Iñaki Úcar, postdoctoral fellow @ IBiDat

Motto

rOpenSci is our form; Spanish public data, our matter

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow