Extract data and metadata from a given data set of https://datos.gob.es/

openes_load(x, encoding = "UTF-8", guess_encoding = TRUE, ...)

Arguments

x

A tibble given by openes_keywords only containing one dataset (1 row) or the end path of a dataset such as 'l01280148-seguridad-ciudadana-actuaciones-de-seccion-del-menor-en-educacion-vial-20141' from https://datos.gob.es/es/catalogo/l01280148-seguridad-ciudadana-actuaciones-de-seccion-del-menor-en-educacion-vial-20141.

encoding

The encoding passed to read (all) the files. Most cases should be resolved with either 'UTF-8', 'latin1' or 'ASCII'.

guess_encoding

A logical stating whether to guess the encoding. This is set to TRUE by default. Whenever guess_encoding is set to TRUE, the 'encoding' argument is ignored. If guess_encoding fails to guess the encoding, openes_load falls back to the encoding argument.

...

Arguments passed to read_csv and the other related read_* functions from readr. Internally, openes_load determines the delimiter of the file being read but the arguments for each of these functions are practically the same, so it doesn't matter how openes_load determines the delimiter, any of the arguments will work on all read_* functions.

Value

if path_id is a valid dataset path, a list with two slots: metadata and data. Each slot contains tibble's that contain either metadata or the data itself. If path_id is not a valid dataset path, it returns an empty list. See the details section for some caveats.

Details

openes_load can return two possible outcomes: either an empty list or a list with a slot called metadata and another slot called data. Whenever the path_id argument is an invalid dataset path, it will return an empty list. When path_id is a valid dataset path, openes_load will return an a list with the two slots described above.

For the metadata slot, openes_load returns a tibble with most available metadata of the dataset. The columns are:

  • keywords: the available keywords from the dataset in the homepage of the dataset.

  • language: the available languages of the dataset's metadata. Note that that this does not mean that the dataset is in different languages but only the metadata.

  • description: a short description of the data being read.

  • url: the complete url of the dataset in https://datos.gob.es/. Note that this URL is not the access URL to the dataset but to the dataset's homepage in https://datos.gob.es/.

  • date_issued: the date at which the dataset was uploaded.

  • date_modified: the date at which the last dataset was uploaded. If the dataset has only been uploaded once, this will return 'No modification date available'.

  • publisher: the entity that publishes the dataset. See openes_load_publishers for all available publishers.

  • publisher_data_url: the homepage of the dataset in the website of the publisher. This is helpful to look at the definitions of the columns in the dataset.

The metadata of the API can sometimes be returned in an incorrect order. For example, there are cases when there are several languages available and the order of the different descriptions are not in the same order of the languages. If you find any of these errors, try raising the issue directly to https://datos.gob.es/ as the package extracts all metadata in the same order as it is.

Whenever the metadata is in different languages, the resulting tibble will have the same numer of rows as there are languages containing the different texts in different languages and repeating the same information whenever it's similar across languages (such as the dates, which are language agnostic).

In case the API returns empty requests, both data and metadata will be empty tibble's with the same column names.

For the data slot, openes_load returns a list containing at least one tibble. If the dataset being request has file formats that openes_load can read (see permitted_formats) it will read those files. If that dataset has several files, then it will return a list of the same length as there are datasets where each slot in that list is a tibble with the data. If for some reason any of the datasets being read cannot be read, openes_load has a fall back mechanism that returns the format that attempted to read together with the URL so that the user can try to read the dataset directly. In any case, the result will always be a list with tibble's where each one could be the requested dataset (success) or a dataset with the format and url that attempted to read but failed (failure).

Inside the data slot, each list slot containing tibble's will be named according to the dataset that was read. When there is more than one dataset, the user can then enter the website in the url column in the metadata slot to see all names of the datasets. This is handy, for example, when the same dataset is repeated across time and we want to figure out which data is which from the slot.

The API of https://datos.gob.es/ is not completely homogenous because it is an aggregator of many different API's from different cities and provinces of Spain. openes_load can only read a limited number of file formats but will keep increasing as the package evolves. You can check the available file formats in permitted_formats. If the file format of the requested path_id is not readable, openes_load will return a list with only one tibble with all available formats with their respective data URL inside the data slot so that users can read the manually.

In a similar line, in order for openes_load to provide the safest behavior, it is very conservative in which publisher it can read from https://datos.gob.es/. Because some publishers do not have standardized datasets, reading many different publishers can become very messy. openes_load currently reads files from selected publishers because they offer standardized datasets which makes it safer to read. As the package evolves and the data quality improves between publishers, the package will include more publishers. See the publishers that the package can read in publishers_available.

Examples