R/openes_load.R
openes_load.Rd
Extract data and metadata from a given data set of https://datos.gob.es/
openes_load(x, encoding = "UTF-8", guess_encoding = TRUE, ...)
x | A |
---|---|
encoding | The encoding passed to read (all) the files. Most cases should be resolved with either 'UTF-8', 'latin1' or 'ASCII'. |
guess_encoding | A logical stating whether to guess the encoding. This is set to TRUE by default.
Whenever guess_encoding is set to TRUE, the 'encoding' argument is ignored. If |
... | Arguments passed to |
if path_id
is a valid dataset path, a list with two slots: metadata and data. Each slot
contains tibble
's that contain either metadata or the data itself. If path_id
is not a valid dataset path, it returns an empty list. See the details section for some caveats.
openes_load
can return two possible outcomes: either an empty list or a list with a slot called metadata
and another slot called data. Whenever the path_id
argument is an invalid dataset path, it will return an empty list.
When path_id
is a valid dataset path, openes_load
will return an a list with the two slots described above.
For the metadata slot, openes_load
returns a tibble
with most available metadata of the dataset.
The columns are:
keywords: the available keywords from the dataset in the homepage of the dataset.
language: the available languages of the dataset's metadata. Note that that this does not mean that the dataset is in different languages but only the metadata.
description: a short description of the data being read.
url: the complete url of the dataset in https://datos.gob.es/. Note that this URL is not the access URL to the dataset but to the dataset's homepage in https://datos.gob.es/.
date_issued: the date at which the dataset was uploaded.
date_modified: the date at which the last dataset was uploaded. If the dataset has only been uploaded once, this
will return 'No modification date available'
.
publisher: the entity that publishes the dataset. See openes_load_publishers
for all available publishers.
publisher_data_url: the homepage of the dataset in the website of the publisher. This is helpful to look at the definitions of the columns in the dataset.
The metadata of the API can sometimes be returned in an incorrect order. For example, there are cases when there are several languages available and the order of the different descriptions are not in the same order of the languages. If you find any of these errors, try raising the issue directly to https://datos.gob.es/ as the package extracts all metadata in the same order as it is.
Whenever the metadata is in different languages, the resulting tibble
will have
the same numer of rows as there are languages containing the different texts in different languages and
repeating the same information whenever it's similar across languages (such as the dates, which are language agnostic).
In case the API returns empty requests, both data and metadata will be empty tibble
's
with the same column names.
For the data slot, openes_load
returns a list containing at least one tibble
.
If the dataset being request has file formats that openes_load
can read (see permitted_formats
)
it will read those files. If that dataset has several files, then it will return a list of the same length
as there are datasets where each slot in that list is a tibble
with the data. If for
some reason any of the datasets being read cannot be read, openes_load
has a fall back mechanism
that returns the format that attempted to read together with the URL so that the user can try to read the
dataset directly. In any case, the result will always be a list with tibble
's
where each one could be the requested dataset (success) or a dataset with the format and url that attempted
to read but failed (failure).
Inside the data slot, each list slot containing tibble
's will be named according
to the dataset that was read. When there is more than one dataset, the user can then enter the website
in the url
column in the metadata slot to see all names of the datasets. This is handy, for example,
when the same dataset is repeated across time and we want to figure out which data is which from the slot.
The API of https://datos.gob.es/ is not completely homogenous because it is an aggregator
of many different API's from different cities and provinces of Spain. openes_load
can only read
a limited number of file formats but will keep increasing as the package evolves. You can check the available file formats
in permitted_formats
. If the file format of the requested path_id
is not readable, openes_load
will return a list with only one tibble
with all available formats with their respective data URL
inside the data slot so that users can read the manually.
In a similar line, in order for openes_load
to provide the safest behavior, it is very conservative in which
publisher it can read from https://datos.gob.es/. Because some publishers do not have standardized datasets,
reading many different publishers can become very messy. openes_load
currently reads files from selected
publishers because they offer standardized datasets which makes it safer to read. As the package evolves and the
data quality improves between publishers, the package will include more publishers. See the publishers that the
package can read in publishers_available
.
Site built with pkgdown 1.6.1.
Template by Bootstrapious . Ported to pkgdown by dieghernan.