spanishoddata 0.2.0 (2025-06-15)
CRAN release: 2025-06-15
New features
spod_quick_get_zones()
is a new function to quickly get municipality geometries to match with the data retrieved withspod_quick_get_od()
(PR #163). This function is experimental, just as thespod_quick_get_od()
function, as the API of the Spanish Ministry of Transport may change in the future. It is only intended for quick analysis in educational or other demonstration purposes, as it downloads very little data compared to the regularspod_get_od()
,spod_download()
andspod_convert()
functions. The requests are cached in memory of the current R session withmemoise
package, so repeated calls tospod_quick_get_zones()
will not cause repeated requests to the API and will allow the user to get the data faster from repeat calls.Experimental
spod_check_files()
function allows to check consistency of downloaded files with Amazon S3 checksums (PR #165). ETags for v1 data are stored with the package, and for v2 data they are fetched from Amazon S3. The checks may fail for May 2022 data and for some 2025 data, as the remote cheksums that are used for checking the file consistency are incorrect. We are working on solving this in future updates, for now, kindly rely on the built-in file size checks ofspod_download()
,spod_get()
, andspod_convert()
.
Improvements
spod_get()
andspod_convert()
are now up to x100,000 faster when you have all (or a lot of) data downloaded, but only requesting several days in the call tospod_get()
orspod_convert()
. This is thanks to a new smarter filtering strategy (issue #159, PR #166).Metadata is now fetched from Amazon S3 storage of the original data files, which allows validation of downloaded files (issue #126) with both size and checksum. PR #165.
Metadata fetched by
spod_available_data()
has extra columns such as datatype
,zones
andperiod
, see help?spod_available_data()
for details.Memory allocation is now delegated to
DuckDB
engine, which by default uses 80% of available RAM. Beware that in some HPC environments this may detect more memory than is actually available to your job, so set the limit manually to 80% of RAM available to your job withmax_mem_gb
argument ofspod_get()
,spod_convert()
,spod_connect()
functions. This will also improve performance in some cases, as DuckDB is more efficient than R at memory allocation (PR #167).
Bug fixes
More reliable, but still multi-threaded data file downloads using base R
utils::download.file()
instead ofcurl::multi_download()
which failed on some connections (issue #127), so nowcurl
dependency is no longer required. PR #165.spod_quick_get_od()
is working again. We fixed it to work with the updated API of the Spanish Ministry of Transport (PR #163, issue #162). This function will remain experimental, just as thespod_quick_get_zones()
function, as the API of the Spanish Ministry of Transport may change in the future. It is only intended for quick analysis in educational or other demonstration purposes, as it downloads very little data compared to the regularspod_get_od()
,spod_download()
andspod_convert()
functions. The requests are cached in memory of the current R session withmemoise
package, so repeated calls tospod_quick_get_od()
will not cause repeated requests to the API and will allow the user to get the data faster from repeat calls.spod_convert()
now acceptsoverwrite = 'update'
withsave_format = 'parquet'
(#161) previously it failed because of the incorrect check that asserted onlyTRUE
orFALSE
(#160)
spanishoddata 0.1.1 (2025-04-09)
CRAN release: 2025-04-09
New features
-
spod_cite()
function to easily cite the package and the data (#134)
Breaking changes
-
hour
column is superseeded bytime_slot
column in the output ofspod_get()
andspod_convert()
.time_slot
is deprecated. It is still present in the tables, but will be removed in the end of 2025 but going forward please use the newhour
column. Otherwise it is exactly the same as before, this is just a name change. (#132)
Other changes
spod_quick_get()
does not rely on metadata download anymore and can be used without setting the data directory withspod_set_data_dir()
(and therefore does not cause a warning if the data directory is not set).hour
(ex-time_slot
) column is now right next to the date column in the output ofspod_get()
andspod_convert()
(#)maximum available CPU cores check is now turned off to improve compatibility when running the package from within a container in high performance computing environments (see #130 and #140 for details)
minor documentation improvements and updates
minor bug fixes