class: center, middle, inverse, title-slide .title[ # Reading the ‘BOE’ ] .author[ ### Lluís Revilla Sancho
] .date[ ### 2022-05-13 ] --- # BOE Retrieve data from the official Spanish Gazette: ```r library("BOE") sumario_hoy <- retrieve_sumario(as.Date("2022/05/06")) # Or retrieve_sumario("BOE-S-2022-1") colnames(sumario_hoy) ## [1] "date" "sumario_nbo" "sumario_code" "section" ## [5] "section_number" "departament" "departament_etq" "epigraph" ## [9] "text" "publication" "pages" ``` .center[ <img src="boe_publicaciones_diarias.png" title="Graphic with years and daily publications on the absciss.There is a seasonal effect and a big peak of publication around 2017." alt="Graphic with years and daily publications on the absciss.There is a seasonal effect and a big peak of publication around 2017." width="645" height="375" /> ] ??? Daily summaries can be retrieved by date or CVE. Then it is easier to extract information for a publication. --- ## Examples .pull-left[ Works for documents which allows to search in text: ```r (CVE <- sumario_hoy$publication[1]) ## [1] "BOE-A-2022-7418" cat(colnames(retrieve_document(CVE))) ## identificador titulo diario diario_numero seccion subseccion departamento rango numero_oficial fecha_disposicion fecha_publicacion fecha_vigencia fecha_derogacion letra_imagen pagina_inicial pagina_final suplemento_letra_imagen suplemento_pagina_inicial suplemento_pagina_final estatus_legislativo origen_legislativo estado_consolidacion judicialmente_anulada vigencia_agotada estatus_derogacion url_epub url_pdf url_pdf_catalan url_pdf_euskera url_pdf_gallego url_pdf_valenciano url_eli departamento_codigo fecha_actualizacion analysis text text_xml cat(colnames(retrieve_document("BORME-S-2022-1"))) ## date sumario_nbo sumario_code section section_number emisor emisor_etq text publication ``` Many data is available to users which allows to analysis it: - Date of approval, date of publication - Department - Type of publication - Full text - Legal status - ... ] .pull-right[ For example: looking at publications from the universities: <img src="https://llrs.github.io/BOE_historico/universidades_files/figure-html/anuncios2-3.png" height="300" /> Almost half of the publications are due to people missing their degrees certificates. ] .bottom[.center[More examples at: https://llrs.github.io/BOE_historico/]] ??? Of each document all the fields reported by the xml file can be retrieved in a tidy format, which allows for nice analysis, graphs and statistics.