This function allows the user to quickly connect to the data converted to DuckDB with the spod_convert_to_duckdb()
function. This function is a simplificaiton of the connection process. It uses
Usage
spod_connect(
data_path,
target_table_name = NULL,
quiet = FALSE,
max_mem_gb = max(4, spod_available_ram() - 4),
max_n_cpu = parallelly::availableCores() - 1,
temp_path = spod_get_temp_dir()
)
Arguments
- data_path
a path to the
DuckDB
database file with '.duckdb' extension, or a path to the folder withparquet
files. Eigher one should have been created with thespod_convert()
function.- target_table_name
Default is
NULL
. When connecting to a folder ofparquet
files, this argument is ignored. When connecting to aDuckDB
database, acharacter
vector of length 1 with the table name to open from the database file. If not specified, it will be guessed from thedata_path
argument and from table names that are available in the database. If you have not manually interfered with the database, this should be guessed automatically and you do not need to specify it.- quiet
A
logical
value indicating whether to suppress messages. Default isFALSE
.- max_mem_gb
The maximum memory to use in GB. A conservative default is 3 GB, which should be enough for resaving the data to DuckDB form a folder of CSV.gz files while being small enough to fit in memory of most even old computers. For data analysis using the already converted data (in DuckDB or Parquet format) or with the raw CSV.gz data, it is recommended to increase it according to available resources.
- max_n_cpu
The maximum number of threads to use. Defaults to the number of available cores minus 1.
- temp_path
The path to the temp folder for DuckDB for intermediate spilling in case the set memory limit and/or physical memory of the computer is too low to perform the query. By default this is set to the
temp
directory in the data folder defined by SPANISH_OD_DATA_DIR environment variable. Otherwise, for queries on folders of CSV files or parquet files, the temporary path would be set to the current R working directory, which probably is undesirable, as the current working directory can be on a slow storage, or storage that may have limited space, compared to the data folder.