hdf5matrix_options

hdf5matrix_options

HDF5MATRIX_CORE

1 Description

Configure global settings for parallelization, block processing and compression in HDF5Matrix operations. These settings affect all HDF5Matrix computations unless explicitly overridden in individual method calls.

2 Usage

hdf5matrix_options(...)

3 Arguments

Parameter Description
paral Logical or NULL. Enable OpenMP parallelization?
block_size Integer or NULL. Number of elements per block for block-wise processing.
threads Integer or NULL. Number of OpenMP threads to use.
compression Integer (0-9) or NULL. gzip compression level for created datasets.

4 Value

When called with arguments: invisibly returns a list of all current options. When called without arguments: returns a list of all current options.

5 Details

BigDataStatMeth achieves high performance through two key mechanisms:

Block-wise processing: Large matrices are processed in chunks that fit in memory. The block_size parameter controls chunk size. Smaller blocks use less memory but require more I/O operations. Larger blocks are faster but require more RAM.

OpenMP parallelization: Operations are distributed across CPU cores. The paral and threads parameters control this. Parallelization provides near-linear speedup for compute-intensive operations.

Compression: Datasets are created with gzip compression (level 6 by default). This reduces disk usage by 60-80% at the cost of additional CPU time for compress/decompress. For benchmarks or workflows where speed is critical, set compression = 0. For long-term storage or large datasets, keep the default.

Priority: Options set here serve as defaults. Individual method calls can override: A$multiply(B, paral = TRUE, threads = 4, block_size = 2000)

Recommendations:

6 Examples

# View current options
hdf5matrix_options()

# Enable parallelization with 8 threads
hdf5matrix_options(paral = TRUE, threads = 8)

# Set block size to 1000 elements
hdf5matrix_options(block_size = 1000)

# Disable compression for benchmarking
hdf5matrix_options(compression = 0)

# Reset to defaults
hdf5matrix_options(paral = NULL, threads = NULL, block_size = NULL, compression = NULL)