bdblockmult_sparse_hdf5

bdblockmult_sparse_hdf5

BLOCKWISE_OPS

1 Usage

bdblockmult_sparse_hdf5(filename, group, A, B, groupB = NULL, block_size = NULL, mixblock_size = NULL, paral = NULL, threads = NULL, outgroup = NULL, outdataset = NULL, overwrite = NULL)

2 Arguments

Parameter Description
filename String indicating the HDF5 file path
group String indicating the group path for matrix A
A String specifying the dataset name for matrix A
B String specifying the dataset name for matrix B
groupB Optional string indicating group path for matrix B. If NULL, uses same group as A
block_size Optional integer specifying block size for processing. If NULL, automatically determined based on matrix dimensions
mixblock_size Optional integer for memory block size in parallel processing
paral Optional boolean indicating whether to use parallel processing. Default is false
threads Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads
outgroup Optional string specifying output group. Default is “OUTPUT”
outdataset Optional string specifying output dataset name. Default is “A_x_B”
overwrite Optional boolean indicating whether to overwrite existing datasets. Default is false

3 Value

Modifies the HDF5 file in place, adding the multiplication result

4 Details

The function implements optimized sparse matrix multiplication through: - Block-wise processing to manage memory usage - Automatic block size optimization - Parallel processing support - Efficient sparse matrix storage

Block size optimization considers: - Available system memory - Matrix dimensions and sparsity - Parallel processing requirements

Memory efficiency is achieved through: - Sparse matrix storage format - Block-wise processing - Minimal temporary storage - Proper resource cleanup

5 Examples

Code
library(Matrix)
library(BigDataStatMeth)

# Create sparse test matrices
k <- 1e3
set.seed(1)
x_sparse <- sparseMatrix(
    i = sample(x = k, size = k),
    j = sample(x = k, size = k),
    x = rnorm(n = k)
)

set.seed(2)
y_sparse <- sparseMatrix(
    i = sample(x = k, size = k),
    j = sample(x = k, size = k),
    x = rnorm(n = k)
)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", as.matrix(x_sparse), "SPARSE", "x_sparse")
bdCreate_hdf5_matrix("test.hdf5", as.matrix(y_sparse), "SPARSE", "y_sparse")

# Perform multiplication
bdblockmult_sparse_hdf5("test.hdf5", "SPARSE", "x_sparse", "y_sparse",
                        block_size = 1024,
                        paral = TRUE,
                        threads = 4)