BLOCKWISE_OPS
Usage
bdblockmult_sparse_hdf5(filename, group, A, B, groupB = NULL, block_size = NULL, mixblock_size = NULL, paral = NULL, threads = NULL, outgroup = NULL, outdataset = NULL, overwrite = NULL)
Arguments
filename |
String indicating the HDF5 file path |
group |
String indicating the group path for matrix A |
A |
String specifying the dataset name for matrix A |
B |
String specifying the dataset name for matrix B |
groupB |
Optional string indicating group path for matrix B. If NULL, uses same group as A |
block_size |
Optional integer specifying block size for processing. If NULL, automatically determined based on matrix dimensions |
mixblock_size |
Optional integer for memory block size in parallel processing |
paral |
Optional boolean indicating whether to use parallel processing. Default is false |
threads |
Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads |
outgroup |
Optional string specifying output group. Default is “OUTPUT” |
outdataset |
Optional string specifying output dataset name. Default is “A_x_B” |
overwrite |
Optional boolean indicating whether to overwrite existing datasets. Default is false |
Value
Modifies the HDF5 file in place, adding the multiplication result
Details
The function implements optimized sparse matrix multiplication through: - Block-wise processing to manage memory usage - Automatic block size optimization - Parallel processing support - Efficient sparse matrix storage
Block size optimization considers: - Available system memory - Matrix dimensions and sparsity - Parallel processing requirements
Memory efficiency is achieved through: - Sparse matrix storage format - Block-wise processing - Minimal temporary storage - Proper resource cleanup
Examples
Code
library(Matrix)
library(BigDataStatMeth)
# Create sparse test matrices
k <- 1e3
set.seed(1)
x_sparse <- sparseMatrix(
i = sample(x = k, size = k),
j = sample(x = k, size = k),
x = rnorm(n = k)
)
set.seed(2)
y_sparse <- sparseMatrix(
i = sample(x = k, size = k),
j = sample(x = k, size = k),
x = rnorm(n = k)
)
# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", as.matrix(x_sparse), "SPARSE", "x_sparse")
bdCreate_hdf5_matrix("test.hdf5", as.matrix(y_sparse), "SPARSE", "y_sparse")
# Perform multiplication
bdblockmult_sparse_hdf5("test.hdf5", "SPARSE", "x_sparse", "y_sparse",
block_size = 1024,
paral = TRUE,
threads = 4)