bdcomputeMatrixVector_hdf5

bdcomputeMatrixVector_hdf5

BLOCKWISE_OPS

1 Description

Performs element-wise operations between a matrix and a vector stored in HDF5 format. The function supports addition, subtraction, multiplication, division and power operations, with options for row-wise or column-wise application and parallel processing.

2 Usage

bdcomputeMatrixVector_hdf5(filename, group, dataset, vectorgroup, vectordataset, outdataset, func, outgroup = NULL, byrows = NULL, paral = NULL, threads = NULL, overwrite = FALSE)

3 Arguments

Parameter Description
filename String. Path to the HDF5 file containing the datasets.
group String. Path to the group containing the matrix dataset.
dataset String. Name of the matrix dataset.
vectorgroup String. Path to the group containing the vector dataset.
vectordataset String. Name of the vector dataset.
outdataset String. Name for the output dataset.
func String. Operation to perform: “+”, “-”, “*“,”/“, or”pow”.
outgroup Optional string. Output group path. If not provided, results are stored in the same group as the input matrix.
byrows Logical. If TRUE, applies operation by rows. If FALSE (default), applies operation by columns.
paral Logical. If TRUE, enables parallel processing.
threads Integer. Number of threads for parallel processing. Ignored if paral is FALSE.
overwrite Logical. If TRUE, allows overwriting existing datasets.

4 Value

List with components:

  • fn: Character string with the HDF5 filename
  • gr: Character string with the HDF5 group
  • ds: Character string with the full dataset path (group/dataset)

5 Details

This function provides a flexible interface for performing element-wise operations between matrices and vectors stored in HDF5 format. It supports: - Four basic operations: - Addition (+): Adds vector elements to matrix rows/columns - Subtraction (-): Subtracts vector elements from matrix rows/columns - Multiplication (*): Multiplies matrix rows/columns by vector elements - Division (/): Divides matrix rows/columns by vector elements - Power (pow): power matrix rows/columns by vector elements - Processing options: - Row-wise or column-wise operations - Parallel processing for improved performance - Configurable thread count for parallel execution - Memory-efficient processing for large datasets

The function performs extensive validation: - Checks matrix and vector dimensions for compatibility - Validates operation type - Verifies HDF5 file and dataset accessibility - Ensures proper data structures (matrix vs. vector)

6 Examples

library(BigDataStatMeth)
    
# Create test data
set.seed(123)
Y <- matrix(rnorm(100), 10, 10)
X <- matrix(rnorm(10), 10, 1)
        
# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", Y, "data", "Y",
                     overwriteFile = TRUE,
                     overwriteDataset = FALSE,
                     unlimited = FALSE)
bdCreate_hdf5_matrix("test.hdf5", X, "data", "X",
                     overwriteFile = FALSE,
                     overwriteDataset = FALSE,
                     unlimited = FALSE)
            
# Multiply matrix rows by vector
bdcomputeMatrixVector_hdf5("test.hdf5",
                           group = "data",
                           dataset = "Y",
                           vectorgroup = "data",
                           vectordataset = "X",
                           outdataset = "ProdComputed",
                           func = "*",
                           byrows = TRUE,
                           overwrite = TRUE)
    
# Subtract vector from matrix rows
bdcomputeMatrixVector_hdf5("test.hdf5",
                           group = "data",
                           dataset = "Y",
                           vectorgroup = "data",
                           vectordataset = "X",
                           outdataset = "SubsComputed",
                           func = "-",
                           byrows = TRUE,
                           overwrite = TRUE)
    
# Subtract vector from matrix columns
bdcomputeMatrixVector_hdf5("test.hdf5",
                           group = "data",
                           dataset = "Y",
                           vectorgroup = "data",
                           vectordataset = "X",
                           outdataset = "SubsComputed",
                           func = "-",
                           byrows = FALSE,
                           overwrite = TRUE)
                           
# Cleanup
if (file.exists("test.hdf5")) {
  file.remove("test.hdf5")
}

7 See Also