bdCrossprod_hdf5

HDF5_ALGEBRA

1 Usage

bdCrossprod_hdf5(filename, group, A, B = NULL, groupB = NULL, block_size = NULL, mixblock_size = NULL, paral = NULL, threads = NULL, outgroup = NULL, outdataset = NULL, overwrite = NULL)

2 Arguments

Parameter Description
filename String indicating the HDF5 file path
group String indicating the input group containing matrix A
A String specifying the dataset name for matrix A
B Optional string specifying dataset name for matrix B. If NULL, performs A^t * A
groupB Optional string indicating group containing matrix B. If NULL, uses same group as A
block_size Optional integer specifying the block size for processing. Default is automatically determined based on matrix dimensions
mixblock_size Optional integer for memory block size in parallel processing
paral Optional boolean indicating whether to use parallel processing. Default is false
threads Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads
outgroup Optional string specifying output group. Default is “OUTPUT”
outdataset Optional string specifying output dataset name. Default is “CrossProd_A_x_B”
overwrite Optional boolean indicating whether to overwrite existing datasets. Default is false

3 Value

A list containing the location of the crossproduct result:

  • fn: Character string. Path to the HDF5 file containing the result
  • ds: Character string. Full dataset path to the crossproduct result (t(A) %% A or t(A) %% B) within the HDF5 file

4 Details

The function implements block-wise matrix multiplication to handle large matrices efficiently. Block size is automatically optimized based on: - Available memory - Matrix dimensions - Whether parallel processing is enabled

For parallel processing: - Uses OpenMP for thread management - Implements cache-friendly block operations - Provides automatic thread count optimization

Memory efficiency is achieved through: - Block-wise reading and writing - Minimal temporary storage - Proper resource cleanup

5 Examples

Code
library(BigDataStatMeth)
  library(rhdf5)
  
  # Create test matrix
  N = 1000
  M = 1000
  set.seed(555)
  a <- matrix(rnorm(N*M), N, M)
  
  # Save to HDF5
  bdCreate_hdf5_matrix("test.hdf5", a, "INPUT", "A", overwriteFile = TRUE)
  
  # Compute cross product
  bdCrossprod_hdf5("test.hdf5", "INPUT", "A", 
                   outgroup = "OUTPUT",
                   outdataset = "result",
                   block_size = 1024,
                   paral = TRUE,
                   threads = 4)