bdtCrossprod_hdf5

HDF5_ALGEBRA

1 Usage

bdtCrossprod_hdf5(filename, group, A, B = NULL, groupB = NULL, block_size = NULL, mixblock_size = NULL, paral = NULL, threads = NULL, outgroup = NULL, outdataset = NULL, overwrite = NULL)

2 Arguments

Parameter Description
filename String indicating the HDF5 file path
group String indicating the input group containing matrix A
A String specifying the dataset name for matrix A
B Optional string specifying dataset name for matrix B. If NULL, performs A * A^t
groupB Optional string indicating group containing matrix B. If NULL, uses same group as A
block_size Optional integer specifying the block size for processing. Default is automatically determined based on matrix dimensions
mixblock_size Optional integer for memory block size in parallel processing
paral Optional boolean indicating whether to use parallel processing. Default is false
threads Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads
outgroup Optional string specifying output group. Default is “OUTPUT”
outdataset Optional string specifying output dataset name. Default is “tCrossProd_A_x_B”
overwrite Optional boolean indicating whether to overwrite existing datasets. Default is false

3 Value

A list containing the location of the transposed crossproduct result:

  • fn: Character string. Path to the HDF5 file containing the result
  • ds: Character string. Full dataset path to the transposed crossproduct result (A %% t(A) or A %% t(B)) within the HDF5 file

4 Details

The function implements block-wise matrix multiplication to handle large matrices efficiently. Block size is automatically optimized based on: - Available memory - Matrix dimensions - Whether parallel processing is enabled

For parallel processing: - Uses OpenMP for thread management - Implements cache-friendly block operations - Provides automatic thread count optimization

Memory efficiency is achieved through: - Block-wise reading and writing - Minimal temporary storage - Proper resource cleanup

Mathematical operations: - For single matrix A: computes A * A^t - For two matrices A, B: computes A * B^t - Optimized for numerical stability

5 Examples

Code
library(BigDataStatMeth)
library(rhdf5)

# Create test matrix
N <- 1000
M <- 1000
set.seed(555)
a <- matrix(rnorm(N*M), N, M)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", a, "INPUT", "A",
                     overwriteFile = TRUE)

# Compute transposed cross product
bdtCrossprod_hdf5("test.hdf5", "INPUT", "A",
                  outgroup = "OUTPUT",
                  outdataset = "result",
                  block_size = 1024,
                  paral = TRUE,
                  threads = 4)