bdCrossprod_hdf5

HDF5_ALGEBRA

1 Usage

bdCrossprod_hdf5(filename, group, A, B = NULL, groupB = NULL, block_size = NULL, mixblock_size = NULL, paral = NULL, threads = NULL, outgroup = NULL, outdataset = NULL, overwrite = NULL)

2 Arguments

Parameter	Description
`filename`	String indicating the HDF5 file path
`group`	String indicating the input group containing matrix A
`A`	String specifying the dataset name for matrix A
`B`	Optional string specifying dataset name for matrix B. If NULL, performs A^t * A
`groupB`	Optional string indicating group containing matrix B. If NULL, uses same group as A
`block_size`	Optional integer specifying the block size for processing. Default is automatically determined based on matrix dimensions
`mixblock_size`	Optional integer for memory block size in parallel processing
`paral`	Optional boolean indicating whether to use parallel processing. Default is false
`threads`	Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads
`outgroup`	Optional string specifying output group. Default is “OUTPUT”
`outdataset`	Optional string specifying output dataset name. Default is “CrossProd_A_x_B”
`overwrite`	Optional boolean indicating whether to overwrite existing datasets. Default is false

3 Value

A list containing the location of the crossproduct result:

fn: Character string. Path to the HDF5 file containing the result
ds: Character string. Full dataset path to the crossproduct result (t(A) %% A or t(A) %% B) within the HDF5 file

4 Details

The function implements block-wise matrix multiplication to handle large matrices efficiently. Block size is automatically optimized based on: - Available memory - Matrix dimensions - Whether parallel processing is enabled

For parallel processing: - Uses OpenMP for thread management - Implements cache-friendly block operations - Provides automatic thread count optimization

Memory efficiency is achieved through: - Block-wise reading and writing - Minimal temporary storage - Proper resource cleanup

5 Examples

Code

library(BigDataStatMeth)
  library(rhdf5)
  
  # Create test matrix
  N = 1000
  M = 1000
  set.seed(555)
  a <- matrix(rnorm(N*M), N, M)
  
  # Save to HDF5
  bdCreate_hdf5_matrix("test.hdf5", a, "INPUT", "A", overwriteFile = TRUE)
  
  # Compute cross product
  bdCrossprod_hdf5("test.hdf5", "INPUT", "A", 
                   outgroup = "OUTPUT",
                   outdataset = "result",
                   block_size = 1024,
                   paral = TRUE,
                   threads = 4)

--- title: "bdCrossprod_hdf5" subtitle: "bdCrossprod_hdf5" --- <span class="category-badge hdf5_algebra">HDF5_ALGEBRA</span> ## Usage ```r bdCrossprod_hdf5(filename, group, A, B = NULL, groupB = NULL, block_size = NULL, mixblock_size = NULL, paral = NULL, threads = NULL, outgroup = NULL, outdataset = NULL, overwrite = NULL) ``` ## Arguments ::: {.param-table} | Parameter | Description | |-----------|-------------| | `filename` | String indicating the HDF5 file path | | `group` | String indicating the input group containing matrix A | | `A` | String specifying the dataset name for matrix A | | `B` | Optional string specifying dataset name for matrix B. If NULL, performs A^t * A | | `groupB` | Optional string indicating group containing matrix B. If NULL, uses same group as A | | `block_size` | Optional integer specifying the block size for processing. Default is automatically determined based on matrix dimensions | | `mixblock_size` | Optional integer for memory block size in parallel processing | | `paral` | Optional boolean indicating whether to use parallel processing. Default is false | | `threads` | Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads | | `outgroup` | Optional string specifying output group. Default is "OUTPUT" | | `outdataset` | Optional string specifying output dataset name. Default is "CrossProd_A_x_B" | | `overwrite` | Optional boolean indicating whether to overwrite existing datasets. Default is false | ::: ## Value ::: {.return-value} A list containing the location of the crossproduct result: - **`fn`**: Character string. Path to the HDF5 file containing the result - **`ds`**: Character string. Full dataset path to the crossproduct result (t(A) %*% A or t(A) %*% B) within the HDF5 file ::: ## Details The function implements block-wise matrix multiplication to handle large matrices efficiently. Block size is automatically optimized based on: - Available memory - Matrix dimensions - Whether parallel processing is enabled For parallel processing: - Uses OpenMP for thread management - Implements cache-friendly block operations - Provides automatic thread count optimization Memory efficiency is achieved through: - Block-wise reading and writing - Minimal temporary storage - Proper resource cleanup ## Examples ```{r} #| eval: false #| code-fold: show library(BigDataStatMeth) library(rhdf5) # Create test matrix N = 1000 M = 1000 set.seed(555) a <- matrix(rnorm(N*M), N, M) # Save to HDF5 bdCreate_hdf5_matrix("test.hdf5", a, "INPUT", "A", overwriteFile = TRUE) # Compute cross product bdCrossprod_hdf5("test.hdf5", "INPUT", "A", outgroup = "OUTPUT", outdataset = "result", block_size = 1024, paral = TRUE, threads = 4) ```