String indicating the input group containing matrix A
A
String specifying the dataset name for matrix A
B
Optional string specifying dataset name for matrix B. If NULL, performs A^t * A
groupB
Optional string indicating group containing matrix B. If NULL, uses same group as A
block_size
Optional integer specifying the block size for processing. Default is automatically determined based on matrix dimensions
mixblock_size
Optional integer for memory block size in parallel processing
paral
Optional boolean indicating whether to use parallel processing. Default is false
threads
Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads
outgroup
Optional string specifying output group. Default is “OUTPUT”
outdataset
Optional string specifying output dataset name. Default is “CrossProd_A_x_B”
overwrite
Optional boolean indicating whether to overwrite existing datasets. Default is false
3 Value
A list containing the location of the crossproduct result:
fn: Character string. Path to the HDF5 file containing the result
ds: Character string. Full dataset path to the crossproduct result (t(A) %% A or t(A) %% B) within the HDF5 file
4 Details
The function implements block-wise matrix multiplication to handle large matrices efficiently. Block size is automatically optimized based on: - Available memory - Matrix dimensions - Whether parallel processing is enabled
For parallel processing: - Uses OpenMP for thread management - Implements cache-friendly block operations - Provides automatic thread count optimization
Memory efficiency is achieved through: - Block-wise reading and writing - Minimal temporary storage - Proper resource cleanup
5 Examples
Code
library(BigDataStatMeth)library(rhdf5)# Create test matrix N =1000 M =1000set.seed(555) a <-matrix(rnorm(N*M), N, M)# Save to HDF5bdCreate_hdf5_matrix("test.hdf5", a, "INPUT", "A", overwriteFile =TRUE)# Compute cross productbdCrossprod_hdf5("test.hdf5", "INPUT", "A", outgroup ="OUTPUT",outdataset ="result",block_size =1024,paral =TRUE,threads =4)
Source Code
---title: "bdCrossprod_hdf5"subtitle: "bdCrossprod_hdf5"---<span class="category-badge hdf5_algebra">HDF5_ALGEBRA</span>## Usage```rbdCrossprod_hdf5(filename, group, A, B =NULL, groupB =NULL, block_size =NULL, mixblock_size =NULL, paral =NULL, threads =NULL, outgroup =NULL, outdataset =NULL, overwrite =NULL)```## Arguments::: {.param-table}| Parameter | Description ||-----------|-------------||`filename`| String indicating the HDF5 file path ||`group`| String indicating the input group containing matrix A ||`A`| String specifying the dataset name for matrix A ||`B`| Optional string specifying dataset name for matrix B. If NULL, performs A^t * A ||`groupB`| Optional string indicating group containing matrix B. If NULL, uses same group as A ||`block_size`| Optional integer specifying the block size for processing. Default is automatically determined based on matrix dimensions ||`mixblock_size`| Optional integer for memory block size in parallel processing ||`paral`| Optional boolean indicating whether to use parallel processing. Default is false ||`threads`| Optional integer specifying number of threads for parallel processing. If NULL, uses maximum available threads ||`outgroup`| Optional string specifying output group. Default is "OUTPUT" ||`outdataset`| Optional string specifying output dataset name. Default is "CrossProd_A_x_B" ||`overwrite`| Optional boolean indicating whether to overwrite existing datasets. Default is false |:::## Value::: {.return-value}A list containing the location of the crossproduct result:- **`fn`**: Character string. Path to the HDF5 file containing the result- **`ds`**: Character string. Full dataset path to the crossproduct result (t(A) %*% A or t(A) %*% B) within the HDF5 file:::## DetailsThe function implements block-wise matrix multiplication to handle large matricesefficiently. Block size is automatically optimized based on:- Available memory- Matrix dimensions- Whether parallel processing is enabledFor parallel processing:- Uses OpenMP for thread management- Implements cache-friendly block operations- Provides automatic thread count optimizationMemory efficiency is achieved through:- Block-wise reading and writing- Minimal temporary storage- Proper resource cleanup## Examples```{r}#| eval: false#| code-fold: showlibrary(BigDataStatMeth)library(rhdf5)# Create test matrix N =1000 M =1000set.seed(555) a <-matrix(rnorm(N*M), N, M)# Save to HDF5bdCreate_hdf5_matrix("test.hdf5", a, "INPUT", "A", overwriteFile =TRUE)# Compute cross productbdCrossprod_hdf5("test.hdf5", "INPUT", "A", outgroup ="OUTPUT",outdataset ="result",block_size =1024,paral =TRUE,threads =4)```