bdBind_hdf5_datasets

bdBind_hdf5_datasets

HDF5_IO_MANAGEMENT

1 Usage

bdBind_hdf5_datasets(filename, group, datasets, outgroup, outdataset, func, overwrite = FALSE)

2 Arguments

Parameter Description
filename Character array indicating the name of the file to create
group Character array indicating the input group containing the datasets
datasets Character array specifying the input datasets to bind
outgroup Character array indicating the output group for the merged dataset. If NULL, output is stored in the same input group
outdataset Character array specifying the name for the new merged dataset
func Character array specifying the binding operation: - “bindRows”: Merge datasets by rows (vertical stacking) - “bindCols”: Merge datasets by columns (horizontal joining) - “bindRowsbyIndex”: Merge datasets by rows using an index
overwrite Boolean indicating whether to overwrite existing datasets. Defaults to false

3 Value

A list containing the location of the combined dataset:

  • fn: Character string. Path to the HDF5 file containing the result
  • ds: Character string. Full dataset path to the bound/combined dataset within the HDF5 file

4 Details

The function performs dimension validation before binding: - For row binding: All datasets must have the same number of columns - For column binding: All datasets must have the same number of rows

Memory efficiency is achieved through: - Block-wise reading and writing - Minimal data copying - Proper resource cleanup

5 Examples

Code
library(BigDataStatMeth)

# Create test matrices
a <- matrix(1:12, 4, 3)
b <- matrix(13:24, 4, 3)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", a, "data", "A")
bdCreate_hdf5_matrix("test.hdf5", b, "data", "B")

# Bind by rows
bdBind_hdf5_datasets("test.hdf5", "data", 
                     c("A", "B"),
                     "results", "combined",
                     "bindRows")