bdNormalize_hdf5

HDF5_STATISTICS

1 Usage

bdNormalize_hdf5(filename, group, dataset, bcenter = NULL, bscale = NULL, byrows = NULL, wsize = NULL, overwrite = FALSE)

2 Arguments

Parameter Description
filename String indicating the HDF5 file path
group String specifying the group containing the dataset
dataset String specifying the dataset name to normalize
bcenter Optional boolean indicating whether to center the data. If TRUE (default), subtracts mean from each column/row
bscale Optional boolean indicating whether to scale the data. If TRUE (default), divides by standard deviation
byrows Optional boolean indicating whether to operate by rows. If TRUE, processes row-wise; if FALSE (default), column-wise
wsize Optional integer specifying the block size for processing. Default is 1000
overwrite Optional boolean indicating whether to overwrite existing datasets. Default is false

3 Value

List with components. If an error occurs, all string values are returned as empty strings (““):

  • fn: Character string. Path to the HDF5 file containing the results
  • ds: Character string. Full dataset path to the normalized data, stored under “NORMALIZED/\[group\]/\[dataset\]”
  • mean: Character string. Dataset path to the column means used for centering, stored under “NORMALIZED/\[group\]/mean.\[dataset\]”
  • sd: Character string. Dataset path to the standard deviations used for scaling, stored under “NORMALIZED/\[group\]/sd.\[dataset\]”

4 Details

The function implements block-wise normalization through:

Statistical computations: - Mean calculation (for centering) - Standard deviation calculation (for scaling) - Efficient block-wise updates

Memory efficiency: - Block-wise data processing - Minimal temporary storage - Proper resource cleanup

Processing options: - Row-wise or column-wise operations - Flexible block size selection - Optional centering and scaling

Error handling: - Input validation - Resource management - Exception handling

5 Examples

Code
library(BigDataStatMeth)

# Create test data
data <- matrix(rnorm(1000*100), 1000, 100)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", data, "data", "matrix",
                     overwriteFile = TRUE)

# Normalize data
bdNormalize_hdf5("test.hdf5", "data", "matrix",
                 bcenter = TRUE,
                 bscale = TRUE,
                 wsize = 1000)