library(BigDataStatMeth)# Create test datadata <-matrix(rnorm(1000*100), 1000, 100)# Save to HDF5bdCreate_hdf5_matrix("test.hdf5", data, "data", "matrix",overwriteFile =TRUE)# Normalize databdNormalize_hdf5("test.hdf5", "data", "matrix",bcenter =TRUE,bscale =TRUE,wsize =1000)
Source Code
---title: "bdNormalize_hdf5"subtitle: "bdNormalize_hdf5"---<span class="category-badge hdf5_statistics">HDF5_STATISTICS</span>## Usage```rbdNormalize_hdf5(filename, group, dataset, bcenter =NULL, bscale =NULL, byrows =NULL, wsize =NULL, overwrite =FALSE)```## Arguments::: {.param-table}| Parameter | Description ||-----------|-------------||`filename`| String indicating the HDF5 file path ||`group`| String specifying the group containing the dataset ||`dataset`| String specifying the dataset name to normalize ||`bcenter`| Optional boolean indicating whether to center the data. If TRUE (default), subtracts mean from each column/row ||`bscale`| Optional boolean indicating whether to scale the data. If TRUE (default), divides by standard deviation ||`byrows`| Optional boolean indicating whether to operate by rows. If TRUE, processes row-wise; if FALSE (default), column-wise ||`wsize`| Optional integer specifying the block size for processing. Default is 1000 ||`overwrite`| Optional boolean indicating whether to overwrite existing datasets. Default is false |:::## Value::: {.return-value}List with components. If an error occurs, all string values are returned as empty strings (""):- **`fn`**: Character string. Path to the HDF5 file containing the results- **`ds`**: Character string. Full dataset path to the normalized data, stored under "NORMALIZED/\\[group\\]/\\[dataset\\]"- **`mean`**: Character string. Dataset path to the column means used for centering, stored under "NORMALIZED/\\[group\\]/mean.\\[dataset\\]"- **`sd`**: Character string. Dataset path to the standard deviations used for scaling, stored under "NORMALIZED/\\[group\\]/sd.\\[dataset\\]":::## DetailsThe function implements block-wise normalization through:Statistical computations:- Mean calculation (for centering)- Standard deviation calculation (for scaling)- Efficient block-wise updatesMemory efficiency:- Block-wise data processing- Minimal temporary storage- Proper resource cleanupProcessing options:- Row-wise or column-wise operations- Flexible block size selection- Optional centering and scalingError handling:- Input validation- Resource management- Exception handling## Examples```{r}#| eval: false#| code-fold: showlibrary(BigDataStatMeth)# Create test datadata <-matrix(rnorm(1000*100), 1000, 100)# Save to HDF5bdCreate_hdf5_matrix("test.hdf5", data, "data", "matrix",overwriteFile =TRUE)# Normalize databdNormalize_hdf5("test.hdf5", "data", "matrix",bcenter =TRUE,bscale =TRUE,wsize =1000)```