bdgetSDandMean_hdf5

HDF5_STATISTICS

1 Description

Computes standard deviation and/or mean statistics for a matrix stored in HDF5 format, with support for row-wise or column-wise computations.

2 Usage

bdgetSDandMean_hdf5(filename, group, dataset, outgroup = NULL, outdataset = NULL, sd = NULL, mean = NULL, byrows = NULL, onmemory = NULL, wsize = NULL, overwrite = FALSE)

3 Arguments

Parameter Description
filename Character string. Path to the HDF5 file.
group Character string. Path to the group containing the dataset.
dataset Character string. Name of the dataset to analyze.
sd Logical (optional). Whether to compute sd. Default is TRUE.
outgroup Character string, custom output group name (default: mean_sd)
outdataset Character string, custom correlation dataset name (default: mean.dataset_original_name and sd.dataset_original_name)
mean Logical (optional). Whether to compute mean. Default is TRUE.
byrows Logical (optional). Whether to compute by rows (TRUE) or columns (FALSE). Default is FALSE.
wsize Integer (optional). Block size for processing. Default is 1000.
onmemory logical (default = FALSE). If TRUE, results are kept in memory and returned as a matrix; nothing is written to disk. If FALSE, results are written to disk.
overwrite Logical (optional). Whether to overwrite existing results. Default is FALSE.

4 Value

Depending on the parameter:

  • If onmemory = TRUE: List with components: \itemize{ \code{mean
  • If onmemory = FALSE: List with components: \itemize{ \code{fn

5 Details

This function provides efficient statistical computation capabilities with: - Computation options: - Standard deviation computation - Mean computation - Row-wise or column-wise processing - Processing features: - Block-based computation - Memory-efficient processing - Configurable block size - Implementation features: - Safe HDF5 file operations - Memory-efficient implementation - Comprehensive error handling

Results are stored in a new group ‘mean_sd’ within the HDF5 file.

6 Examples

Code
library(BigDataStatMeth)

# Create test matrices
set.seed(123)
Y <- matrix(rnorm(100), 10, 10)
X <- matrix(rnorm(10), 10, 1)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", Y, "data", "matrix1",
                     overwriteFile = TRUE)
bdCreate_hdf5_matrix("test.hdf5", X, "data", "vector1",
                     overwriteFile = FALSE)

# Compute statistics
bdgetSDandMean_hdf5(
  filename = "test.hdf5",
  group = "data",
  dataset = "matrix1",
  sd = TRUE,
  mean = TRUE,
  byrows = TRUE,
  wsize = 500
)

# Cleanup
if (file.exists("test.hdf5")) {
  file.remove("test.hdf5")
}

7 See Also