bdgetSDandMean_hdf5

HDF5_STATISTICS

1 Description

Computes standard deviation and/or mean statistics for a matrix stored in HDF5 format, with support for row-wise or column-wise computations.

2 Usage

bdgetSDandMean_hdf5(filename, group, dataset, outgroup = NULL, outdataset = NULL, sd = NULL, mean = NULL, byrows = NULL, onmemory = NULL, wsize = NULL, overwrite = FALSE)

3 Arguments

Parameter	Description
`filename`	Character string. Path to the HDF5 file.
`group`	Character string. Path to the group containing the dataset.
`dataset`	Character string. Name of the dataset to analyze.
`sd`	Logical (optional). Whether to compute sd. Default is TRUE.
`outgroup`	Character string, custom output group name (default: mean_sd)
`outdataset`	Character string, custom correlation dataset name (default: mean.dataset_original_name and sd.dataset_original_name)
`mean`	Logical (optional). Whether to compute mean. Default is TRUE.
`byrows`	Logical (optional). Whether to compute by rows (TRUE) or columns (FALSE). Default is FALSE.
`wsize`	Integer (optional). Block size for processing. Default is 1000.
`onmemory`	logical (default = FALSE). If TRUE, results are kept in memory and returned as a matrix; nothing is written to disk. If FALSE, results are written to disk.
`overwrite`	Logical (optional). Whether to overwrite existing results. Default is FALSE.

4 Value

Depending on the parameter:

If onmemory = TRUE: List with components: \itemize{ \code{mean
If onmemory = FALSE: List with components: \itemize{ \code{fn

5 Details

This function provides efficient statistical computation capabilities with: - Computation options: - Standard deviation computation - Mean computation - Row-wise or column-wise processing - Processing features: - Block-based computation - Memory-efficient processing - Configurable block size - Implementation features: - Safe HDF5 file operations - Memory-efficient implementation - Comprehensive error handling

Results are stored in a new group ‘mean_sd’ within the HDF5 file.

6 Examples

Code

library(BigDataStatMeth)

# Create test matrices
set.seed(123)
Y <- matrix(rnorm(100), 10, 10)
X <- matrix(rnorm(10), 10, 1)

# Save to HDF5
bdCreate_hdf5_matrix("test.hdf5", Y, "data", "matrix1",
                     overwriteFile = TRUE)
bdCreate_hdf5_matrix("test.hdf5", X, "data", "vector1",
                     overwriteFile = FALSE)

# Compute statistics
bdgetSDandMean_hdf5(
  filename = "test.hdf5",
  group = "data",
  dataset = "matrix1",
  sd = TRUE,
  mean = TRUE,
  byrows = TRUE,
  wsize = 500
)

# Cleanup
if (file.exists("test.hdf5")) {
  file.remove("test.hdf5")
}

7 See Also

bdCreate_hdf5_matrix for creating HDF5 matrices

--- title: "bdgetSDandMean_hdf5" subtitle: "bdgetSDandMean_hdf5" --- <span class="category-badge hdf5_statistics">HDF5_STATISTICS</span> ## Description Computes standard deviation and/or mean statistics for a matrix stored in HDF5 format, with support for row-wise or column-wise computations. ## Usage ```r bdgetSDandMean_hdf5(filename, group, dataset, outgroup = NULL, outdataset = NULL, sd = NULL, mean = NULL, byrows = NULL, onmemory = NULL, wsize = NULL, overwrite = FALSE) ``` ## Arguments ::: {.param-table} | Parameter | Description | |-----------|-------------| | `filename` | Character string. Path to the HDF5 file. | | `group` | Character string. Path to the group containing the dataset. | | `dataset` | Character string. Name of the dataset to analyze. | | `sd` | Logical (optional). Whether to compute sd. Default is TRUE. | | `outgroup` | Character string, custom output group name (default: mean_sd) | | `outdataset` | Character string, custom correlation dataset name (default: mean.dataset_original_name and sd.dataset_original_name) | | `mean` | Logical (optional). Whether to compute mean. Default is TRUE. | | `byrows` | Logical (optional). Whether to compute by rows (TRUE) or columns (FALSE). Default is FALSE. | | `wsize` | Integer (optional). Block size for processing. Default is 1000. | | `onmemory` | logical (default = FALSE). If TRUE, results are kept in memory and returned as a matrix; nothing is written to disk. If FALSE, results are written to disk. | | `overwrite` | Logical (optional). Whether to overwrite existing results. Default is FALSE. | ::: ## Value ::: {.return-value} Depending on the \code{onmemory} parameter: - **`If onmemory = TRUE`**: List with components: \itemize{ \item \code{mean - **`If onmemory = FALSE`**: List with components: \itemize{ \item \code{fn ::: ## Details This function provides efficient statistical computation capabilities with: - Computation options: - Standard deviation computation - Mean computation - Row-wise or column-wise processing - Processing features: - Block-based computation - Memory-efficient processing - Configurable block size - Implementation features: - Safe HDF5 file operations - Memory-efficient implementation - Comprehensive error handling Results are stored in a new group 'mean_sd' within the HDF5 file. ## Examples ```{r} #| eval: false #| code-fold: show library(BigDataStatMeth) # Create test matrices set.seed(123) Y <- matrix(rnorm(100), 10, 10) X <- matrix(rnorm(10), 10, 1) # Save to HDF5 bdCreate_hdf5_matrix("test.hdf5", Y, "data", "matrix1", overwriteFile = TRUE) bdCreate_hdf5_matrix("test.hdf5", X, "data", "vector1", overwriteFile = FALSE) # Compute statistics bdgetSDandMean_hdf5( filename = "test.hdf5", group = "data", dataset = "matrix1", sd = TRUE, mean = TRUE, byrows = TRUE, wsize = 500 ) # Cleanup if (file.exists("test.hdf5")) { file.remove("test.hdf5") } ``` ## See Also ::: {.see-also} - [bdCreate_hdf5_matrix](../hdf5_io_management/bdCreate_hdf5_matrix.html) for creating HDF5 matrices :::