bdSVD_hdf5

HDF5_ALGEBRA

1 Description

Computes the Singular Value Decomposition (SVD) of a large matrix stored in an HDF5 file. The SVD decomposes a matrix A into a product A = UDV’ where U and V are orthogonal matrices and D is a diagonal matrix containing the singular values.

2 Usage

bdSVD_hdf5(filename, group = NULL, dataset = NULL, k = 2L, q = 1L, bcenter = TRUE, bscale = TRUE, rankthreshold = 0.0, overwrite = NULL, method = NULL, threads = NULL)

3 Arguments

Parameter	Description
`filename`	Character string. Path to the HDF5 file containing the input matrix.
`group`	Character string. Path to the group containing the input dataset.
`dataset`	Character string. Name of the input dataset to decompose.
`k`	Integer. Number of local SVDs to concatenate at each level (default = 2). Controls the trade-off between memory usage and computation speed.
`q`	Integer. Number of levels for SVD computation (default = 1). Higher values can improve accuracy but increase computation time.
`bcenter`	Logical. If TRUE (default), centers the data by subtracting column means.
`bscale`	Logical. If TRUE (default), scales the centered columns by their standard deviations or root mean square.
`rankthreshold`	Numeric. Threshold for determining matrix rank (default = 0). Must be between 0 and 0.1. Used to approximate rank for nearly singular matrices.
`overwrite`	Logical. If TRUE, allows overwriting existing results.
`method`	Character string. Computation method: * “auto”: Automatically selects between “full” and “blocks” based on matrix size * “blocks”: Uses block-based computation (recommended for large matrices) * “full”: Performs direct computation without partitioning
`threads`	Integer. Number of threads for parallel computation.

4 Value

A list with the following elements:

fn: Path to the HDF5 file
ds_d: Path to the dataset containing singular values
ds_u: Path to the dataset containing left singular vectors
ds_v: Path to the dataset containing right singular vectors

5 Details

This function implements a block-based SVD algorithm suitable for large matrices that may not fit in memory. Key features include: - Automatic method selection based on matrix size - Block-based computation for large matrices - Data centering and scaling options - Parallel processing support - Rank approximation through threshold - Memory-efficient incremental algorithm

The implementation uses an incremental algorithm with two key parameters: - k: number of local SVDs to concatenate at each level - q: number of levels in the computation

6 Examples

Code

# Create a sample large matrix in HDF5

library(BigDataStatMeth)
library(rhdf5)

# Create a sample large matrix in HDF5
A <- matrix(rnorm(10000), 1000, 10)

fn <- "test_temp.hdf5"
bdCreate_hdf5_matrix(filename = fn, object = A, group = "data", dataset = "matrix")

# Compute SVD with default parameters
res <- bdSVD_hdf5(fn, "data", "matrix")

# Compute SVD with custom parameters
res <- bdSVD_hdf5(fn, "data", "matrix",
           k = 4, q = 2,
           bcenter = TRUE, bscale = TRUE,
           method = "blocks",
           threads = 4)

# list contents
h5ls(res$fn)

# Extract the result from HDF5 (d)
result_d_hdf5 <- h5read(res$fn, res$ds_d)
result_d_hdf5

# Compute the same SVD in R
result_d_r <- svd(A)$d
result_d_r

# Compare both results (should be TRUE)
all.equal(result_d_hdf5, result_d_r)

# Remove file
if (file.exists(fn)) {
  file.remove(fn)
}

7 See Also

bdPCA_hdf5 for Principal Component Analysis
bdQR_hdf5 for QR decomposition

--- title: "bdSVD_hdf5" subtitle: "bdSVD_hdf5" --- <span class="category-badge hdf5_algebra">HDF5_ALGEBRA</span> ## Description Computes the Singular Value Decomposition (SVD) of a large matrix stored in an HDF5 file. The SVD decomposes a matrix A into a product A = UDV' where U and V are orthogonal matrices and D is a diagonal matrix containing the singular values. ## Usage ```r bdSVD_hdf5(filename, group = NULL, dataset = NULL, k = 2L, q = 1L, bcenter = TRUE, bscale = TRUE, rankthreshold = 0.0, overwrite = NULL, method = NULL, threads = NULL) ``` ## Arguments ::: {.param-table} | Parameter | Description | |-----------|-------------| | `filename` | Character string. Path to the HDF5 file containing the input matrix. | | `group` | Character string. Path to the group containing the input dataset. | | `dataset` | Character string. Name of the input dataset to decompose. | | `k` | Integer. Number of local SVDs to concatenate at each level (default = 2). Controls the trade-off between memory usage and computation speed. | | `q` | Integer. Number of levels for SVD computation (default = 1). Higher values can improve accuracy but increase computation time. | | `bcenter` | Logical. If TRUE (default), centers the data by subtracting column means. | | `bscale` | Logical. If TRUE (default), scales the centered columns by their standard deviations or root mean square. | | `rankthreshold` | Numeric. Threshold for determining matrix rank (default = 0). Must be between 0 and 0.1. Used to approximate rank for nearly singular matrices. | | `overwrite` | Logical. If TRUE, allows overwriting existing results. | | `method` | Character string. Computation method: * "auto": Automatically selects between "full" and "blocks" based on matrix size * "blocks": Uses block-based computation (recommended for large matrices) * "full": Performs direct computation without partitioning | | `threads` | Integer. Number of threads for parallel computation. | ::: ## Value ::: {.return-value} A list with the following elements: - **`fn`**: Path to the HDF5 file - **`ds_d`**: Path to the dataset containing singular values - **`ds_u`**: Path to the dataset containing left singular vectors - **`ds_v`**: Path to the dataset containing right singular vectors ::: ## Details This function implements a block-based SVD algorithm suitable for large matrices that may not fit in memory. Key features include: - Automatic method selection based on matrix size - Block-based computation for large matrices - Data centering and scaling options - Parallel processing support - Rank approximation through threshold - Memory-efficient incremental algorithm The implementation uses an incremental algorithm with two key parameters: - k: number of local SVDs to concatenate at each level - q: number of levels in the computation ## Examples ```{r} #| eval: false #| code-fold: show # Create a sample large matrix in HDF5 library(BigDataStatMeth) library(rhdf5) # Create a sample large matrix in HDF5 A <- matrix(rnorm(10000), 1000, 10) fn <- "test_temp.hdf5" bdCreate_hdf5_matrix(filename = fn, object = A, group = "data", dataset = "matrix") # Compute SVD with default parameters res <- bdSVD_hdf5(fn, "data", "matrix") # Compute SVD with custom parameters res <- bdSVD_hdf5(fn, "data", "matrix", k = 4, q = 2, bcenter = TRUE, bscale = TRUE, method = "blocks", threads = 4) # list contents h5ls(res$fn) # Extract the result from HDF5 (d) result_d_hdf5 <- h5read(res$fn, res$ds_d) result_d_hdf5 # Compute the same SVD in R result_d_r <- svd(A)$d result_d_r # Compare both results (should be TRUE) all.equal(result_d_hdf5, result_d_r) # Remove file if (file.exists(fn)) { file.remove(fn) } ``` ## See Also ::: {.see-also} - [bdPCA_hdf5](bdPCA_hdf5.html) for Principal Component Analysis - [bdQR_hdf5](bdQR_hdf5.html) for QR decomposition :::