Computes the Singular Value Decomposition (SVD) of a large matrix stored in an HDF5 file. The SVD decomposes a matrix A into a product A = UDV’ where U and V are orthogonal matrices and D is a diagonal matrix containing the singular values.
Character string. Path to the HDF5 file containing the input matrix.
group
Character string. Path to the group containing the input dataset.
dataset
Character string. Name of the input dataset to decompose.
k
Integer. Number of local SVDs to concatenate at each level (default = 2). Controls the trade-off between memory usage and computation speed.
q
Integer. Number of levels for SVD computation (default = 1). Higher values can improve accuracy but increase computation time.
bcenter
Logical. If TRUE (default), centers the data by subtracting column means.
bscale
Logical. If TRUE (default), scales the centered columns by their standard deviations or root mean square.
rankthreshold
Numeric. Threshold for determining matrix rank (default = 0). Must be between 0 and 0.1. Used to approximate rank for nearly singular matrices.
overwrite
Logical. If TRUE, allows overwriting existing results.
method
Character string. Computation method: * “auto”: Automatically selects between “full” and “blocks” based on matrix size * “blocks”: Uses block-based computation (recommended for large matrices) * “full”: Performs direct computation without partitioning
threads
Integer. Number of threads for parallel computation.
4 Value
A list with the following elements:
fn: Path to the HDF5 file
ds_d: Path to the dataset containing singular values
ds_u: Path to the dataset containing left singular vectors
ds_v: Path to the dataset containing right singular vectors
5 Details
This function implements a block-based SVD algorithm suitable for large matrices that may not fit in memory. Key features include: - Automatic method selection based on matrix size - Block-based computation for large matrices - Data centering and scaling options - Parallel processing support - Rank approximation through threshold - Memory-efficient incremental algorithm
The implementation uses an incremental algorithm with two key parameters: - k: number of local SVDs to concatenate at each level - q: number of levels in the computation
6 Examples
Code
# Create a sample large matrix in HDF5library(BigDataStatMeth)library(rhdf5)# Create a sample large matrix in HDF5A <-matrix(rnorm(10000), 1000, 10)fn <-"test_temp.hdf5"bdCreate_hdf5_matrix(filename = fn, object = A, group ="data", dataset ="matrix")# Compute SVD with default parametersres <-bdSVD_hdf5(fn, "data", "matrix")# Compute SVD with custom parametersres <-bdSVD_hdf5(fn, "data", "matrix",k =4, q =2,bcenter =TRUE, bscale =TRUE,method ="blocks",threads =4)# list contentsh5ls(res$fn)# Extract the result from HDF5 (d)result_d_hdf5 <-h5read(res$fn, res$ds_d)result_d_hdf5# Compute the same SVD in Rresult_d_r <-svd(A)$dresult_d_r# Compare both results (should be TRUE)all.equal(result_d_hdf5, result_d_r)# Remove fileif (file.exists(fn)) {file.remove(fn)}
---title: "bdSVD_hdf5"subtitle: "bdSVD_hdf5"---<span class="category-badge hdf5_algebra">HDF5_ALGEBRA</span>## DescriptionComputes the Singular Value Decomposition (SVD) of a large matrix stored in an HDF5 file.The SVD decomposes a matrix A into a product A = UDV' where U and V are orthogonalmatrices and D is a diagonal matrix containing the singular values.## Usage```rbdSVD_hdf5(filename, group =NULL, dataset =NULL, k =2L, q =1L, bcenter =TRUE, bscale =TRUE, rankthreshold =0.0, overwrite =NULL, method =NULL, threads =NULL)```## Arguments::: {.param-table}| Parameter | Description ||-----------|-------------||`filename`| Character string. Path to the HDF5 file containing the input matrix. ||`group`| Character string. Path to the group containing the input dataset. ||`dataset`| Character string. Name of the input dataset to decompose. ||`k`| Integer. Number of local SVDs to concatenate at each level (default = 2). Controls the trade-off between memory usage and computation speed. ||`q`| Integer. Number of levels for SVD computation (default = 1). Higher values can improve accuracy but increase computation time. ||`bcenter`| Logical. If TRUE (default), centers the data by subtracting column means. ||`bscale`| Logical. If TRUE (default), scales the centered columns by their standard deviations or root mean square. ||`rankthreshold`| Numeric. Threshold for determining matrix rank (default = 0). Must be between 0 and 0.1. Used to approximate rank for nearly singular matrices. ||`overwrite`| Logical. If TRUE, allows overwriting existing results. ||`method`| Character string. Computation method: * "auto": Automatically selects between "full" and "blocks" based on matrix size * "blocks": Uses block-based computation (recommended for large matrices) * "full": Performs direct computation without partitioning ||`threads`| Integer. Number of threads for parallel computation. |:::## Value::: {.return-value}A list with the following elements:- **`fn`**: Path to the HDF5 file- **`ds_d`**: Path to the dataset containing singular values- **`ds_u`**: Path to the dataset containing left singular vectors- **`ds_v`**: Path to the dataset containing right singular vectors:::## DetailsThis function implements a block-based SVD algorithm suitable for large matricesthat may not fit in memory. Key features include:- Automatic method selection based on matrix size- Block-based computation for large matrices- Data centering and scaling options- Parallel processing support- Rank approximation through threshold- Memory-efficient incremental algorithmThe implementation uses an incremental algorithm with two key parameters:- k: number of local SVDs to concatenate at each level- q: number of levels in the computation## Examples```{r}#| eval: false#| code-fold: show# Create a sample large matrix in HDF5library(BigDataStatMeth)library(rhdf5)# Create a sample large matrix in HDF5A <-matrix(rnorm(10000), 1000, 10)fn <-"test_temp.hdf5"bdCreate_hdf5_matrix(filename = fn, object = A, group ="data", dataset ="matrix")# Compute SVD with default parametersres <-bdSVD_hdf5(fn, "data", "matrix")# Compute SVD with custom parametersres <-bdSVD_hdf5(fn, "data", "matrix",k =4, q =2,bcenter =TRUE, bscale =TRUE,method ="blocks",threads =4)# list contentsh5ls(res$fn)# Extract the result from HDF5 (d)result_d_hdf5 <-h5read(res$fn, res$ds_d)result_d_hdf5# Compute the same SVD in Rresult_d_r <-svd(A)$dresult_d_r# Compare both results (should be TRUE)all.equal(result_d_hdf5, result_d_r)# Remove fileif (file.exists(fn)) {file.remove(fn)}```## See Also::: {.see-also}- [bdPCA_hdf5](bdPCA_hdf5.html) for Principal Component Analysis- [bdQR_hdf5](bdQR_hdf5.html) for QR decomposition:::