Character string. Path to the group containing input dataset.
dataset
Character string. Name of the dataset to split.
outgroup
Character string (optional). Output group path. If NULL, uses input group.
outdataset
Character string (optional). Base name for output datasets. If NULL, uses input dataset name with block number suffix.
nblocks
Integer (optional). Number of blocks to split into. Mutually exclusive with blocksize.
blocksize
Integer (optional). Size of each block. Mutually exclusive with nblocks.
bycols
Logical (optional). Whether to split by columns (TRUE) or rows (FALSE). Default is TRUE.
overwrite
Logical (optional). Whether to overwrite existing datasets. Default is FALSE.
4 Value
List with components. If an error occurs, all string values are returned as empty strings (““):
fn: Character string with the HDF5 filename
ds: Character string with the output group path where the split datasets are stored. Multiple datasets are created in this location named as <outdataset>.1, <outdataset>.2, etc.
The function supports two splitting strategies: 1. By number of blocks: Splits the dataset into a specified number of roughly equal-sized blocks 2. By block size: Splits the dataset into blocks of a specified size
6 Examples
Code
library(BigDataStatMeth)# Create test datadata <-matrix(rnorm(1000), 100, 10)# Save to HDF5fn <-"test.hdf5"bdCreate_hdf5_matrix(fn, data, "data", "matrix1",overwriteFile =TRUE)# Split by number of blocksbdSplit_matrix_hdf5(filename = fn,group ="data",dataset ="matrix1",outgroup ="data_split",outdataset ="block",nblocks =4,bycols =TRUE)# Split by block sizebdSplit_matrix_hdf5(filename = fn,group ="data",dataset ="matrix1",outgroup ="data_split2",outdataset ="block",blocksize =25,bycols =TRUE)# Cleanupif (file.exists(fn)) {file.remove(fn)}
---title: "bdSplit_matrix_hdf5"subtitle: "bdSplit_matrix_hdf5"---<span class="category-badge hdf5_io_management">HDF5_IO_MANAGEMENT</span>## DescriptionSplits a large dataset in an HDF5 file into smaller submatrices, withsupport for both row-wise and column-wise splitting.## Usage```rbdSplit_matrix_hdf5(filename, group, dataset, outgroup =NULL, outdataset =NULL, nblocks =NULL, blocksize =NULL, bycols =TRUE, overwrite =FALSE)```## Arguments::: {.param-table}| Parameter | Description ||-----------|-------------||`filename`| Character string. Path to the HDF5 file. ||`group`| Character string. Path to the group containing input dataset. ||`dataset`| Character string. Name of the dataset to split. ||`outgroup`| Character string (optional). Output group path. If NULL, uses input group. ||`outdataset`| Character string (optional). Base name for output datasets. If NULL, uses input dataset name with block number suffix. ||`nblocks`| Integer (optional). Number of blocks to split into. Mutually exclusive with blocksize. ||`blocksize`| Integer (optional). Size of each block. Mutually exclusive with nblocks. ||`bycols`| Logical (optional). Whether to split by columns (TRUE) or rows (FALSE). Default is TRUE. ||`overwrite`| Logical (optional). Whether to overwrite existing datasets. Default is FALSE. |:::## Value::: {.return-value}List with components. If an error occurs, all string values are returned as empty strings (""):- **`fn`**: Character string with the HDF5 filename- **`ds`**: Character string with the output group path where the split datasets are stored. Multiple datasets are created in this location named as \<outdataset\>.1, \<outdataset\>.2, etc.:::## DetailsThis function provides efficient dataset splitting capabilities with:- Splitting options: - Row-wise or column-wise splitting - Fixed block size splitting - Fixed block count splitting- Implementation features: - Memory-efficient processing - Block-based operations - Safe file operations - Progress reportingThe function supports two splitting strategies:1. By number of blocks: Splits the dataset into a specified number of roughly equal-sized blocks2. By block size: Splits the dataset into blocks of a specified size## Examples```{r}#| eval: false#| code-fold: showlibrary(BigDataStatMeth)# Create test datadata <-matrix(rnorm(1000), 100, 10)# Save to HDF5fn <-"test.hdf5"bdCreate_hdf5_matrix(fn, data, "data", "matrix1",overwriteFile =TRUE)# Split by number of blocksbdSplit_matrix_hdf5(filename = fn,group ="data",dataset ="matrix1",outgroup ="data_split",outdataset ="block",nblocks =4,bycols =TRUE)# Split by block sizebdSplit_matrix_hdf5(filename = fn,group ="data",dataset ="matrix1",outgroup ="data_split2",outdataset ="block",blocksize =25,bycols =TRUE)# Cleanupif (file.exists(fn)) {file.remove(fn)}```## See Also::: {.see-also}- [bdCreate_hdf5_matrix](bdCreate_hdf5_matrix.html) for creating HDF5 matrices:::