bdImputeSNPs_hdf5

HDF5_STATISTICS

1 Description

Performs imputation of missing values in SNP (Single Nucleotide Polymorphism) data stored in HDF5 format.

2 Usage

bdImputeSNPs_hdf5(filename, group, dataset, outgroup = NULL, outdataset = NULL, bycols = TRUE, paral = NULL, threads = NULL, overwrite = NULL)

3 Arguments

Parameter Description
filename Character string. Path to the HDF5 file.
group Character string. Path to the group containing input dataset.
dataset Character string. Name of the dataset to impute.
outgroup Character string (optional). Output group path. If NULL, uses input group.
outdataset Character string (optional). Output dataset name. If NULL, overwrites input dataset.
bycols Logical (optional). Whether to impute by columns (TRUE) or rows (FALSE). Default is TRUE.
paral Logical (optional). Whether to use parallel processing.
threads Integer (optional). Number of threads for parallel processing.
overwrite Logical (optional). Whether to overwrite existing dataset.

4 Value

List with components:

  • fn: Character string with the HDF5 filename
  • ds: Character string with the full dataset path to the imputed data (group/dataset)

5 Details

This function provides efficient imputation capabilities for genomic data with support for: - Imputation options: - Row-wise or column-wise imputation - Parallel processing - Configurable thread count - Output options: - Custom output location - In-place modification - Overwrite protection - Implementation features: - Memory-efficient processing - Safe file operations - Error handling

The function supports both in-place modification and creation of new datasets.

6 Examples

Code
library(BigDataStatMeth)

# Create test data with missing values
data <- matrix(sample(c(0, 1, 2, NA), 100, replace = TRUE), 10, 10)

# Save to HDF5
fn <- "snp_data.hdf5"
bdCreate_hdf5_matrix(fn, data, "genotype", "snps",
                     overwriteFile = TRUE)

# Impute missing values
bdImputeSNPs_hdf5(
  filename = fn,
  group = "genotype",
  dataset = "snps",
  outgroup = "genotype_imputed",
  outdataset = "snps_complete",
  bycols = TRUE,
  paral = TRUE
)

# Cleanup
if (file.exists(fn)) {
  file.remove(fn)
}

7 See Also