bdImportTextFile_hdf5

bdImportTextFile_hdf5

HDF5_IO_MANAGEMENT

1 Description

Converts a text file (e.g., CSV, TSV) to HDF5 format, providing efficient storage and access capabilities.

2 Usage

bdImportTextFile_hdf5(filename, outputfile, outGroup, outDataset, sep = NULL, header = FALSE, rownames = FALSE, overwrite = FALSE, paral = NULL, threads = NULL, overwriteFile = NULL)

3 Arguments

Parameter Description
filename Character string. Path to the input text file.
outputfile Character string. Path to the output HDF5 file.
outGroup Character string. Name of the group to create in HDF5 file.
outDataset Character string. Name of the dataset to create.
sep Character string (optional). Field separator, default is “\t”.
header Logical (optional). Whether first row contains column names.
rownames Logical (optional). Whether first column contains row names.
overwrite Logical (optional). Whether to overwrite existing dataset.
paral Logical (optional). Whether to use parallel processing.
threads Integer (optional). Number of threads for parallel processing.
overwriteFile Logical (optional). Whether to overwrite existing HDF5 file.

4 Value

List with components:

  • fn: Character string with the HDF5 filename
  • ds: Character string with the full dataset path to the imported data (group/dataset)
  • ds_rows: Character string with the full dataset path to the row names
  • ds_cols: Character string with the full dataset path to the column names

5 Details

This function provides flexible text file import capabilities with support for: - Input format options: - Custom field separators - Header row handling - Row names handling - Processing options: - Parallel processing - Memory-efficient import - Configurable thread count - File handling: - Safe file operations - Overwrite protection - Comprehensive error handling

The function supports parallel processing for large files and provides memory-efficient import capabilities.

6 Examples

Code
library(BigDataStatMeth)

# Create a test CSV file
data <- matrix(rnorm(100), 10, 10)
write.csv(data, "test.csv", row.names = FALSE)

# Import to HDF5
bdImportTextFile_hdf5(
  filename = "test.csv",
  outputfile = "output.hdf5",
  outGroup = "data",
  outDataset = "matrix1",
  sep = ",",
  header = TRUE,
  overwriteFile = TRUE
)

# Cleanup
unlink(c("test.csv", "output.hdf5"))

7 See Also