hdf5_import

hdf5_import

HDF5MATRIX_CORE

1 Description

Modern wrapper for importing CSV, TSV, or other delimited text files into HDF5 format. Returns an HDF5Matrix object ready for use.

2 Usage

hdf5_import(...)

3 Arguments

Parameter Description
source Character. Path to local file or URL to import. Supports compressed files (.gz, .tar.gz, .zip, .bz2).
filename Character. Path to HDF5 output file (created if doesn’t exist).
dataset Character. Full dataset path (e.g., “data/imported” or “group/dataset”).
sep Character. Field separator. Default (auto-detect from extension: “,” for .csv, “\t” for .tsv, “\t” otherwise).
header Logical or character vector. If , first row contains column names. If character vector, use these as column names. Default .
rownames Logical or character vector. If , first column contains row names. If character vector, use these as row names. Default .
overwrite Logical. If , overwrite dataset if exists. Default .
parallel Logical. Use parallel processing for import. Default .
threads Integer. Number of threads for parallel processing. Default (uses all available cores).

4 Value

object pointing to the imported data.

5 Details

This function is a modern, user-friendly wrapper around bdImportData_hdf5 and bdImportTextFile_hdf5. It:

Supported formats:

Memory efficiency: Import is done in a streaming fashion, so very large files can be imported without loading them entirely into memory.

6 Examples

\donttest{
csv_file  <- tempfile(fileext = ".csv")
hdf5_file <- tempfile(fileext = ".h5")

# Write sample numeric data
write.table(matrix(rnorm(50), nrow = 10, ncol = 5),
            csv_file, sep = ",", row.names = FALSE, col.names = TRUE)

# Import CSV to HDF5
mat <- hdf5_import(
  source   = csv_file,
  filename = hdf5_file,
  dataset  = "raw/data",
  sep      = ","
)
dim(mat)

hdf5_close_all()
unlink(c(csv_file, hdf5_file))
}

7 See Also

bdImportData_hdf5 for the underlying implementation, hdf5_create_matrix for creating matrices from R objects