bdImportTextFile_hdf5

HDF5_IO_MANAGEMENT

1 Description

Converts a text file (e.g., CSV, TSV) to HDF5 format, providing efficient storage and access capabilities.

2 Usage

bdImportTextFile_hdf5(filename, outputfile, outGroup, outDataset, sep = NULL, header = FALSE, rownames = FALSE, overwrite = FALSE, paral = NULL, threads = NULL, overwriteFile = NULL)

3 Arguments

Parameter	Description
`filename`	Character string. Path to the input text file.
`outputfile`	Character string. Path to the output HDF5 file.
`outGroup`	Character string. Name of the group to create in HDF5 file.
`outDataset`	Character string. Name of the dataset to create.
`sep`	Character string (optional). Field separator, default is “\t”.
`header`	Logical (optional). Whether first row contains column names.
`rownames`	Logical (optional). Whether first column contains row names.
`overwrite`	Logical (optional). Whether to overwrite existing dataset.
`paral`	Logical (optional). Whether to use parallel processing.
`threads`	Integer (optional). Number of threads for parallel processing.
`overwriteFile`	Logical (optional). Whether to overwrite existing HDF5 file.

4 Value

List with components:

fn: Character string with the HDF5 filename
ds: Character string with the full dataset path to the imported data (group/dataset)
ds_rows: Character string with the full dataset path to the row names
ds_cols: Character string with the full dataset path to the column names

5 Details

This function provides flexible text file import capabilities with support for: - Input format options: - Custom field separators - Header row handling - Row names handling - Processing options: - Parallel processing - Memory-efficient import - Configurable thread count - File handling: - Safe file operations - Overwrite protection - Comprehensive error handling

The function supports parallel processing for large files and provides memory-efficient import capabilities.

6 Examples

Code

library(BigDataStatMeth)

# Create a test CSV file
data <- matrix(rnorm(100), 10, 10)
write.csv(data, "test.csv", row.names = FALSE)

# Import to HDF5
bdImportTextFile_hdf5(
  filename = "test.csv",
  outputfile = "output.hdf5",
  outGroup = "data",
  outDataset = "matrix1",
  sep = ",",
  header = TRUE,
  overwriteFile = TRUE
)

# Cleanup
unlink(c("test.csv", "output.hdf5"))

7 See Also

bdCreate_hdf5_matrix for creating HDF5 matrices directly

--- title: "bdImportTextFile_hdf5" subtitle: "bdImportTextFile_hdf5" --- <span class="category-badge hdf5_io_management">HDF5_IO_MANAGEMENT</span> ## Description Converts a text file (e.g., CSV, TSV) to HDF5 format, providing efficient storage and access capabilities. ## Usage ```r bdImportTextFile_hdf5(filename, outputfile, outGroup, outDataset, sep = NULL, header = FALSE, rownames = FALSE, overwrite = FALSE, paral = NULL, threads = NULL, overwriteFile = NULL) ``` ## Arguments ::: {.param-table} | Parameter | Description | |-----------|-------------| | `filename` | Character string. Path to the input text file. | | `outputfile` | Character string. Path to the output HDF5 file. | | `outGroup` | Character string. Name of the group to create in HDF5 file. | | `outDataset` | Character string. Name of the dataset to create. | | `sep` | Character string (optional). Field separator, default is "\\t". | | `header` | Logical (optional). Whether first row contains column names. | | `rownames` | Logical (optional). Whether first column contains row names. | | `overwrite` | Logical (optional). Whether to overwrite existing dataset. | | `paral` | Logical (optional). Whether to use parallel processing. | | `threads` | Integer (optional). Number of threads for parallel processing. | | `overwriteFile` | Logical (optional). Whether to overwrite existing HDF5 file. | ::: ## Value ::: {.return-value} List with components: - **`fn`**: Character string with the HDF5 filename - **`ds`**: Character string with the full dataset path to the imported data (group/dataset) - **`ds_rows`**: Character string with the full dataset path to the row names - **`ds_cols`**: Character string with the full dataset path to the column names ::: ## Details This function provides flexible text file import capabilities with support for: - Input format options: - Custom field separators - Header row handling - Row names handling - Processing options: - Parallel processing - Memory-efficient import - Configurable thread count - File handling: - Safe file operations - Overwrite protection - Comprehensive error handling The function supports parallel processing for large files and provides memory-efficient import capabilities. ## Examples ```{r} #| eval: false #| code-fold: show library(BigDataStatMeth) # Create a test CSV file data <- matrix(rnorm(100), 10, 10) write.csv(data, "test.csv", row.names = FALSE) # Import to HDF5 bdImportTextFile_hdf5( filename = "test.csv", outputfile = "output.hdf5", outGroup = "data", outDataset = "matrix1", sep = ",", header = TRUE, overwriteFile = TRUE ) # Cleanup unlink(c("test.csv", "output.hdf5")) ``` ## See Also ::: {.see-also} - [bdCreate_hdf5_matrix](bdCreate_hdf5_matrix.html) for creating HDF5 matrices directly :::