---
title: "bdCorr_hdf5"
subtitle: "Compute correlation matrix for matrices stored in HDF5 format"
---
<span class="category-badge hdf5_statistics">HDF5_STATISTICS</span>
## Description
This function computes Pearson or Spearman correlation matrix
for matrices
stored in HDF5 format. It automatically detects whether to compute:
\itemize{
\item Single matrix correlation cor(X) - when only dataset_x is provided
\item Cross-matrix correlation cor(X,Y) - when both dataset_x and
dataset_y are provided
}
It automatically selects between direct computation for small matrices and
block-wise processing for large matrices to optimize memory usage and
performance.
Correlation types supported:
\itemize{
\item Single matrix: cor(X) when only dataset_x provided
\item Single matrix transposed: cor(t(X)) when trans_x=TRUE
\item Cross-correlation: cor(X,Y) when both datasets provided
\item Cross with transpose: cor(t(X),Y), cor(X,t(Y)), cor(t(X),t(Y))
}
For omics data analysis:
\itemize{
\item trans_x=FALSE, trans_y=FALSE: Variables vs Variables
(genes vs genes, CpGs vs CpGs)
\item trans_x=TRUE, trans_y=FALSE: Samples vs Variables
(individuals vs genes)
\item trans_x=FALSE, trans_y=TRUE: Variables vs Samples
(genes vs individuals)
\item trans_x=TRUE, trans_y=TRUE: Samples vs Samples
(individuals vs individuals) - optimized to cor(X,Y)
}
## Usage
```r
bdCorr_hdf5(filename_x, group_x, dataset_x, filename_y = "", group_y = "", dataset_y = "", trans_x = FALSE, trans_y = FALSE, method = "pearson", use_complete_obs = TRUE, compute_pvalues = TRUE, block_size = 1000L, overwrite = FALSE, output_filename = "", output_group = "", output_dataset_corr = "", output_dataset_pval = "", threads = -1L)
```
## Arguments
::: {.param-table}
| Parameter | Description |
|-----------|-------------|
| `filename_x` | Character string with the path to the HDF5 file containing matrix X |
| `group_x` | Character string indicating the group containing matrix X |
| `dataset_x` | Character string indicating the dataset name of matrix X |
| `filename_y` | Character string with the path to the HDF5 file containing matrix Y (optional, default: "") |
| `group_y` | Character string indicating the group containing matrix Y (optional, default: "") |
| `dataset_y` | Character string indicating the dataset name of matrix Y (optional, default: "") |
| `trans_x` | Logical, whether to transpose matrix X (default: FALSE) |
| `trans_y` | Logical, whether to transpose matrix Y (default: FALSE, ignored for single matrix) |
| `method` | Character string indicating correlation method ("pearson" or "spearman", default: "pearson") |
| `use_complete_obs` | Logical, whether to use only complete observations (default: TRUE) |
| `compute_pvalues` | Logical, whether to compute p-values for correlations (default: TRUE) |
| `block_size` | Integer, block size for large matrix processing (default: 1000) |
| `overwrite` | Logical, whether to overwrite existing results (default: FALSE) |
| `output_filename` | Character string, output HDF5 file (default: same as filename_x) |
| `output_group` | Character string, custom output group name (default: auto-generated) |
| `output_dataset_corr` | Character string, custom correlation dataset name (default: "correlation") |
| `output_dataset_pval` | Character string, custom p-values dataset name (default: "pvalues") |
| `threads` | Integer, number of threads for parallel computation (optional, default: auto) |
:::
## Value
::: {.return-value}
List with components:
- **`fn`**: Character string with the HDF5 filename
- **`ds`**: Character string with the full dataset path to the correlation matrix (group/dataset)
:::
## Examples
```{r}
#| eval: false
#| code-fold: show
# Backward compatible - existing code works unchanged
result_original <- bdCorr_hdf5("data.h5", "expression", "genes")
# New transpose functionality
# Gene-gene correlations (variables)
gene_corr <- bdCorr_hdf5("omics.h5", "expression", "genes", trans_x = FALSE)
# Sample-sample correlations (individuals)
sample_corr <- bdCorr_hdf5("omics.h5", "expression", "genes", trans_x = TRUE)
# Cross-correlation: genes vs methylation sites (variables vs variables)
cross_vars <- bdCorr_hdf5("omics.h5", "expression", "genes",
"omics.h5", "methylation", "cpg_sites",
trans_x = FALSE, trans_y = FALSE)
# Cross-correlation: samples vs methylation sites (samples vs variables)
samples_vs_cpg <- bdCorr_hdf5("omics.h5", "expression", "genes",
"omics.h5", "methylation", "cpg_sites",
trans_x = TRUE, trans_y = FALSE)
```