BigDataStatMeth

Scalable Statistical Computing with R, C++, and HDF5

NoteWelcome

This website provides comprehensive educational material for the BigDataStatMeth package. Here you’ll find in-depth explanations, tutorials, and practical examples to help you understand the fundamental concepts and develop new statistical methods for large-scale data analysis.

1 What is BigDataStatMeth?

BigDataStatMeth is an R package that enables scalable statistical computing on datasets that exceed available memory. By combining:

  • HDF5-based storage for disk-backed matrices
  • Block-wise algorithms for memory-efficient computation
  • High-performance C++ backend with parallel processing
  • Dual R/C++ APIs for flexibility and integration

BigDataStatMeth allows you to perform complex statistical analyses on large datasets using standard hardware.

2 What You’ll Learn Here

This documentation goes beyond the API reference to teach you the foundations you need to:

2.1 Learning Objectives

  • Understand why traditional in-memory approaches fail with large datasets
  • Master HDF5 file format and its role in big data computing
  • Grasp block-wise algorithm design and implementation
  • Apply BigDataStatMeth to real-world statistical problems
  • Develop your own scalable statistical methods
  • Integrate BigDataStatMeth into complex analytical workflows

3 Documentation Structure

The documentation is organized as a progressive learning journey:

3.1 Fundamentals

Learn the core concepts that underpin BigDataStatMeth:

3.2 Tutorials

Step-by-step guides to get you started:

3.3 Workflows

Complete examples of implementing statistical methods:

3.4 API Reference

Technical documentation for all functions:

3.5 Technical Details

Advanced topics and optimization:

4 Quick Start

# Install stable version from CRAN
install.packages("BigDataStatMeth")

# Load package
library(BigDataStatMeth)
# Install development version from GitHub
# (requires devtools package)
install.packages("devtools")
devtools::install_github("isglobal-brge/BigDataStatMeth")

# Load package
library(BigDataStatMeth)

4.1 Your First HDF5 Matrix

set.seed(123)
data <- matrix(rnorm(1000 * 500), nrow = 1000, ncol = 500)

bdCreate_hdf5_matrix(
  filename = "my_analysis.hdf5",
  object = data,
  group = "data",
  dataset = "matrix1"
)

# Perform SVD on HDF5 data (without loading into memory)
result <- bdSVD_hdf5(
  filename = "my_analysis.hdf5",
  group = "data",
  dataset = "matrix1",
  k = 10
)

5 Learning Path

We recommend following this sequence:

  1. Start with Fundamentals if you’re new to HDF5 or block-wise computing
  2. Follow the Tutorials for hands-on practice with BigDataStatMeth
  3. Study the Workflows to see complete method implementations
  4. Refer to API Reference when developing your own methods
  5. Explore Technical Details for optimization and advanced usage

6 Getting Help

7 Citation

If you use BigDataStatMeth in your research, please cite:

citation("BigDataStatMeth")

Or use this BibTeX entry:

@Manual{BigDataStatMeth,
  title = {BigDataStatMeth: Scalable Statistical Methods for Big Data},
  author = {Dolors Pelegrí-Sisó and Juan R. González},
  year = {2025},
  note = {R package version 1.0.2},
  url = {https://CRAN.R-project.org/package=BigDataStatMeth},
}

Ready to start? Head to Understanding HDF5 to begin your learning journey!