---title: "bdRemoveMAF_hdf5"subtitle: "bdRemoveMAF_hdf5"---<span class="category-badge hdf5_statistics">HDF5_STATISTICS</span>## DescriptionFilters SNPs (Single Nucleotide Polymorphisms) based on Minor AlleleFrequency (MAF) in genomic data stored in HDF5 format.## Usage```rbdRemoveMAF_hdf5(filename, group, dataset, outgroup, outdataset, maf, bycols, blocksize, overwrite =NULL)```## Arguments::: {.param-table}| Parameter | Description ||-----------|-------------||`filename`| Character string. Path to the HDF5 file. ||`group`| Character string. Path to the group containing input dataset. ||`dataset`| Character string. Name of the dataset to filter. ||`outgroup`| Character string. Output group path for filtered data. ||`outdataset`| Character string. Output dataset name for filtered data. ||`maf`| Numeric (optional). MAF threshold for filtering (0-1). Default is 0.05. SNPs with MAF above this threshold are removed. ||`bycols`| Logical (optional). Whether to process by columns (TRUE) or rows (FALSE). Default is FALSE. ||`blocksize`| Integer (optional). Block size for processing. Default is 100. Larger values use more memory but may be faster. ||`overwrite`| Logical (optional). Whether to overwrite existing dataset. Default is FALSE. |:::## Value::: {.return-value}List with components. If an error occurs, all string values are returned as empty strings (""):- **`fn`**: Character string with the HDF5 filename- **`ds`**: Character string with the full dataset path to the filtered dataset (group/dataset)- **`nremoved`**: Integer with the number of SNPs removed due to low Minor Allele Frequency (MAF):::## DetailsThis function provides efficient MAF-based filtering capabilities with:- Filtering options: - MAF threshold-based filtering - Row-wise or column-wise processing - Block-based processing- Implementation features: - Memory-efficient processing - Block-based operations - Safe file operations - Progress reportingThe function supports both in-place modification and creation of new datasets.## Examples```{r}#| eval: false#| code-fold: showlibrary(BigDataStatMeth)# Create test SNP datasnps <-matrix(sample(c(0, 1, 2), 1000, replace =TRUE,prob =c(0.7, 0.2, 0.1)), 100, 10)# Save to HDF5fn <-"snp_data.hdf5"bdCreate_hdf5_matrix(fn, snps, "genotype", "raw_snps",overwriteFile =TRUE)# Remove SNPs with high MAFbdRemoveMAF_hdf5(filename = fn,group ="genotype",dataset ="raw_snps",outgroup ="genotype_filtered",outdataset ="filtered_snps",maf =0.1,bycols =TRUE,blocksize =50)# Cleanupif (file.exists(fn)) {file.remove(fn)}```## See Also::: {.see-also}- [bdRemovelowdata_hdf5](bdRemovelowdata_hdf5.html) for removing low-representation SNPs- [bdImputeSNPs_hdf5](bdImputeSNPs_hdf5.html) for imputing missing SNP values:::