\donttest{
fn <- tempfile(fileext = ".h5")
snps <- matrix(sample(c(0, 1, 2, NA), 200, replace = TRUE,
prob = c(.25, .25, .25, .25)), 20, 10)
X <- hdf5_create_matrix(fn, "geno/raw", data = snps)
# Filter with auto output path (adds "_filtered" suffix)
out <- filter_low_coverage(X, pcent = 0.1)
# Filter with explicit output
out2 <- filter_low_coverage(X, out_group = "geno",
out_dataset = "filtered", overwrite = TRUE)
hdf5_close_all()
unlink(fn)
}filter_low_coverage
filter_low_coverage
OMICS
1 Description
Removes columns (SNPs) or rows (samples) whose proportion of missing values (NAs) exceeds pcent. Writes result to a new dataset.
When out_group/out_dataset are NULL (default) the result is written alongside the input dataset with the suffix "_filtered".
2 Usage
filter_low_coverage(x, ...)3 Arguments
| Parameter | Description |
|---|---|
x |
An containing SNP data. |
out_group |
Output group. (default) = same group as input. |
out_dataset |
Output dataset name. (default) = input name + . |
pcent |
Numeric in [0,1]. Maximum allowed NA proportion (default ). Features above this are removed. |
by_cols |
Logical. Filter columns (, default) or rows. |
overwrite |
Logical. Overwrite existing output. Default . |
4 Value
pointing to the filtered dataset.