Performs PCA on vertically partitioned data using the privacy-preserving correlation matrix computed via Homomorphic Encryption.
Usage
ds.vertPCA(
data_name = NULL,
variables = NULL,
n_components = NULL,
cor_result = NULL,
log_n = 12,
log_scale = 40,
datasources = NULL
)Arguments
- data_name
Character string. Name of the (aligned) data frame on each server. Ignored if
cor_resultis provided.- variables
A named list where each name corresponds to a server name and each element is a character vector of variable names from that server. Ignored if
cor_resultis provided.- n_components
Integer. Number of principal components to return. Default is NULL (returns all).
- cor_result
An existing
ds.corobject fromds.vertCor. If provided, the MHE protocol is NOT re-run; the PCA is computed directly from this correlation matrix. This avoids running the expensive MHE protocol twice when you already have the correlation.- log_n
Integer. CKKS ring dimension parameter for MHE. Default is 12. Ignored if
cor_resultis provided.- log_scale
Integer. CKKS precision parameter for MHE. Default is 40. Ignored if
cor_resultis provided.- datasources
DataSHIELD connection object or list of connections. If NULL, uses all available connections. Ignored if
cor_resultis provided.
Value
A list with class "ds.pca" containing:
loadings: Matrix of variable loadings (n_vars x n_components)eigenvalues: Eigenvalues for each componentvariance_pct: Percentage of variance explainedcumulative_pct: Cumulative percentage explainedvar_names: Variable namesn_obs: Number of observationscorrelation: The correlation matrix used for PCA
Details
This function performs PCA using the correlation matrix obtained via Multiparty Homomorphic Encryption (MHE). The approach is:
Compute the privacy-preserving correlation matrix using
ds.vertCor(or reuse an existing one via thecor_resultparameter)Perform eigen decomposition on the correlation matrix
Extract loadings (eigenvectors) and eigenvalues
Since PCA on standardized data is equivalent to eigen decomposition of the correlation matrix, this gives correct loadings and variance explained.
Note on scores: This function does NOT return principal component scores because computing scores would require access to the raw data. If you need scores, you would need to compute them on each server using the loadings and aggregate the results (which is a separate operation).
Security
This function inherits all security properties from ds.vertCor:
Individual observations are never exposed
The client cannot decrypt without all servers cooperating
Only aggregate statistics (correlation matrix) are revealed
See also
ds.vertCor for correlation analysis
Examples
if (FALSE) { # \dontrun{
vars <- list(
server1 = c("age", "weight"),
server2 = c("height", "bmi")
)
pca_result <- ds.vertPCA("D_aligned", vars, n_components = 3)
# Or reuse an existing correlation result (avoids running MHE again):
cor_result <- ds.vertCor("D_aligned", vars)
pca_result <- ds.vertPCA(cor_result = cor_result, n_components = 3)
# View variance explained
print(pca_result)
# Biplot of loadings for first two PCs
plot(pca_result$loadings[, 1], pca_result$loadings[, 2],
xlim = c(-1, 1), ylim = c(-1, 1),
xlab = paste0("PC1 (", round(pca_result$variance_pct[1], 1), "%)"),
ylab = paste0("PC2 (", round(pca_result$variance_pct[2], 1), "%)"))
text(pca_result$loadings[, 1], pca_result$loadings[, 2],
labels = pca_result$var_names, pos = 3)
abline(h = 0, v = 0, lty = 2)
} # }