Privacy-preserving record alignment using Elliptic Curve Diffie-Hellman Private Set Intersection (ECDH-PSI) with blind-relay transport encryption. Aligns data frames across vertically partitioned DataSHIELD servers so that rows correspond to the same individuals. The client never sees raw EC points — only opaque encrypted blobs.
Arguments
- data_name
Character string. Name of the data frame on each server.
- id_col
Character string. Name of the identifier column.
- newobj
Character string. Name for the aligned data frame on servers. Default is
"D_aligned".- ref_server
Character string or NULL. Name of the reference server. If NULL (default), the first connection is used.
- datasources
DataSHIELD connection object or list of connections. If NULL, uses all available connections.
Value
Invisibly returns a list with alignment statistics for each server:
n_matched: Number of records matchedn_total: Number of records on that server
Details
This function performs privacy-preserving record alignment in a single call, using ECDH-PSI with blind-relay transport encryption.
Protocol overview
ECDH-PSI exploits the commutativity of elliptic curve scalar multiplication: \(\alpha \cdot (\beta \cdot H(id)) = \beta \cdot (\alpha \cdot H(id))\).
All EC point exchanges are encrypted server-to-server (X25519 + AES-256-GCM ECIES). The client acts as a blind relay, seeing only opaque blobs.
Phase 0: Each server generates an X25519 transport keypair. Public keys are exchanged via the client.
Phase 1: The reference server masks IDs with scalar \(\alpha\). Points are stored server-side (not returned to client).
For each target server:
The reference encrypts masked points under the target's PK.
The target decrypts, generates scalar \(\beta\), double-masks ref points (stores locally), masks own IDs, encrypts them under the ref's PK.
The reference decrypts, double-masks with \(\alpha\), encrypts result under target's PK.
The target decrypts, matches double-masked sets, aligns data.
A multi-server intersection ensures only records present on ALL servers are retained.
See also
ds.vertCor, ds.vertGLM for analysis
functions that operate on aligned data.