Privacy-preserving record linkage (PPRL) lets two parties find shared entities without exchanging plaintext identifiers. GoldenMatch encodes fields into Bloom filters and matches on those, reaching F1 0.924 on the FEBRL4 benchmark.
import goldenmatch as gm
result = gm.pprl_link("hospital_a.csv", "hospital_b.csv")
print(f"Found {result['match_count']} matches")
Manual config
import goldenmatch as gm
result = gm.pprl_link(
"party_a.csv", "party_b.csv",
fields=["first_name", "last_name", "dob", "zip"],
threshold=0.85,
security_level="high",
)
CLI
goldenmatch pprl link file_a.csv file_b.csv
PPRL reduces but does not eliminate disclosure risk. Choose security_level and field sets with your privacy and compliance requirements in mind.