Quickstart
Dedupe, match, and write golden records.
Auto-config
How zero-config converges on a defensible config.
Backends and scale
Polars, DuckDB, chunked, and Ray.
CLI reference
Every command and the key flags.
Install
Key features
- 12+ scoring methods: exact, Jaro-Winkler, Levenshtein, token-sort, soundex, ensemble, embedding, record-embedding, dice, jaccard, surname IDF-weighted, and alias-aware.
- Probabilistic matching: Fellegi-Sunter EM-trained m/u probabilities with automatic threshold estimation.
- LLM scorer: scores borderline pairs with budget caps and graceful degradation.
- 8+ blocking strategies: static, adaptive, sorted-neighborhood, multi-pass, ANN, canopy, and learned.
- Golden records: five merge strategies with field-level provenance and quality-weighted survivorship.
- PPRL: privacy-preserving record linkage across organizations (F1 0.924 on FEBRL4).
- Learning Memory: persists steward corrections, unmerges, and LLM votes across runs.
- Interactive TUI, REST API, MCP server (54 tools), A2A agent (31 skills), and a localhost web workbench.
Benchmarks
Zero-config accuracy, quoted from the package README:| Dataset | F1 | Note |
|---|---|---|
| DBLP-ACM (bibliographic) | 0.964 | Hand-tuned ceiling 0.918 |
| Febrl3 (PII) | 0.944 | Zero-config |
| NCVR (voter records) | 0.972 | Zero-config |
| Febrl4 (PII, PPRL) | 0.924 | Bloom-filter PPRL |
| DQBench ER composite | 91.04 | No LLM (v1.12.0) |