Quick Start
Two Entrypoints
The package ships with two separate entry points so the core stays edge-safe and dependency-free:goldenmatch— edge-safe core. Works in browsers, Cloudflare Workers, Vercel Edge Runtime, Deno, Bun, and Node.goldenmatch/node— adds Node-only features: file I/O (CSV, JSON), HTTP servers, DB connectors.
Core API
dedupe(rows, options)
Deduplicate an array of rows.match(target, reference, options)
Match target records against a reference dataset. Returns matched pairs with confidence scores.scoreStrings(a, b, scorer?)
Score similarity between two strings. Available scorers:exact, jaro_winkler, levenshtein, token_sort, soundex_match, dice, jaccard, ensemble.
applyTransforms(value, transforms)
Apply a chain of normalization transforms to a value.Scorers
All scorers implement the same interface as Pythongoldenmatch.core.scorer:
| Scorer | Use case |
|---|---|
| jaro_winkler | Short strings (names). MARTHA/MARHTA -> 0.9611 |
| levenshtein | Normalized edit distance |
| token_sort | Word reordering tolerant (rapidfuzz-compatible) |
| soundex_match | Phonetic matching (1.0 if same code) |
| ensemble | Weighted combination of jaro_winkler + levenshtein + token_sort + dice |
| dice, jaccard | Set-based similarity for hex-encoded bloom filters (PPRL) |
| embedding | Cosine similarity of embeddings |
| record_embedding | Cosine similarity across whole records |
Blocking Strategies
static— single blocking key with transformsmulti_pass— multiple blocking keys, union of blockssorted_neighborhood— sliding window over sorted dataadaptive— static + auto-split oversized blocksann— approximate nearest neighbor (requireshnswlib-nodepeer dep)canopy— TF-IDF canopy clusteringlearned— data-driven predicate selection
Golden Record Strategies
most_complete— pick longest stringmajority_vote— pick most frequentsource_priority— pick first non-null from priority listmost_recent— pick value with most recent datefirst_non_null— pick first non-null
Transforms
Applied at matchkey time. Same names as the Python toolkit:lowercase, uppercase, strip, strip_all, soundex, metaphone,
digits_only, alpha_only, normalize_whitespace, token_sort,
first_token, last_token, substring:start:end, qgram:n.
CLI
The npm package ships agoldenmatch-js binary:
Servers
MCP server (Claude Desktop / Claude Code)
REST API server
/health, /dedupe, /match, /score, /explain, /profile, /clusters, /reviews.
A2A agent server
/.well-known/agent.json advertises 10 skills.
Interactive TUI
Optional Peer Dependencies
All peer deps are optional. Install only what you need:| Peer dep | Unlocks |
|---|---|
yaml | YAML config file loading |
hnswlib-node | Sub-linear ANN blocking (vs brute-force) |
@huggingface/transformers | ONNX cross-encoder reranking (MiniLM) |
piscina | Worker-thread parallel block scoring |
ink, react, ink-table, ink-select-input, ink-text-input, ink-spinner, ink-gradient | Interactive TUI |
pg | Postgres connector + sync |
@duckdb/node-api | DuckDB connector |
snowflake-sdk | Snowflake connector |
@google-cloud/bigquery | BigQuery connector |
@databricks/sql | Databricks connector |
Advanced Features
- Probabilistic matching — Fellegi-Sunter with Splink-style EM
- PPRL — Privacy-preserving record linkage with SHA-256 bloom filters (3 security levels: standard, high, paranoid)
- Graph ER — Multi-table entity resolution with evidence propagation
- Streaming — Incremental single-record matching
- Memory — Persistent corrections + threshold learning
- Sensitivity analysis — Parameter sweep with CCMS / TWI cluster comparison
- Lineage tracking — Full provenance per field per golden record
Examples
Seepackages/goldenmatch-js/examples/ for 11 full end-to-end TypeScript examples covering dedupe, match, PPRL, streaming, graph ER, Fellegi-Sunter, and more.
Source
- npm: https://www.npmjs.com/package/goldenmatch
- GitHub: https://github.com/benseverndev-oss/goldenmatch/tree/main/packages/goldenmatch-js
Comparison With Python
| Feature | Python | TypeScript |
|---|---|---|
| Core matching | Polars + rapidfuzz | Pure TS |
| Fellegi-Sunter | Yes | Yes |
| PPRL | SHA-256 | SHA-256 (interop verified byte-for-byte) |
| Graph ER | Yes | Yes |
| LLM scorer | Yes | Yes (via fetch, edge-safe) |
| Cross-encoder | sentence-transformers | @huggingface/transformers (ONNX) |
| ANN blocking | FAISS | hnswlib-node |
| Parallel scoring | Threads + Ray | piscina worker threads |
| Interactive UI | Textual TUI | Ink TUI |
| MCP server | 30 tools | 19 tools |
| REST API | Yes | Yes |
| A2A server | Yes | Yes |
| YAML configs | Yes | Yes (round-trippable) |
| Edge-safe core | No | Yes |