Quickstart
Deduplicate a CSV in 30 seconds.
Architecture
How the five tools compose into one pipeline.
GoldenMatch
The headline package: zero-config entity resolution.
Scale envelope
Pick the right backend for your row count.
The pipeline
Raw, messy records enter on the left and leave as clean golden records on the right. You can run the whole chain through GoldenPipe or use any single tool on its own.| Tool | Role |
|---|---|
| InferMap | Schema mapping. Auto-aligns columns across heterogeneous sources. |
| GoldenCheck | Profile and validate. Encoding, format, anomaly detection. |
| GoldenFlow | Standardize and transform. Phone, date, address, categorical normalization. |
| GoldenMatch | Dedupe, cluster, and survivorship. Fuzzy, exact, probabilistic, and LLM scoring. |
| GoldenPipe | Orchestrator. Wires the tools into one adaptive pipeline. |
Packages
GoldenMatch
Zero-config entity resolution for Python and TypeScript.
GoldenCheck
Data-quality scanning that discovers rules automatically.
GoldenFlow
76 transforms across 11 categories for cleaning messy data.
GoldenPipe
One call to chain Check, Flow, and Match.
InferMap
Inference-driven schema mapping with confidence scores.
SQL extensions
Native Postgres and DuckDB fuzzy matching in SQL.
Why Golden Suite
- Zero-config that beats hand-tuned. GoldenMatch’s introspective auto-config controller reaches F1 0.964 on DBLP-ACM out of the box, above the hand-tuned ceiling of 0.918.
- Polyglot. Python is the headline runtime; TypeScript runs the same scorers on edge runtimes (Vercel Edge, Cloudflare Workers, Deno); Rust powers the Postgres and DuckDB extensions.
- AI-native. Every package ships an MCP server (35+ tools across the suite), a REST API, and agent skills.
- MIT-licensed. Every package in the suite.
Benchmark and scale numbers throughout these docs are quoted from the package READMEs and
docs/ in the repository. Re-measure for your own hardware and data before relying on exact figures.