Skip to main content
The Golden Suite is a polyglot data-quality and entity-resolution toolkit. Each tool stands alone, but together they form a single pipeline: profile your data, standardize it, deduplicate it, and emit golden records. Every package ships zero-config defaults, a CLI, a Python and (mostly) a TypeScript library, and an AI-native surface (MCP server, REST API, agent skills).

Quickstart

Deduplicate a CSV in 30 seconds.

Architecture

How the five tools compose into one pipeline.

GoldenMatch

The headline package: zero-config entity resolution.

Scale envelope

Pick the right backend for your row count.

The pipeline

Raw, messy records enter on the left and leave as clean golden records on the right. You can run the whole chain through GoldenPipe or use any single tool on its own.
ToolRole
InferMapSchema mapping. Auto-aligns columns across heterogeneous sources.
GoldenCheckProfile and validate. Encoding, format, anomaly detection.
GoldenFlowStandardize and transform. Phone, date, address, categorical normalization.
GoldenMatchDedupe, cluster, and survivorship. Fuzzy, exact, probabilistic, and LLM scoring.
GoldenPipeOrchestrator. Wires the tools into one adaptive pipeline.

Packages

GoldenMatch

Zero-config entity resolution for Python and TypeScript.

GoldenCheck

Data-quality scanning that discovers rules automatically.

GoldenFlow

76 transforms across 11 categories for cleaning messy data.

GoldenPipe

One call to chain Check, Flow, and Match.

InferMap

Inference-driven schema mapping with confidence scores.

SQL extensions

Native Postgres and DuckDB fuzzy matching in SQL.

Why Golden Suite

  • Zero-config that beats hand-tuned. GoldenMatch’s introspective auto-config controller reaches F1 0.964 on DBLP-ACM out of the box, above the hand-tuned ceiling of 0.918.
  • Polyglot. Python is the headline runtime; TypeScript runs the same scorers on edge runtimes (Vercel Edge, Cloudflare Workers, Deno); Rust powers the Postgres and DuckDB extensions.
  • AI-native. Every package ships an MCP server (35+ tools across the suite), a REST API, and agent skills.
  • MIT-licensed. Every package in the suite.
Benchmark and scale numbers throughout these docs are quoted from the package READMEs and docs/ in the repository. Re-measure for your own hardware and data before relying on exact figures.