Golden Suite

The Golden Suite is a polyglot data-quality and entity-resolution toolkit. Each tool stands alone, but together they form a single pipeline: profile your data, standardize it, deduplicate it, and emit golden records. Every package ships zero-config defaults, a CLI, a Python and a TypeScript library, and an AI-native surface (MCP server, and — for the service-shaped packages — a REST API and agent skills).

Where do I even begin?

Six tools is a lot. Find the sentence that sounds like your problem and start there — each tool works on its own, so you only pick up the ones you need.

Your situation	Start with	Go to
”I have duplicate rows and want one clean record per entity”	GoldenMatch — dedupe	Quickstart
”I need to match records across two sources (CRM ↔ billing)“	GoldenMatch — match	GoldenMatch
”I don’t know what’s wrong with my data yet”	GoldenCheck — profile + scan	GoldenCheck
”My formats are a mess — phones, dates, addresses, casing”	GoldenFlow — transform	GoldenFlow
”My source columns don’t line up with my target schema”	InferMap — schema mapping	InferMap
”I want the whole clean → standardize → dedupe flow in one call”	GoldenPipe — orchestrate	GoldenPipe
”I can’t share raw PII but need to link across parties”	GoldenMatch — PPRL	Privacy-preserving linkage
”I want to match inside my database, in SQL”	SQL extensions — Postgres / DuckDB	SQL extensions
”I need to track match quality and catch regressions across runs”	GoldenAnalysis — reporting	GoldenAnalysis

Still not sure? Two safe defaults: run GoldenCheck first — it profiles your data and tells you what needs fixing, then points you at the right tool. Or run GoldenPipe, which chains the whole flow and adaptively skips the steps your data doesn’t need.

Start where you fit

Developers

Install and dedupe a CSV in 30 seconds from Python, TypeScript, or the CLI.

No code

Point-and-click in the browser workbench — edit rules, review matches, label pairs.

Researchers

Reproduce the benchmarks, read the methodology + honest framing, and cite the work.

Quickstart

Deduplicate a CSV in 30 seconds.

Architecture

How the six tools compose into one pipeline.

GoldenMatch

The headline package: zero-config entity resolution.

Scale envelope

Pick the right backend for your row count.

API surface

Every entry point in one place — Python, TypeScript, CLI, MCP, REST, and agent skills across all six packages.

The pipeline

Raw, messy records enter on the left and leave as clean golden records on the right. You can run the whole chain through GoldenPipe or use any single tool on its own.

Tool	Role
InferMap	Schema mapping. Auto-aligns columns across heterogeneous sources.
GoldenCheck	Profile and validate. Encoding, format, anomaly detection.
GoldenFlow	Standardize and transform. Phone, date, address, categorical normalization.
GoldenMatch	Dedupe, cluster, and survivorship. Fuzzy, exact, probabilistic, and LLM scoring.
GoldenPipe	Orchestrator. Wires the tools into one adaptive pipeline.
GoldenAnalysis	Cross-cutting reporting. Read-only metrics, trend, and regression detection over any stage’s outputs.

Packages

GoldenMatch

Zero-config entity resolution for Python and TypeScript.

GoldenCheck

Data-quality scanning that discovers rules automatically.

GoldenFlow

92 transforms across 11 categories for cleaning messy data.

GoldenPipe

One call to chain Check, Flow, and Match.

GoldenAnalysis

Read-only metrics, trend, and regression reporting over any run.

InferMap

Inference-driven schema mapping with confidence scores.

SQL extensions

Native Postgres and DuckDB fuzzy matching in SQL.

Why Golden Suite

Zero-config that beats hand-tuned. GoldenMatch’s introspective auto-config controller reaches F1 0.964 on DBLP-ACM out of the box, above the hand-tuned ceiling of 0.918.
Polyglot. Python is the headline runtime; TypeScript runs the same scorers on edge runtimes (Vercel Edge, Cloudflare Workers, Deno); Rust powers the Postgres and DuckDB extensions.
AI-native. Every package ships an MCP server (~110 tools across the suite), and the service-shaped packages add a REST API and agent skills.
MIT-licensed. Every package in the suite.

Benchmark and scale numbers throughout these docs are quoted from the package READMEs and docs/ in the repository. Re-measure for your own hardware and data before relying on exact figures.

Get started

Concepts

GoldenMatch

GoldenCheck

GoldenFlow

GoldenPipe

GoldenAnalysis

InferMap

SQL extensions

Reference

Research

Where do I even begin?

Start where you fit

Developers

No code

Researchers

Quickstart

Architecture

GoldenMatch

Scale envelope

API surface

The pipeline

Packages

GoldenMatch

GoldenCheck

GoldenFlow

GoldenPipe

GoldenAnalysis

InferMap

SQL extensions

Why Golden Suite

​Where do I even begin?

​Start where you fit

Developers

No code

Researchers

Quickstart

Architecture

GoldenMatch

Scale envelope

API surface

​The pipeline

​Packages

GoldenMatch

GoldenCheck

GoldenFlow

GoldenPipe

GoldenAnalysis

InferMap

SQL extensions

​Why Golden Suite

Where do I even begin?

Start where you fit

The pipeline

Packages

Why Golden Suite