GoldenPipe overview

GoldenPipe chains GoldenCheck, GoldenFlow, and GoldenMatch into one pipeline. It profiles your data, conditionally routes it through transformation, deduplicates it (or routes sensitive fields to privacy-preserving matching), and emits golden records. Its logic is adaptive: it validates stage wiring (each stage’s inputs must be produced upstream), skips transformation when no issues are found, and explains the reasoning behind every decision. It runs in Python and TypeScript, on the CLI, and behind MCP, REST, and A2A servers.

Install

pip install goldenpipe
pip install goldenpipe[mcp]   # MCP server mode

npm install goldenpipe        # edge-safe; enableWasm() opts into the Rust planner

Quickstart

import goldenpipe as gp

result = gp.run("customers.csv")

print(result.status)     # PipeStatus.SUCCESS
print(result.stages)     # dict[str, StageResult] — per-stage status + output
print(result.artifacts)  # dict, e.g. {"golden": ..., "manifest": ...}
print(result.reasoning)  # why each decision was made
print(result.skipped)    # stages that were skipped, and why

import { runDf } from "goldenpipe";

const result = await runDf(rows);   // zero-config: scan -> transform -> dedupe
console.log(result.status);         // "success"
console.log(result.artifacts.golden);

On the CLI:

goldenpipe run customers.csv                 # full pipeline
goldenpipe run customers.csv --verbose       # show reasoning
goldenpipe run customers.csv -c pipeline.yml # custom stage config (e.g. skip a stage)
goldenpipe run customers.csv -o golden.csv   # save golden records
goldenpipe stages                            # list registered stages
goldenpipe validate -c pipeline.yml          # dry-run wiring validation
goldenpipe serve | mcp-serve | agent-serve   # REST / MCP / A2A servers

Key features

Orchestrates the full pipeline (Check → Flow → Match) in one call.
Wiring validation that auto-prepends the load stage, checks each stage’s declared consumes against the artifacts produced by earlier stages (in declared order), and raises a typed WiringError when an input isn’t available — or errors on an unknown stage use.
Adaptive logic that skips transformation when there are no quality issues.
Privacy-preserving routing that detects sensitive fields and routes to PPRL.
Reasoning transparency that reports why each stage ran or was skipped.
Column-context enrichment that builds targeted dedupe config from GoldenCheck profiles and column-name heuristics.
Polyglot planner, one source of truth. The planner (stage ordering, decision routing, auto-config, skip_if) is a pyo3-free goldenpipe-core Rust kernel; Python and edge-TS compute it identically, locked byte-for-byte by a CI parity gate. TypeScript opts into the WASM kernel with enableWasm(); pure-TS is the default. Only the planner is in Rust — stage execution and IO stay a per-language host.
Four server surfaces: MCP (goldenpipe mcp-serve, 4 tools), REST (goldenpipe serve), and A2A (goldenpipe agent-serve), plus the local CLI.

GoldenPipe scores 88.07 on the DQBench Pipeline category. See the API surface for every entry point in one place.

Selective stages

Run only part of the pipeline by specifying stages:

from goldenpipe import Pipeline, PipelineConfig, StageSpec

config = PipelineConfig(
    pipeline="check-and-flow-only",
    stages=[
        StageSpec(use="goldencheck.scan"),
        StageSpec(use="goldenflow.transform"),
        # omit goldenmatch.dedupe to skip dedup
    ],
)
result = Pipeline(config=config).run(source="data.csv")

The PipeResult

Pipeline.run() returns a PipeResult, not the output DataFrame:

result.status      # PipeStatus enum: SUCCESS, PARTIAL, FAILED
result.input_rows  # int
result.stages      # dict[str, StageResult]
result.artifacts   # dict[str, Any], e.g. {"manifest": Manifest}
result.errors      # list[str]
result.reasoning   # dict[str, str], why each stage ran or was skipped
result.timing      # dict[str, float]
result.skipped     # list[str]

A few sharp edges from the package docs: PipeResult does not expose the output DataFrame directly. Use gp.run(path) (file-based) rather than gp.run_df(df), since GoldenCheck needs a file extension. And cast mixed-type columns (for example a birth_year that is sometimes an int and sometimes a string) to a single type before dedup, or GoldenMatch will raise.

Servers

GoldenPipe ships three server surfaces. MCP and A2A expose all four operations — list_stages, validate_pipeline, run_pipeline, explain_pipeline; REST exposes list / validate / run (no explain):

MCP — remote https://goldenpipe-mcp-production.up.railway.app/mcp/ or local goldenpipe mcp-serve --transport http --port 8250 (4 tools).
REST — goldenpipe serve (GET /stages, POST /validate, POST /run).
A2A — goldenpipe agent-serve (agent card at /.well-known/agent.json, 4 skills).

Get started

Concepts

GoldenMatch

GoldenCheck

GoldenFlow

GoldenPipe

GoldenAnalysis

InferMap

SQL extensions

Reference

Research

Install

Quickstart

Key features

Selective stages

The PipeResult

Servers

​Install

​Quickstart

​Key features

​Selective stages

​The PipeResult

​Servers

Install

Quickstart

Key features

Selective stages

The PipeResult

Servers