Skip to main content
pip install goldenmatch
Requires Python 3.11 or later. Core dependencies: Polars, RapidFuzz, Typer, Pydantic, Textual.

Native acceleration (default on common platforms)

pip install goldenmatch now pulls the compiled goldenmatch-native kernel automatically on macOS (x86_64 + arm64), Linux (x86_64 + aarch64), and Windows (amd64). No [native] extra is needed; the [native] extra still works as a back-compat alias. The kernel accelerates the hot paths (block scoring, record fingerprints, cluster build, dedup-pairs) and lets the v3 planner pick the bucket+native backend as the suggested path up to 750k rows on common boxes. The pure-Python + Polars pipeline is the byte-for-byte reference and the automatic fallback on any platform without a prebuilt wheel. Alpine/musl users may not get a prebuilt wheel yet (a musllinux wheel is a separate follow-up) and degrade gracefully to pure-Python. Opt out with environment variables:
GOLDENMATCH_NATIVE=0            # disable the native kernel entirely
GOLDENMATCH_PLANNER_BUCKET=0    # force the polars-direct backend

Optional extras

pip install goldenmatch[embeddings]     # sentence-transformers + FAISS
pip install goldenmatch[llm]            # Claude/OpenAI for LLM scoring
pip install goldenmatch[postgres]       # PostgreSQL database sync
pip install goldenmatch[snowflake]      # Snowflake connector
pip install goldenmatch[bigquery]       # BigQuery connector
pip install goldenmatch[databricks]     # Databricks connector
pip install goldenmatch[salesforce]     # Salesforce connector
pip install goldenmatch[duckdb]         # DuckDB out-of-core backend
pip install goldenmatch[quality]        # GoldenCheck data quality scanning
pip install goldenmatch[ray]            # Ray distributed backend
Install multiple extras at once:
pip install goldenmatch[embeddings,llm,postgres]

Docker

docker pull ghcr.io/benseverndev-oss/goldenmatch:latest

# Run a dedupe
docker run --rm -v $(pwd):/data ghcr.io/benseverndev-oss/goldenmatch:latest \
    dedupe /data/customers.csv --output-dir /data/results

# Start the REST API
docker run --rm -p 8080:8080 -v $(pwd):/data ghcr.io/benseverndev-oss/goldenmatch:latest \
    serve --file /data/customers.csv --port 8080

PostgreSQL Extension

Pre-built packages for the SQL extension (separate from the Python package):
# Debian/Ubuntu
sudo dpkg -i goldenmatch-pg-0.7.0-pg16-amd64.deb
sudo systemctl restart postgresql

# RHEL/Fedora
sudo rpm -i goldenmatch-pg-0.7.0-pg16.x86_64.rpm
sudo systemctl restart postgresql
Download .deb and .rpm from the goldenmatch releases page (look for the goldenmatch-pg-v* tags).

Verifying release artifacts

Releases since goldenmatch-pg v0.6.0 ship a .sigstore bundle alongside each tarball. Releases published after 2026-06-05 also ship an .intoto.jsonl build-provenance attestation next to each asset. Verify a tarball with cosign (keyless, GitHub Actions OIDC):
cosign verify-blob \
  --bundle goldenmatch_pg-0.7.0-pg17-linux-x86_64.tar.gz.sigstore \
  goldenmatch_pg-0.7.0-pg17-linux-x86_64.tar.gz \
  --certificate-identity-regexp 'github.com/benseverndev-oss/goldenmatch' \
  --certificate-oidc-issuer https://token.actions.githubusercontent.com
A successful run prints Verified OK. The .sigstore bundle contains the certificate, transparency-log entry, and signature in a single file — no separate key management required.
Install cosign with brew install cosign (macOS) or go install github.com/sigstore/cosign/v2/cmd/cosign@latest (any platform).

DuckDB UDFs

pip install goldenmatch-duckdb
import duckdb, goldenmatch_duckdb

con = duckdb.connect()
goldenmatch_duckdb.register(con)
con.sql("SELECT goldenmatch_score('John', 'Jon', 'jaro_winkler')")

dbt Integration

pip install dbt-goldensuite
The dbt-goldensuite package provides macros for running entity resolution inside dbt pipelines using DuckDB.

Verify installation

goldenmatch --version
# goldenmatch 1.1.1

goldenmatch demo
# Runs a built-in demo with sample data
import goldenmatch as gm
print(gm.__version__)   # "1.1.1"

Environment variables

VariablePurpose
OPENAI_API_KEYLLM scorer and LLM boost (OpenAI)
ANTHROPIC_API_KEYLLM scorer (Claude)
DATABASE_URLPostgreSQL connection string for sync / watch
GOOGLE_APPLICATION_CREDENTIALSVertex AI embeddings (GCP service account)

Setup wizard

Run the interactive wizard to configure GPU mode, API keys, and database connections:
goldenmatch setup
The wizard guides you through:
  • GPU mode selection (CPU, CUDA, MPS, Vertex AI, Colab)
  • LLM API key configuration
  • PostgreSQL connection setup
  • Saved preferences at ~/.goldenmatch/settings.yaml