pip (recommended)
Requires Python 3.11 or later. Core dependencies: Polars, RapidFuzz, Typer, Pydantic, Textual.
pip install goldenmatch now pulls the compiled goldenmatch-native kernel automatically on macOS (x86_64 + arm64), Linux (x86_64 + aarch64), and Windows (amd64). No [native] extra is needed; the [native] extra still works as a back-compat alias. The kernel accelerates the hot paths (block scoring, record fingerprints, cluster build, dedup-pairs) and lets the v3 planner pick the bucket+native backend as the suggested path up to 750k rows on common boxes.
The pure-Python + Polars pipeline is the byte-for-byte reference and the automatic fallback on any platform without a prebuilt wheel. Alpine/musl users may not get a prebuilt wheel yet (a musllinux wheel is a separate follow-up) and degrade gracefully to pure-Python.
Opt out with environment variables:
GOLDENMATCH_NATIVE=0 # disable the native kernel entirely
GOLDENMATCH_PLANNER_BUCKET=0 # force the polars-direct backend
pip install goldenmatch[embeddings] # sentence-transformers + FAISS
pip install goldenmatch[llm] # Claude/OpenAI for LLM scoring
pip install goldenmatch[postgres] # PostgreSQL database sync
pip install goldenmatch[snowflake] # Snowflake connector
pip install goldenmatch[bigquery] # BigQuery connector
pip install goldenmatch[databricks] # Databricks connector
pip install goldenmatch[salesforce] # Salesforce connector
pip install goldenmatch[duckdb] # DuckDB out-of-core backend
pip install goldenmatch[quality] # GoldenCheck data quality scanning
pip install goldenmatch[ray] # Ray distributed backend
Install multiple extras at once:
pip install goldenmatch[embeddings,llm,postgres]
Docker
docker pull ghcr.io/benseverndev-oss/goldenmatch:latest
# Run a dedupe
docker run --rm -v $(pwd):/data ghcr.io/benseverndev-oss/goldenmatch:latest \
dedupe /data/customers.csv --output-dir /data/results
# Start the REST API
docker run --rm -p 8080:8080 -v $(pwd):/data ghcr.io/benseverndev-oss/goldenmatch:latest \
serve --file /data/customers.csv --port 8080
PostgreSQL Extension
Pre-built packages for the SQL extension (separate from the Python package):
# Debian/Ubuntu
sudo dpkg -i goldenmatch-pg-0.7.0-pg16-amd64.deb
sudo systemctl restart postgresql
# RHEL/Fedora
sudo rpm -i goldenmatch-pg-0.7.0-pg16.x86_64.rpm
sudo systemctl restart postgresql
Download .deb and .rpm from the goldenmatch releases page (look for the goldenmatch-pg-v* tags).
Verifying release artifacts
Releases since goldenmatch-pg v0.6.0 ship a .sigstore bundle alongside
each tarball. Releases published after 2026-06-05 also ship an
.intoto.jsonl build-provenance attestation next to each asset.
Verify a tarball with cosign (keyless, GitHub Actions OIDC):
cosign verify-blob \
--bundle goldenmatch_pg-0.7.0-pg17-linux-x86_64.tar.gz.sigstore \
goldenmatch_pg-0.7.0-pg17-linux-x86_64.tar.gz \
--certificate-identity-regexp 'github.com/benseverndev-oss/goldenmatch' \
--certificate-oidc-issuer https://token.actions.githubusercontent.com
A successful run prints Verified OK. The .sigstore bundle contains the
certificate, transparency-log entry, and signature in a single file — no
separate key management required.
Install cosign with brew install cosign (macOS) or
go install github.com/sigstore/cosign/v2/cmd/cosign@latest (any platform).
DuckDB UDFs
pip install goldenmatch-duckdb
import duckdb, goldenmatch_duckdb
con = duckdb.connect()
goldenmatch_duckdb.register(con)
con.sql("SELECT goldenmatch_score('John', 'Jon', 'jaro_winkler')")
dbt Integration
pip install dbt-goldensuite
The dbt-goldensuite package provides macros for running entity resolution inside dbt pipelines using DuckDB.
Verify installation
goldenmatch --version
# goldenmatch 1.1.1
goldenmatch demo
# Runs a built-in demo with sample data
import goldenmatch as gm
print(gm.__version__) # "1.1.1"
Environment variables
| Variable | Purpose |
|---|
OPENAI_API_KEY | LLM scorer and LLM boost (OpenAI) |
ANTHROPIC_API_KEY | LLM scorer (Claude) |
DATABASE_URL | PostgreSQL connection string for sync / watch |
GOOGLE_APPLICATION_CREDENTIALS | Vertex AI embeddings (GCP service account) |
Setup wizard
Run the interactive wizard to configure GPU mode, API keys, and database connections:
The wizard guides you through:
- GPU mode selection (CPU, CUDA, MPS, Vertex AI, Colab)
- LLM API key configuration
- PostgreSQL connection setup
- Saved preferences at
~/.goldenmatch/settings.yaml