goldenmatch-extensions package runs GoldenMatch directly from SQL, without leaving the database. It ships a pgrx-based PostgreSQL extension and a DuckDB UDF package.
PostgreSQL
Install
The fastest path is the prebuilt Docker image with the extension preinstalled:Functions
DuckDB
| Function | Purpose |
|---|---|
goldenmatch_score(a, b, scorer) | Score two strings. |
goldenmatch_score_pair(rec_a, rec_b, config) | Score two JSON records. |
goldenmatch_explain(rec_a, rec_b, config) | Explain a match. |
goldenmatch_dedupe_table(table, config) | Deduplicate a table. |
goldenmatch_match_tables(target, ref, config) | Match two tables. |
goldenmatch_dedupe(json, config) | Deduplicate JSON records directly. |
goldenmatch_match(target_json, ref_json, config) | Match two JSON record sets. |
goldenmatch_connected_components(...) | Group a candidate-pair graph into entities. |
goldenmatch_pair_dedup(...) | Keep the best score per canonical pair. |
goldenmatch_embed_local(text, model_path) | Embed text with a local in-house model. |
gm_embed(text) (PostgreSQL) | Embed text with the in-house model, dir from GOLDENEMBED_MODEL_DIR. |
Graph and embedding kernels
These run native-direct in pure Rust, with no CPython round-trip. They expose GoldenMatch’s clustering primitives and the local embedder directly in SQL, on both backends (and as DataFusion FFI UDFs). One shared kernel backs all surfaces, so results are identical across them.Connected components and pair dedupe
goldenmatch_connected_components groups a candidate-pair graph into entities, one component per entity, with singletons included. goldenmatch_pair_dedup canonicalizes a candidate-pair set and keeps the best score per pair. Both take the edge columns as lists. Pass integer record ids to the bare name, or string ids to the _str sibling.
Local embedding
goldenmatch_embed_local embeds text with a saved in-house model through the goldenembed ONNX runtime. No network and no API key. model_path is a directory holding config.json and model.onnx.
gm_embed(text) is a one-argument convenience that reads the model directory from the GOLDENEMBED_MODEL_DIR environment variable instead of taking it per call, and returns real[] (float4) to match the DataFusion goldenmatch_embed UDF. The model loads once per backend process and is cached. A NULL input embeds the empty string rather than returning NULL.
PostgreSQL
The DuckDB embedding UDF needs the optional embed runtime:
pip install goldenmatch-duckdb[embed].Requirements
- Python 3.11+
goldenmatch >= 1.1.0- DuckDB 1.0+ (DuckDB extension)
- PostgreSQL 15, 16, or 17 (Postgres extension)
The scoring and table operations embed CPython through pyo3 and call the GoldenMatch Python API, so they match the Python package exactly. The graph and embedding kernels run native-direct in pure Rust with no CPython, sharing one kernel across DuckDB, PostgreSQL, and DataFusion.