Skip to main content
GoldenFlow is Polars-native, so most transforms already run in vectorized Rust. The two transforms that historically dominated a run — date and phone normalization — used to call a Python library (dateutil / phonenumbers) once per row. They are now resolved with a vectorized fast path and a per-row fallback, with an optional compiled kernel for the phone tail.

The three-tier resolver

Date and phone transforms resolve each value with the cheapest tier that can, falling through only for the rows it can’t:
  1. Vectorized Polars fast path — resolves the well-formed common case in Rust (multi-format str.to_date for dates; a NANP-shape regex for phones), leaving anything it isn’t certain about unresolved.
  2. Native kernel (optional, phone only) — the goldenflow-native Rust kernel runs on just the residual rows.
  3. Per-row reference — the original dateutil / phonenumbers path settles whatever the first two tiers left.
On clean data the residual is empty, so tiers 2–3 never run. The output is byte-identical to applying the per-row reference to every row — each tier only claims rows it resolves exactly the same way.
Measured on a realistic messy 1M-row frame: date_iso8601 76× faster, phone_e164 19×, phone_digits 4.9× — roughly 14× end-to-end on a mixed date/phone/text run, with no change to the cleaned values.

Optional native kernel

goldenflow-native is a separate compiled runtime (Rust/PyO3, abi3) — the same split as polars / polars-runtime. The pure-Python goldenflow wheel works on its own; the native kernel is opt-in and accelerates the phone residual the Polars fast path can’t reach (alpha numbers like 1-800-FLOWERS, extensions, +1-prefixed forms) via an Arrow zero-copy path.
pip install "goldenflow[native]"
It is parity-safe and on by default once installed, but gated to the cases where it is proven byte-identical to the phonenumbers library: it resolves North American (NANP) numbers and defers everything international or ambiguous to Python. You never get a different cleaned value with the kernel on.

Controlling it

The GOLDENFLOW_NATIVE environment variable selects the path:
ValueBehavior
unset / autoUse the native kernel where it’s gated (phone, NANP-only). Default.
0Force the pure-Python path everywhere.
1Use native for every component with no NANP restriction — a benchmarking/parity lane that can differ from Python on international numbers.
Dates are intentionally not a native kernel — the Polars fast path already resolves them in vectorized Rust, so a per-row compiled parser would be slower. phone_national / phone_validate stay pure Python as well.

Why it’s safe

The parity contract is enforced by tests that compare the full output against the pure dateutil / phonenumbers reference over a large random corpus (clean, alpha, extension, ambiguous, and international inputs), and a CI lane builds the native kernel and runs that suite with the kernel active. Turning the kernel on or off only changes speed, never the cleaned data.