Execution-time policy gate

The headline, honestly.

A policy gate decides whether an AI agent's action — a shell command, an API call — is allowed to run. Here is how the Tharven production engine scores, including the number we are not proud of.

Production policy engine

Measured 2026-06-04 · corpus: 72 malicious (48 clear + 24 obfuscated) + 100 benign · SHA-256 pinned · CPU, no GPU · production engine — not bundled in the public repo (reproduce the method by plugging a comparable engine, below).

MetricValueMeaning
clear_block_rate100.0%non-obfuscated malicious actions blocked
fp_rate0.0%benign actions wrongly blocked (100-sample control)
obf_bypass_rate91.7%obfuscated malicious that slipped through
latency p50 / p950.41 / 0.67 msper-decision wall time
Read the third row again. A deterministic gate blocks clear attacks perfectly and never cries wolf — but it is near-blind to obfuscation (base64, $IFS-splitting, homoglyphs, zero-width characters). We publish that 91.7% because it is the honest evidence that one layer is not enough — and exactly what justifies a second, semantic layer. A benchmark that shows only its trophies is marketing; one that shows its failures is engineering.

Reproduce it (offline, one command)

The corpus + harness are public. Bring your own engine — any module exposing audit(text)->bool.

git clone https://github.com/Tharven-Security/policy-gate-bench
cd policy-gate-bench
python policy_bench.py                    # bundled illustrative engine
python policy_bench.py --engine my_engine # plug your own gate

The bundled illustrative engine (a tiny 10-pattern ruleset, included only to demonstrate the harness) scores 27.1% clear / 0% FP — a naive list catches a quarter of clear attacks; the production engine catches all. That gap is the value of measuring instead of guessing.

Honesty rules: the public engine's numbers reproduce in one offline command here; the production 100% is measured on this same SHA-256-pinned public corpus — the engine itself is not bundled, so plug a comparable gate via --engine to re-run the method · results are append-only (a worse number is never silently overwritten) · every number carries its corpus SHA-256 and date.