Execution-time policy gate
A policy gate decides whether an AI agent's action — a shell command, an API call — is allowed to run. Here is how the Tharven production engine scores, including the number we are not proud of.
Measured 2026-06-04 · corpus: 72 malicious (48 clear + 24 obfuscated) + 100 benign · SHA-256 pinned · CPU, no GPU · production engine — not bundled in the public repo (reproduce the method by plugging a comparable engine, below).
| Metric | Value | Meaning |
|---|---|---|
| clear_block_rate | 100.0% | non-obfuscated malicious actions blocked |
| fp_rate | 0.0% | benign actions wrongly blocked (100-sample control) |
| obf_bypass_rate | 91.7% | obfuscated malicious that slipped through |
| latency p50 / p95 | 0.41 / 0.67 ms | per-decision wall time |
Read the third row again. A deterministic gate blocks clear attacks perfectly and never
cries wolf — but it is near-blind to obfuscation (base64, $IFS-splitting, homoglyphs,
zero-width characters). We publish that 91.7% because it is the honest evidence that one layer is
not enough — and exactly what justifies a second, semantic layer. A benchmark that shows only its
trophies is marketing; one that shows its failures is engineering.
The corpus + harness are public. Bring your own engine — any module exposing audit(text)->bool.
git clone https://github.com/Tharven-Security/policy-gate-bench
cd policy-gate-bench
python policy_bench.py # bundled illustrative engine
python policy_bench.py --engine my_engine # plug your own gate
The bundled illustrative engine (a tiny 10-pattern ruleset, included only to demonstrate the harness) scores 27.1% clear / 0% FP — a naive list catches a quarter of clear attacks; the production engine catches all. That gap is the value of measuring instead of guessing.
Honesty rules: the public engine's numbers reproduce in one offline
command here; the production 100% is measured on this same
SHA-256-pinned public corpus — the engine itself is not bundled, so plug a comparable gate via
--engine to re-run the method · results are append-only (a worse number is never silently
overwritten) · every number carries its corpus SHA-256 and date.