Blog

Honest, reproducible AI-security write-ups.

Methodology over marketing. Every post pins its numbers to a corpus SHA and a one-command reproduction — including the numbers I'm not proud of.

Does my self-improving AI actually improve? Measuring capability, not activity

Most "self-improving AI" tracks activity, not capability. I built a like-for-like capability ledger to prove whether the loop compounds — and on its first run it caught a measurement confound faking a regression in my own numbers.

2026-06 · capability vs activity · provable non-regression · EU AI Act Art. 15 →

I built an AI-security benchmark that caught three bugs in my own code

A sovereign two-layer defense (input-time + execution-time) benchmarked on a fanless Celeron with no GPU — and the three real bugs the numbers caught: a data-egress leak, an overfit accuracy claim, and a prompt-injection of my own tooling.

2026-06 · execution-time gate 100% clear / 0% FP / 0.67 ms · OWASP LLM01/LLM06 · EU AI Act Art. 15 →