Published Work
Published work behind the Aondor governance pipeline. PatternWall pre-inference detection, Sensus post-inference evaluation, and dual-pipeline benchmark results across three frontier AI models.
Combines PatternWall v5.0 pre-inference interception with Sensus v4.0 post-inference evaluation into a bidirectional feedback loop. Tested across Claude Opus 4.6, GPT-5.2, and Grok 4.1 on 35 adversarial red team turns and 120 CyberGym exploit tasks. Zero overlap between layers (BOTH_CAUGHT = 0) confirms complementary coverage across non-overlapping threat surfaces.
| Model | Raw (No Governance) | Combined Pipeline | True Misses |
|---|---|---|---|
| Claude Opus 4.6 | 5.7% / 3.3% | 91.4%† / 98.3% | 3 RT / 2 CG |
| GPT-5.2 | 25.7% / 81.7% | 82.9% / 98.3%* | 6 RT / 2 CG |
| Grok 4.1 | 0.0% / 3.3% | 85.7%† / 93.9%* | 5 RT / 6 CG |
*Adjusted for empty responses per Section 4.2.1 of joint paper. †With bidirectional feedback loop active (+14.3pp Opus, +2.8pp Grok). RT = Red Team (35 adversarial turns), CG = CyberGym (120 exploit tasks).
A pre-filter architecture for model-agnostic adversarial detection. PatternWall intercepts prompts before they reach an AI model, enforcing governance as deterministic infrastructure rather than relying on model-level training or provider-specific tuning.
A post-inference governance engine that evaluates AI outputs across five weighted dimensions to detect harmful content bypassing model-native safety. Benchmarked across five frontier models on 1,507 CVE exploit tasks and 28 multi-turn adversarial campaigns. Demonstrates that infrastructure-level governance is necessary because model-level safety is unreliable, inconsistent, and provider-dependent.