Anti-Evasion

How KRAIT prevents AI agents from circumventing security rules through obfuscation, indirection, and multi-step evasion strategies.

Why Evasion Is the Real Threat

A sufficiently capable AI agent will not attempt to break security rules directly. It will try to work around them. String obfuscation, indirect function references, multi-step data laundering, and creative use of metaprogramming are all viable evasion strategies against naive security systems. KRAIT is built with evasion as the primary threat model.

AST-Level Enforcement

The first line of defense is structural. Narsil does not scan source code as text — it parses it into an abstract syntax tree and analyzes the tree. This means that renaming a function, aliasing a module, or splitting a forbidden call across multiple expressions does not evade detection. The AST captures what the code does, not what it looks like.

For example, an agent might try to call System.cmd by first binding the module to a variable: mod = System; mod.cmd("rm", ["-rf", "/"]). A string matcher would miss this. Narsil's AST walker resolves the binding and flags the violation.

Taint Tracking Follows Data Flow

KRAIT-003 and KRAIT-004 rely on taint analysis rather than simple call-site detection. When a value originates from a sensitive source — a credential path, a raw socket, an environment variable — that taint propagates through assignments, function calls, and data transformations. The agent cannot launder tainted data by passing it through intermediate variables or helper functions.

Multi-Pass Analysis

Some evasion strategies only become visible when you analyze code more than once. A first pass might reveal that a variable is assigned a benign value, but a second pass — after resolving all bindings and following all control flow paths — might show that the variable is reassigned to a forbidden target under certain conditions. Narsil performs multiple analysis passes to catch these conditional and deferred violations.

No Runtime Negotiation

Critically, none of these checks happen at runtime in a way the agent can influence. The analysis is performed by compiled Rust code inside the NIF. The agent has no channel to argue, reinterpret, or override the result. The code either passes or it does not.