The Token Guard
The Token Guard enforces cost envelopes and blast-radius limits for every invocation — structurally simple but operationally critical.
Cost and Blast-Radius Enforcement
The Token Guard enforces cost envelopes and blast-radius limits for every invocation. It is structurally simple but operationally critical.
Before Invocation
- Reads the agent's declared cost envelope (Surface 5).
- Estimates the cost of the invocation based on input size, historical performance, and prompt structure.
- If estimated cost exceeds max_tokens_in, max_latency_ms, or max_cost_usd — refuses the invocation.
- If estimated cost approaches max_* values (typically over 80%) — escalates to HITL via the Butler.
- Inspects declared tools for lethal trifecta status; forces HITL if detected.
- Requires kernel maintainer approval for organizational blast radius in HOTL mode.
During and After Invocation
- Monitors actual resource consumption; cancels invocations that breach hard ceilings mid-flight.
- Updates the agent's running cost profile (used for future estimates).
- Flags agents whose actual costs systematically exceed their declared envelopes.
- Emits telemetry to the Inspector Pipeline; updates organizational cost dashboards.
The Token Guard's gate, before and during an invocation:
Blast Radius Enforcement
| Level | Definition | HOTL Allowed? |
|---|---|---|
| read | Read data, cannot modify state | Yes |
| local | Modify state within a single resource | Yes |
| domain | Modify state within a single service/system | Yes, with declaration |
| organizational | Affects multiple systems | Only with kernel maintainer approval |
Cost Transparency
The Token Guard's telemetry feeds a cost dashboard visible to the entire organization: every cell sees its own consumption, kernel maintainers see aggregate consumption, and costs are attributed to capabilities and cells so poorly designed or runaway usage can be flagged. Cost transparency is not surveillance — it is the same visibility that financial planning provides for any organizational resource.
Why It Matters
Most AI adoption failures in 2026 are cost failures, not capability failures: a team enables agents, agents start running in loops, costs spike, leadership panics, the program is paused. The Token Guard prevents this by making cost a first-class declarative property of each agent, enforced at runtime, with visibility into who is consuming what.
Cost is governed the same way memory and latency are governed in distributed systems: declared, monitored, enforced.