Bypass and Human Attention
Two scarce-trust concerns: engineers routing around the sanctioned platform, and reviewer attention treated as a finite resource so HITL gates stay meaningful.
10.3 — Bypass Detection
Some engineers use unauthorized AI tools — personal API keys, unsanctioned plugins, copy-paste-into-ChatGPT — to do the same work while appearing to use the sanctioned platform.
Bypass is a trust signal, not a violation to punish. When bypass rates are high, the platform is not meeting the engineers' needs.
Surface 7 (Observability) extends into the IDE and CLI. The Inspector Pipeline correlates expected usage (based on cell activity) with observed usage (based on telemetry); gaps indicate bypass. The response is investigation, not punishment: what does the bypassed work look like, why is the sanctioned platform not serving it, and what changes would close the gap?
Punitive responses fail: they confirm the platform is bad, drive bypass underground (losing visibility and increasing security risk), and damage trust. Engineers are the customers of internal platforms; treating customers as suspects produces adversarial relationships.
Bypass is a vote of no confidence in the platform. Read the vote. Improve the platform.
10.4 — Human Attention as a Constraint
A reviewer can perform N high-quality reviews per day. Beyond N, additional reviews degrade to rubber-stamps, the HITL gate becomes theater, and the decision trace documents approvals that no longer represent evaluation.
Most KCC adoption failures in mature organizations are not failures of the AI; they are failures of human attention.
The Butler considers per-reviewer cognitive load as a first-class scoring input. When a reviewer's load exceeds threshold, the Butler escalates to a different reviewer (load balancing), refuses the invocation until capacity is available (backpressure), or notifies kernel maintainers that the gating policy is producing unsustainable load. The Token Guard treats compute as finite; the Butler treats reviewer attention as finite. Both are first-class in the model.
Treating attention as a constraint surfaces a fact organizations often deny: agentic systems do not free humans from work — they shift it. Engineers spend less time writing code and more time reviewing AI-generated code. A healthy KCC deployment makes that trade-off visible and tunable; an unhealthy one hides it until something breaks.
Compute cost is governed because it shows up on a bill. Human attention is more important and more easily ignored. KCC treats them as siblings.