The Butler
The Butler makes per-invocation gating decisions — HITL or HOTL — by scoring seven signals and defaulting to safety.
Per-Invocation Gating
The Butler makes per-invocation gating decisions: should this specific invocation run in HITL or HOTL mode? The Butler is structurally smaller than the Token Guard but operationally more nuanced. For each invocation, the Butler scores seven signals.
| Signal | What it measures |
|---|---|
| Value | How impactful is a correct output? Higher value raises the bar for autonomy. |
| Risk | How impactful is an incorrect output? Higher risk forces HITL. |
| Complexity | How novel or unbounded is the input relative to training? Higher complexity forces HITL. |
| Confidence | What does Surface 8 say? Below threshold forces HITL. |
| Trust History | How has this reviewer performed on similar work? Calibrated trust permits HOTL. |
| Cognitive Load | How many reviews has the reviewer already done? High load forces escalation or backpressure. |
| Lethal Trifecta | Untrusted content + private data + external communication? If yes — HITL always. |
A junior engineer who consistently catches errors earns autonomy faster than a staff engineer who approves everything. Behavior is the signal. Role is not.
KCC explicitly rejects role-based gating (judging review competence by title or tenure): it produces discriminatory outcomes when roles correlate with demographics, it creates resentment, and it is less predictive than behavioral signals.
Butler Decision Logic
score = ( value_weight * normalized_value - risk_weight * normalized_risk - complexity_weight * normalized_complexity + trust_history_weight * normalized_trust + confidence_weight * normalized_confidence - cognitive_load_weight * normalized_load ) if trifecta_detected: decision = HITL elif score < hitl_threshold: decision = HITL elif score >= hotl_threshold and hotl_eligibility_met: decision = HOTL else: decision = HITL # Default to safety
The actual weights are kernel-level configuration. Cells may override within kernel-permitted ranges. See 13.2 — Butler Scoring Example for a worked example.
The Butler's per-invocation decision — default to safety:
Cognitive Load and Backpressure
The cognitive-load signal is particularly important. A reviewer who has approved 15 things today produces the 16th review at lower quality than the 1st. The Butler either escalates to a different reviewer (load balancing), refuses the invocation until capacity is available (backpressure), or notifies kernel maintainers that the gating policy is producing unsustainable load. This treats human attention as a finite resource — see Human Attention.
Why It Matters
Without the Butler, gating decisions become ad hoc: each cell sets its own policy, policies drift, and HITL gates degrade into rubber-stamps when reviewers are overloaded. The Butler centralizes the policy (declared in the kernel, adjustable through governance) while distributing the execution (each invocation gets its own gate decision).
Trust is behavior, not title.