KCC 06.2

The Butler

The Butler makes per-invocation gating decisions — HITL or HOTL — by scoring seven signals and defaulting to safety.

Meta AgentsHITL/HOTLReviewer calibrationCognitive loadTrust history

Created 2026-06-08 · v0.4.0

Per-Invocation Gating

The Butler makes per-invocation gating decisions: should this specific invocation run in HITL or HOTL mode? The Butler is structurally smaller than the Token Guard but operationally more nuanced. For each invocation, the Butler scores seven signals.

Signal	What it measures
Value	How impactful is a correct output? Higher value raises the bar for autonomy.
Risk	How impactful is an incorrect output? Higher risk forces HITL.
Complexity	How novel or unbounded is the input relative to training? Higher complexity forces HITL.
Confidence	What does Surface 8 say? Below threshold forces HITL.
Trust History	How has this reviewer performed on similar work? Calibrated trust permits HOTL.
Cognitive Load	How many reviews has the reviewer already done? High load forces escalation or backpressure.
Lethal Trifecta	Untrusted content + private data + external communication? If yes — HITL always.

A junior engineer who consistently catches errors earns autonomy faster than a staff engineer who approves everything. Behavior is the signal. Role is not.

KCC explicitly rejects role-based gating (judging review competence by title or tenure): it produces discriminatory outcomes when roles correlate with demographics, it creates resentment, and it is less predictive than behavioral signals.

Butler Decision Logic

score = (
  value_weight * normalized_value -
  risk_weight * normalized_risk -
  complexity_weight * normalized_complexity +
  trust_history_weight * normalized_trust +
  confidence_weight * normalized_confidence -
  cognitive_load_weight * normalized_load
)

if trifecta_detected:
  decision = HITL
elif score < hitl_threshold:
  decision = HITL
elif score >= hotl_threshold and hotl_eligibility_met:
  decision = HOTL
else:
  decision = HITL  # Default to safety

The actual weights are kernel-level configuration. Cells may override within kernel-permitted ranges. See 13.2 — Butler Scoring Example for a worked example.

The Butler's per-invocation decision — default to safety:

Diagram

rendering diagram…

Cognitive Load and Backpressure

The cognitive-load signal is particularly important. A reviewer who has approved 15 things today produces the 16th review at lower quality than the 1st. The Butler either escalates to a different reviewer (load balancing), refuses the invocation until capacity is available (backpressure), or notifies kernel maintainers that the gating policy is producing unsustainable load. This treats human attention as a finite resource — see Human Attention.

Why It Matters

Without the Butler, gating decisions become ad hoc: each cell sets its own policy, policies drift, and HITL gates degrade into rubber-stamps when reviewers are overloaded. The Butler centralizes the policy (declared in the kernel, adjustable through governance) while distributing the execution (each invocation gets its own gate decision).

Trust is behavior, not title.