HITL and Observability
Human-in-the-loop gates and observability keep autonomous work visible at the moments where judgement matters most.
Surface 6 - HITL/HOTL Declaration: Purpose
Declare the agent's default trust mode and the conditions for trust transitions. The Butler uses this together with per-invocation signals to make gating decisions.
| Mode | Definition | Example |
|---|---|---|
| HITL (Human-in-the-loop) | Agent invocation requires human review before action | A code review agent that posts comments only after a human approves the proposed comment |
| HOTL (Human-on-the-loop) | Agent acts autonomously; humans observe and intervene only on signal | A spec-writer agent that produces specs which humans review afterward |
The distinction is temporal: HITL means human approval is a prerequisite for action; HOTL means human review happens after the fact, with intervention possible.
hitl_hotl:
default_mode: HITL|HOTL
hotl_eligibility:
requires_maturity: L1|L2|L3 # Minimum maturity for HOTL
requires_trust_history: boolean # Reviewer must have history with this agent
minimum_trust_score: float # 0.0-1.0
excluded_invocation_types: [string] # Always HITL for these types
hotl_revoke_triggers:
- consecutive_failures: int
- confidence_below: float
- novel_domain_detected: boolean
- cost_overrun_above: percentage
- trifecta_detected: boolean # If true: forces HITL alwaysAn agent flagged as lethal-trifecta in Surface 4 cannot have default_mode: HOTL. Agents with blast_radius: organizational require explicit kernel maintainer approval for HOTL. The Butler may escalate an HOTL-eligible agent to HITL based on per-invocation factors even when the declaration permits HOTL; the Butler may not promote an HITL-only declaration to HOTL.
Reference Pattern - The Escalation Gate
Non-normative. Surface 6 defines the modes but not the ritual for escalating from HOTL back to HITL. A single universal gate makes that ritual the same regardless of which agent or signal triggered it: (1) Pause lifecycle progression; (2) Present a structured four-way choice - approve, revise, escalate, abort; (3) Record the choice in the Decision Trace as a gate event; (4) Resume, revise, escalate, or terminate per the choice.
The HITL -> HOTL transition is a trust ladder, not a setting.
Surface 7 - Observability: Purpose
Declare what telemetry the agent emits and where it goes. This enables the Inspector Pipeline to function. Without observability, the system can run agents but cannot learn from them.
observability:
events:
- name: string
schema: <schema reference>
destination: kernel_telemetry|cell_telemetry|both
metrics:
- name: string
type: counter|gauge|histogram
labels: [string]
traces:
- decision_trace: required # Surface 9 trace must always be emitted
sampling_rate: percentage # Optional; default 100%Every agent must emit at minimum three events: invocation_start (fired when the agent begins processing), invocation_complete (fired on finish, success or failure), and decision_trace_recorded (fired when Surface 9 trace is durably written). Agents that cannot emit conformant telemetry cannot be promoted past L1. Decision trace emission is non-negotiable; sampling is not permitted below L2.
Privacy and Telemetry
The Inspector Pipeline reads trace metadata, not agent outputs directly. The trace records which agent ran, what tools it invoked, how confident it was, and whether it succeeded - but not the full input or output content. Inputs and outputs are referenced via opaque IDs (blob URIs, hashes). An auditor with appropriate permissions can dereference the IDs; the Inspector Pipeline operating at scale only reads metadata. This permits pattern detection without privacy violation.
Anti-Patterns
- Declaring HOTL as default for an L1 agent - insufficient evidence; rejected
- Declaring HOTL without any revoke triggers - no safety net; flagged
- Trifecta agents declared as HOTL - automatically reverted to HITL by kernel
- Sampling decision traces - defeats audit; not permitted at L2+
- Telemetry only emitted in development - production behavior is what matters
- Telemetry that includes sensitive content - should reference content via opaque IDs, not include raw content
Why It Matters
Without the HITL/HOTL boundary, all invocations either require human review (workflow becomes a bottleneck) or run autonomously (governance fails), and gating decisions drift ad hoc per cell. Declared at the agent level with the policy enforced per-invocation by the Butler, the boundary becomes predictable, adjustable, and audited. Without observability, cost is invisible, pattern detection is impossible, audit fails, and capability evolution is blind. Telemetry is not an extra; it is the substrate that makes governance and learning possible.
Boring engineering: the telemetry always emits. The traces are always durable. The patterns are always observable. Predictability is the feature.