Verification and Environments
Two honest boundaries: functional verification is an unsolved problem KCC does not pretend to close, and host-mutating actions are gated and never silent-able.
10.6 — Functional Verification
Decision Trace (Surface 9) and Confidence (Surface 8) make agent behavior auditable and replayable. They do not, by themselves, prove that the output is functionally correct. A green test suite when the AI wrote the tests is not strong evidence of correctness.
Functional verification of agent output remains an unsolved problem.
Honest framing: KCC enables verification infrastructure (replay-by-input-hash, confidence calibration, trace audit) but does not specify the verification methodology. Cells are responsible for their own functional verification.
| Method | What It Provides |
|---|---|
| Golden datasets | Curated examples with known-correct outputs; regression detection |
| Property tests | Assertions that must hold across many inputs; behavioral invariants |
| Differential testing | Comparing new agent versions against previous on the same inputs |
| Sampled human-in-loop checks | Random verification at defined rates |
Why functional verification is hard: the test itself is suspect when the AI wrote both code and tests; correctness is domain-specific; verification at scale requires verification of verification; and some things can only be discovered, not verified. One practical lever is the Architecture Critic: a producer/critic separation that applies an adversarial second opinion at design time, when the cost of a wrong decision is lowest.
Auditability is not correctness. KCC provides the first. The second is your job, in your domain, with methods appropriate to your risks.
10.7 — The Environment Mutation Boundary
Reading the environment is safe and can be unsupervised. Changing the environment — installing a compiler, package manager, or runtime — is a system-changing act that must not ride along silently inside an autonomous run. An agent may freely inspect its environment and suggest install commands, but it may not install anything on its own authority.
| Outcome | Meaning |
|---|---|
| install | The human authorizes the install to run now |
| human-install | The human will install out of band; the agent waits |
| defer | The step is postponed; the build/verify step is marked deferred, not silently re-stacked |
Observe freely; mutate only through a gate that never goes silent:
Two rules make the boundary hold: the gate is never silent (it fires even under the most permissive autonomy envelope), and every decision carries provenance in the decision trace. Detection is reversible; installation is not. At scale, silent environment mutation is how an 'autonomous coding' program becomes an 'unexplained machine state' program.
An agent may read its world freely. Changing it is a decision a human signs.