KCC 11

Metrics and Measurements

KCC is meaningful only insofar as it is measurable: what to measure, at what layer and phase, with what cadence, and how to interpret the result.

MetricsLeading indicatorsLagging indicatorsCalibrationCadence

Created 2026-06-08 · v0.4.0

KCC is meaningful only insofar as it is measurable. This section specifies what to measure, at what level, with what frequency, and how to interpret results. It distinguishes leading indicators (signals that predict future outcomes) from lagging indicators (signals that confirm what already happened).

11.1 Metrics by Layer

Layer	Representative metrics
Per-agent	success_rate, latency percentiles, cost_per_invocation, confidence_calibration, hitl_escalation_rate, downstream_rejection_rate, workslop_signature_count
Per-capability	cells_using_capability, maturity level, promotion_readiness, breaking_change_rate, issue resolution times
Per-cell	total_cost, cost_per_feature, success_rate, reviewer_load_p95, kernel_contract_violations, bypass_signature_count
Per-organization	capability_reuse_ratio, cross_cell_learning_lag, inspector promotions per month, roi_on_kcc_investment, knowledge_survey_score

11.2 Metrics by Phase

Different metrics matter at different phases; using the wrong metric for the current phase produces misleading signals. Phase 1 tracks lead-time delta and per-individual productivity (and will plateau — that plateau is the signal to invest in Phase 2). Phase 2 tracks specs per sprint and token burn rate. Phase 3 tracks cross-cell learning lag, capability reuse ratio, ROI, and knowledge-survey scores — organizational intelligence over time, not per-person productivity. See Phase Model.

11.3 Leading vs. Lagging Indicators

Leading indicator	What it predicts
Reviewer load trends	Review quality degradation
Confidence calibration drift	Trust failures
Bypass signature count	Platform abandonment
Inspector proposal queue length	Pipeline backlog
Context efficiency degradation	Agent quality decline

Leading indicators allow intervention before harm and should be reviewed weekly or monthly. Lagging indicators (success rates, cost per feature, ROI, knowledge-survey scores) confirm whether the system is working and are reviewed monthly or quarterly.

11.5 Measurement Cadences

Cadence	What's measured
Real-time	Token Guard signals, Butler decisions, decision traces
Daily	Per-cell cost summaries, reviewer load, failure-rate snapshots
Weekly	Inspector detection report, adoption changes, bypass and workslop review
Monthly	Per-capability health, pipeline velocity, confidence calibration
Quarterly	Phase progression, kernel evolution, L3 review, ROI and compounding metrics

11.6 Metric Anti-Patterns

Measuring activity instead of outcome — invocations per day says nothing about value.
Using individual productivity in Phase 3 — it hides cross-cell learning effects.
Reporting averages without distributions — averages hide tail problems.
Optimizing for cost without quality — a cheap workslop generator is worse than an expensive value generator.
Mistaking acceptance for verification — humans approving things does not mean they are correct.

Numbers without judgment are noise.

Section 11 specifies the measurements; Intelligence Economics addresses why they matter.