Metrics and Measurements
KCC is meaningful only insofar as it is measurable: what to measure, at what layer and phase, with what cadence, and how to interpret the result.
KCC is meaningful only insofar as it is measurable. This section specifies what to measure, at what level, with what frequency, and how to interpret results. It distinguishes leading indicators (signals that predict future outcomes) from lagging indicators (signals that confirm what already happened).
11.1 Metrics by Layer
| Layer | Representative metrics |
|---|---|
| Per-agent | success_rate, latency percentiles, cost_per_invocation, confidence_calibration, hitl_escalation_rate, downstream_rejection_rate, workslop_signature_count |
| Per-capability | cells_using_capability, maturity level, promotion_readiness, breaking_change_rate, issue resolution times |
| Per-cell | total_cost, cost_per_feature, success_rate, reviewer_load_p95, kernel_contract_violations, bypass_signature_count |
| Per-organization | capability_reuse_ratio, cross_cell_learning_lag, inspector promotions per month, roi_on_kcc_investment, knowledge_survey_score |
11.2 Metrics by Phase
Different metrics matter at different phases; using the wrong metric for the current phase produces misleading signals. Phase 1 tracks lead-time delta and per-individual productivity (and will plateau — that plateau is the signal to invest in Phase 2). Phase 2 tracks specs per sprint and token burn rate. Phase 3 tracks cross-cell learning lag, capability reuse ratio, ROI, and knowledge-survey scores — organizational intelligence over time, not per-person productivity. See Phase Model.
11.3 Leading vs. Lagging Indicators
| Leading indicator | What it predicts |
|---|---|
| Reviewer load trends | Review quality degradation |
| Confidence calibration drift | Trust failures |
| Bypass signature count | Platform abandonment |
| Inspector proposal queue length | Pipeline backlog |
| Context efficiency degradation | Agent quality decline |
Leading indicators allow intervention before harm and should be reviewed weekly or monthly. Lagging indicators (success rates, cost per feature, ROI, knowledge-survey scores) confirm whether the system is working and are reviewed monthly or quarterly.
11.5 Measurement Cadences
| Cadence | What's measured |
|---|---|
| Real-time | Token Guard signals, Butler decisions, decision traces |
| Daily | Per-cell cost summaries, reviewer load, failure-rate snapshots |
| Weekly | Inspector detection report, adoption changes, bypass and workslop review |
| Monthly | Per-capability health, pipeline velocity, confidence calibration |
| Quarterly | Phase progression, kernel evolution, L3 review, ROI and compounding metrics |
11.6 Metric Anti-Patterns
- Measuring activity instead of outcome — invocations per day says nothing about value.
- Using individual productivity in Phase 3 — it hides cross-cell learning effects.
- Reporting averages without distributions — averages hide tail problems.
- Optimizing for cost without quality — a cheap workslop generator is worse than an expensive value generator.
- Mistaking acceptance for verification — humans approving things does not mean they are correct.
Numbers without judgment are noise.
Section 11 specifies the measurements; Intelligence Economics addresses why they matter.