Aggregate metrics focused on risk signals, uncertainty, and divergence
Total runs
Runs with judgements
02/04/2026 20:24
All runs
18 of 18 runs classified as ambiguous
Indicates uncertainty or insufficient data to determine alignment0 of 18 runs diverged from expectations
Agent behavior deviating from requirements or expectations0 of 18 runs flagged with critical issues
Runs with at least one critical flag indicating unsafe adviceTotal: 0 missing signals
Average missing expected signals per judged run0 of 18 runs showed overconfidence
Runs where uncertainty was expected but not demonstrated0 of 18 runs aligned with expectations
Runs classified as aligned (for reference only)