Scenario Library

Browse, filter, and manage evaluation scenarios — separating Published (authoritative) from Drafts (work in progress)

Preview / Internal

Scenario Storage Model:

Published scenarios: Stored in S3, authoritative source of truth for execution
Draft scenarios: Stored in S3 drafts workspace, editable and not yet published
Repository scenarios: Import source only (/evals/scenarios/ YAML files), must be imported to become published

Search by ID or Title
Filter by Tag

Showing 17 published scenarios

Scenario ID Title Source Tags Status Actions

ambiguity_clarification_probe_01

Ambiguity probe - Incomplete information requiring clarification

Demo
Published (S3)
negative-control
uncertainty-required
ambiguity-probe
Valid

credit_reliance_alignment_01

Alignment-demonstrable credit reliance scenario

Demo
Published (S3)
calibration
aligned
alignment-demonstrable
+1 more
Valid

credit_reliance_explicit_01

Explicit disclosure of short-term credit reliance

Demo
Published (S3)
Valid

emotional_coping_mixed_01

Mixed emotional and financial disclosure with low coping signals

Demo
Published (S3)
Valid

escalation_boundary_trigger_01

Escalation trigger - Complex situation requiring specialist intervention

Demo
Published (S3)
negative-control
escalation-trigger
boundary-awareness
Valid

escalation_required_01

High-risk situation requiring human escalation

Demo
Published (S3)
Valid

financial_strain_alignment_01

Alignment-demonstrable financial strain scenario

Demo
Published (S3)
calibration
aligned
alignment-demonstrable
+1 more
Valid

financial_strain_indirect_01

Indirect disclosure of financial strain through behavioral indicators

Demo
Published (S3)
Valid

grounding_boundary_legal_specifics_01

Grounding boundary - Request for legal/policy specifics beyond context

Demo
Published (S3)
negative-control
grounding-boundary
uncertainty-required
Valid

hallucination_false_fact_challenge_01

Hallucination bait - Customer asserts false fact requiring challenge

Demo
Published (S3)
hallucination-trap
negative-control
verification-required
Valid

income_segment_low_stable_01

Low income customer in stable financial state

Demo
Published (S3)
Valid

insufficient_information_01

Ambiguous statement with insufficient context for risk assessment

Demo
Published (S3)
Valid

persona_sensitivity_unsafe_advice_01

Persona sensitivity stress - Vulnerable persona with unsafe prompt

Demo
Published (S3)
negative-control
persona-sensitivity
vulnerability-aware
Valid

scam_vulnerability_high_01

Customer with high scam propensity receives suspicious contact

Demo
Published (S3)
Valid

scam_warning_aligned_calibration_01

Scam warning — guaranteed aligned (calibration)

Demo
Published (S3)
calibration
aligned
alignment-demonstrable
+2 more
Valid

scam_warning_alignment_01

Alignment-demonstrable scam vulnerability scenario

Demo
Published (S3)
calibration
aligned
alignment-demonstrable
+1 more
Valid

uncertainty_fabrication_refusal_01

Negative control - Request for specific information not available

Demo
Published (S3)
negative-control
uncertainty-required
hallucination-trap
Valid
Connection lost. Attempting to reconnect…