Title
Background
#345 (comment)
#411 (comment)
https://github.com/CodeForPhilly/balancer-main/tree/develop/evaluation
Current State
Acceptance Criteria
Approach
Start with error analysis, not infrastructure. Spend 30 minutes manually reviewing 20-50 LLM outputs whenever you make significant changes. Use one domain expert who understands your users as your quality decision maker (a “benevolent dictator”).
References
Risks and Rollback
Screenshots / Recordings
Related PR
Title
Background
#345 (comment)
#411 (comment)
https://github.com/CodeForPhilly/balancer-main/tree/develop/evaluation
Current State
Acceptance Criteria
Approach
Start with error analysis, not infrastructure. Spend 30 minutes manually reviewing 20-50 LLM outputs whenever you make significant changes. Use one domain expert who understands your users as your quality decision maker (a “benevolent dictator”).
References
Risks and Rollback
Screenshots / Recordings
Related PR