Hi,
I’m currently testing Qwen2.5-Coder-32B-Instruct and trying to compute pass@10, but the evaluation stage seems to become nearly endless, particularly during the test execution phase.
In addition, I’m encountering OOM issues even on fairly large CPU clusters. From the behavior, it seems possible that worker processes are recursively creating more workers, which may be contributing to both the extremely long runtime and the memory blow-up.
I’m wondering:
- Is this a known issue with the evaluator?
- Are there recommended settings for pass@10 evaluation to avoid runaway execution?
- Should the number of workers be restricted manually, or is there a safer evaluation mode for large-scale runs?
Any advice would be appreciated. Thanks.
Hi,
I’m currently testing Qwen2.5-Coder-32B-Instruct and trying to compute pass@10, but the evaluation stage seems to become nearly endless, particularly during the test execution phase.
In addition, I’m encountering OOM issues even on fairly large CPU clusters. From the behavior, it seems possible that worker processes are recursively creating more workers, which may be contributing to both the extremely long runtime and the memory blow-up.
I’m wondering:
Any advice would be appreciated. Thanks.