Codex/update readme experiments#94
Conversation
There was a problem hiding this comment.
Code Review
This pull request significantly expands the Agent-R1 framework by adding support for broader algorithms and benchmarks, including StepPO, RLOO, REINFORCE++ Baseline, and GiGPO. It introduces complete task recipes, data preparation scripts, and text-based environments for ALFWorld, HotpotQA, Paper Search, and WebShop. Additionally, the core PPO trainer and advantage estimators have been refactored to support these multi-step agent tasks. Feedback on the changes highlights a potential TypeError in the _to_hashable helper function within core_algos.py when encountering None values, suggesting a safer type check to handle them as hashable scalars.
|
|
||
| def _to_hashable(value): | ||
| """Convert common observation objects to hashable keys for GiGPO grouping.""" | ||
| if isinstance(value, (int, float, str, bool)): |
There was a problem hiding this comment.
The _to_hashable function does not handle None values, which will cause a TypeError if any observation field is None. It is safer to allow None as a hashable scalar value.
| if isinstance(value, (int, float, str, bool)): | |
| if value is None or isinstance(value, (int, float, str, bool)): |
References
- Be careful not to confuse variables with similar names or purposes. Verify the type and origin of a variable before assuming its structure (e.g., list vs. scalar).
No description provided.