Skip to content

Codex/update readme experiments#94

Open
fishsure wants to merge 2 commits into
mainfrom
codex/update-readme-experiments
Open

Codex/update readme experiments#94
fishsure wants to merge 2 commits into
mainfrom
codex/update-readme-experiments

Conversation

@fishsure
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the Agent-R1 framework by adding support for broader algorithms and benchmarks, including StepPO, RLOO, REINFORCE++ Baseline, and GiGPO. It introduces complete task recipes, data preparation scripts, and text-based environments for ALFWorld, HotpotQA, Paper Search, and WebShop. Additionally, the core PPO trainer and advantage estimators have been refactored to support these multi-step agent tasks. Feedback on the changes highlights a potential TypeError in the _to_hashable helper function within core_algos.py when encountering None values, suggesting a safer type check to handle them as hashable scalars.


def _to_hashable(value):
"""Convert common observation objects to hashable keys for GiGPO grouping."""
if isinstance(value, (int, float, str, bool)):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _to_hashable function does not handle None values, which will cause a TypeError if any observation field is None. It is safer to allow None as a hashable scalar value.

Suggested change
if isinstance(value, (int, float, str, bool)):
if value is None or isinstance(value, (int, float, str, bool)):
References
  1. Be careful not to confuse variables with similar names or purposes. Verify the type and origin of a variable before assuming its structure (e.g., list vs. scalar).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant