Codex/update readme experiments by fishsure · Pull Request #94 · AgentR1/Agent-R1

fishsure · 2026-05-26T08:17:13Z

No description provided.

gemini-code-assist

Code Review

This pull request significantly expands the Agent-R1 framework by adding support for broader algorithms and benchmarks, including StepPO, RLOO, REINFORCE++ Baseline, and GiGPO. It introduces complete task recipes, data preparation scripts, and text-based environments for ALFWorld, HotpotQA, Paper Search, and WebShop. Additionally, the core PPO trainer and advantage estimators have been refactored to support these multi-step agent tasks. Feedback on the changes highlights a potential TypeError in the _to_hashable helper function within core_algos.py when encountering None values, suggesting a safer type check to handle them as hashable scalars.

gemini-code-assist · 2026-05-26T08:20:05Z


+def _to_hashable(value):
+    """Convert common observation objects to hashable keys for GiGPO grouping."""
+    if isinstance(value, (int, float, str, bool)):


The _to_hashable function does not handle None values, which will cause a TypeError if any observation field is None. It is safer to allow None as a hashable scalar value.

Suggested change

if isinstance(value, (int, float, str, bool)):

if value is None or isinstance(value, (int, float, str, bool)):

References

Be careful not to confuse variables with similar names or purposes. Verify the type and origin of a variable before assuming its structure (e.g., list vs. scalar).

fishsure added 2 commits May 26, 2026 11:49

Add StepPO recipes and dataset docs

6ede22b

docs: update Agent-R1 README and experiments

0e3cd5e

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

0russwest0 force-pushed the main branch from 6be3ae1 to 898c36d Compare May 28, 2026 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex/update readme experiments#94

Codex/update readme experiments#94
fishsure wants to merge 2 commits into
mainfrom
codex/update-readme-experiments

fishsure commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	if isinstance(value, (int, float, str, bool)):
	if value is None or isinstance(value, (int, float, str, bool)):

Conversation

fishsure commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant