Add custom-forward-backward and forward endpoints for RL custom losses#276
Add custom-forward-backward and forward endpoints for RL custom losses#276dvmazur wants to merge 1 commit into
Conversation
Adds four new RL training operation endpoints:
- POST /rl/training-sessions/{session_id}/operations/custom-forward-backward
- GET /rl/training-sessions/{session_id}/operations/custom-forward-backward/{operation_id}
- POST /rl/training-sessions/{session_id}/operations/forward
- GET /rl/training-sessions/{session_id}/operations/forward/{operation_id}
Also adds the corresponding schema definitions:
RL.CustomForwardBackwardBody, RL.CustomForwardBackwardOperation,
RL.CustomForwardBackwardResult, RL.ForwardBody, RL.ForwardOperation,
RL.ForwardResult, RL.TargetLogprobs, RL.TargetLogprobGradients.
✱ Stainless preview builds for togetheraiThis PR will update the go openapi python terraform typescript Edit this comment to update them. They will appear in their respective SDK's changelogs. ✅ togetherai-openapi studio · code · diff
✅ togetherai-go studio · code · diff
✅ togetherai-python studio · code · diff
✅ togetherai-typescript studio · code · diff
✅ togetherai-terraform studio · code · diff
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
Summary
POST /rl/training-sessions/{session_id}/operations/custom-forward-backward— submit a forward-backward pass driven by externally computed log-prob gradientsGET /rl/training-sessions/{session_id}/operations/custom-forward-backward/{operation_id}— poll status/resultPOST /rl/training-sessions/{session_id}/operations/forward— submit a no-grad forward pass to retrieve per-token log-probabilitiesGET /rl/training-sessions/{session_id}/operations/forward/{operation_id}— poll status/resultRL.CustomForwardBackwardBody,RL.CustomForwardBackwardOperation,RL.CustomForwardBackwardResult,RL.ForwardBody,RL.ForwardOperation,RL.ForwardResult,RL.TargetLogprobs,RL.TargetLogprobGradientsRL.*naming convention and OpenAPI 3.1 style of the fileCompanion PR in together-shaping: https://github.com/togethercomputer/together-shaping/pull/3333
Test plan
$reftargets resolve (no dangling references)RL.DType(referenced byRL.TargetLogprobGradients) already exists in the file — it does, at theRL.DTypeschema definition🤖 Generated with Claude Code