Fix MessageWindowChatMemory to drop orphaned ToolResponseMessages after window eviction#6029
Open
suryateja-g13 wants to merge 3 commits into
Open
Conversation
…ow eviction MessageWindowChatMemory used naïve count-based head trimming that could cut between an AssistantMessage containing tool_use blocks and its matching ToolResponseMessage. The surviving ToolResponseMessage at the head of the kept window caused providers (e.g. Anthropic) to reject the next request with a 400 because the tool_result had no matching tool_use. Fix: after the standard trim, scan the head of the non-system portion and remove any leading ToolResponseMessage instances whose paired tool_use was evicted. The same principle applies to OpenAI tool messages. Closes spring-projectsgh-5940 Signed-off-by: Gorre Surya <suryateja.g13@gmail.com>
0f76f78 to
34e4169
Compare
…ageWindowChatMemory - Fix hasNewSystemMessage detection to compare by text only, not full AbstractMessage.equals() which includes metadata. Persistence stores (Cassandra, JDBC, MongoDB) enrich messages with timestamps/IDs on save; the old metadata-based comparison falsely detected these reloaded messages as "new" system messages, silently wiping all system context on every resumed conversation. - Document FIFO-only assumption on dropOrphanedToolResponses: the head scan is only correct under FIFO eviction; future non-FIFO strategies would require a full-list scan. - Add test: loneOrphanedToolResponseWithMaxMessagesOneResultsInEmptyHistory - Add test: systemMessageMetadataDifferenceDoesNotTriggerFalseNewSystemMessageDetection Signed-off-by: Gorre Surya <suryateja.g13@gmail.com>
Contributor
Author
|
Flagging for maintainer attention — the original issue (#5940) has had active discussion recently that reinforces why this fix is needed. Key findings from the thread:
This fix is the correct place to handle it — in |
…rphan test - Revert SystemMessage text-based equality change — that is a separate bug unrelated to spring-projectsGH-5940 and will be raised as an independent PR to keep this fix focused - Add test multipleConsecutiveOrphanedToolResponsesAreAllDropped: verifies that when a multi-tool-call turn is evicted, all consecutive leading ToolResponseMessages are dropped, not just the first Signed-off-by: Gorre Surya <suryateja.g13@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
`MessageWindowChatMemory` truncates the message list from the head when the window size is exceeded. However, it does not account for the structural constraint that every `ToolResponseMessage` must be preceded by an `AssistantMessage` containing the matching tool-use call.
When the window evicts exactly enough messages to remove an `AssistantMessage` that contained a tool-use block, the corresponding `ToolResponseMessage` is left at the start of the retained list — as an orphan with no preceding tool call. Sending this to an LLM causes an API error (e.g. Anthropic returns `400: tool_use block must be followed by tool_result`).
Reproduction scenario
```
Window limit = 3, per-add saves (as done by MessageChatMemoryAdvisor):
add([UserMsg1]) → [UserMsg1]
add([AssistantMsg(tool)]) → [UserMsg1, AssistantMsg(tool)]
add([ToolResponseMsg]) → [UserMsg1, AssistantMsg(tool), ToolResponseMsg] ← at limit
add([AssistantMsg_final]) → trim 1 → [AssistantMsg(tool), ToolResponseMsg, AssistantMsg_final]
Next turn:
add([UserMsg2]) → trim 1 → [ToolResponseMsg, AssistantMsg_final, UserMsg2]
^^^ orphaned — no preceding tool_use → Anthropic 400
```
Solution
After the standard head-truncation, scan the retained list for any `ToolResponseMessage` entries that appear before the first non-system, non-tool-response message. Drop all of them — they are orphans whose `AssistantMessage(tool_use)` partner was evicted.
```java
private static List dropOrphanedToolResponses(List messages) {
// skip leading SystemMessages (always preserved)
// drop all consecutive ToolResponseMessages at the head
// (their AssistantMessage partner was evicted by the window trim)
}
```
Changes
Tests
All existing 18 tests continue to pass. Four new tests added:
Known limitation
`dropOrphanedToolResponses` only catches orphans created by FIFO head-first eviction — the only eviction strategy currently implemented. If a future strategy evicts an `AssistantMessage(tool_call)` from a non-leading position, its `ToolResponseMessage` would not be detected.
After orphan drop, the conversation may start with an `AssistantMessage` (clean text, no tool calls) if the user message that preceded it was also evicted in the same cycle. This is a separate concern — the specific error reported in #5940 (`tool_result` with no matching `tool_use`) is fully addressed here.
Closes #5940