Skip to content

Fix MessageWindowChatMemory to drop orphaned ToolResponseMessages after window eviction#6029

Open
suryateja-g13 wants to merge 3 commits into
spring-projects:mainfrom
suryateja-g13:gh-5940-message-window-tool-pair-awareness
Open

Fix MessageWindowChatMemory to drop orphaned ToolResponseMessages after window eviction#6029
suryateja-g13 wants to merge 3 commits into
spring-projects:mainfrom
suryateja-g13:gh-5940-message-window-tool-pair-awareness

Conversation

@suryateja-g13
Copy link
Copy Markdown
Contributor

@suryateja-g13 suryateja-g13 commented May 13, 2026

Problem

`MessageWindowChatMemory` truncates the message list from the head when the window size is exceeded. However, it does not account for the structural constraint that every `ToolResponseMessage` must be preceded by an `AssistantMessage` containing the matching tool-use call.

When the window evicts exactly enough messages to remove an `AssistantMessage` that contained a tool-use block, the corresponding `ToolResponseMessage` is left at the start of the retained list — as an orphan with no preceding tool call. Sending this to an LLM causes an API error (e.g. Anthropic returns `400: tool_use block must be followed by tool_result`).

Reproduction scenario

```
Window limit = 3, per-add saves (as done by MessageChatMemoryAdvisor):
add([UserMsg1]) → [UserMsg1]
add([AssistantMsg(tool)]) → [UserMsg1, AssistantMsg(tool)]
add([ToolResponseMsg]) → [UserMsg1, AssistantMsg(tool), ToolResponseMsg] ← at limit
add([AssistantMsg_final]) → trim 1 → [AssistantMsg(tool), ToolResponseMsg, AssistantMsg_final]

Next turn:
add([UserMsg2]) → trim 1 → [ToolResponseMsg, AssistantMsg_final, UserMsg2]
^^^ orphaned — no preceding tool_use → Anthropic 400
```

Solution

After the standard head-truncation, scan the retained list for any `ToolResponseMessage` entries that appear before the first non-system, non-tool-response message. Drop all of them — they are orphans whose `AssistantMessage(tool_use)` partner was evicted.

```java
private static List dropOrphanedToolResponses(List messages) {
// skip leading SystemMessages (always preserved)
// drop all consecutive ToolResponseMessages at the head
// (their AssistantMessage partner was evicted by the window trim)
}
```

Changes

  • `MessageWindowChatMemory.java`: call `dropOrphanedToolResponses()` after head-truncation
  • `MessageWindowChatMemoryTests.java`: four new tests

Tests

All existing 18 tests continue to pass. Four new tests added:

  • `orphanedToolResponseIsDroppedWhenToolUseEvicted` — single orphan
  • `multipleConsecutiveOrphanedToolResponsesAreAllDropped` — two back-to-back orphans from a multi-tool-call turn
  • `toolPairPreservedWhenBothFitInWindow` — complete pair survives eviction
  • `systemMessagePreservedWhenOrphanedToolResponseDropped` — system message stays at position 0
  • `loneOrphanedToolResponseWithMaxMessagesOneResultsInEmptyHistory` — edge case: single orphan leaves empty history

Known limitation

`dropOrphanedToolResponses` only catches orphans created by FIFO head-first eviction — the only eviction strategy currently implemented. If a future strategy evicts an `AssistantMessage(tool_call)` from a non-leading position, its `ToolResponseMessage` would not be detected.

After orphan drop, the conversation may start with an `AssistantMessage` (clean text, no tool calls) if the user message that preceded it was also evicted in the same cycle. This is a separate concern — the specific error reported in #5940 (`tool_result` with no matching `tool_use`) is fully addressed here.

Closes #5940

…ow eviction

MessageWindowChatMemory used naïve count-based head trimming that could
cut between an AssistantMessage containing tool_use blocks and its
matching ToolResponseMessage. The surviving ToolResponseMessage at the
head of the kept window caused providers (e.g. Anthropic) to reject the
next request with a 400 because the tool_result had no matching tool_use.

Fix: after the standard trim, scan the head of the non-system portion and
remove any leading ToolResponseMessage instances whose paired tool_use was
evicted. The same principle applies to OpenAI tool messages.

Closes spring-projectsgh-5940

Signed-off-by: Gorre Surya <suryateja.g13@gmail.com>
…ageWindowChatMemory

- Fix hasNewSystemMessage detection to compare by text only, not full
  AbstractMessage.equals() which includes metadata. Persistence stores
  (Cassandra, JDBC, MongoDB) enrich messages with timestamps/IDs on save;
  the old metadata-based comparison falsely detected these reloaded messages
  as "new" system messages, silently wiping all system context on every
  resumed conversation.
- Document FIFO-only assumption on dropOrphanedToolResponses: the head scan
  is only correct under FIFO eviction; future non-FIFO strategies would
  require a full-list scan.
- Add test: loneOrphanedToolResponseWithMaxMessagesOneResultsInEmptyHistory
- Add test: systemMessageMetadataDifferenceDoesNotTriggerFalseNewSystemMessageDetection

Signed-off-by: Gorre Surya <suryateja.g13@gmail.com>
@suryateja-g13
Copy link
Copy Markdown
Contributor Author

Flagging for maintainer attention — the original issue (#5940) has had active discussion recently that reinforces why this fix is needed.

Key findings from the thread:

  • The recommended Spring AI pattern (disableInternalConversationHistory() + MessageChatMemoryAdvisor at default order) does store tool pairs in memory on every tool iteration — this is by design, confirmed by AbstractToolCallAdvisorIT
  • There is no workaround available to users short of placing MessageChatMemoryAdvisor outside the tool loop (a non-obvious deviation from the recommended pattern)
  • The orphan issue is reproducible in production with any tool-heavy conversation once the window starts sliding — the original author hits it every few hours with maxMessages=100

This fix is the correct place to handle it — in MessageWindowChatMemory itself, transparently, without requiring users to change their advisor configuration.

…rphan test

- Revert SystemMessage text-based equality change — that is a separate
  bug unrelated to spring-projectsGH-5940 and will be raised as an independent PR to
  keep this fix focused
- Add test multipleConsecutiveOrphanedToolResponsesAreAllDropped: verifies
  that when a multi-tool-call turn is evicted, all consecutive leading
  ToolResponseMessages are dropped, not just the first

Signed-off-by: Gorre Surya <suryateja.g13@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MessageWindowChatMemory truncation breaks tool_use/tool_result pairs (Anthropic 400)

1 participant