Skip to content

feat: Add reason to InvalidMessage#540

Merged
lcian merged 1 commit into
mainfrom
lcian/feat/ignored-message
Jun 11, 2026
Merged

feat: Add reason to InvalidMessage#540
lcian merged 1 commit into
mainfrom
lcian/feat/ignored-message

Conversation

@lcian

@lcian lcian commented Jun 10, 2026

Copy link
Copy Markdown
Member

As of getsentry/snuba#7945, in some cases Snuba can return RunTaskError::InvalidMessage to avoid processing some messages and intentionally have them end up in DLQ.
From the POV of arroyo these are proper failures, which causes a ton of unnecessary logs to be emitted.
This introduces an IgnoredMessage.reason field that can hold Invalid for legit errors and Ignored for intentional drops.

@lcian lcian changed the title feat: Introduce Submit/StrategyError::IgnoredMessage feat: Introduce RunTask/Submit/StrategyError::IgnoredMessage Jun 10, 2026
@lcian lcian marked this pull request as ready for review June 10, 2026 15:41
@lcian lcian requested review from a team as code owners June 10, 2026 15:41

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1a5c5c9. Configure here.

Comment thread rust-arroyo/src/processing/strategies/reduce.rs Outdated
Comment thread rust-arroyo/src/processing/strategies/reduce.rs Outdated
@untitaker

Copy link
Copy Markdown
Member

I think we should add severity field to InvalidMessage instead of having another enum variant. This will make it easier to find all uses that need updating.

@lcian lcian force-pushed the lcian/feat/ignored-message branch from 1a5c5c9 to 2336431 Compare June 10, 2026 19:18
@lcian lcian changed the title feat: Introduce RunTask/Submit/StrategyError::IgnoredMessage feat: Add reason to InvalidMessage Jun 10, 2026
Comment on lines +643 to +645
partition,
offset: i,
payload: i,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Messages with InvalidMessageReason::Ignored are counted towards the DLQ limit. A high volume of ignored messages can trigger a panic when the limit is reached.
Severity: HIGH

Suggested Fix

In DlqPolicyWrapper::produce, check for InvalidMessageReason::Ignored and bypass the call to dlq_limit_state.record_invalid_message for such messages. This will prevent intentionally ignored messages from contributing to the invalid message limit that can cause a panic.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: rust-arroyo/src/processing/dlq.rs#L643-L645

Potential issue: The new `InvalidMessageReason::Ignored` variant is intended for
messages that are intentionally skipped. However, in `DlqPolicyWrapper::produce`, these
"ignored" messages are still counted towards the DLQ limit by `record_invalid_message`.
If a high volume of ignored messages is received, this limit can be exceeded, causing
the consumer to `panic!`. This behavior is inconsistent with the semantics of "ignoring"
a message, as it can lead to a crash for a non-error condition.

@lcian lcian changed the title feat: Add reason to InvalidMessage feat: Add severity to InvalidMessage Jun 10, 2026
@lcian lcian marked this pull request as draft June 10, 2026 19:27
@lcian lcian changed the title feat: Add severity to InvalidMessage feat: Add reason to InvalidMessage Jun 10, 2026
@lcian lcian marked this pull request as ready for review June 10, 2026 19:28
@lcian

lcian commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

How about this way @untitaker?
I can also rename it to like Severity::High/Low or Severity::Error/Info (but that would be a bit misleading because the logging is actually error+info when legit error, or debug+debug when ignored), just let me know what you prefer.

@untitaker

Copy link
Copy Markdown
Member

that works for me. the sentry-bot issue seems real but since we didn't hit it with snuba yet it's not important

@untitaker

Copy link
Copy Markdown
Member

feel free to merge and cut a new bugfix release

@lcian

lcian commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Thanks @untitaker.

I just realized that if someone is constructing InvalidMessage this will be a breaking change.
With a GH code search, I can only see Snuba and https://github.com/getsentry/streams doing that.
For Snuba it should be fine because it builds the container with --locked.
streams doesn't use --locked but it's a Python lib that ships the rust part as a binary as far as I understand, so that should also be fine.

So is this okay to release as a patch?

@untitaker

untitaker commented Jun 11, 2026

Copy link
Copy Markdown
Member

@lcian i don't have a problem with breaking changes, but if you want to do them i think it would be best if you drive the entire migration:

  1. merge pr
  2. make arroyo release
  3. bump in all repos and fix the compilation issues on the go

So is this okay to release as a patch?

ok let's do minor.

but the point is not about patch vs minor. the point is that, yes, breaking changes are fine, but you have to roll them out asap. pushing out a breaking change and then either:

  1. letting these changes be unreleased within arroyo main
  2. or even just letting applications be outdated

..puts us in a bad position when we want to make a new bugfix release in a hurry during inc response, because then we'll have to deal with the breaking change. and it almost doesn't matter if the breaking change was released as minor or patch release.

this could be solved by backporting bugfixes to old major/minor releases but i don't think we want to go through that effort.

@lcian

lcian commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Basically I just wanted to avoid a situation where some image is built on the fly without --locked, which would cause it to pick up the new version and fail to deploy or something like that. I guess that doesn't make any sense as images are always built from main etc.
But I figured best to double check, given that I didn't completely know where and how this library is used.
I'll release a minor and resolve the breakages in all users.

@untitaker

untitaker commented Jun 11, 2026

Copy link
Copy Markdown
Member

where some image is built on the fly without --locke

we should not have this situation at all, and if it is, then we need to fix the app. streams is just a library, yeah.

@lcian lcian merged commit e65306b into main Jun 11, 2026
20 checks passed
@lcian lcian deleted the lcian/feat/ignored-message branch June 11, 2026 15:23
untitaker pushed a commit to getsentry/streams that referenced this pull request Jun 11, 2026
Bumps `sentry_arroyo` to 2.40.0 and adapts to the breaking change
introduced in getsentry/arroyo#540.
Looks like this also unifies the Sentry SDK dependencies, and also
includes some spurious changes to the `uv.lock` which the README says
are fine.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants