feat: Add reason to InvalidMessage#540
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1a5c5c9. Configure here.
|
I think we should add severity field to InvalidMessage instead of having another enum variant. This will make it easier to find all uses that need updating. |
1a5c5c9 to
2336431
Compare
| partition, | ||
| offset: i, | ||
| payload: i, |
There was a problem hiding this comment.
Bug: Messages with InvalidMessageReason::Ignored are counted towards the DLQ limit. A high volume of ignored messages can trigger a panic when the limit is reached.
Severity: HIGH
Suggested Fix
In DlqPolicyWrapper::produce, check for InvalidMessageReason::Ignored and bypass the call to dlq_limit_state.record_invalid_message for such messages. This will prevent intentionally ignored messages from contributing to the invalid message limit that can cause a panic.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: rust-arroyo/src/processing/dlq.rs#L643-L645
Potential issue: The new `InvalidMessageReason::Ignored` variant is intended for
messages that are intentionally skipped. However, in `DlqPolicyWrapper::produce`, these
"ignored" messages are still counted towards the DLQ limit by `record_invalid_message`.
If a high volume of ignored messages is received, this limit can be exceeded, causing
the consumer to `panic!`. This behavior is inconsistent with the semantics of "ignoring"
a message, as it can lead to a crash for a non-error condition.
|
How about this way @untitaker? |
|
that works for me. the sentry-bot issue seems real but since we didn't hit it with snuba yet it's not important |
|
feel free to merge and cut a new bugfix release |
|
Thanks @untitaker. I just realized that if someone is constructing So is this okay to release as a patch? |
|
@lcian i don't have a problem with breaking changes, but if you want to do them i think it would be best if you drive the entire migration:
ok let's do minor. but the point is not about patch vs minor. the point is that, yes, breaking changes are fine, but you have to roll them out asap. pushing out a breaking change and then either:
..puts us in a bad position when we want to make a new bugfix release in a hurry during inc response, because then we'll have to deal with the breaking change. and it almost doesn't matter if the breaking change was released as minor or patch release. this could be solved by backporting bugfixes to old major/minor releases but i don't think we want to go through that effort. |
|
Basically I just wanted to avoid a situation where some image is built on the fly without |
we should not have this situation at all, and if it is, then we need to fix the app. |
Bumps `sentry_arroyo` to 2.40.0 and adapts to the breaking change introduced in getsentry/arroyo#540. Looks like this also unifies the Sentry SDK dependencies, and also includes some spurious changes to the `uv.lock` which the README says are fine.

As of getsentry/snuba#7945, in some cases Snuba can return
RunTaskError::InvalidMessageto avoid processing some messages and intentionally have them end up in DLQ.From the POV of arroyo these are proper failures, which causes a ton of unnecessary logs to be emitted.
This introduces an
IgnoredMessage.reasonfield that can holdInvalidfor legit errors andIgnoredfor intentional drops.