Skip to content

Add troubleshooting guide for missed Schedule Actions#4471

Open
dustin-temporal wants to merge 1 commit intomainfrom
docs/schedule-missed-actions
Open

Add troubleshooting guide for missed Schedule Actions#4471
dustin-temporal wants to merge 1 commit intomainfrom
docs/schedule-missed-actions

Conversation

@dustin-temporal
Copy link
Copy Markdown
Contributor

@dustin-temporal dustin-temporal commented Apr 22, 2026

Summary

Adds docs/troubleshooting/schedule-missed-actions.mdx documenting the workflow for diagnosing why a Schedule did not fire:

  1. Alert on the missed catchup window counter (temporal_cloud_v1_schedule_missed_catchup_window_count for Cloud, schedule_missed_catchup_window for self-hosted) grouped by Namespace.
  2. Enumerate Schedules with temporal schedule list to produce candidate Schedule Ids.
  3. Inspect temporal schedule describe per Schedule and look for non-zero info.missedCatchupWindow to identify the affected Schedule.
  4. Interpret impact, cross-check rate-limit and buffer-overrun metrics for root cause, and remediate (widen Catchup Window, revisit Overlap Policy, raise throughput, Backfill).

The metric is Namespace-scoped with no per-Schedule label, so the list + describe fan-out is currently the only path from alert to affected Schedule.

Why

The existing docs describe the Catchup Window as a Schedule Spec option and document the metrics individually, but there is no page that ties them together into an investigation workflow. Users receiving an alert on ...missed_catchup_window_count have no guided path to find out which Schedule was affected.

Changes

  • New: docs/troubleshooting/schedule-missed-actions.mdx
  • Edited: docs/troubleshooting/index.mdx (added link to the new page)
  • Edited: sidebars.js (added the new page under Troubleshooting)

Checklist

  • Follows STYLE.md guidelines (sentence-case infinitive-verb headings, capitalized Temporal core terms, Id not ID)
  • Frontmatter matches existing troubleshooting pages
  • Links to related documentation (Schedule concept, Catchup Window, CLI reference, Cloud OpenMetrics reference, cluster metrics reference)
  • sidebars.js updated
  • yarn build passes (not yet run - draft)

🤖 Generated with Claude Code

┆Attachments: EDU-6249 Add troubleshooting guide for missed Schedule Actions

Documents the workflow for diagnosing why a Schedule did not fire: alert
on the missed catchup window metric (temporal_cloud_v1 for Cloud,
schedule_missed_catchup_window for self-hosted), enumerate Schedules with
ListSchedules, then inspect DescribeSchedule.info.missedCatchupWindow per
Schedule to identify the affected one. Includes root-cause cross-checks
against rate-limit and buffer-overrun metrics, plus remediation guidance.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
temporal-documentation Ready Ready Preview, Comment Apr 22, 2026 2:52pm

Request Review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

📖 Docs PR preview links

@dustin-temporal dustin-temporal marked this pull request as ready for review April 22, 2026 16:35
@dustin-temporal dustin-temporal requested a review from a team as a code owner April 22, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant