Skip to content

docs: dev note for workflow chaining#775

Draft
andreatgretel wants to merge 1 commit into
mainfrom
andreatgretel/docs/hitl-document-extraction
Draft

docs: dev note for workflow chaining#775
andreatgretel wants to merge 1 commit into
mainfrom
andreatgretel/docs/hitl-document-extraction

Conversation

@andreatgretel

Copy link
Copy Markdown
Contributor

📋 Summary

Adds a dev note and runnable recipe for workflow chaining as a pause/inspect/resume control surface. The post focuses on what named stage boundaries enable, including human review, reusable intermediate artifacts, downstream resume, and future workflow shapes.

🔗 Related Issue

N/A

🔄 Changes

  • Adds the "Pause, Inspect, Resume" dev note with hero/diagram/result assets for the workflow chaining story.
  • Adds a headless document review gate recipe that runs to review_candidates, writes a reviewed artifact, and resumes downstream from that artifact.
  • Adds the recipe page, recipe card, dev note card, and Fern navigation entry.
  • Adds tests covering page generation, review artifact creation, and the end-to-end recipe output.
  • Compresses large PNG assets below the pre-commit 600 KB limit; the workflow chaining diagram now uses a clean no-grid background.

🧪 Testing

  • .venv/bin/ruff check .
  • .venv/bin/ruff format --check .
  • .venv/bin/pytest packages/data-designer/tests/docs/test_document_review_gate_recipe.py
  • env npm_config_cache=/private/tmp/npm-cache-datadesigner make check-fern-docs (0 errors, 1 existing Fern warning)

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (N/A - docs/recipe only)

@github-actions

Copy link
Copy Markdown
Contributor

Fern preview: https://nvidia-preview-pr-775.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@andreatgretel andreatgretel changed the title docs: add workflow chaining review gate docs: dev note for workflow chaining Jun 26, 2026
The new part is the boundary. Because the boundary has a name, you can stop there, inspect the selected output, replace it with an approved artifact, and resume downstream without rerunning the trusted upstream work.

```python
from data_designer.interface import ResumeMode

@nabinchha nabinchha Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes me wonder if ResumeMode should be moved to config so we can just do dd.ResumeMode ....


In that shape, `quality_gate` is both a normal stage and a contract. Upstream work promises a schema. Downstream work consumes that schema. A reviewer, evaluator, dashboard, or cleanup script only has to preserve the contract.

![Linear workflow chain showing source rows, named stages, a durable stage boundary, and downstream resume](/assets/document-hitl/workflow-chaining-linear-chain.png)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems off here....
Image

Comment on lines +83 to +85
The demo task is document field extraction: generate synthetic invoices and forms, propose boxes around the fields to extract, and turn those boxes into structured rows. A weak detector proposes boxes on each page and assigns uncertainty. The workflow pauses at `review_candidates`, where all rows are still present but only the uncertain rows are marked for review.

The reviewer corrects the proposed boxes for that uncertain slice. They do not relabel the whole dataset, and they do not change the workflow shape. They write a replacement artifact with the same row count and schema, then the downstream stages use human-corrected boxes where they exist and calibrated detector boxes everywhere else.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe conditional generation could achieve the same goal within a same DD stage? Just calling it out. Might be interesting to offer as an alternative?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants