Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions fern/versions/latest/pages/concepts/workflow-chaining.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,35 @@ workflow.add_stage("enriched", enriched)

`on_success_version` is part of the stage resume identity. Change it when the callback's output semantics change. If a callback returns zero rows, the workflow raises by default; set `allow_empty=True` to mark that stage as completed empty and skip downstream stages.

## Repeating until a filtered count

Use `repeat_until` when a stage should keep generating candidates until its selected output reaches a target row count. This is useful for bounded rejection sampling, such as generating many candidates and keeping only rows that pass a judge or quality gate.

```python
from data_designer.interface import RepeatUntil

workflow = data_designer.compose_workflow(name="judge-disagreements")
workflow.add_stage(
"judged",
judges,
num_records=1_000,
on_success=keep_disagreements,
on_success_version="disagreements-v1",
repeat_until=RepeatUntil(
output_records=5_000,
max_iterations=10,
max_generated_records=20_000,
),
)
workflow.add_stage("enriched", enriched)
```

`num_records` is the per-attempt size. In the default `mode="append"`, each iteration requests the cumulative stage size (`num_records`, then `2 * num_records`, and so on), reruns `on_success` over the accumulated stage output, and feeds exactly `output_records` selected rows downstream.

Set `on_exhausted="return_partial"` to keep the best partial output when the bounds are reached; otherwise the workflow raises. If no rows pass, the stage completes empty and downstream stages are skipped, matching `allow_empty=True` behavior.

Use `mode="discard"` when each attempt should replace the previous selected output instead of accumulating it. Discard mode restarts the stage on resume because previous attempts are intentionally replaced. Keep bounded limits in place: a low acceptance rate is often a signal to inspect the recipe, not just to run indefinitely. In append mode, `max_generated_records` caps the cumulative requested stage size; in discard mode, it caps records produced across attempts.

## Changing row counts between stages

Each stage has a fixed requested row count while it runs. To resize a workflow, change the selected output at a stage boundary and let the next stage seed from that output.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
from data_designer.interface.composite_workflow import ( # noqa: F401
CompositeWorkflow,
CompositeWorkflowResults,
RepeatUntil,
RepeatUntilExhaustion,
RepeatUntilMode,
SkippedStageResult,
SkippedStageStatus,
)
Expand All @@ -33,6 +36,9 @@
"DataDesignerWorkflowError": ("data_designer.interface.errors", "DataDesignerWorkflowError"),
"DatasetCreationResults": ("data_designer.interface.results", "DatasetCreationResults"),
"ResumeMode": ("data_designer.engine.storage.artifact_storage", "ResumeMode"),
"RepeatUntil": ("data_designer.interface.composite_workflow", "RepeatUntil"),
"RepeatUntilExhaustion": ("data_designer.interface.composite_workflow", "RepeatUntilExhaustion"),
"RepeatUntilMode": ("data_designer.interface.composite_workflow", "RepeatUntilMode"),
"SkippedStageResult": ("data_designer.interface.composite_workflow", "SkippedStageResult"),
"SkippedStageStatus": ("data_designer.interface.composite_workflow", "SkippedStageStatus"),
}
Expand Down
Loading
Loading