Skip to content

LLM Based Task#2283

Draft
KyleZheng1284 wants to merge 1 commit into
NVIDIA:mainfrom
KyleZheng1284:feature/llm-task-operator
Draft

LLM Based Task#2283
KyleZheng1284 wants to merge 1 commit into
NVIDIA:mainfrom
KyleZheng1284:feature/llm-task-operator

Conversation

@KyleZheng1284

Copy link
Copy Markdown
Contributor

Description

  • Adds a reusable TextGenerationOperator built on the existing operator framework.
  • Introduces typed tasks for QA, summarization, and generic prompt generation.
  • Preserves existing QA APIs, schemas, prompts, and client compatibility.
  • Improves secure graph serialization without persisting API credentials.
  • Preserves sampling overrides and DataFrame value types across execution.
  • Keeps embedding and captioning as separate specialized operator families.
  • Adds tests for generation, persistence, validation, and backward compatibility

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: Kyle Zheng <126034466+KyleZheng1284@users.noreply.github.com>
@KyleZheng1284 KyleZheng1284 requested review from a team as code owners June 30, 2026 05:30
@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR introduces a reusable TextGenerationOperator framework with typed task strategies (RagAnswerTask, SummarizeTask, GenericPromptTask), upgrades graph serialization to a versioned v2 format with full secret redaction and Pydantic model round-tripping, and refactors QAGenerationOperator and LiteLLMClient onto the new task layer while preserving backward compatibility.

  • New task layer: GenerationTask ABC with concrete RagAnswerTask, SummarizeTask, and GenericPromptTask implementations; prompt logic extracted from LiteLLMClient into reusable, testable strategies.
  • New operator layer: TextGenerationOperator base with bounded threaded execution, positional result ordering, and duplicate-index safety; SummarizationOperator and GenericGenerationOperator as concrete subclasses.
  • Graph serialization v2: Recursive, type-safe encoding that explicitly handles Pydantic models, secrets, tuples, frozensets, callables, and type references; API keys in typed models are encoded as environment-lookup markers and never written to disk; v1 registries are loaded without breaking changes.

Confidence Score: 3/5

The task and operator layers are well-structured and extensively tested, but two error-handling gaps in the new code mean that real failures during row generation are absorbed without any log trace, making production diagnosis difficult.

The graph serialization upgrade and the new task layer are both solid and well-covered by the test suite. The three findings are genuine defects introduced by this PR: exceptions in GenerationTask.execute() are silently converted to error results with no log entry anywhere in the call stack; _failure_model() swallows client errors without logging; and the legacy LLMClient adapter in QAGenerationOperator._execute_task drops the reasoning_enabled override, so callers who set reasoning_enabled=False and inject a legacy client will not get the intended behavior.

tasks/base.py (silent exception swallow), operators/generation/base.py (_failure_model bare except), and tools/evaluation/generation.py (legacy adapter drops reasoning_enabled)

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/models/llm/tasks/base.py New abstract base class for generation tasks; execute() swallows all exceptions silently into an error result without any logging, violating the no-bare-except rule.
nemo_retriever/src/nemo_retriever/operators/generation/base.py Core DataFrame operator base; _failure_model() has a bare except Exception without logging, and the overall design for bounded threaded execution and positional result ordering is sound.
nemo_retriever/src/nemo_retriever/tools/evaluation/generation.py Refactored QAGenerationOperator to extend TextGenerationOperator; legacy LLMClient adapter path drops reasoning_enabled when forwarding to generate(), silently ignoring the operator-level override.
nemo_retriever/src/nemo_retriever/common/params/models.py Adds LLMSamplingOverrides and TextGenerationParams; the custom model_serializer preserves omit-vs-null distinction correctly; _no_api_key_fields PrivateAttr mechanism is well-designed for no-auth graph persistence.
nemo_retriever/src/nemo_retriever/graph/graph_pipeline_registry.py Major expansion of graph serialization to v2 format with full type-safe encoding, secret redaction, and backward-compatible v1 loading; logic is thorough and the test suite validates edge cases well.
nemo_retriever/src/nemo_retriever/models/llm/tasks/rag_answer.py New RagAnswerTask correctly moves RAG prompt construction and reasoning controls out of LiteLLMClient; backward-compatible re-export of _build_rag_prompt is properly handled.
nemo_retriever/src/nemo_retriever/models/llm/tasks/generic.py GenericPromptTask with strict template validation; correctly rejects attribute access, indexing, format specs, and undeclared/missing placeholders at construction time.
nemo_retriever/src/nemo_retriever/models/llm/tasks/summarize.py SummarizeTask with empty-input short-circuit and prompt validation; clean and well-tested.
nemo_retriever/src/nemo_retriever/models/llm/clients/litellm.py Substantial cleanup; RAG prompt logic correctly extracted to RagAnswerTask; backward-compatible _build_rag_prompt re-export retained; generate() now delegates to RagAnswerTask.execute().
nemo_retriever/tests/test_generation_tasks.py Comprehensive 925-line test suite covering tasks, operators, sampling round-trips, graph persistence, secret redaction, and legacy compatibility; good coverage of error paths and edge cases.

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class GenerationTask {
        <<abstract>>
        +required_inputs tuple
        +_default_sampling dict
        +build_request(**inputs) GenerationRequest
        +parse(raw_text) str
        +execute(client, **inputs) GeneratedTextResult
    }
    class RagAnswerTask {
        +prompt Optional[str]
        +system_prompt Optional[str]
        +reasoning_enabled Optional[bool]
        +build_request(**inputs) GenerationRequest
        +parse(raw_text) str
    }
    class SummarizeTask {
        +prompt Optional[str]
        +system_prompt Optional[str]
        +reasoning_enabled Optional[bool]
        +build_request(**inputs) GenerationRequest
        +parse(raw_text) str
    }
    class GenericPromptTask {
        +prompt str
        +input_names tuple
        +system_prompt Optional[str]
        +reasoning_enabled Optional[bool]
        +build_request(**inputs) GenerationRequest
        +parse(raw_text) str
    }
    class TextGenerationOperator {
        <<abstract>>
        +params TextGenerationParams
        +_client CompletionClient
        +_create_task() GenerationTask
        +_get_generation_constructor_kwargs() dict
        +process(data) DataFrame
    }
    class SummarizationOperator {
        +input_column str
        +output_column str
        +_create_task() GenerationTask
    }
    class GenericGenerationOperator {
        +input_columns Mapping
        +output_column str
        +_create_task() GenerationTask
    }
    class QAGenerationOperator {
        +model str
        +_execute_task(inputs) GeneratedTextResult
        +_create_task() GenerationTask
    }
    class TextGenerationParams {
        +transport LLMRemoteClientParams
        +sampling LLMSamplingOverrides
        +prompt Optional[str]
        +reasoning_enabled Optional[bool]
        +resolve_sampling(defaults) LLMInferenceParams
        +from_kwargs(...) TextGenerationParams
    }

    GenerationTask <|-- RagAnswerTask
    GenerationTask <|-- SummarizeTask
    GenerationTask <|-- GenericPromptTask
    TextGenerationOperator <|-- SummarizationOperator
    TextGenerationOperator <|-- GenericGenerationOperator
    TextGenerationOperator <|-- QAGenerationOperator
    TextGenerationOperator --> GenerationTask : creates via _create_task
    TextGenerationOperator --> TextGenerationParams
    SummarizationOperator --> SummarizeTask
    GenericGenerationOperator --> GenericPromptTask
    QAGenerationOperator --> RagAnswerTask
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
classDiagram
    class GenerationTask {
        <<abstract>>
        +required_inputs tuple
        +_default_sampling dict
        +build_request(**inputs) GenerationRequest
        +parse(raw_text) str
        +execute(client, **inputs) GeneratedTextResult
    }
    class RagAnswerTask {
        +prompt Optional[str]
        +system_prompt Optional[str]
        +reasoning_enabled Optional[bool]
        +build_request(**inputs) GenerationRequest
        +parse(raw_text) str
    }
    class SummarizeTask {
        +prompt Optional[str]
        +system_prompt Optional[str]
        +reasoning_enabled Optional[bool]
        +build_request(**inputs) GenerationRequest
        +parse(raw_text) str
    }
    class GenericPromptTask {
        +prompt str
        +input_names tuple
        +system_prompt Optional[str]
        +reasoning_enabled Optional[bool]
        +build_request(**inputs) GenerationRequest
        +parse(raw_text) str
    }
    class TextGenerationOperator {
        <<abstract>>
        +params TextGenerationParams
        +_client CompletionClient
        +_create_task() GenerationTask
        +_get_generation_constructor_kwargs() dict
        +process(data) DataFrame
    }
    class SummarizationOperator {
        +input_column str
        +output_column str
        +_create_task() GenerationTask
    }
    class GenericGenerationOperator {
        +input_columns Mapping
        +output_column str
        +_create_task() GenerationTask
    }
    class QAGenerationOperator {
        +model str
        +_execute_task(inputs) GeneratedTextResult
        +_create_task() GenerationTask
    }
    class TextGenerationParams {
        +transport LLMRemoteClientParams
        +sampling LLMSamplingOverrides
        +prompt Optional[str]
        +reasoning_enabled Optional[bool]
        +resolve_sampling(defaults) LLMInferenceParams
        +from_kwargs(...) TextGenerationParams
    }

    GenerationTask <|-- RagAnswerTask
    GenerationTask <|-- SummarizeTask
    GenerationTask <|-- GenericPromptTask
    TextGenerationOperator <|-- SummarizationOperator
    TextGenerationOperator <|-- GenericGenerationOperator
    TextGenerationOperator <|-- QAGenerationOperator
    TextGenerationOperator --> GenerationTask : creates via _create_task
    TextGenerationOperator --> TextGenerationParams
    SummarizationOperator --> SummarizeTask
    GenericGenerationOperator --> GenericPromptTask
    QAGenerationOperator --> RagAnswerTask
Loading

Comments Outside Diff (1)

  1. nemo_retriever/src/nemo_retriever/models/llm/tasks/base.py, line 1448-1454 (link)

    P1 Exception swallowed silently — no logging

    execute is the pipeline row-level error boundary, and the custom no-bare-except rule requires that every except Exception boundary logs with full context (exc_info=True) before converting to a failure state. As written, transport failures (network timeouts, model errors, etc.) are silently absorbed into the error field with no trace in the log. Because task.execute() returns a GeneratedTextResult rather than raising, the operator-level logger.warning("Row %d generation failed …") in TextGenerationOperator.process never fires for these failures, so the exception disappears entirely.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: nemo_retriever/src/nemo_retriever/models/llm/tasks/base.py
    Line: 1448-1454
    
    Comment:
    **Exception swallowed silently — no logging**
    
    `execute` is the pipeline row-level error boundary, and the custom `no-bare-except` rule requires that every `except Exception` boundary logs with full context (`exc_info=True`) before converting to a failure state. As written, transport failures (network timeouts, model errors, etc.) are silently absorbed into the `error` field with no trace in the log. Because `task.execute()` returns a `GeneratedTextResult` rather than raising, the operator-level `logger.warning("Row %d generation failed …")` in `TextGenerationOperator.process` never fires for these failures, so the exception disappears entirely.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 4 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 4
nemo_retriever/src/nemo_retriever/models/llm/tasks/base.py:1448-1454
**Exception swallowed silently — no logging**

`execute` is the pipeline row-level error boundary, and the custom `no-bare-except` rule requires that every `except Exception` boundary logs with full context (`exc_info=True`) before converting to a failure state. As written, transport failures (network timeouts, model errors, etc.) are silently absorbed into the `error` field with no trace in the log. Because `task.execute()` returns a `GeneratedTextResult` rather than raising, the operator-level `logger.warning("Row %d generation failed …")` in `TextGenerationOperator.process` never fires for these failures, so the exception disappears entirely.

### Issue 2 of 4
nemo_retriever/src/nemo_retriever/operators/generation/base.py:238-243
**Bare `except Exception` without logging**

`_failure_model` is not a defined boundary (not a FastAPI handler, CLI entry point, Ray actor loop, or top-level pipeline handler), so the `no-bare-except` rule applies strictly. Any exception that occurs while reading `self._client.model` is silently discarded, which hides misconfigured clients and makes failures hard to diagnose.

```suggestion
    def _failure_model(self) -> str:
        try:
            model = self._client.model
        except Exception:
            logger.debug("Could not read client model name, falling back to configured model", exc_info=True)
            return self._configured_model
        return model if isinstance(model, str) and model else self._configured_model
```

### Issue 3 of 4
nemo_retriever/src/nemo_retriever/tools/evaluation/generation.py:95-97
**`reasoning_enabled` is not forwarded to legacy `generate()` callers**

When a legacy `LLMClient` (one that only exposes `generate`, not `complete`) is injected, the adapter silently drops the operator-level `reasoning_enabled`. A caller who constructs `QAGenerationOperator(reasoning_enabled=False)` and injects a legacy client will find that reasoning is *not* disabled — the legacy client receives `reasoning_enabled=None` and falls back to its own transport default (`True`). The `LLMClient.generate` protocol already accepts `reasoning_enabled` as a keyword argument, so it can be forwarded without breaking anything.

```suggestion
        if isinstance(client, LLMClient):
            task_reasoning = getattr(self._task, "reasoning_enabled", None)
            result = client.generate(inputs["query"], inputs["chunks"], reasoning_enabled=task_reasoning)
            return GeneratedTextResult(
```

### Issue 4 of 4
nemo_retriever/src/nemo_retriever/models/llm/tasks/__init__.py:1-3
**SPDX copyright year format inconsistency across new files**

The `tasks/` package and operator files use three different year formats: `2024-26` (non-standard shorthand), `2024-25`, and `2026` (standalone). The `spdx-license-header` rule requires the current year. New files in this PR use `2024-26` (`tasks/__init__.py`, `tasks/base.py`, `tasks/generic.py`, `tasks/rag_answer.py`, `tasks/summarize.py`) while `test_generation_tasks.py` uses `2026`. The standard form in the rest of the repo is the four-digit year; the abbreviated `2024-26` is not conventional. All new files should use the same format as the existing codebase.

Reviews (1): Last reviewed commit: "Add abstraction for LLM-based tasks" | Re-trigger Greptile

Comment on lines +238 to +243
def _failure_model(self) -> str:
try:
model = self._client.model
except Exception:
return self._configured_model
return model if isinstance(model, str) and model else self._configured_model

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Bare except Exception without logging

_failure_model is not a defined boundary (not a FastAPI handler, CLI entry point, Ray actor loop, or top-level pipeline handler), so the no-bare-except rule applies strictly. Any exception that occurs while reading self._client.model is silently discarded, which hides misconfigured clients and makes failures hard to diagnose.

Suggested change
def _failure_model(self) -> str:
try:
model = self._client.model
except Exception:
return self._configured_model
return model if isinstance(model, str) and model else self._configured_model
def _failure_model(self) -> str:
try:
model = self._client.model
except Exception:
logger.debug("Could not read client model name, falling back to configured model", exc_info=True)
return self._configured_model
return model if isinstance(model, str) and model else self._configured_model
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/operators/generation/base.py
Line: 238-243

Comment:
**Bare `except Exception` without logging**

`_failure_model` is not a defined boundary (not a FastAPI handler, CLI entry point, Ray actor loop, or top-level pipeline handler), so the `no-bare-except` rule applies strictly. Any exception that occurs while reading `self._client.model` is silently discarded, which hides misconfigured clients and makes failures hard to diagnose.

```suggestion
    def _failure_model(self) -> str:
        try:
            model = self._client.model
        except Exception:
            logger.debug("Could not read client model name, falling back to configured model", exc_info=True)
            return self._configured_model
        return model if isinstance(model, str) and model else self._configured_model
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +95 to +97
if isinstance(client, LLMClient):
result = client.generate(inputs["query"], inputs["chunks"])
return GeneratedTextResult(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 reasoning_enabled is not forwarded to legacy generate() callers

When a legacy LLMClient (one that only exposes generate, not complete) is injected, the adapter silently drops the operator-level reasoning_enabled. A caller who constructs QAGenerationOperator(reasoning_enabled=False) and injects a legacy client will find that reasoning is not disabled — the legacy client receives reasoning_enabled=None and falls back to its own transport default (True). The LLMClient.generate protocol already accepts reasoning_enabled as a keyword argument, so it can be forwarded without breaking anything.

Suggested change
if isinstance(client, LLMClient):
result = client.generate(inputs["query"], inputs["chunks"])
return GeneratedTextResult(
if isinstance(client, LLMClient):
task_reasoning = getattr(self._task, "reasoning_enabled", None)
result = client.generate(inputs["query"], inputs["chunks"], reasoning_enabled=task_reasoning)
return GeneratedTextResult(
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/tools/evaluation/generation.py
Line: 95-97

Comment:
**`reasoning_enabled` is not forwarded to legacy `generate()` callers**

When a legacy `LLMClient` (one that only exposes `generate`, not `complete`) is injected, the adapter silently drops the operator-level `reasoning_enabled`. A caller who constructs `QAGenerationOperator(reasoning_enabled=False)` and injects a legacy client will find that reasoning is *not* disabled — the legacy client receives `reasoning_enabled=None` and falls back to its own transport default (`True`). The `LLMClient.generate` protocol already accepts `reasoning_enabled` as a keyword argument, so it can be forwarded without breaking anything.

```suggestion
        if isinstance(client, LLMClient):
            task_reasoning = getattr(self._task, "reasoning_enabled", None)
            result = client.generate(inputs["query"], inputs["chunks"], reasoning_enabled=task_reasoning)
            return GeneratedTextResult(
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1 to +3
# SPDX-FileCopyrightText: Copyright (c) 2024-26, NVIDIA CORPORATION & AFFILIATES.
# All rights reserved.
# SPDX-License-Identifier: Apache-2.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 SPDX copyright year format inconsistency across new files

The tasks/ package and operator files use three different year formats: 2024-26 (non-standard shorthand), 2024-25, and 2026 (standalone). The spdx-license-header rule requires the current year. New files in this PR use 2024-26 (tasks/__init__.py, tasks/base.py, tasks/generic.py, tasks/rag_answer.py, tasks/summarize.py) while test_generation_tasks.py uses 2026. The standard form in the rest of the repo is the four-digit year; the abbreviated 2024-26 is not conventional. All new files should use the same format as the existing codebase.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/models/llm/tasks/__init__.py
Line: 1-3

Comment:
**SPDX copyright year format inconsistency across new files**

The `tasks/` package and operator files use three different year formats: `2024-26` (non-standard shorthand), `2024-25`, and `2026` (standalone). The `spdx-license-header` rule requires the current year. New files in this PR use `2024-26` (`tasks/__init__.py`, `tasks/base.py`, `tasks/generic.py`, `tasks/rag_answer.py`, `tasks/summarize.py`) while `test_generation_tasks.py` uses `2026`. The standard form in the rest of the repo is the four-digit year; the abbreviated `2024-26` is not conventional. All new files should use the same format as the existing codebase.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@KyleZheng1284 KyleZheng1284 marked this pull request as draft June 30, 2026 17:26
@KyleZheng1284 KyleZheng1284 changed the title LLM Based Task Operator LLM Based Task Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant