(skill) table-structure ingest, answer-grounding, relaxed query budget by edknv · Pull Request #2285 · NVIDIA/NeMo-Retriever

edknv · 2026-06-30T17:14:55Z

Description

This skill lifts answer corrrectness while still keeping cost at 30% lower than baseline.

Table-structure at ingest: In some documents, especially finance documents with structured tables, cells were being flattened into space-separated text with no row/column association, so the agent was retrieving the right page but was reading the wrong cell. Enable table-structure emits real markdown tables with headers, so a figure stays bound to its row and column.
Answer-grounding: Even with the right value present, the agent sometimes emited a number that appeared nowhere in the evidence. The skill now requires every figure to be quoted froma specific source line, prefers a prose statement over a table cell when both exist, and forbids inventing or computing values, which eliminates facricated answers on extractive questions.
Relaxed query budget: The previous skill's efficiency caps (2-3 queries) under-queried and often never surfaced the consolidated table among many near-duplicates. Lifting the cap and encouring re-querying to disambiguate lifts the answer correctness significantly. It increases the cost somewhat, but it's still lower than the baseline (no retriever, no skill) and bare retriever (no skill).

Before -> After (n=3, vidore finance 100 sample dataset)

Metric	Previous	Current	Δ
answer_correctness	0.569	0.668	+0.099
page recall@5	0.426	0.527	+0.101
page recall@10	0.457	0.601	+0.144
doc recall@5	0.975	0.965	−0.010
retriever queries	1.58	4.19	+2.60
$/pass	$14.47	$22.07	+$7.60

baseline vs retriever (no skill) vs retriever + skill

Mode	AC	page@5	page@10	queries	$/pass
baseline	0.677	0.448	0.462	0.00	$33.05
retriever (no skill)	0.656	0.503	0.550	4.04	$29.14
skill	0.668	0.527	0.601	4.19	$22.07

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

greptile-apps · 2026-06-30T17:19:06Z

Greptile Summary

This PR improves the nemo-retriever skill's answer correctness (+0.099 AC) by introducing table-structure ingest flags, answer-grounding rules, and a relaxed query budget — while keeping cost ~33% below the no-retriever baseline.

Table-structure ingest: Adds --extract-tables --table-output-format markdown to the default PDF ingest command so financial tables are indexed with full row/column headers instead of flattened pseudo_markdown, preventing the agent from reading the wrong cell.
Answer-grounding: Extends the query-turn instructions to require every stated figure to be quoted from a specific evidence line, prefer prose over table cells when both are present, and forbid rounding or computing values not literally in the evidence.
Relaxed query budget: Replaces the previous "at most 2 Bash calls" hard cap with a "query until sure" model (typically 4–8 calls), with a lexical sparse pass per named term run upfront alongside the semantic pass.

Confidence Score: 4/5

Safe to merge with one targeted fix: the "When the answer isn't in the first result" section in query.md still carries the old conservative re-query constraint and will silently override the new proactive lexical-pass strategy for any agent that reads the fallback file.

The core changes in SKILL.md and setup.md are clear and well-motivated. The one concern is that query.md was retitled to "fallback detail" but its "When the answer isn't in the first result" section was not updated — it still tells the agent to re-query "only when coverage.thin_spots flags a miss," directly contradicting the new "one lexical pass per named term, upfront" strategy. An agent consulting the fallback file on a thin-results turn would receive a behavioral prohibition that reverts it to the under-querying pattern this PR is fixing.

skills/nemo-retriever/references/cli/query.md — the "When the answer isn't in the first result" section needs to be updated to align with the new lexical-pass-first approach.

Important Files Changed

Filename	Overview
skills/nemo-retriever/SKILL.md	Adds two-pass query workflow (semantic hybrid + lexical sparse per named term), answer-grounding rules, and relaxed call budget (4–8 calls typical, hard-stop only at 15+). Core logic is clear and well-motivated.
skills/nemo-retriever/references/cli/query.md	Retitled to "fallback detail" and fixes the evidence[0] guard, but the "When the answer isn't in the first result" section still carries the old "re-query ONLY when thin_spots flags a miss" constraint, which directly contradicts the new proactive lexical-pass strategy in SKILL.md. Also retains a stale ./output.json reference.
skills/nemo-retriever/references/setup.md	Adds --extract-tables --table-output-format markdown to the default PDF ingest command with clear explanation of why it matters; setup section is clean.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent
    participant Retriever

    Note over Agent,Retriever: Query Turn (new two-pass flow)
    Agent->>Retriever: semantic pass — full question, --hybrid --top-k 10
    Retriever-->>Agent: evidence JSON (dense + BM25 fusion)
    loop Per named term / figure
        Agent->>Retriever: lexical pass — exact term, --retrieval-mode sparse --top-k 10
        Retriever-->>Agent: evidence JSON (BM25 only)
    end
    opt Ambiguous candidates (e.g. multiple "Level 3" tables)
        Agent->>Retriever: disambiguating query — "consolidated total Level 3 assets"
        Retriever-->>Agent: evidence JSON
    end
    Note over Agent: Ground every figure to a source line
    Note over Agent: Copy verbatim, prefer prose over table cell
    Agent->>Agent: Compose and emit answer (STOP)

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent
    participant Retriever

    Note over Agent,Retriever: Query Turn (new two-pass flow)
    Agent->>Retriever: semantic pass — full question, --hybrid --top-k 10
    Retriever-->>Agent: evidence JSON (dense + BM25 fusion)
    loop Per named term / figure
        Agent->>Retriever: lexical pass — exact term, --retrieval-mode sparse --top-k 10
        Retriever-->>Agent: evidence JSON (BM25 only)
    end
    opt Ambiguous candidates (e.g. multiple "Level 3" tables)
        Agent->>Retriever: disambiguating query — "consolidated total Level 3 assets"
        Retriever-->>Agent: evidence JSON
    end
    Note over Agent: Ground every figure to a source line
    Note over Agent: Copy verbatim, prefer prose over table cell
    Agent->>Agent: Compose and emit answer (STOP)

Comments Outside Diff (1)

skills/nemo-retriever/references/cli/query.md, line 27-30 (link)

Stale re-query guidance contradicts SKILL.md's new "query until sure" approach

query.md still instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or when coverage.thin_spots flags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless of thin_spots, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers.

Prompt To Fix With AI

This is a comment left during a code review.
Path: skills/nemo-retriever/references/cli/query.md
Line: 27-30

Comment:
**Stale re-query guidance contradicts SKILL.md's new "query until sure" approach**

`query.md` still instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or when `coverage.thin_spots` flags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless of `thin_spots`, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
skills/nemo-retriever/references/cli/query.md:27-30
**Stale re-query guidance contradicts SKILL.md's new "query until sure" approach**

`query.md` still instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or when `coverage.thin_spots` flags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless of `thin_spots`, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers.

_{Reviews (4): Last reviewed commit: "address comments" | Re-trigger Greptile}

mahikaw

lgtm, added one minor suggestion to further disambiguate the query flow.

(skill) table-structure ingest, answer-grounding, relaxed query budget

a0b887d

edknv requested review from a team as code owners June 30, 2026 17:14

edknv requested review from jdye64 and randerzander June 30, 2026 17:14

greptile-apps Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread skills/nemo-retriever/SKILL.md Outdated

Comment thread skills/nemo-retriever/SKILL.md Outdated

Comment thread skills/nemo-retriever/SKILL.md Outdated

Comment thread skills/nemo-retriever/SKILL.md Outdated

Comment thread skills/nemo-retriever/references/setup.md

edknv added 2 commits June 30, 2026 10:42

remove artifacts

539920c

IndexError when lexical pass returns no results

e1bc997

edknv requested a review from mahikaw June 30, 2026 18:01

mahikaw approved these changes Jun 30, 2026

View reviewed changes

Comment thread skills/nemo-retriever/SKILL.md

address comments

d31877d

edknv merged commit a12105b into NVIDIA:main Jul 1, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(skill) table-structure ingest, answer-grounding, relaxed query budget#2285

(skill) table-structure ingest, answer-grounding, relaxed query budget#2285
edknv merged 4 commits into
NVIDIA:mainfrom
edknv:edwardk/skill-0630

edknv commented Jun 30, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Jun 30, 2026 •

edited

Loading

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mahikaw left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

edknv commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Before -> After (n=3, vidore finance 100 sample dataset)

baseline vs retriever (no skill) vs retriever + skill

Checklist

Uh oh!

greptile-apps Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mahikaw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

edknv commented Jun 30, 2026 •

edited

Loading

greptile-apps Bot commented Jun 30, 2026 •

edited

Loading