Skip to content

(skill) table-structure ingest, answer-grounding, relaxed query budget#2285

Merged
edknv merged 4 commits into
NVIDIA:mainfrom
edknv:edwardk/skill-0630
Jul 1, 2026
Merged

(skill) table-structure ingest, answer-grounding, relaxed query budget#2285
edknv merged 4 commits into
NVIDIA:mainfrom
edknv:edwardk/skill-0630

Conversation

@edknv

@edknv edknv commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Description

This skill lifts answer corrrectness while still keeping cost at 30% lower than baseline.

  • Table-structure at ingest: In some documents, especially finance documents with structured tables, cells were being flattened into space-separated text with no row/column association, so the agent was retrieving the right page but was reading the wrong cell. Enable table-structure emits real markdown tables with headers, so a figure stays bound to its row and column.
  • Answer-grounding: Even with the right value present, the agent sometimes emited a number that appeared nowhere in the evidence. The skill now requires every figure to be quoted froma specific source line, prefers a prose statement over a table cell when both exist, and forbids inventing or computing values, which eliminates facricated answers on extractive questions.
  • Relaxed query budget: The previous skill's efficiency caps (2-3 queries) under-queried and often never surfaced the consolidated table among many near-duplicates. Lifting the cap and encouring re-querying to disambiguate lifts the answer correctness significantly. It increases the cost somewhat, but it's still lower than the baseline (no retriever, no skill) and bare retriever (no skill).

Before -> After (n=3, vidore finance 100 sample dataset)

Metric Previous Current Δ
answer_correctness 0.569 0.668 +0.099
page recall@5 0.426 0.527 +0.101
page recall@10 0.457 0.601 +0.144
doc recall@5 0.975 0.965 −0.010
retriever queries 1.58 4.19 +2.60
$/pass $14.47 $22.07 +$7.60

baseline vs retriever (no skill) vs retriever + skill

Mode AC page@5 page@10 queries $/pass
baseline 0.677 0.448 0.462 0.00 $33.05
retriever (no skill) 0.656 0.503 0.550 4.04 $29.14
skill 0.668 0.527 0.601 4.19 $22.07

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@edknv edknv requested review from a team as code owners June 30, 2026 17:14
@edknv edknv requested review from jdye64 and randerzander June 30, 2026 17:14
@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR improves the nemo-retriever skill's answer correctness (+0.099 AC) by introducing table-structure ingest flags, answer-grounding rules, and a relaxed query budget — while keeping cost ~33% below the no-retriever baseline.

  • Table-structure ingest: Adds --extract-tables --table-output-format markdown to the default PDF ingest command so financial tables are indexed with full row/column headers instead of flattened pseudo_markdown, preventing the agent from reading the wrong cell.
  • Answer-grounding: Extends the query-turn instructions to require every stated figure to be quoted from a specific evidence line, prefer prose over table cells when both are present, and forbid rounding or computing values not literally in the evidence.
  • Relaxed query budget: Replaces the previous "at most 2 Bash calls" hard cap with a "query until sure" model (typically 4–8 calls), with a lexical sparse pass per named term run upfront alongside the semantic pass.

Confidence Score: 4/5

Safe to merge with one targeted fix: the "When the answer isn't in the first result" section in query.md still carries the old conservative re-query constraint and will silently override the new proactive lexical-pass strategy for any agent that reads the fallback file.

The core changes in SKILL.md and setup.md are clear and well-motivated. The one concern is that query.md was retitled to "fallback detail" but its "When the answer isn't in the first result" section was not updated — it still tells the agent to re-query "only when coverage.thin_spots flags a miss," directly contradicting the new "one lexical pass per named term, upfront" strategy. An agent consulting the fallback file on a thin-results turn would receive a behavioral prohibition that reverts it to the under-querying pattern this PR is fixing.

skills/nemo-retriever/references/cli/query.md — the "When the answer isn't in the first result" section needs to be updated to align with the new lexical-pass-first approach.

Important Files Changed

Filename Overview
skills/nemo-retriever/SKILL.md Adds two-pass query workflow (semantic hybrid + lexical sparse per named term), answer-grounding rules, and relaxed call budget (4–8 calls typical, hard-stop only at 15+). Core logic is clear and well-motivated.
skills/nemo-retriever/references/cli/query.md Retitled to "fallback detail" and fixes the evidence[0] guard, but the "When the answer isn't in the first result" section still carries the old "re-query ONLY when thin_spots flags a miss" constraint, which directly contradicts the new proactive lexical-pass strategy in SKILL.md. Also retains a stale ./output.json reference.
skills/nemo-retriever/references/setup.md Adds --extract-tables --table-output-format markdown to the default PDF ingest command with clear explanation of why it matters; setup section is clean.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Agent
    participant Retriever

    Note over Agent,Retriever: Query Turn (new two-pass flow)
    Agent->>Retriever: semantic pass — full question, --hybrid --top-k 10
    Retriever-->>Agent: evidence JSON (dense + BM25 fusion)
    loop Per named term / figure
        Agent->>Retriever: lexical pass — exact term, --retrieval-mode sparse --top-k 10
        Retriever-->>Agent: evidence JSON (BM25 only)
    end
    opt Ambiguous candidates (e.g. multiple "Level 3" tables)
        Agent->>Retriever: disambiguating query — "consolidated total Level 3 assets"
        Retriever-->>Agent: evidence JSON
    end
    Note over Agent: Ground every figure to a source line
    Note over Agent: Copy verbatim, prefer prose over table cell
    Agent->>Agent: Compose and emit answer (STOP)
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Agent
    participant Retriever

    Note over Agent,Retriever: Query Turn (new two-pass flow)
    Agent->>Retriever: semantic pass — full question, --hybrid --top-k 10
    Retriever-->>Agent: evidence JSON (dense + BM25 fusion)
    loop Per named term / figure
        Agent->>Retriever: lexical pass — exact term, --retrieval-mode sparse --top-k 10
        Retriever-->>Agent: evidence JSON (BM25 only)
    end
    opt Ambiguous candidates (e.g. multiple "Level 3" tables)
        Agent->>Retriever: disambiguating query — "consolidated total Level 3 assets"
        Retriever-->>Agent: evidence JSON
    end
    Note over Agent: Ground every figure to a source line
    Note over Agent: Copy verbatim, prefer prose over table cell
    Agent->>Agent: Compose and emit answer (STOP)
Loading

Comments Outside Diff (1)

  1. skills/nemo-retriever/references/cli/query.md, line 27-30 (link)

    P1 Stale re-query guidance contradicts SKILL.md's new "query until sure" approach

    query.md still instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or when coverage.thin_spots flags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless of thin_spots, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: skills/nemo-retriever/references/cli/query.md
    Line: 27-30
    
    Comment:
    **Stale re-query guidance contradicts SKILL.md's new "query until sure" approach**
    
    `query.md` still instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or when `coverage.thin_spots` flags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless of `thin_spots`, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
skills/nemo-retriever/references/cli/query.md:27-30
**Stale re-query guidance contradicts SKILL.md's new "query until sure" approach**

`query.md` still instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or when `coverage.thin_spots` flags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless of `thin_spots`, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers.

Reviews (4): Last reviewed commit: "address comments" | Re-trigger Greptile

Comment thread skills/nemo-retriever/SKILL.md Outdated
Comment thread skills/nemo-retriever/SKILL.md Outdated
Comment thread skills/nemo-retriever/SKILL.md Outdated
Comment thread skills/nemo-retriever/SKILL.md Outdated
Comment thread skills/nemo-retriever/references/setup.md
@edknv edknv requested a review from mahikaw June 30, 2026 18:01

@mahikaw mahikaw left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, added one minor suggestion to further disambiguate the query flow.

Comment thread skills/nemo-retriever/SKILL.md
@edknv edknv merged commit a12105b into NVIDIA:main Jul 1, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants