(skill) table-structure ingest, answer-grounding, relaxed query budget#2285
Conversation
Greptile SummaryThis PR improves the nemo-retriever skill's answer correctness (+0.099 AC) by introducing table-structure ingest flags, answer-grounding rules, and a relaxed query budget — while keeping cost ~33% below the no-retriever baseline.
|
| Filename | Overview |
|---|---|
| skills/nemo-retriever/SKILL.md | Adds two-pass query workflow (semantic hybrid + lexical sparse per named term), answer-grounding rules, and relaxed call budget (4–8 calls typical, hard-stop only at 15+). Core logic is clear and well-motivated. |
| skills/nemo-retriever/references/cli/query.md | Retitled to "fallback detail" and fixes the evidence[0] guard, but the "When the answer isn't in the first result" section still carries the old "re-query ONLY when thin_spots flags a miss" constraint, which directly contradicts the new proactive lexical-pass strategy in SKILL.md. Also retains a stale ./output.json reference. |
| skills/nemo-retriever/references/setup.md | Adds --extract-tables --table-output-format markdown to the default PDF ingest command with clear explanation of why it matters; setup section is clean. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Agent
participant Retriever
Note over Agent,Retriever: Query Turn (new two-pass flow)
Agent->>Retriever: semantic pass — full question, --hybrid --top-k 10
Retriever-->>Agent: evidence JSON (dense + BM25 fusion)
loop Per named term / figure
Agent->>Retriever: lexical pass — exact term, --retrieval-mode sparse --top-k 10
Retriever-->>Agent: evidence JSON (BM25 only)
end
opt Ambiguous candidates (e.g. multiple "Level 3" tables)
Agent->>Retriever: disambiguating query — "consolidated total Level 3 assets"
Retriever-->>Agent: evidence JSON
end
Note over Agent: Ground every figure to a source line
Note over Agent: Copy verbatim, prefer prose over table cell
Agent->>Agent: Compose and emit answer (STOP)
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Agent
participant Retriever
Note over Agent,Retriever: Query Turn (new two-pass flow)
Agent->>Retriever: semantic pass — full question, --hybrid --top-k 10
Retriever-->>Agent: evidence JSON (dense + BM25 fusion)
loop Per named term / figure
Agent->>Retriever: lexical pass — exact term, --retrieval-mode sparse --top-k 10
Retriever-->>Agent: evidence JSON (BM25 only)
end
opt Ambiguous candidates (e.g. multiple "Level 3" tables)
Agent->>Retriever: disambiguating query — "consolidated total Level 3 assets"
Retriever-->>Agent: evidence JSON
end
Note over Agent: Ground every figure to a source line
Note over Agent: Copy verbatim, prefer prose over table cell
Agent->>Agent: Compose and emit answer (STOP)
Comments Outside Diff (1)
-
skills/nemo-retriever/references/cli/query.md, line 27-30 (link)Stale re-query guidance contradicts SKILL.md's new "query until sure" approach
query.mdstill instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or whencoverage.thin_spotsflags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless ofthin_spots, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers.Prompt To Fix With AI
This is a comment left during a code review. Path: skills/nemo-retriever/references/cli/query.md Line: 27-30 Comment: **Stale re-query guidance contradicts SKILL.md's new "query until sure" approach** `query.md` still instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or when `coverage.thin_spots` flags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless of `thin_spots`, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers. How can I resolve this? If you propose a fix, please make it concise.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
skills/nemo-retriever/references/cli/query.md:27-30
**Stale re-query guidance contradicts SKILL.md's new "query until sure" approach**
`query.md` still instructs the agent to re-query "ONLY when the top evidence doesn't yet answer … or when `coverage.thin_spots` flags a miss." That is the old conservative behavior this PR explicitly replaces. SKILL.md now says to always run a lexical sparse pass per named term upfront, regardless of `thin_spots`, and to re-query freely to disambiguate similar tables. Any agent that reads this file for fallback detail — e.g. because its first results looked thin — will receive an affirmative prohibition ("only when…") that causes it to revert to under-querying, directly undoing the correctness gain the PR delivers.
Reviews (4): Last reviewed commit: "address comments" | Re-trigger Greptile
mahikaw
left a comment
There was a problem hiding this comment.
lgtm, added one minor suggestion to further disambiguate the query flow.
Description
This skill lifts answer corrrectness while still keeping cost at 30% lower than baseline.
Before -> After (n=3, vidore finance 100 sample dataset)
baseline vs retriever (no skill) vs retriever + skill
Checklist