Adds Agentic Retrieval into harness#2267
Conversation
Greptile SummaryThis PR adds opt-in agentic (ReAct) retrieval evaluation to the harness, routing BEIR and audio-recall benchmarks through a graph-backed LLM agent when
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/query/agentic_options.py | New shared validation helper module for agentic options; SPDX copyright year shows 2024-25 but the file is new in 2026. |
| nemo_retriever/src/nemo_retriever/harness/beir_runner.py | Splits dense and agentic retrieval paths; introduces an assert guard for the query_plan is not None invariant that would be silently dropped in optimized builds. |
| nemo_retriever/src/nemo_retriever/query/agentic.py | Adds run_agentic_beir_evaluation, run_agentic_audio_recall_evaluation, and agentic_beir_retrieve; validation consolidated into AgenticRetrievalConfig.__post_init__ via shared helpers. |
| nemo_retriever/src/nemo_retriever/query/workflow.py | Extracts build_agentic_config so both the single-query CLI and batch harness BEIR path share one config-derivation point; clean refactor. |
| nemo_retriever/src/nemo_retriever/cli/pipeline/main.py | Adds nine agentic Typer options, _run_agentic_evaluation helper, and validation block; service-mode guard is correctly registered. |
| nemo_retriever/tests/test_harness_agentic_eval.py | New test file covering all three harness integration points; patches at the correct seams and verifies both happy-path wiring and structured failure on invalid config. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[harness run_benchmark] --> B[resolve_query_plan]
A --> C[build_query_request]
B --> D[run_beir_queries]
C --> D
D --> E{query_request.agentic.enabled?}
E -->|Yes| F[_agentic_retrieve]
E -->|No| G[_dense_retrieve]
F --> F1[build_agentic_config]
F1 --> F2[agentic_beir_retrieve]
F2 --> F3[AgenticRetriever.retrieve]
F3 --> F4[_agentic_result_to_ranked_doc_ids]
F4 --> H
G --> G1[query_plan.create_retriever]
G1 --> G2[per-query retriever loop]
G2 --> G3[build_beir_run_from_hits]
G3 --> H
H[build_beir_run_from_ranked_doc_ids] --> I[compute_beir_metrics]
I --> J[write artifacts: beir_metrics.json / beir_run.trec / query_results.jsonl]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A[harness run_benchmark] --> B[resolve_query_plan]
A --> C[build_query_request]
B --> D[run_beir_queries]
C --> D
D --> E{query_request.agentic.enabled?}
E -->|Yes| F[_agentic_retrieve]
E -->|No| G[_dense_retrieve]
F --> F1[build_agentic_config]
F1 --> F2[agentic_beir_retrieve]
F2 --> F3[AgenticRetriever.retrieve]
F3 --> F4[_agentic_result_to_ranked_doc_ids]
F4 --> H
G --> G1[query_plan.create_retriever]
G1 --> G2[per-query retriever loop]
G2 --> G3[build_beir_run_from_hits]
G3 --> H
H[build_beir_run_from_ranked_doc_ids] --> I[compute_beir_metrics]
I --> J[write artifacts: beir_metrics.json / beir_run.trec / query_results.jsonl]
Reviews (8): Last reviewed commit: "Add agentic (ReAct) retrieval and agenti..." | Re-trigger Greptile
PR #2270 (harness revamp) replaced the subprocess-based harness/run.py and deleted its tests, while this branch (#2267) had added agentic (ReAct) BEIR evaluation to that old harness. Re-port it onto the new in-process harness: - Agentic options now flow via the benchmark spec query section (query.agentic*, registered in QUERY_OVERRIDE_PATHS) into QueryRequest.agentic, replacing the deleted CLI-flag/HarnessConfig path. - run_beir_queries forks only at retrieval: shared dataset load + scoring + artifact writing; _dense_retrieve (per-query loop) vs _agentic_retrieve (one concurrent ReAct batch), each returning a uniform (latencies, run). - Factor build_agentic_config/build_agentic_retriever (workflow.py) and agentic_beir_retrieve (agentic.py), shared by the CLI single-query path and the harness batch path. - Strip the now-dead HarnessConfig.agentic_* fields (the new run path no longer uses HarnessConfig). - Add tests/test_harness_agentic_eval.py; fix docs/cli/benchmarking.md override examples to the query.agentic* namespace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Mahika Wason <mwason@nvidia.com>
9b5e999 to
31b7f7d
Compare
Adds an opt-in agentic retrieval mode (ReAct loop -> RRF fusion -> source-priority selection, reusing the existing Retriever/LanceDB; the standard dense path is unchanged) and wires it into the revamped, in-process harness as a BEIR evaluation strategy. - Agentic query options flow via the benchmark spec query section (query.agentic*, registered in QUERY_OVERRIDE_PATHS) into QueryRequest.agentic. - run_beir_queries forks only at retrieval: shared dataset load + scoring + artifact writing; dense per-query loop vs one concurrent agentic ReAct batch, each returning a uniform (latencies, run). - build_agentic_config/build_agentic_retriever (query/workflow.py) and agentic_beir_retrieve (query/agentic.py) centralize agentic config derivation, shared by the CLI single-query path and the harness batch path. - Tests in tests/test_harness_agentic_eval.py; docs in docs/cli/benchmarking.md (query.agentic* override examples). Signed-off-by: Mahika Wason <mwason@nvidia.com>
ed2c597 to
0f5011c
Compare
| Minimal BEIR override example: | ||
|
|
||
| ```bash | ||
| retriever harness run --dataset jp20 --preset single_gpu \ |
There was a problem hiding this comment.
we might not have presets still
There was a problem hiding this comment.
this is the entry point to: "retriever pipeline run /files"
we are deprecating it this release so right now it has to be backwards compatible, but we are cutting off the oxygen to it slowly.
you can see in the actual code, how a lot of it is just invoking the new cli tools retriever ingest and retriever query in a single file
There was a problem hiding this comment.
entry point for harness related things is all in the harness folder nemo_retriever/harness
Summary
Adds opt-in agentic retrieval evaluation to harness runs.
agentic: trueharness config support and forwards it through the harness execution path.requested metric depth, runtime bounds, and endpoint-aware temperature limits.
Validation
132 passedoutput.
Follow-ups
terminology.