Skip to content

Add raw search fast path with code domain support#174

Open
michael-schvarcz wants to merge 3 commits into
XortexAI:mainfrom
michael-schvarcz:codex/raw-search-fast-path
Open

Add raw search fast path with code domain support#174
michael-schvarcz wants to merge 3 commits into
XortexAI:mainfrom
michael-schvarcz:codex/raw-search-fast-path

Conversation

@michael-schvarcz
Copy link
Copy Markdown

Addresses #163.

This adds a low-latency raw search path that returns ranked memory hits without the agentic LLM planning/synthesis round, while keeping answer generation optional through answer=true.

What changed:

  • Adds RetrievalPipeline.raw_search for direct profile, temporal, summary, and snippet retrieval without tool selection.
  • Extends /v1/memory/search to optionally include code hits by querying the existing code retrieval pipeline's native symbol/file search tools.
  • Adds answer=true synthesis after raw hits are collected, without a second planning step.
  • Caches profile catalogs and agentic retrieval plans.
  • Tracks bounded p50/p95/p99 latency snapshots by retrieval mode.
  • Adds tests proving raw search bypasses LLM tool planning, answer mode only synthesizes after hits, and cached plans are reused.

Compared with the currently open #173, this implementation also wires the requested code domain into the raw search endpoint.

Validation:

.venv/bin/python -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py -q
9 passed

git diff --check
passed

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR expands the memory search capabilities by adding a "code" domain for repository-level searches and introducing a fast raw_search path that avoids LLM planning. Key architectural additions include a latency tracking system for performance monitoring and caching mechanisms for retrieval plans and profile catalogs. The review feedback highlights several critical improvements: addressing a potential TypeError when rounding null scores, implementing bounded caches to prevent memory leaks, maintaining proper encapsulation of private methods, and enhancing the robustness of concurrent tool execution through better error handling.

Comment thread src/api/routes/memory.py Outdated
SourceRecord(
domain=s.domain,
content=s.content,
score=round(s.score, 3),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The score attribute of a SourceRecord can be None (as handled in the sorting logic on line 728). Attempting to call round(None, 3) will raise a TypeError. It is safer to provide a default value of 0.0 if the score is missing.

Suggested change
score=round(s.score, 3),
score=round(s.score, 3) if s.score is not None else 0.0,

Comment thread src/pipelines/retrieval.py Outdated
Comment on lines +172 to +175
self._profile_catalog_cache: Dict[
str, Tuple[float, List[Dict[str, str]], List[Any]]
] = {}
self._retrieval_plan_cache: Dict[str, List[Dict[str, Any]]] = {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _profile_catalog_cache and _retrieval_plan_cache are implemented as unbounded dictionaries. In a long-running API process, these will grow indefinitely as different users and queries are processed, potentially leading to memory exhaustion. It is recommended to use a cache with a maximum size or an eviction policy (e.g., cachetools.TTLCache or a simple LRU implementation).

Comment thread src/api/routes/memory.py Outdated
data = SearchResponse(results=all_results, total=len(all_results))
answer = ""
if req.answer:
answer = await pipeline._answer_from_sources(query=req.query, sources=records)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line accesses _answer_from_sources, which is a private method of the RetrievalPipeline class. To maintain proper encapsulation, this method should be made public (e.g., renamed to answer_from_sources) if it is intended to be used outside the class.

Comment thread src/pipelines/retrieval.py Outdated
if include_answer:
answer = await self._answer_from_sources(query=query, sources=sources)

confidence = min(1.0, len(sources) * 0.15) if sources else 0.1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The confidence calculation logic here (len(sources) * 0.15) is inconsistent with the logic used in the agentic run method on line 308 (len(sources) * 0.2). Consider unifying this logic or defining the multiplier as a constant to ensure consistent confidence scoring across different retrieval modes.

Comment thread src/api/routes/memory.py
Comment on lines +709 to +724
code_results = await asyncio.gather(
code_pipeline._execute_tool(
tool_name="search_symbols",
tool_args={"query": req.query, "repo": req.repo},
repo=req.repo,
top_k=req.top_k,
user_id=user_id,
),
code_pipeline._execute_tool(
tool_name="search_files",
tool_args={"query": req.query, "repo": req.repo},
repo=req.repo,
top_k=req.top_k,
user_id=user_id,
),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If any of the code retrieval tool calls fail, the entire search request will return a 500 error. For a more robust search experience, consider using asyncio.gather(..., return_exceptions=True) and filtering out failed results, allowing the user to see memory hits even if the code domain search is temporarily unavailable.

@michael-schvarcz michael-schvarcz force-pushed the codex/raw-search-fast-path branch from 8aa3426 to bb08f07 Compare May 11, 2026 18:43
@michael-schvarcz
Copy link
Copy Markdown
Author

Updated this PR to address the review points:

  • Normalizes missing source scores through a shared helper before API serialization.
  • Keeps raw memory search resilient when one requested backend fails by using asyncio.gather(..., return_exceptions=True) and returning healthy domain results.
  • The current head already uses bounded LRU-style caches, public answer_from_sources / confidence_from_sources helpers, and the shared confidence multiplier.

Validation: .venv/bin/python -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py -> 9 passed.

Comment thread src/pipelines/retrieval.py Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rechecked the Gemini items and they look addressed. One remaining edge case I noticed is in src/pipelines/retrieval.py::_format_tool_results: API serialization now normalizes None scores, but answer=true can still fail before serialization because answer_from_sources() formats sources via _format_tool_results(), which does rec.score > 0 / {rec.score:.2f}.

If any backend returns SourceRecord(score=None), raw search without answer works, but raw search with answer=true can raise there. Can we normalize the score inside _format_tool_results() too, e.g. score = rec.score or 0.0, before formatting?

After that, this looks good to me.

Copy link
Copy Markdown
Contributor

@Ankit-Kotnala Ankit-Kotnala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good and the Gemini review items appear addressed.

I left one blocking edge-case comment around _format_tool_results(): answer=true can still fail if any backend returns SourceRecord(score=None), because the answer formatting path uses rec.score > 0 / {rec.score:.2f} before API serialization normalizes scores.

Once that score normalization is handled in _format_tool_results(), I’m happy with this PR.

@michael-schvarcz
Copy link
Copy Markdown
Author

Fixed the remaining missing-score edge case in _format_tool_results() by normalizing None to 0.0 before score comparison/formatting, so answer=true no longer crashes before API serialization.

Added regression coverage for answer_from_sources() with SourceRecord(score=None).

Validation:

.venv/bin/python -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py -q
# 10 passed
git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants