Add raw search fast path with code domain support#174
Conversation
There was a problem hiding this comment.
Code Review
This PR expands the memory search capabilities by adding a "code" domain for repository-level searches and introducing a fast raw_search path that avoids LLM planning. Key architectural additions include a latency tracking system for performance monitoring and caching mechanisms for retrieval plans and profile catalogs. The review feedback highlights several critical improvements: addressing a potential TypeError when rounding null scores, implementing bounded caches to prevent memory leaks, maintaining proper encapsulation of private methods, and enhancing the robustness of concurrent tool execution through better error handling.
| SourceRecord( | ||
| domain=s.domain, | ||
| content=s.content, | ||
| score=round(s.score, 3), |
There was a problem hiding this comment.
The score attribute of a SourceRecord can be None (as handled in the sorting logic on line 728). Attempting to call round(None, 3) will raise a TypeError. It is safer to provide a default value of 0.0 if the score is missing.
| score=round(s.score, 3), | |
| score=round(s.score, 3) if s.score is not None else 0.0, |
| self._profile_catalog_cache: Dict[ | ||
| str, Tuple[float, List[Dict[str, str]], List[Any]] | ||
| ] = {} | ||
| self._retrieval_plan_cache: Dict[str, List[Dict[str, Any]]] = {} |
There was a problem hiding this comment.
The _profile_catalog_cache and _retrieval_plan_cache are implemented as unbounded dictionaries. In a long-running API process, these will grow indefinitely as different users and queries are processed, potentially leading to memory exhaustion. It is recommended to use a cache with a maximum size or an eviction policy (e.g., cachetools.TTLCache or a simple LRU implementation).
| data = SearchResponse(results=all_results, total=len(all_results)) | ||
| answer = "" | ||
| if req.answer: | ||
| answer = await pipeline._answer_from_sources(query=req.query, sources=records) |
| if include_answer: | ||
| answer = await self._answer_from_sources(query=query, sources=sources) | ||
|
|
||
| confidence = min(1.0, len(sources) * 0.15) if sources else 0.1 |
There was a problem hiding this comment.
| code_results = await asyncio.gather( | ||
| code_pipeline._execute_tool( | ||
| tool_name="search_symbols", | ||
| tool_args={"query": req.query, "repo": req.repo}, | ||
| repo=req.repo, | ||
| top_k=req.top_k, | ||
| user_id=user_id, | ||
| ), | ||
| code_pipeline._execute_tool( | ||
| tool_name="search_files", | ||
| tool_args={"query": req.query, "repo": req.repo}, | ||
| repo=req.repo, | ||
| top_k=req.top_k, | ||
| user_id=user_id, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
If any of the code retrieval tool calls fail, the entire search request will return a 500 error. For a more robust search experience, consider using asyncio.gather(..., return_exceptions=True) and filtering out failed results, allowing the user to see memory hits even if the code domain search is temporarily unavailable.
8aa3426 to
bb08f07
Compare
|
Updated this PR to address the review points:
Validation: |
There was a problem hiding this comment.
I rechecked the Gemini items and they look addressed. One remaining edge case I noticed is in src/pipelines/retrieval.py::_format_tool_results: API serialization now normalizes None scores, but answer=true can still fail before serialization because answer_from_sources() formats sources via _format_tool_results(), which does rec.score > 0 / {rec.score:.2f}.
If any backend returns SourceRecord(score=None), raw search without answer works, but raw search with answer=true can raise there. Can we normalize the score inside _format_tool_results() too, e.g. score = rec.score or 0.0, before formatting?
After that, this looks good to me.
Ankit-Kotnala
left a comment
There was a problem hiding this comment.
Overall this looks good and the Gemini review items appear addressed.
I left one blocking edge-case comment around _format_tool_results(): answer=true can still fail if any backend returns SourceRecord(score=None), because the answer formatting path uses rec.score > 0 / {rec.score:.2f} before API serialization normalizes scores.
Once that score normalization is handled in _format_tool_results(), I’m happy with this PR.
|
Fixed the remaining missing-score edge case in Added regression coverage for Validation: .venv/bin/python -m pytest tests/integration/test_retrieval_pipeline.py tests/api/test_dependencies_and_routes.py -q
# 10 passed
git diff --check |
Addresses #163.
This adds a low-latency raw search path that returns ranked memory hits without the agentic LLM planning/synthesis round, while keeping answer generation optional through
answer=true.What changed:
RetrievalPipeline.raw_searchfor direct profile, temporal, summary, and snippet retrieval without tool selection./v1/memory/searchto optionally include code hits by querying the existing code retrieval pipeline's native symbol/file search tools.answer=truesynthesis after raw hits are collected, without a second planning step.Compared with the currently open #173, this implementation also wires the requested
codedomain into the raw search endpoint.Validation: