Add low-latency raw memory search by strongkeep-debug · Pull Request #173 · XortexAI/XMem

strongkeep-debug · 2026-05-11T18:21:08Z

Addresses #163.

This PR turns memory search into a true low-latency path. Raw search now goes through RetrievalPipeline.search_raw and returns ranked profile, temporal, summary, snippet, and code annotation hits without retrieval-plan tool selection. answer=true synthesizes from those already-fetched hits when a caller wants a generated answer, and the root /search alias is wired for clients that need the shorter path.

Area	Change	Evidence
Raw retrieval	Selected domains are searched directly and ranked by score, including the code annotation domain requested by the issue.	`test_raw_search_returns_ranked_hits_without_tool_selection` confirms no tool-selection call is made and verifies a code hit keeps file and symbol metadata.
API contract	`/v1/memory/search` accepts `code` in the domain list and includes it in the default raw search domain set.	`test_memory_search_route_accepts_code_domain` covers the request validator and serialized response shape.
Optional answer	`answer=true` synthesizes from collected raw hits without doing agentic retrieval planning first.	`test_root_search_alias_can_synthesize_answer` covers the alias and answer mode.
Caching and latency	Profile catalogs and retrieval plans are cached for the agentic path, and bounded p50/p95/p99 latency snapshots are recorded for raw, answer, and agentic modes.	`test_retrieval_pipeline_caches_catalog_and_retrieval_plan` covers cache reuse.
Robustness	Raw search now normalizes missing/non-finite backend scores and keeps healthy domain results when another requested domain fails.	`test_raw_search_skips_failed_domains_and_normalizes_scores` and the API route regression cover both pipeline and serialization behavior.

Validation was run locally:

.\.venv\Scripts\python.exe -m pytest -q
50 passed

.\.venv\Scripts\python.exe -m pytest tests\integration\test_retrieval_pipeline.py tests\api\test_memory_search_routes.py -q
12 passed

.\.venv\Scripts\python.exe -m ruff check src\pipelines\retrieval.py src\api\routes\memory.py tests\integration\test_retrieval_pipeline.py tests\api\test_memory_search_routes.py
All checks passed!

git diff --check
passed

gemini-code-assist

Code Review

This pull request introduces a low-latency raw search endpoint and enhances the existing search functionality with optional answer synthesis and latency tracking. Key changes include the implementation of search_raw and answer_from_sources in the retrieval pipeline, the addition of TTL-based caching for profile catalogs and retrieval plans, and the inclusion of detailed latency metrics in search responses. Feedback focuses on optimizing performance by parallelizing domain searches, ensuring the profile catalog retrieval is asynchronous to avoid blocking the event loop, and managing cache memory usage through bounded collections and hashed keys.

gemini-code-assist · 2026-05-11T18:26:41Z

+        if "profile" in domain_set:
+            results.extend(await self._search_profile_raw(query, user_id, top_k))
+        if "temporal" in domain_set:
+            results.extend(await self._search_temporal(query, user_id, top_k))
+        if "summary" in domain_set:
+            results.extend(await self._search_summary(query, user_id, top_k))
+        if "snippet" in domain_set:
+            results.extend(await self._search_snippet(query, user_id, top_k))


The current implementation of search_raw executes searches across different domains sequentially. To achieve true low-latency as intended by this PR, these searches should be executed in parallel using asyncio.gather.

Suggested change

if "profile" in domain_set:

results.extend(await self._search_profile_raw(query, user_id, top_k))

if "temporal" in domain_set:

results.extend(await self._search_temporal(query, user_id, top_k))

if "summary" in domain_set:

results.extend(await self._search_summary(query, user_id, top_k))

if "snippet" in domain_set:

results.extend(await self._search_snippet(query, user_id, top_k))

domain_set = set(domains)

tasks = []

if "profile" in domain_set:

tasks.append(self._search_profile_raw(query, user_id, top_k))

if "temporal" in domain_set:

tasks.append(self._search_temporal(query, user_id, top_k))

if "summary" in domain_set:

tasks.append(self._search_summary(query, user_id, top_k))

if "snippet" in domain_set:

tasks.append(self._search_snippet(query, user_id, top_k))

task_results = await asyncio.gather(*tasks)

results: List[SourceRecord] = [item for sublist in task_results for item in sublist]

gemini-code-assist · 2026-05-11T18:26:41Z

+    def _get_profile_catalog(self, user_id: str):
+        cached = self._profile_catalog_cache.get(user_id)
+        now = time.monotonic()
+        if cached and cached[0] > now:
+            return cached[1], cached[2]
+
+        catalog, results = self._fetch_profile_catalog(user_id)
+        self._profile_catalog_cache[user_id] = (
+            now + _CACHE_TTL_SECONDS,
+            catalog,
+            results,
+        )
+        return catalog, results


The _get_profile_catalog method is synchronous and performs network I/O via _fetch_profile_catalog (which calls vector_store.search_by_metadata). Calling this from an async context like run or search_raw will block the entire event loop, significantly impacting performance and defeating the purpose of a low-latency path. This should be made asynchronous.

Suggested change

def _get_profile_catalog(self, user_id: str):

cached = self._profile_catalog_cache.get(user_id)

now = time.monotonic()

if cached and cached[0] > now:

return cached[1], cached[2]

catalog, results = self._fetch_profile_catalog(user_id)

self._profile_catalog_cache[user_id] = (

now + _CACHE_TTL_SECONDS,

catalog,

results,

)

return catalog, results

async def _get_profile_catalog(self, user_id: str):

cached = self._profile_catalog_cache.get(user_id)

now = time.monotonic()

if cached and cached[0] > now:

return cached[1], cached[2]

# Assuming search_by_metadata is made async or wrapped in an executor

catalog, results = await self._fetch_profile_catalog(user_id)

self._profile_catalog_cache[user_id] = (

now + _CACHE_TTL_SECONDS,

catalog,

results,

)

return catalog, results

gemini-code-assist · 2026-05-11T18:26:41Z

+        self._profile_catalog_cache: Dict[str, tuple[float, List[Dict[str, str]], List[Any]]] = {}
+        self._retrieval_plan_cache: Dict[tuple[str, str, int, str], tuple[float, AIMessage]] = {}
+        self._latency_samples: Dict[str, List[float]] = {}


The caches _profile_catalog_cache and _retrieval_plan_cache are unbounded dictionaries that only expire entries on access. This can lead to a memory leak as entries for users who do not return will persist indefinitely. Consider using a cache with a maximum size and an eviction policy (e.g., cachetools.TTLCache).

gemini-code-assist · 2026-05-11T18:26:41Z

        ]

-        ai_response: AIMessage = await self.model_with_tools.ainvoke(messages)
+        plan_key = (user_id, query.strip(), top_k, catalog_text)


Using the entire catalog_text as part of the cache key for _retrieval_plan_cache can be memory-intensive if the catalog is large. Consider using a hash of the catalog_text instead.

import hashlib catalog_hash = hashlib.sha256(catalog_text.encode()).hexdigest() plan_key = (user_id, query.strip(), top_k, catalog_hash)

strongkeep-debug · 2026-05-11T20:27:37Z

Follow-up after the latest push: 5548b63 now includes the requested code domain in the raw search path, preserves repository/file/symbol/type/severity metadata, and hardens raw search against missing or non-finite backend scores. It also keeps healthy domain results if another requested domain fails.

The earlier performance review items are addressed in the same branch: raw domain searches run concurrently, profile catalog lookup is async, and both caches are bounded with hashed plan keys. Current local verification is 50 passed for the full suite, 12 passed for the targeted retrieval/API tests, touched-file Ruff clean, and git diff --check clean. The current PR label check is green.

Update: GitHub Actions Test Suite is now green as well: Unit, API, and Integration Tests and End-to-End Tests both passed on the current head, and the PR remains mergeable with a clean merge state.

ved015 · 2026-05-13T10:55:55Z

@strongkeep-debug thank you for your contribution pls review the gemini suggestions and resolve them pls make sure to also add comment on the suggestions :)

Add low-latency memory search path

e0f8b9f

strongkeep-debug requested review from ishaanxgupta and ved015 as code owners May 11, 2026 18:21

github-actions Bot added tests api pipelines labels May 11, 2026

gemini-code-assist Bot reviewed May 11, 2026

View reviewed changes

michael-schvarcz mentioned this pull request May 11, 2026

Add raw search fast path with code domain support #174

Open

strongkeep-debug added 2 commits May 11, 2026 11:51

Tighten low-latency retrieval path

5dfca95

Include code annotations in raw search

c06c5fe

strongkeep-debug mentioned this pull request May 12, 2026

Add low-latency raw search path separate from agentic answer synthesis #163

Open

Harden raw search score handling

5548b63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add low-latency raw memory search#173

Add low-latency raw memory search#173
strongkeep-debug wants to merge 4 commits into
XortexAI:mainfrom
strongkeep-debug:codex/163-search-fast-path

strongkeep-debug commented May 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

gemini-code-assist Bot May 11, 2026

Uh oh!

strongkeep-debug commented May 11, 2026 •

edited

Loading

Uh oh!

ved015 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

strongkeep-debug commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

strongkeep-debug commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ved015 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

strongkeep-debug commented May 11, 2026 •

edited

Loading

strongkeep-debug commented May 11, 2026 •

edited

Loading