The retrieval uses LLM tool selection plus storage calls plus LLM synthesis in retrieval.py. This is quality-friendly but latency-heavy.
Acceptance criteria:
- Add /search fast path returning ranked profile/summary/temporal/snippet/code hits without synthesis.
- Make LLM answer generation optional via answer=true.
- Cache profile catalogs and retrieval plans.
- Track p50/p95/p99 latency per retrieval mode.
The retrieval uses LLM tool selection plus storage calls plus LLM synthesis in retrieval.py. This is quality-friendly but latency-heavy.
Acceptance criteria: