Overview
agent_mode adds an orchestration layer on top of episodic long-term-memory
search. Instead of relying on one direct vector search, it can:
- Route a query to the best retrieval strategy.
- Rewrite or split queries when one pass is not enough.
- Aggregate and rerank evidence before returning episodes.
agent_mode=true in memory search APIs.
Diagram

Why Retrieval-Agent Mode Helps
A single direct retrieval works best when one query maps cleanly to one relevant cluster of episodes. Many real user questions are harder than that. Retrieval-agent mode helps when queries require:- Multi-hop dependency chains where later facts depend on earlier facts.
- Relationship traversal across entities (
person -> organization -> role). - Mixed constraints (time, location, role) that are easier to retrieve in steps.
- Evidence completeness checks before stopping.
Where direct retrieval struggles on multi-hop relationship queries
One similarity search query tries to retrieve all required evidence in one shot. For relationship chains, this often fails because the intermediate entity is unknown at query start. Example query:What is the current company of the spouse of the CEO of Acme?
Required hops:
- Find CEO of Acme.
- Find spouse of that CEO.
- Find current company of the spouse.
How retrieval-agent mode fixes this pattern
ToolSelectAgentdetects query shape and routes to chain-based retrieval.ChainOfQueryAgentiteratively rewrites to the next missing hop.- Each iteration retrieves new evidence, then performs sufficiency checking.
- Evidence from all hops is accumulated and reranked before returning.
Where It Fits
The retrieval agent is created inpackages/server/src/memmachine_server/retrieval_agent/service_locator.py and cached in MemMachine.
It is used only for long-term episodic retrieval.
-
Request flow:
POST /api/v2/memories/search(or SDKmemory.search(..., agent_mode=True))MemMachine.query_search(..., agent_mode=True)MemMachine._search_episodic_memory(...)MemMachine._query_episodic_with_retrieval_agent(...)- Retrieval agent
do_query(...)withQueryParam(memory=<EpisodicMemory>) EpisodicMemory.query_memory(..., mode=LONG_TERM_ONLY)for long-term searchEpisodicMemory.query_memory(..., mode=SHORT_TERM_ONLY)for short-term merge
-
agent_mode=falsekeeps the defaultEpisodicMemory.query_memory(...)path.
Core Components
1) Shared Agent API
packages/server/src/memmachine_server/retrieval_agent/common/agent_api.py defines:
QueryParam: query text, limit, context expansion, optional property filter.QueryPolicy: budget/quality policy inputs (currently mostly advisory).AgentToolBase: shared orchestration behavior:- child tool fan-out (
do_query) - perf-metric merge (
_update_perf_metrics) - optional rerank (
_do_rerank)
- child tool fan-out (
2) MemMachineAgent (Direct Retrieval)
packages/server/src/memmachine_server/retrieval_agent/agents/memmachine_retriever.py
- Calls
EpisodicMemory.query_memory(..., mode=LONG_TERM_ONLY)directly. - Returns episodes plus basic metrics (
memory_search_called,memory_retrieval_time). - No query rewrite/splitting.
3) SplitQueryAgent (Independent Sub-Queries)
packages/server/src/memmachine_server/retrieval_agent/agents/split_query_agent.py
- Uses an LLM prompt to split a complex single-hop query into sub-queries.
- Runs child retrieval for each sub-query.
- Aggregates and reranks the combined result.
- Tracks split queries and token/time metrics.
4) ChainOfQueryAgent (Iterative Multi-Hop)
packages/server/src/memmachine_server/retrieval_agent/agents/coq_agent.py
- Iterates up to
max_attempts. - Each iteration:
- retrieves with current query
- asks LLM to judge sufficiency + propose next query rewrite
- Stops when:
- evidence is sufficient, and
- confidence exceeds configured threshold.
- Aggregates evidence across iterations, then reranks final episodes.
5) ToolSelectAgent (Router)
packages/server/src/memmachine_server/retrieval_agent/agents/tool_select_agent.py
- Uses an LLM prompt to choose exactly one strategy tool:
ChainOfQueryAgentSplitQueryAgentMemMachineAgent
- Executes only the selected tool.
- Falls back to a configured default tool if selection is invalid.
Assembly in Service Locator
packages/server/src/memmachine_server/retrieval_agent/service_locator.py creates a layered tool
tree:
MemMachineAgent(leaf; direct memory search)ChainOfQueryAgentandSplitQueryAgent(both wrapMemMachineAgent)ToolSelectAgent(wraps all three as router)
ToolSelectAgent as the top-level agent.
Its fallback default tool is ChainOfQueryAgent.
Workflow by Query Type
A) Direct single-hop
ToolSelectAgentpicksMemMachineAgent.MemMachineAgentruns one long-term-only episodic query.- Episodes are returned.
B) Single-hop with multiple independent entities/constraints
ToolSelectAgentpicksSplitQueryAgent.SplitQueryAgentgenerates N sub-queries.- Child retrieval runs for each sub-query.
- Combined episodes are reranked and returned.
C) Multi-hop / dependency chain
ToolSelectAgentpicksChainOfQueryAgent.- Initial retrieval runs.
- LLM checks sufficiency and rewrites next query if needed.
- Steps repeat until enough evidence or attempt limit hit.
- Aggregated episodes are reranked and returned.
Key Features
- Strategy routing: direct vs split vs iterative multi-hop.
- Query decomposition: split and rewrite prompts are customizable.
- Evidence accumulation: especially in chain-of-query mode.
- Unified reranking: final ranking uses configured reranker when applicable.
- Telemetry: each tool contributes performance and token metrics.
Metrics You Get Back
Depending on selected strategy, metrics may include:selected_toolqueriesmemory_search_calledmemory_retrieval_timellm_timeinput_token/output_tokenconfidence_scoresevidence
Operational Notes and Constraints
- Retrieval-agent mode applies to episodic long-term memory retrieval only.
- Semantic-memory search is unaffected by retrieval-agent routing.
- In agent mode, returned episodic long-term scores are normalized to
1.0in the final response payload. - Reranker and LLM quality strongly affect final result quality.
- Child tool errors can propagate unless handled by retry logic.
Configuration and Extension Points
- Enable at query-time with
agent_mode=true. - Tune prompts via
extra_params:tool_select_promptsplit_promptcombined_prompt(chain-of-query)
- Tune chain behavior:
max_attemptsconfidence_scorethreshold
- Change top-level strategy by setting retrieval config
agent_name(consumed bycreate_retrieval_agent(...)) for deterministic behavior.

