Introducing Mem0 V3
Single-pass hierarchical distillation. Multi-signal retrieval. Benchmarked across LoCoMo, LongMemEval, and BEAM.
LoCoMo
LongMemEval
BEAM 1M
BEAM 10M
Executive Summary
Mem0's new algorithm is benchmarked at production scale across LoCoMo, LongMemEval, and BEAM (1M and 10M tokens).
Biggest wins:
+53.6 on assistant memory recall (LongMemEval)
+29.6 on temporal queries (LoCoMo)
64.1 / 48.2 on BEAM at 1M / 10M
Results are reported at two cutoffs, top200 (~7K tokens, ~1s) and top50 (~1.7K tokens, <0.9s). We report latency and token budgets alongside accuracy and call on the industry to adopt this as standard practice for production viability. Eval framework is open-source.
LongMemEval
LongMemEval evaluates memory across single-session and multi-session contexts, including knowledge updates and temporal reasoning.
Top200
Top50
6000
6000
Mean Tokens
0.000s
0.000s
Median Latency
BEAM
BEAM evaluates memory systems at 1M and 10M token scales across ten task categories including preference following, temporal reasoning, and contradiction resolution. It is the only public benchmark that operates at context volumes production AI agents actually encounter.
LoCoMo
LoCoMo tests single-hop, multi-hop, open-domain, and temporal memory recall across conversational sessions.
Top200
Top50
6000
6000
Mean Tokens
0.000s
0.000s
Median Latency
All results are previous Mem0 vs. Memory-1, single-pass retrieval — one retrieval call, one answer, no agentic loops. Full evaluation framework is open-sourced on GitHub.
Multi-signal retrieval
Retrieval stack now runs three scoring passes in parallel and fuses the results: Semantic similarity, Keyword matching, and Entity matching. The combined score outperformed every individual signal across every category we tested.

Agent-generated facts are now first-class
Mem0 now retains facts generated by agents, not just users. When an agent confirms an action or provides a recommendation, that information is stored with equal weight.
