Introducing Mem0 V3

Single-pass hierarchical distillation. Multi-signal retrieval. Benchmarked across LoCoMo, LongMemEval, and BEAM.

0.0

0.0

0.0

0.0

0.0

0.0

LoCoMo

0.0

0.0

0.0

0.0

0.0

0.0

LongMemEval

0.0

0.0

0.0

0.0

0.0

0.0

BEAM 1M

0.0

0.0

0.0

0.0

0.0

0.0

BEAM 10M

Executive Summary

Mem0's new algorithm is benchmarked at production scale across LoCoMo, LongMemEval, and BEAM (1M and 10M tokens).

Biggest wins:

  • +53.6 on assistant memory recall (LongMemEval)

  • +29.6 on temporal queries (LoCoMo)

  • 64.1 / 48.2 on BEAM at 1M / 10M

Results are reported at two cutoffs, top200 (~7K tokens, ~1s) and top50 (~1.7K tokens, <0.9s). We report latency and token budgets alongside accuracy and call on the industry to adopt this as standard practice for production viability. Eval framework is open-source.

LongMemEval

LongMemEval evaluates memory across single-session and multi-session contexts, including knowledge updates and temporal reasoning.

Top200

Top50

6000

6000

Mean Tokens

0.000s

0.000s

Median Latency

94.3
97.1
46.4
100.0
76.7
96.7
79.5
96.2
51.1
93.2
70.7
86.5
Single-session (user)
Single-session (assistant)
Single-session (preference)
Knowledge update
Temporal reasoning
Multi-session
Mem0 v2
Mem0 v3
LONGMEMEVAL TOP 200

BEAM

BEAM evaluates memory systems at 1M and 10M token scales across ten task categories including preference following, temporal reasoning, and contradiction resolution. It is the only public benchmark that operates at context volumes production AI agents actually encounter.

Top200

Top50

6000

6000

Mean Tokens (1m)

0.0s

0.0s

Median Latency (1M)

6000

6000

Mean Tokens (10m)

0.000s

0.000s

Median Latency (10m)

88.3
90.4
85.2
82.5
70.0
56.3
65.0
75.0
65.2
26.1
63.5
46.9
61.8
16.3
53.6
20.2
52.5
40.0
35.7
32.5
Preference Following
Instruction Following
Information Extraction
Knowledge Update
Multi Session Reasoning
Summarization
Temporal Reasoning
Event Ordering
Abstention
Contradiction Resolution
Mem0 v3 (1M)
Mem0 v3 (10M)
SCORES ACROSS BENCHMARKS

Top200

Top50

6000

6000

Mean Tokens (1m)

0.0s

0.0s

Median Latency (1M)

6000

6000

Mean Tokens (10m)

0.000s

0.000s

Median Latency (10m)

88.3
90.4
85.2
82.5
70.0
56.3
65.0
75.0
65.2
26.1
63.5
46.9
61.8
16.3
53.6
20.2
52.5
40.0
35.7
32.5
Preference Following
Instruction Following
Information Extraction
Knowledge Update
Multi Session Reasoning
Summarization
Temporal Reasoning
Event Ordering
Abstention
Contradiction Resolution
Mem0 v3 (1M)
Mem0 v3 (10M)
SCORES ACROSS BENCHMARKS

LoCoMo

LoCoMo tests single-hop, multi-hop, open-domain, and temporal memory recall across conversational sessions.

Top200

Top50

6000

6000

Mean Tokens

0.000s

0.000s

Median Latency

76.6
92.3
70.2
93.3
57.3
76.0
63.2
92.8
Single-hop
Multi-hop
Open-domain
Temporal
Mem0 v2
Mem0 v3
SCORES ACROSS BENCHMARKS

All results are previous Mem0 vs. Memory-1, single-pass retrieval — one retrieval call, one answer, no agentic loops. Full evaluation framework is open-sourced on GitHub.

WHAT’S NEW

WHAT’S NEW

Multi-signal retrieval

Retrieval stack now runs three scoring passes in parallel and fuses the results: Semantic similarity, Keyword matching, and Entity matching. The combined score outperformed every individual signal across every category we tested.

Agent-generated facts are now first-class

Mem0 now retains facts generated by agents, not just users. When an agent confirms an action or provides a recommendation, that information is stored with equal weight.

What we're building next

Temporal Abstraction

Representing not just what happened, but how events relate over time. BEAM 10M scores define the current frontier.

Temporal Abstraction

Representing not just what happened, but how events relate over time. BEAM 10M scores define the current frontier.

Background Memory

Distillation and retrieval running asynchronously as infrastructure, so agents don't spend cycles managing their own context.

Background Memory

Distillation and retrieval running asynchronously as infrastructure, so agents don't spend cycles managing their own context.

Higher-Order Patterns

Behavioral patterns, preference shifts, and evolving relationships that can only be inferred from many data points over time.

Higher-Order Patterns

Behavioral patterns, preference shifts, and evolving relationships that can only be inferred from many data points over time.