Benchmarking Mem0's
token-efficient memory algorithm

Benchmarked across LoCoMo, LongMemEval, and BEAM
Powered by single-pass hierarchical extraction and multi-signal retrieval

Summary

Summary

Mem0's new token-efficient memory algorithm hits 91.6 on LoCoMo, 93.4 on LongMemEval, and 64.1/48.6 on BEAM (1M/10M) while averaging under 7,000 tokens per retrieval call. Full-context approaches on the same benchmarks use 25,000+. High accuracy at 3-4x lower token cost.

Mem0's new token-efficient memory algorithm hits 91.6 on LoCoMo, 93.4 on LongMemEval, and 64.1/48.6 on BEAM (1M/10M) while averaging under 7,000 tokens per retrieval call. Full-context approaches on the same benchmarks use 25,000+. High accuracy at 3-4x lower token cost.

BENCHMARKS

BENCHMARKS

LoCoMo

1,540 questions • 5 categories

85.0

85.0

OVERALL

6950

6950

Mean Tokens

76.6
92.3
70.2
93.3
57.3
76.0
63.2
92.8
Single-hop
Multi-hop
Open-domain
Temporal
Old
New
LOCOMO

LongMemEval

500 questions • 6 categories

92.0

92.0

OVERALL

6780

6780

Mean Tokens

94.3
97.1
46.4
100.0
76.7
96.7
79.5
96.2
51.1
93.2
70.7
86.5
Single-session (user)
Single-session (assistant)
Single-session (preference)
Knowledge update
Temporal reasoning
Multi-session
Old
New
LONGMEMEVAL

BEAM

BEAM 1M: 700 questions • 35 conversations

BEAM 10M: 200 questions • 10 conversations

62.0

62.0

OVERALL (1M)

45.0

45.0

OVERALL (10M)

6710

6710

Mean Tokens (1m)

6910

6910

Mean Tokens (10m)

88.3
90.4
85.2
82.5
70.0
56.3
65.0
75.0
65.2
26.1
63.5
46.9
61.8
16.3
53.6
20.2
52.5
40.0
35.7
32.5
Preference Following
Instruction Following
Information Extraction
Knowledge Update
Multi Session Reasoning
Summarization
Temporal Reasoning
Event Ordering
Abstention
Contradiction Resolution
1M
10M
BEAM

All results are Old Algorithm vs. New Algorithm.

Full evaluation framework is open-sourced on GitHub.

WHAT’S NEW

WHAT’S NEW

Single pass ADD-only extraction

Mem0 now treats agent-generated facts as first-class, closing a significant gap in memory coverage. When an agent confirms an action or provides a recommendation, that information is stored with equal weight.

Multi-signal retrieval

Retrieval stack now runs three scoring passes in parallel and fuses the results: Semantic similarity, Keyword matching, and Entity matching. The combined score outperforms individual signal scores.

What we're building next

Temporal abstraction

Representing how events relate over time, not just what happened. BEAM 10M scores define the current frontier.

Temporal abstraction

Representing how events relate over time, not just what happened. BEAM 10M scores define the current frontier.

Cross-session structure

Modeling how information evolves across sessions. Requires connecting scattered interactions into coherent timelines.

Cross-session structure

Modeling how information evolves across sessions. Requires connecting scattered interactions into coherent timelines.

Agent-native memory

Extraction and retrieval running asynchronously as infrastructure, so agents don’t spend cycles managing their own context.

Agent-native memory

Extraction and retrieval running asynchronously as infrastructure, so agents don’t spend cycles managing their own context.