Developers

Resources

Usecases

Pricing

Docs

Star

research_navbar_get-started

Research

Start Free

Introducing Mem0 V3

Single-pass hierarchical distillation. Multi-signal retrieval. Benchmarked across LoCoMo, LongMemEval, and BEAM.

0.0

LoCoMo

0.0

LongMemEval

0.0

BEAM 1M

0.0

BEAM 10M

Executive Summary

Mem0's new algorithm is benchmarked at production scale across LoCoMo, LongMemEval, and BEAM (1M and 10M tokens).

Biggest wins:

+53.6 on assistant memory recall (LongMemEval)

+29.6 on temporal queries (LoCoMo)
64.1 / 48.2 on BEAM at 1M / 10M

Results are reported at two cutoffs, top200 (~7K tokens, ~1s) and top50 (~1.7K tokens, <0.9s). We report latency and token budgets alongside accuracy and call on the industry to adopt this as standard practice for production viability. Eval framework is open-source.

LongMemEval

LongMemEval evaluates memory across single-session and multi-session contexts, including knowledge updates and temporal reasoning.

Top200

Top50

6000

Mean Tokens

0.000s

Median Latency

94.3

97.1

46.4

100.0

76.7

96.7

79.5

96.2

51.1

93.2

70.7

86.5

Single-session (user)

Single-session (assistant)

Single-session (preference)

Knowledge update

Temporal reasoning

Multi-session

Mem0 v2

Mem0 v3

LONGMEMEVAL TOP 200

BEAM

BEAM evaluates memory systems at 1M and 10M token scales across ten task categories including preference following, temporal reasoning, and contradiction resolution. It is the only public benchmark that operates at context volumes production AI agents actually encounter.

Top200

Top50

6000

Mean Tokens (1m)

0.0s

Median Latency (1M)

6000

Mean Tokens (10m)

0.000s

Median Latency (10m)

88.3

90.4

85.2

82.5

70.0

56.3

65.0

75.0

65.2

26.1

63.5

46.9

61.8

16.3

53.6

20.2

52.5

40.0

35.7

32.5

Preference Following

Instruction Following

Information Extraction

Knowledge Update

Multi Session Reasoning

Summarization

Temporal Reasoning

Event Ordering

Abstention

Contradiction Resolution

Mem0 v3 (1M)

Mem0 v3 (10M)

SCORES ACROSS BENCHMARKS

Top200

Top50

6000

Mean Tokens (1m)

0.0s

Median Latency (1M)

6000

Mean Tokens (10m)

0.000s

Median Latency (10m)

88.3

90.4

85.2

82.5

70.0

56.3

65.0

75.0

65.2

26.1

63.5

46.9

61.8

16.3

53.6

20.2

52.5

40.0

35.7

32.5

Preference Following

Instruction Following

Information Extraction

Knowledge Update

Multi Session Reasoning

Summarization

Temporal Reasoning

Event Ordering

Abstention

Contradiction Resolution

Mem0 v3 (1M)

Mem0 v3 (10M)

SCORES ACROSS BENCHMARKS

LoCoMo

LoCoMo tests single-hop, multi-hop, open-domain, and temporal memory recall across conversational sessions.

Top200

Top50

6000

Mean Tokens

0.000s

Median Latency

76.6

92.3

70.2

93.3

57.3

76.0

63.2

92.8

Single-hop

Multi-hop

Open-domain

Temporal

Mem0 v2

Mem0 v3

SCORES ACROSS BENCHMARKS

All results are previous Mem0 vs. Memory-1, single-pass retrieval — one retrieval call, one answer, no agentic loops. Full evaluation framework is open-sourced on GitHub.

WHAT’S NEW

Multi-signal retrieval

Retrieval stack now runs three scoring passes in parallel and fuses the results: Semantic similarity, Keyword matching, and Entity matching. The combined score outperformed every individual signal across every category we tested.

Agent-generated facts are now first-class

Mem0 now retains facts generated by agents, not just users. When an agent confirms an action or provides a recommendation, that information is stored with equal weight.

What we're building next

Temporal Abstraction

Representing not just what happened, but how events relate over time. BEAM 10M scores define the current frontier.

Temporal Abstraction

Representing not just what happened, but how events relate over time. BEAM 10M scores define the current frontier.

Background Memory

Distillation and retrieval running asynchronously as infrastructure, so agents don't spend cycles managing their own context.

Background Memory

Distillation and retrieval running asynchronously as infrastructure, so agents don't spend cycles managing their own context.

Higher-Order Patterns

Behavioral patterns, preference shifts, and evolving relationships that can only be inferred from many data points over time.

Higher-Order Patterns

Behavioral patterns, preference shifts, and evolving relationships that can only be inferred from many data points over time.

home_primary_get-started

Website/CTA

Read the Post

home_primary_get-started

Website/CTA

Join the Team