Mem0: Building Production- Ready AI Agents
with Scalable Long-Term Memory

A scalable memory-centric algorithm that dynamically extracts and retrieves key conversational facts—delivering 26 % relative accuracy gains over OpenAI on the LOCOMO benchmark, with 91 % lower p95 latency and 90 % fewer tokens.

Mem0: Building Production- Ready AI Agents
with Scalable Long-Term Memory

A scalable memory-centric algorithm that dynamically extracts and retrieves key conversational facts—delivering 26 % relative accuracy gains over OpenAI on the LOCOMO benchmark, with 91 % lower p95 latency and 90 % fewer tokens.

Mem0: Building Production- Ready AI Agents
with Scalable Long-Term Memory

A scalable memory-centric algorithm that dynamically extracts and retrieves key conversational facts—delivering 26 % relative accuracy gains over OpenAI on the LOCOMO benchmark, with 91 % lower p95 latency and 90 % fewer tokens.

Executive Summary

Benchmarking Mem0

AI systems today forget key facts over extended interactions, breaking context and eroding trust. Simply enlarging LLM context windows only delays the problem—models get slower, costlier, and still overlook critical details.

Mem0 addresses this problem head-on with a scalable memory architecture that dynamically extracts, consolidates, and retrieves important information from conversations. An enhanced variant, Mem0ᵍ, layers in a graph-based store to capture richer, multi-session relationships.

On the LOCOMO benchmark, Mem0 consistently outperforms six leading memory approaches, achieving:

  • 26% higher response accuracy compared to OpenAI’s memory

  • 91% lower latency compared to full-context method

  • 90% savings in token usage, making memory practical and affordable at scale

By making persistent, structured memory practical at scale, Mem0 paves the way for AI agents that don’t just react, but truly remember, adapt, and collaborate over time.

Executive Summary

Benchmarking Mem0

AI systems today forget key facts over extended interactions, breaking context and eroding trust. Simply enlarging LLM context windows only delays the problem—models get slower, costlier, and still overlook critical details.

Mem0 addresses this problem head-on with a scalable memory architecture that dynamically extracts, consolidates, and retrieves important information from conversations. An enhanced variant, Mem0ᵍ, layers in a graph-based store to capture richer, multi-session relationships.

On the LOCOMO benchmark, Mem0 consistently outperforms six leading memory approaches, achieving:

  • 26% higher response accuracy compared to OpenAI’s memory

  • 91% lower latency compared to full-context method

  • 90% savings in token usage, making memory practical and affordable at scale

By making persistent, structured memory practical at scale, Mem0 paves the way for AI agents that don’t just react, but truly remember, adapt, and collaborate over time.

Executive Summary

Benchmarking Mem0

AI systems today forget key facts over extended interactions, breaking context and eroding trust. Simply enlarging LLM context windows only delays the problem—models get slower, costlier, and still overlook critical details.

Mem0 addresses this problem head-on with a scalable memory architecture that dynamically extracts, consolidates, and retrieves important information from conversations. An enhanced variant, Mem0ᵍ, layers in a graph-based store to capture richer, multi-session relationships.

On the LOCOMO benchmark, Mem0 consistently outperforms six leading memory approaches, achieving:

  • 26% higher response accuracy compared to OpenAI’s memory

  • 91% lower latency compared to full-context method

  • 90% savings in token usage, making memory practical and affordable at scale

By making persistent, structured memory practical at scale, Mem0 paves the way for AI agents that don’t just react, but truly remember, adapt, and collaborate over time.

Approach

Under the hood

A two-phase memory pipeline that extracts, consolidates, and retrieves only the most salient conversational facts—enabling scalable, long-term reasoning.

Mem0’s pipeline consists of two phases—Extraction and Update—ensuring only the most relevant facts are stored and retrieved, minimizing tokens and latency .

In the Extraction Phase, the system ingests three context sources—the latest exchange, a rolling summary, and the m most recent messages—and uses an LLM to extract a concise set of candidate memories. A background module refreshes the long-term summary asynchronously, so inference never stalls .

In the Update Phase, each new fact is compared to the top s similar entries in the vector database. The LLM then chooses one of four operations:

  • ADD new memories

  • UPDATE existing entries

  • DELETE contradictions

  • NOOP if no change is needed

These steps keep the memory store coherent, non-redundant, and instantly ready for the next query .

Mem0ᵍ enhances Mem0 by storing memories as a directed, labeled graph. In the Extraction Phase, incoming messages feed into an Entity Extractor to identify entities as nodes and a Relations Generator to infer labeled edges—transforming text into a structured graph .

During the Update Phase, a Conflict Detector flags overlapping or contradictory nodes/edges, and an LLM-powered Update Resolver decides whether to add, merge, invalidate, or skip graph elements. The resulting knowledge graph enables efficient subgraph retrieval and semantic triplet matching for complex multi-hop, temporal, and open-domain reasoning .

Results

Performance Highlights

Rigorous LOCOMO benchmarking shows Mem0 delivers across accuracy, speed, and efficiency.

+26%

More accurate vs. OpenAI Memory

91%

Lower p95 latency vs. full-context

90%

Token cost savings vs. full-context

On the LOCOMO benchmark, Mem0 delivers a 26% relative uplift in overall LLM-as-a-Judge score over OpenAI’s memory feature—66.9 % versus 52.9 %—underscoring its superior factual accuracy and coherence . Beyond quality, Mem0’s selective retrieval pipeline slashes p95 latency by 91 % (1.44 s vs. 17.12 s) by operating over concise memory facts instead of reprocessing entire chat histories. This focused approach also drives a 90% reduction in token consumption, requiring only ~1.8K tokens per conversation compared to 26K for full-context methods . Together, these results demonstrate how Mem0 balances state-of-the-art reasoning, real-time responsiveness, and cost efficiency—making long-term conversational memory practical at scale.

This chart compares each method’s search latency (median p50 in pink, tail p95 in green) against its reasoning accuracy (blue bars). Mem0 achieves 66.9 % accuracy with a median search latency of 0.20 s and p95 latency of 0.15 s, keeping the memory retrieval firmly in real-time territory. By contrast, a standard RAG setup manages only 61.0 % accuracy at 0.70 s median and 0.26 s p95 search times. The graph-enhanced variant Mem0ᵍ further lifts accuracy to 68.4 % with 0.66 s median and 0.48 s p95 search latencies. By extracting and indexing only the most salient facts, Mem0 delivers near–state-of-the-art long-term reasoning while minimizing search overhead.

End-to-end measurements (memory retrieval + answer generation) showcase Mem0’s production readiness. A full-context approach may reach 72.9 % accuracy, but suffers from a 9.87 s median and 17.12 s p95 latency. In contrast, Mem0 achieves 66.9 % accuracy with just a 0.71 s median and 1.44 s p95 end-to-end response time. Its graph-enhanced variant Mem0ᵍ nudges accuracy to 68.4 % while maintaining a 1.09 s median and 2.59 s p95 latency. By extracting and indexing only the most relevant facts, Mem0 delivers near–state-of-the-art long-term reasoning at true production speed.

Conclusion

By delivering a 26% accuracy boost, 91% lower p95 latency, and 90% token savings, Mem0 demonstrates that persistent, structured memory can be both powerful and practical at scale . These results unlock a future where AI agents don’t just react—but truly remember: preserving user preferences over weeks, adapting to evolving contexts, and maintaining coherent, personalized interactions in domains from healthcare and education to enterprise support. Building on this foundation, the next generation of memory systems can explore hierarchical and multimodal representations, on-device memory, and dynamic consolidation mechanisms—paving the way for AI that genuinely grows and evolves alongside its users.

Approach

Under the hood

A two-phase memory pipeline that extracts, consolidates, and retrieves only the most salient conversational facts—enabling scalable, long-term reasoning.

Mem0’s pipeline consists of two phases—Extraction and Update—ensuring only the most relevant facts are stored and retrieved, minimizing tokens and latency .

In the Extraction Phase, the system ingests three context sources—the latest exchange, a rolling summary, and the m most recent messages—and uses an LLM to extract a concise set of candidate memories. A background module refreshes the long-term summary asynchronously, so inference never stalls .

In the Update Phase, each new fact is compared to the top s similar entries in the vector database. The LLM then chooses one of four operations:

  • ADD new memories

  • UPDATE existing entries

  • DELETE contradictions

  • NOOP if no change is needed

These steps keep the memory store coherent, non-redundant, and instantly ready for the next query .

Mem0ᵍ enhances Mem0 by storing memories as a directed, labeled graph. In the Extraction Phase, incoming messages feed into an Entity Extractor to identify entities as nodes and a Relations Generator to infer labeled edges—transforming text into a structured graph .

During the Update Phase, a Conflict Detector flags overlapping or contradictory nodes/edges, and an LLM-powered Update Resolver decides whether to add, merge, invalidate, or skip graph elements. The resulting knowledge graph enables efficient subgraph retrieval and semantic triplet matching for complex multi-hop, temporal, and open-domain reasoning .

Results

Performance Highlights

Rigorous LOCOMO benchmarking shows Mem0 delivers across accuracy, speed, and efficiency.

+26%

More accurate vs. OpenAI Memory

91%

Lower p95 latency vs. full-context

90%

Token cost savings vs. full-context

On the LOCOMO benchmark, Mem0 delivers a 26% relative uplift in overall LLM-as-a-Judge score over OpenAI’s memory feature—66.9 % versus 52.9 %—underscoring its superior factual accuracy and coherence . Beyond quality, Mem0’s selective retrieval pipeline slashes p95 latency by 91 % (1.44 s vs. 17.12 s) by operating over concise memory facts instead of reprocessing entire chat histories. This focused approach also drives a 90% reduction in token consumption, requiring only ~1.8K tokens per conversation compared to 26K for full-context methods . Together, these results demonstrate how Mem0 balances state-of-the-art reasoning, real-time responsiveness, and cost efficiency—making long-term conversational memory practical at scale.

This chart compares each method’s search latency (median p50 in pink, tail p95 in green) against its reasoning accuracy (blue bars). Mem0 achieves 66.9 % accuracy with a median search latency of 0.20 s and p95 latency of 0.15 s, keeping the memory retrieval firmly in real-time territory. By contrast, a standard RAG setup manages only 61.0 % accuracy at 0.70 s median and 0.26 s p95 search times. The graph-enhanced variant Mem0ᵍ further lifts accuracy to 68.4 % with 0.66 s median and 0.48 s p95 search latencies. By extracting and indexing only the most salient facts, Mem0 delivers near–state-of-the-art long-term reasoning while minimizing search overhead.

End-to-end measurements (memory retrieval + answer generation) showcase Mem0’s production readiness. A full-context approach may reach 72.9 % accuracy, but suffers from a 9.87 s median and 17.12 s p95 latency. In contrast, Mem0 achieves 66.9 % accuracy with just a 0.71 s median and 1.44 s p95 end-to-end response time. Its graph-enhanced variant Mem0ᵍ nudges accuracy to 68.4 % while maintaining a 1.09 s median and 2.59 s p95 latency. By extracting and indexing only the most relevant facts, Mem0 delivers near–state-of-the-art long-term reasoning at true production speed.

Conclusion

By delivering a 26% accuracy boost, 91% lower p95 latency, and 90% token savings, Mem0 demonstrates that persistent, structured memory can be both powerful and practical at scale . These results unlock a future where AI agents don’t just react—but truly remember: preserving user preferences over weeks, adapting to evolving contexts, and maintaining coherent, personalized interactions in domains from healthcare and education to enterprise support. Building on this foundation, the next generation of memory systems can explore hierarchical and multimodal representations, on-device memory, and dynamic consolidation mechanisms—paving the way for AI that genuinely grows and evolves alongside its users.

Approach

Under the hood

A two-phase memory pipeline that extracts, consolidates, and retrieves only the most salient conversational facts—enabling scalable, long-term reasoning.

Mem0’s pipeline consists of two phases—Extraction and Update—ensuring only the most relevant facts are stored and retrieved, minimizing tokens and latency .

In the Extraction Phase, the system ingests three context sources—the latest exchange, a rolling summary, and the m most recent messages—and uses an LLM to extract a concise set of candidate memories. A background module refreshes the long-term summary asynchronously, so inference never stalls .

In the Update Phase, each new fact is compared to the top s similar entries in the vector database. The LLM then chooses one of four operations:

  • ADD new memories

  • UPDATE existing entries

  • DELETE contradictions

  • NOOP if no change is needed

These steps keep the memory store coherent, non-redundant, and instantly ready for the next query .

Mem0ᵍ enhances Mem0 by storing memories as a directed, labeled graph. In the Extraction Phase, incoming messages feed into an Entity Extractor to identify entities as nodes and a Relations Generator to infer labeled edges—transforming text into a structured graph .

During the Update Phase, a Conflict Detector flags overlapping or contradictory nodes/edges, and an LLM-powered Update Resolver decides whether to add, merge, invalidate, or skip graph elements. The resulting knowledge graph enables efficient subgraph retrieval and semantic triplet matching for complex multi-hop, temporal, and open-domain reasoning .

Results

Performance Highlights

Rigorous LOCOMO benchmarking shows Mem0 delivers across accuracy, speed, and efficiency.

+26%

More accurate vs. OpenAI Memory

91%

Lower p95 latency vs. full-context

90%

Token cost savings vs. full-context

On the LOCOMO benchmark, Mem0 delivers a 26% relative uplift in overall LLM-as-a-Judge score over OpenAI’s memory feature—66.9 % versus 52.9 %—underscoring its superior factual accuracy and coherence . Beyond quality, Mem0’s selective retrieval pipeline slashes p95 latency by 91 % (1.44 s vs. 17.12 s) by operating over concise memory facts instead of reprocessing entire chat histories. This focused approach also drives a 90% reduction in token consumption, requiring only ~1.8K tokens per conversation compared to 26K for full-context methods . Together, these results demonstrate how Mem0 balances state-of-the-art reasoning, real-time responsiveness, and cost efficiency—making long-term conversational memory practical at scale.

This chart compares each method’s search latency (median p50 in pink, tail p95 in green) against its reasoning accuracy (blue bars). Mem0 achieves 66.9 % accuracy with a median search latency of 0.20 s and p95 latency of 0.15 s, keeping the memory retrieval firmly in real-time territory. By contrast, a standard RAG setup manages only 61.0 % accuracy at 0.70 s median and 0.26 s p95 search times. The graph-enhanced variant Mem0ᵍ further lifts accuracy to 68.4 % with 0.66 s median and 0.48 s p95 search latencies. By extracting and indexing only the most salient facts, Mem0 delivers near–state-of-the-art long-term reasoning while minimizing search overhead.

End-to-end measurements (memory retrieval + answer generation) showcase Mem0’s production readiness. A full-context approach may reach 72.9 % accuracy, but suffers from a 9.87 s median and 17.12 s p95 latency. In contrast, Mem0 achieves 66.9 % accuracy with just a 0.71 s median and 1.44 s p95 end-to-end response time. Its graph-enhanced variant Mem0ᵍ nudges accuracy to 68.4 % while maintaining a 1.09 s median and 2.59 s p95 latency. By extracting and indexing only the most relevant facts, Mem0 delivers near–state-of-the-art long-term reasoning at true production speed.

Conclusion

By delivering a 26% accuracy boost, 91% lower p95 latency, and 90% token savings, Mem0 demonstrates that persistent, structured memory can be both powerful and practical at scale . These results unlock a future where AI agents don’t just react—but truly remember: preserving user preferences over weeks, adapting to evolving contexts, and maintaining coherent, personalized interactions in domains from healthcare and education to enterprise support. Building on this foundation, the next generation of memory systems can explore hierarchical and multimodal representations, on-device memory, and dynamic consolidation mechanisms—paving the way for AI that genuinely grows and evolves alongside its users.

Empower AI Agents with true
long-term memory

Build AI agents that adapt, personalize, and retain context across every interaction.

Empower AI Agents with true
long-term memory

Build AI agents that adapt, personalize, and retain context across every interaction.

Empower AI Agents with true
long-term memory

Build AI agents that adapt, personalize, and retain context across every interaction.

Stay in the loop on AI memory breakthroughs.

Stay in the loop on AI memory breakthroughs.

Stay in the loop on AI memory breakthroughs.

RESOURCES

TERMS & CONDITIONS

© 2025 Mem0. All rights reserved.

RESOURCES

TERMS & CONDITIONS

© 2025 Mem0. All rights reserved.

RESOURCES

TERMS & CONDITIONS

© 2025 Mem0. All rights reserved.