DEVELOPERS

PRICING

USECASES

RESOURCES

DOCS

Star

home_primary_get-started

Home

Start For Free

DEVELOPERS

PRICING

USECASES

RESOURCES

DOCS

home_primary_get-started

Home

Start For Free

Blog

Engineering

Google ADK Memory: How to Add Persistent Memory to Google ADK with Mem0

Engineering Team

•

March 9, 2026

You build an agent on your local machine using the Google Agent Development Kit (ADK). It works perfectly. It remembers your name, it recalls that you prefer Python over Java, and it handles complex multi-turn conversations.

Then you deploy it.

You push it to Cloud Run or a Google Kubernetes Engine (GKE) cluster. You scale it to three replicas to handle traffic. The moment a pod restarts or traffic routes to another replica, your agent loses all context.

This is a fundamental architectural constraint of ADK's default configuration. The built-in InMemoryMemoryService stores context in RAM. When the process dies or the request routes to a different instance, the memory is gone. For production agents that need to survive infrastructure updates and scale horizontally, you need external persistent storage.

This guide shows you how to wire Mem0 into Google ADK agents to give them persistent, semantic memory that survives restarts and follows users across sessions, regardless of which instance handles the request.

TL;DR

ADK’s default InMemoryMemoryService stores data in RAM and loses it on restart.
Mem0 adds persistent, semantic memory using vector embeddings.
Agents can store and retrieve memory via Python tools (search_memory, save_memory).
Shared memory enables multi-agent coordination without cross-user leakage.

ADK Memory options vs Mem0

So what memory options are available inside ADK today?

Let’s compare the default in-memory implementation, Vertex AI Memory Bank, and Mem0 across persistence, semantic capabilities, multi-agent sharing, and operational complexity.

Feature	InMemoryMemoryService	VertexAI Memory Bank	Mem0
Persistence	No (RAM only)	Yes (managed by Vertex AI)	Yes
Survives restarts	No	Yes	Yes
Semantic search	No (keyword only)	Yes	Yes
Multi-agent sharing	No	Yes (same Agent Engine)	Yes (any agent)
Setup complexity	Low	Medium to high	Medium
Vendor lock-in	None	Google Cloud	None

Prerequisites and Setup

Before you wire Mem0 into your ADK agents, you need API keys and dependencies.

Install required packages

pip3 install google-adk mem0ai python-dotenv

pip3 install google-adk mem0ai python-dotenv

pip3 install google-adk mem0ai python-dotenv

Get your API keys

You need two API keys:

Why do ADK agents lose context between sessions?

Google ADK separates conversation management into three layers: session, state, and memory.

Session stores conversation history for a single thread.

State is temporary key-value data tied to a session.

Memory is meant to persist information across sessions.

The problem is ADK’s default implementation.

InMemoryMemoryService stores everything in a Python dictionary in RAM and relies on keyword matching. According to the ADK documentation on memory, all data is lost when the application restarts.

In distributed environments like GKE with multiple replicas, each instance maintains its own in-memory store. When traffic shifts between replicas or a pod restarts, previously stored memory disappears. This makes the default implementation unsuitable for production deployments.

To fix this, you need a memory layer that runs outside the agent process and is accessible to every instance.

How does Mem0 solve the memory problem for ADK agents?

Mem0 provides a persistent semantic memory layer outside your ADK runtime.

Instead of relying on keyword matching, Mem0 converts memories into vector embeddings. This allows your agent to retrieve information based on meaning rather than exact text.

You can also explore how persistent memory is essential for avoiding stateless agent failures in Why Stateless Agents Fail at Personalization.

For example, a user says, “I don’t eat meat” in one session. Later, they ask, “What protein sources work for me?”

Keyword matching fails. The words "meat" and "protein sources" don't overlap.

Semantic search succeeds. It understands that "doesn't eat meat" relates to "protein sources" even though the exact words are different. The embeddings capture meaning, not just exact text matches.

In practice, ADK’s in-memory service checks for exact string matches, while Mem0 retrieves information based on semantic relevance. This allows your agent to recall related concepts even when the user phrases things differently across sessions.

The integration itself is simple. You access Mem0 through two tool functions and register them with your agent.

The tool function pattern

Mem0 integrates with ADK agents via two Python tool functions. You define search_memory and save_memory, initialize the MemoryClient, and register these functions with the agent’s tools parameter. During conversations, the agent’s LLM decides when to call each tool. Here is the core setup:

from mem0 import MemoryClient
from google.adk.agents import Agent
from dotenv import load_dotenv

# Load environment variables (API keys, etc.)
load_dotenv()

# Initialize Mem0 client
mem0 = MemoryClient()

# --- Memory tool functions ---
def search_memory(query: str, user_id: str):
    """Retrieve memories matching the query for a user."""
    results = mem0.search(query=query, filters={"user_id": user_id}).get("results", [])
    if results:
        # Return only top memory, or join all if you want
        return "\n".join([m["memory"] for m in results[:1]])
    return ""

def save_memory(text: str, user_id: str)-> dict:
    """Save a memory for a user."""
    try:
        mem0.add(text, user_id=user_id)
        return {"status": "success"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

# --- Create agent ---
assistant = Agent(
    name="assistant",
    model="gemini-2.5-flash",
    instruction="Use memory tools to personalize responses.",
    tools=[search_memory, save_memory],
)

# --- Example usage ---

    # Save a few memories
if __name__ == "__main__":
    save_memory("I am allergic to peanuts and love spicy food.", user_id="abhay")
    save_memory("I like to travel to Paris.", user_id="abhay")
    save_memory("My favorite color is blue.", user_id="abhay")

    # Retrieve relevant memories
    question = "What food should I avoid?"
    print(f"\nQuestion: {question}")
    print("Found in memory:", search_memory(question, user_id="abhay"))

from mem0 import MemoryClient
from google.adk.agents import Agent
from dotenv import load_dotenv

# Load environment variables (API keys, etc.)
load_dotenv()

# Initialize Mem0 client
mem0 = MemoryClient()

# --- Memory tool functions ---
def search_memory(query: str, user_id: str):
    """Retrieve memories matching the query for a user."""
    results = mem0.search(query=query, filters={"user_id": user_id}).get("results", [])
    if results:
        # Return only top memory, or join all if you want
        return "\n".join([m["memory"] for m in results[:1]])
    return ""

def save_memory(text: str, user_id: str)-> dict:
    """Save a memory for a user."""
    try:
        mem0.add(text, user_id=user_id)
        return {"status": "success"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

# --- Create agent ---
assistant = Agent(
    name="assistant",
    model="gemini-2.5-flash",
    instruction="Use memory tools to personalize responses.",
    tools=[search_memory, save_memory],
)

# --- Example usage ---

    # Save a few memories
if __name__ == "__main__":
    save_memory("I am allergic to peanuts and love spicy food.", user_id="abhay")
    save_memory("I like to travel to Paris.", user_id="abhay")
    save_memory("My favorite color is blue.", user_id="abhay")

    # Retrieve relevant memories
    question = "What food should I avoid?"
    print(f"\nQuestion: {question}")
    print("Found in memory:", search_memory(question, user_id="abhay"))

from mem0 import MemoryClient
from google.adk.agents import Agent
from dotenv import load_dotenv

# Load environment variables (API keys, etc.)
load_dotenv()

# Initialize Mem0 client
mem0 = MemoryClient()

# --- Memory tool functions ---
def search_memory(query: str, user_id: str):
    """Retrieve memories matching the query for a user."""
    results = mem0.search(query=query, filters={"user_id": user_id}).get("results", [])
    if results:
        # Return only top memory, or join all if you want
        return "\n".join([m["memory"] for m in results[:1]])
    return ""

def save_memory(text: str, user_id: str)-> dict:
    """Save a memory for a user."""
    try:
        mem0.add(text, user_id=user_id)
        return {"status": "success"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

# --- Create agent ---
assistant = Agent(
    name="assistant",
    model="gemini-2.5-flash",
    instruction="Use memory tools to personalize responses.",
    tools=[search_memory, save_memory],
)

# --- Example usage ---

    # Save a few memories
if __name__ == "__main__":
    save_memory("I am allergic to peanuts and love spicy food.", user_id="abhay")
    save_memory("I like to travel to Paris.", user_id="abhay")
    save_memory("My favorite color is blue.", user_id="abhay")

    # Retrieve relevant memories
    question = "What food should I avoid?"
    print(f"\nQuestion: {question}")
    print("Found in memory:", search_memory(question, user_id="abhay"))

When you call search_memory with a query about dietary preferences, Mem0 can return relevant memories even if the query does not contain the exact words. For example, asking “What food should I avoid?” can surface stored information about allergies or dietary restrictions, such as avoiding peanuts.

This works because Mem0 compares vector embeddings rather than raw strings. The query and stored memories are matched by meaning, allowing related preferences and constraints to surface automatically even when phrasing differs.

These functions are invoked through ADK’s tool system. Because Mem0 stores memory outside the agent process, all instances read and write to the same backend. Memories persist across restarts, and semantic retrieval ensures agents receive contextually relevant information rather than simple keyword matches.

Automatic conversation storage

The pattern above relies on the agent deciding when to call save_memory. For more reliable memory generation, you can automatically store conversations after each agent response without depending on tool calls.

ADK's Runner yields events during execution. When the runner emits an is_final_response event, you can extract the user input and agent response, then store the conversation pair directly in Mem0:

from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google import genai
from mem0 import MemoryClient

# Initialize Mem0 client
mem0 = MemoryClient()

async def chat_with_agent(user_input: str, user_id: str) -> str:
    """
    Handle user input with automatic memory storage.
    
    Args:
        user_input: The user's message
        user_id: Unique identifier for the user
        
    Returns:
        The agent's response
    """
    # Create session
    session_service = InMemorySessionService()
    session = await session_service.create_session(
        app_name="assistant",
        user_id=user_id,
        session_id=f"session_{user_id}"
    )
    
    # Initialize runner
    runner = Runner(
        agent=assistant,
        app_name="assistant",
        session_service=session_service
    )
    
    # Process user message
    content = genai.types.Content(
        role='user',
        parts=[genai.types.Part(text=user_input)]
    )
    
    events = await runner.run_async(

        user_id=user_id,

        session_id=session.id,

        new_message=content

    )
    
    # Extract final response and store conversation
    for event in events:
        if event.is_final_response():
            response = event.content.parts[0].text
            
            # Store conversation pair in Mem0
            conversation = [
                {"role": "user", "content": user_input},
                {"role": "assistant", "content": response}
            ]
            mem0.add(conversation, user_id=user_id)
            
            return response

from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google import genai
from mem0 import MemoryClient

# Initialize Mem0 client
mem0 = MemoryClient()

async def chat_with_agent(user_input: str, user_id: str) -> str:
    """
    Handle user input with automatic memory storage.
    
    Args:
        user_input: The user's message
        user_id: Unique identifier for the user
        
    Returns:
        The agent's response
    """
    # Create session
    session_service = InMemorySessionService()
    session = await session_service.create_session(
        app_name="assistant",
        user_id=user_id,
        session_id=f"session_{user_id}"
    )
    
    # Initialize runner
    runner = Runner(
        agent=assistant,
        app_name="assistant",
        session_service=session_service
    )
    
    # Process user message
    content = genai.types.Content(
        role='user',
        parts=[genai.types.Part(text=user_input)]
    )
    
    events = await runner.run_async(

        user_id=user_id,

        session_id=session.id,

        new_message=content

    )
    
    # Extract final response and store conversation
    for event in events:
        if event.is_final_response():
            response = event.content.parts[0].text
            
            # Store conversation pair in Mem0
            conversation = [
                {"role": "user", "content": user_input},
                {"role": "assistant", "content": response}
            ]
            mem0.add(conversation, user_id=user_id)
            
            return response

from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google import genai
from mem0 import MemoryClient

# Initialize Mem0 client
mem0 = MemoryClient()

async def chat_with_agent(user_input: str, user_id: str) -> str:
    """
    Handle user input with automatic memory storage.
    
    Args:
        user_input: The user's message
        user_id: Unique identifier for the user
        
    Returns:
        The agent's response
    """
    # Create session
    session_service = InMemorySessionService()
    session = await session_service.create_session(
        app_name="assistant",
        user_id=user_id,
        session_id=f"session_{user_id}"
    )
    
    # Initialize runner
    runner = Runner(
        agent=assistant,
        app_name="assistant",
        session_service=session_service
    )
    
    # Process user message
    content = genai.types.Content(
        role='user',
        parts=[genai.types.Part(text=user_input)]
    )
    
    events = await runner.run_async(

        user_id=user_id,

        session_id=session.id,

        new_message=content

    )
    
    # Extract final response and store conversation
    for event in events:
        if event.is_final_response():
            response = event.content.parts[0].text
            
            # Store conversation pair in Mem0
            conversation = [
                {"role": "user", "content": user_input},
                {"role": "assistant", "content": response}
            ]
            mem0.add(conversation, user_id=user_id)
            
            return response

This ensures every interaction is stored. The agent builds a searchable memory corpus without requiring explicit "remember this" commands from users. Each turn becomes context for future sessions.

Later, when the same user asks about upcoming trips, semantic search retrieves the Paris travel plan even if the new query doesn’t mention flights or bookings. The embeddings connect related concepts like “cities I’m visiting” and “trip to Paris” without relying on exact keywords.

How do you implement Mem0 in a multi-agent ADK system?

ADK is designed for multi-agent orchestration. A coordinator delegates tasks to specialist agents. With Mem0, all of these agents share the same external memory layer.

The pattern is simple. The coordinator receives the user query, searches Mem0 for context, then routes the request to the right specialist. Each specialist also has access to search_memory and save_memory, so they can read and write user-specific context.

Shared memory across agent hierarchies

Here's a coordinator setup with specialist agents using AgentTool:

from google.adk.agents import Agent
from google.adk.tools.agent_tool import AgentTool
from mem0 import MemoryClient

# Initialize Mem0 client
mem0 = MemoryClient()

# Define memory tool functions
def search_memory(query: str, user_id: str):
    """Retrieve memories matching the query for a user."""
    results = mem0.search(query=query, filters={"user_id": user_id}).get("results", [])
    if results:
        return "\n".join([m["memory"] for m in results[:3]])
    return ""

def save_memory(text: str, user_id: str)-> dict:
    """Save a memory for a user."""
    try:
        mem0.add(text, user_id=user_id)
        return {"status": "success"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

# Specialist agents with memory tools
travel_agent = Agent(
    name="travel_specialist",
    model="gemini-2.5-flash",
    instruction=(
        "You are a travel planning specialist. "
        "Use search_memory to understand user travel preferences before making recommendations. "
        "Save important preferences using save_memory."
    ),
    tools=[search_memory, save_memory]
)

fitness_agent = Agent(
    name="fitness_advisor",
    model="gemini-2.5-flash",
    instruction=(
        "You are a fitness advisor. "
        "Use search_memory to understand dietary restrictions and fitness goals. "
        "Save workout preferences and health constraints."
    ),
    tools=[search_memory, save_memory]
)

# Coordinator delegates to specialists
coordinator = Agent(
    name="coordinator",
    model="gemini-2.5-flash",
    instruction=(
        "Delegate travel questions to travel_specialist and fitness questions to fitness_advisor. "
        "Use search_memory to understand user context before delegation."
    ),
    tools=[
        AgentTool(agent=travel_agent),
        AgentTool(agent=fitness_agent),
        search_memory
    ]
)

from google.adk.agents import Agent
from google.adk.tools.agent_tool import AgentTool
from mem0 import MemoryClient

# Initialize Mem0 client
mem0 = MemoryClient()

# Define memory tool functions
def search_memory(query: str, user_id: str):
    """Retrieve memories matching the query for a user."""
    results = mem0.search(query=query, filters={"user_id": user_id}).get("results", [])
    if results:
        return "\n".join([m["memory"] for m in results[:3]])
    return ""

def save_memory(text: str, user_id: str)-> dict:
    """Save a memory for a user."""
    try:
        mem0.add(text, user_id=user_id)
        return {"status": "success"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

# Specialist agents with memory tools
travel_agent = Agent(
    name="travel_specialist",
    model="gemini-2.5-flash",
    instruction=(
        "You are a travel planning specialist. "
        "Use search_memory to understand user travel preferences before making recommendations. "
        "Save important preferences using save_memory."
    ),
    tools=[search_memory, save_memory]
)

fitness_agent = Agent(
    name="fitness_advisor",
    model="gemini-2.5-flash",
    instruction=(
        "You are a fitness advisor. "
        "Use search_memory to understand dietary restrictions and fitness goals. "
        "Save workout preferences and health constraints."
    ),
    tools=[search_memory, save_memory]
)

# Coordinator delegates to specialists
coordinator = Agent(
    name="coordinator",
    model="gemini-2.5-flash",
    instruction=(
        "Delegate travel questions to travel_specialist and fitness questions to fitness_advisor. "
        "Use search_memory to understand user context before delegation."
    ),
    tools=[
        AgentTool(agent=travel_agent),
        AgentTool(agent=fitness_agent),
        search_memory
    ]
)

from google.adk.agents import Agent
from google.adk.tools.agent_tool import AgentTool
from mem0 import MemoryClient

# Initialize Mem0 client
mem0 = MemoryClient()

# Define memory tool functions
def search_memory(query: str, user_id: str):
    """Retrieve memories matching the query for a user."""
    results = mem0.search(query=query, filters={"user_id": user_id}).get("results", [])
    if results:
        return "\n".join([m["memory"] for m in results[:3]])
    return ""

def save_memory(text: str, user_id: str)-> dict:
    """Save a memory for a user."""
    try:
        mem0.add(text, user_id=user_id)
        return {"status": "success"}
    except Exception as e:
        return {"status": "error", "message": str(e)}

# Specialist agents with memory tools
travel_agent = Agent(
    name="travel_specialist",
    model="gemini-2.5-flash",
    instruction=(
        "You are a travel planning specialist. "
        "Use search_memory to understand user travel preferences before making recommendations. "
        "Save important preferences using save_memory."
    ),
    tools=[search_memory, save_memory]
)

fitness_agent = Agent(
    name="fitness_advisor",
    model="gemini-2.5-flash",
    instruction=(
        "You are a fitness advisor. "
        "Use search_memory to understand dietary restrictions and fitness goals. "
        "Save workout preferences and health constraints."
    ),
    tools=[search_memory, save_memory]
)

# Coordinator delegates to specialists
coordinator = Agent(
    name="coordinator",
    model="gemini-2.5-flash",
    instruction=(
        "Delegate travel questions to travel_specialist and fitness questions to fitness_advisor. "
        "Use search_memory to understand user context before delegation."
    ),
    tools=[
        AgentTool(agent=travel_agent),
        AgentTool(agent=fitness_agent),
        search_memory
    ]
)

If the travel agent saves a memory like “User prefers window seats on long flights,” the fitness agent can later retrieve that information when planning in-flight routines. The coordinator can also access it for future travel queries.

This works because Mem0 runs outside ADK’s session service. All agents query the same backend using user_id. Whether a request hits Pod 1 or Pod 2, they read from the same memory store.

The key idea is simple: memory is infrastructure, not application state. Once it lives outside the agent process, every instance and every agent can access it.

What are the production considerations?

When you move from localhost to Cloud Run or GKE with Mem0, a few operational concerns become relevant. For production deployment patterns on GKE, see the GKE AI Labs tutorial on ADK memory.

Memory scoping and user isolation

Every Mem0 operation requires a user_id parameter. This scopes reads and writes to prevent cross-user data leakage:

# User A's memories
mem0.search(query="preferences", filters={"user_id": "user_a"})

# User B's memories (completely isolated)
mem0.search(query="preferences", filters={"user_id": "user_b"})

# User A's memories
mem0.search(query="preferences", filters={"user_id": "user_a"})

# User B's memories (completely isolated)
mem0.search(query="preferences", filters={"user_id": "user_b"})

# User A's memories
mem0.search(query="preferences", filters={"user_id": "user_a"})

# User B's memories (completely isolated)
mem0.search(query="preferences", filters={"user_id": "user_b"})

Mem0 enforces this at the API level. Even if multiple agent instances run simultaneously, each request only retrieves memories tied to its `user_id`.

Always validate and sanitize the `user_id` before sending it to Mem0. Use authenticated identifiers, not raw user input. Otherwise, one user could guess another user’s ID and access their memory. For more on securing AI agent memory, see best practices for memory isolation and access control.

For high-volume systems, consider memory retention policies:

Delete or archive memories older than 90 days for inactive users
Summarize older memories instead of storing every conversation turn
Keep recent interactions and prune outdated context

If latency matters, don’t search memory on every request. Let the agent call search_memory only when needed. That keeps API calls and response time low.

When to use Mem0 vs. ADK's built-in options

With ADK, you can use the built-in InMemoryMemoryService, Google’s Vertex AI Memory Bank, or an external memory layer like Mem0.

Use InMemoryMemoryService when:

Prototyping locally
Running single-instance demos
Building proof-of-concept applications

Use VertexAI Memory Bank when:

Already using Vertex AI Agent Engine
running single-instance demos
Want native Google Cloud integration

Learn more about Google’s approach to ADK’s state and memory in their official blog.

Use Mem0 when:

You need vendor-independent persistent memory
You use Mem0 across other frameworks like LangChain or LlamaIndex
You want semantic search without running your own vector database
You deploy across multiple clouds

Both VertexAI Memory Bank and Mem0 solve the same core problem: persistent memory. The difference is infrastructure choice and portability.

Production agents need persistent memory

ADK handles orchestration. Mem0 handles long-term memory. Together, they make agents stateful across restarts and scaling events.

Users stop repeating themselves. Context persists across restarts and scaled replicas. Multiple replicas share the same knowledge base.

The implementation is straightforward: two tool functions, external storage, and semantic search.

For full implementation details and API reference, see the Mem0 ADK integration documentation

FAQs

Why do Google ADK agents lose memory between sessions?

ADK's default InMemoryMemoryService stores all context in a Python dictionary in RAM. When the process restarts, a pod is replaced, or traffic routes to a different replica, that dictionary is gone. It's in-process storage, not a shared external service.

What is the difference between ADK's InMemoryMemoryService and Mem0?

InMemoryMemoryService lives inside the agent process and uses keyword matching. Mem0 runs as an external service, stores memories as vector embeddings, and supports semantic search — meaning it can surface relevant context even when the exact words don't match.

Does Mem0 work across multiple ADK agent replicas on GKE or Cloud Run?

Yes. Because Mem0 runs outside the agent process, every replica reads from and writes to the same memory backend. Whether a request hits Pod 1 or Pod 3, the agent has access to the same user history.

How do ADK agents call Mem0?

ADK agents call Mem0 through two Python tool functions — search_memory and save_memory — registered in the agent's tools parameter. The agent's LLM decides when to invoke them based on context.

Can multiple ADK agents in a multi-agent system share Mem0 memory?

Yes. Because memory is scoped by user_id and stored externally, any agent in the hierarchy — coordinator or specialist — can read and write to the same store. A preference saved by the travel agent is accessible to the fitness agent in the same session.

How does Mem0 prevent one user's memories from leaking to another?

Every Mem0 read and write requires a user_id parameter. Mem0 enforces this scoping at the API level, ensuring each query only returns memories tied to that specific user.

When should I use Mem0 instead of Vertex AI Memory Bank?

Use Mem0 when you want vendor-independent persistent memory, deploy across multiple clouds, or already use Mem0 across other frameworks like LangChain or LlamaIndex. Use Vertex AI Memory Bank when you're already committed to Vertex AI Agent Engine and prefer native Google Cloud integration.

Can I automatically store every conversation without relying on tool calls?

Yes. ADK's Runner emits events during execution. When a final response event fires, you can extract the user input and agent response and store the conversation pair in Mem0 directly — no explicit save command needed from the user or the agent.

GET TLDR from: