You build an agent on your local machine using the Google Agent Development Kit (ADK). It works perfectly. It remembers your name, it recalls that you prefer Python over Java, and it handles complex multi-turn conversations.
Then you deploy it.
You push it to Cloud Run or a Google Kubernetes Engine (GKE) cluster. You scale it to three replicas to handle traffic. The moment a pod restarts or traffic routes to another replica, your agent loses all context.
This is a fundamental architectural constraint of ADK's default configuration. The built-in InMemoryMemoryService stores context in RAM. When the process dies or the request routes to a different instance, the memory is gone. For production agents that need to survive infrastructure updates and scale horizontally, you need external persistent storage.
This guide shows you how to wire Mem0 into Google ADK agents to give them persistent, semantic memory that survives restarts and follows users across sessions, regardless of which instance handles the request.
TL;DR
ADK’s default InMemoryMemoryService stores data in RAM and loses it on restart.
Agents can store and retrieve memory via Python tools (search_memory, save_memory).
Shared memory enables multi-agent coordination without cross-user leakage.
ADK Memory options vs Mem0
So what memory options are available inside ADK today?
Let’s compare the default in-memory implementation, Vertex AI Memory Bank, and Mem0 across persistence, semantic capabilities, multi-agent sharing, and operational complexity.
Feature
InMemoryMemoryService
VertexAI Memory Bank
Mem0
Persistence
No (RAM only)
Yes (managed by Vertex AI)
Yes
Survives restarts
No
Yes
Yes
Semantic search
No (keyword only)
Yes
Yes
Multi-agent sharing
No
Yes (same Agent Engine)
Yes (any agent)
Setup complexity
Low
Medium to high
Medium
Vendor lock-in
None
Google Cloud
None
Prerequisites and Setup
Before you wire Mem0 into your ADK agents, you need API keys and dependencies.
Google ADK separates conversation management into three layers: session, state, and memory.
Session stores conversation history for a single thread.
State is temporary key-value data tied to a session.
Memory is meant to persist information across sessions.
The problem is ADK’s default implementation.
InMemoryMemoryService stores everything in a Python dictionary in RAM and relies on keyword matching. According to the ADK documentation on memory, all data is lost when the application restarts.
In distributed environments like GKE with multiple replicas, each instance maintains its own in-memory store. When traffic shifts between replicas or a pod restarts, previously stored memory disappears. This makes the default implementation unsuitable for production deployments.
To fix this, you need a memory layer that runs outside the agent process and is accessible to every instance.
How does Mem0 solve the memory problem for ADK agents?
Mem0 provides a persistent semantic memory layer outside your ADK runtime.
Instead of relying on keyword matching, Mem0 converts memories into vector embeddings. This allows your agent to retrieve information based on meaning rather than exact text.
For example, a user says, “I don’t eat meat” in one session. Later, they ask, “What protein sources work for me?”
Keyword matching fails. The words "meat" and "protein sources" don't overlap.
Semantic search succeeds. It understands that "doesn't eat meat" relates to "protein sources" even though the exact words are different. The embeddings capture meaning, not just exact text matches.
In practice, ADK’s in-memory service checks for exact string matches, while Mem0 retrieves information based on semantic relevance. This allows your agent to recall related concepts even when the user phrases things differently across sessions.
The integration itself is simple. You access Mem0 through two tool functions and register them with your agent.
The tool function pattern
Mem0 integrates with ADK agents via two Python tool functions. You define search_memory and save_memory, initialize the MemoryClient, and register these functions with the agent’s tools parameter. During conversations, the agent’s LLM decides when to call each tool. Here is the core setup:
from mem0 importMemoryClientfromgoogle.adk.agentsimportAgentfromdotenv importload_dotenv
# Load environment variables(API keys,etc.)load_dotenv()
# Initialize Mem0 clientmem0 = MemoryClient()
# --- Memory tool functions ---
def search_memory(query: str,user_id: str):"""Retrieve memories matching the query for a user."""results = mem0.search(query=query,filters={"user_id":user_id}).get("results",[])if results:
# Return only top memory,or join all ifyou wantreturn"\n".join([m["memory"]for m inresults[:1]])return""def save_memory(text: str,user_id: str)-> dict:"""Save a memory for a user."""try:mem0.add(text,user_id=user_id)return{"status":"success"}except Exceptionas e:return{"status":"error","message":str(e)}
# --- Create agent ---
assistant = Agent(name="assistant",model="gemini-2.5-flash",instruction="Use memory tools to personalize responses.",tools=[search_memory,save_memory],)
# --- Example usage ---
# Save a few memoriesif__name__ == "__main__":save_memory("I am allergic to peanuts and love spicy food.",user_id="abhay")save_memory("I like to travel to Paris.",user_id="abhay")save_memory("My favorite color is blue.",user_id="abhay")
# Retrieve relevant memoriesquestion = "What food should I avoid?"print(f"\nQuestion: {question}")print("Found in memory:",search_memory(question,user_id="abhay"))
from mem0 importMemoryClientfromgoogle.adk.agentsimportAgentfromdotenv importload_dotenv
# Load environment variables(API keys,etc.)load_dotenv()
# Initialize Mem0 clientmem0 = MemoryClient()
# --- Memory tool functions ---
def search_memory(query: str,user_id: str):"""Retrieve memories matching the query for a user."""results = mem0.search(query=query,filters={"user_id":user_id}).get("results",[])if results:
# Return only top memory,or join all ifyou wantreturn"\n".join([m["memory"]for m inresults[:1]])return""def save_memory(text: str,user_id: str)-> dict:"""Save a memory for a user."""try:mem0.add(text,user_id=user_id)return{"status":"success"}except Exceptionas e:return{"status":"error","message":str(e)}
# --- Create agent ---
assistant = Agent(name="assistant",model="gemini-2.5-flash",instruction="Use memory tools to personalize responses.",tools=[search_memory,save_memory],)
# --- Example usage ---
# Save a few memoriesif__name__ == "__main__":save_memory("I am allergic to peanuts and love spicy food.",user_id="abhay")save_memory("I like to travel to Paris.",user_id="abhay")save_memory("My favorite color is blue.",user_id="abhay")
# Retrieve relevant memoriesquestion = "What food should I avoid?"print(f"\nQuestion: {question}")print("Found in memory:",search_memory(question,user_id="abhay"))
from mem0 importMemoryClientfromgoogle.adk.agentsimportAgentfromdotenv importload_dotenv
# Load environment variables(API keys,etc.)load_dotenv()
# Initialize Mem0 clientmem0 = MemoryClient()
# --- Memory tool functions ---
def search_memory(query: str,user_id: str):"""Retrieve memories matching the query for a user."""results = mem0.search(query=query,filters={"user_id":user_id}).get("results",[])if results:
# Return only top memory,or join all ifyou wantreturn"\n".join([m["memory"]for m inresults[:1]])return""def save_memory(text: str,user_id: str)-> dict:"""Save a memory for a user."""try:mem0.add(text,user_id=user_id)return{"status":"success"}except Exceptionas e:return{"status":"error","message":str(e)}
# --- Create agent ---
assistant = Agent(name="assistant",model="gemini-2.5-flash",instruction="Use memory tools to personalize responses.",tools=[search_memory,save_memory],)
# --- Example usage ---
# Save a few memoriesif__name__ == "__main__":save_memory("I am allergic to peanuts and love spicy food.",user_id="abhay")save_memory("I like to travel to Paris.",user_id="abhay")save_memory("My favorite color is blue.",user_id="abhay")
# Retrieve relevant memoriesquestion = "What food should I avoid?"print(f"\nQuestion: {question}")print("Found in memory:",search_memory(question,user_id="abhay"))
When you call search_memory with a query about dietary preferences, Mem0 can return relevant memories even if the query does not contain the exact words. For example, asking “What food should I avoid?” can surface stored information about allergies or dietary restrictions, such as avoiding peanuts.
This works because Mem0 compares vector embeddings rather than raw strings. The query and stored memories are matched by meaning, allowing related preferences and constraints to surface automatically even when phrasing differs.
These functions are invoked through ADK’s tool system. Because Mem0 stores memory outside the agent process, all instances read and write to the same backend. Memories persist across restarts, and semantic retrieval ensures agents receive contextually relevant information rather than simple keyword matches.
Automatic conversation storage
The pattern above relies on the agent deciding when to call save_memory. For more reliable memory generation, you can automatically store conversations after each agent response without depending on tool calls.
ADK's Runner yields events during execution. When the runner emits an is_final_response event, you can extract the user input and agent response, then store the conversation pair directly in Mem0:
from google.adk.runnersimportRunnerfromgoogle.adk.sessionsimportInMemorySessionServicefromgoogle importgenaifrommem0 import MemoryClient
# Initialize Mem0 clientmem0 = MemoryClient()asyncdef chat_with_agent(user_input: str,user_id: str) -> str:"""
Handle user input withautomatic memory storage.
Args:
user_input:The user's message
user_id:Unique identifier forthe user
Returns:The agent's response
"""
# Create sessionsession_service = InMemorySessionService()session = awaitsession_service.create_session(app_name="assistant",user_id=user_id,session_id=f"session_{user_id}")
# Initialize runnerrunner = Runner(agent=assistant,app_name="assistant",session_service=session_service)
# Process user messagecontent = genai.types.Content(role='user',parts=[genai.types.Part(text=user_input)])events = awaitrunner.run_async(user_id=user_id,session_id=session.id,new_message=content)
# Extract final response and store conversationforeventinevents:ifevent.is_final_response():response = event.content.parts[0].text
# Store conversation pairinMem0conversation = [{"role":"user","content":user_input},{"role":"assistant","content":response}]mem0.add(conversation,user_id=user_id)returnresponse
from google.adk.runnersimportRunnerfromgoogle.adk.sessionsimportInMemorySessionServicefromgoogle importgenaifrommem0 import MemoryClient
# Initialize Mem0 clientmem0 = MemoryClient()asyncdef chat_with_agent(user_input: str,user_id: str) -> str:"""
Handle user input withautomatic memory storage.
Args:
user_input:The user's message
user_id:Unique identifier forthe user
Returns:The agent's response
"""
# Create sessionsession_service = InMemorySessionService()session = awaitsession_service.create_session(app_name="assistant",user_id=user_id,session_id=f"session_{user_id}")
# Initialize runnerrunner = Runner(agent=assistant,app_name="assistant",session_service=session_service)
# Process user messagecontent = genai.types.Content(role='user',parts=[genai.types.Part(text=user_input)])events = awaitrunner.run_async(user_id=user_id,session_id=session.id,new_message=content)
# Extract final response and store conversationforeventinevents:ifevent.is_final_response():response = event.content.parts[0].text
# Store conversation pairinMem0conversation = [{"role":"user","content":user_input},{"role":"assistant","content":response}]mem0.add(conversation,user_id=user_id)returnresponse
from google.adk.runnersimportRunnerfromgoogle.adk.sessionsimportInMemorySessionServicefromgoogle importgenaifrommem0 import MemoryClient
# Initialize Mem0 clientmem0 = MemoryClient()asyncdef chat_with_agent(user_input: str,user_id: str) -> str:"""
Handle user input withautomatic memory storage.
Args:
user_input:The user's message
user_id:Unique identifier forthe user
Returns:The agent's response
"""
# Create sessionsession_service = InMemorySessionService()session = awaitsession_service.create_session(app_name="assistant",user_id=user_id,session_id=f"session_{user_id}")
# Initialize runnerrunner = Runner(agent=assistant,app_name="assistant",session_service=session_service)
# Process user messagecontent = genai.types.Content(role='user',parts=[genai.types.Part(text=user_input)])events = awaitrunner.run_async(user_id=user_id,session_id=session.id,new_message=content)
# Extract final response and store conversationforeventinevents:ifevent.is_final_response():response = event.content.parts[0].text
# Store conversation pairinMem0conversation = [{"role":"user","content":user_input},{"role":"assistant","content":response}]mem0.add(conversation,user_id=user_id)returnresponse
This ensures every interaction is stored. The agent builds a searchable memory corpus without requiring explicit "remember this" commands from users. Each turn becomes context for future sessions.
Later, when the same user asks about upcoming trips, semantic search retrieves the Paris travel plan even if the new query doesn’t mention flights or bookings. The embeddings connect related concepts like “cities I’m visiting” and “trip to Paris” without relying on exact keywords.
How do you implement Mem0 in a multi-agent ADK system?
ADK is designed for multi-agent orchestration. A coordinator delegates tasks to specialist agents. With Mem0, all of these agents share the same external memory layer.
The pattern is simple. The coordinator receives the user query, searches Mem0 for context, then routes the request to the right specialist. Each specialist also has access to search_memory and save_memory, so they can read and write user-specific context.
Shared memory across agent hierarchies
Here's a coordinator setup with specialist agents using AgentTool:
from google.adk.agentsimportAgentfromgoogle.adk.tools.agent_toolimportAgentToolfrommem0 importMemoryClient
# Initialize Mem0 clientmem0 = MemoryClient()
# Define memory tool functionsdef search_memory(query: str,user_id: str):"""Retrieve memories matching the query for a user."""results = mem0.search(query=query,filters={"user_id":user_id}).get("results",[])if results:return"\n".join([m["memory"]for m inresults[:3]])return""def save_memory(text: str,user_id: str)-> dict:"""Save a memory for a user."""try:mem0.add(text,user_id=user_id)return{"status":"success"}except Exceptionas e:return{"status":"error","message":str(e)}
# Specialist agents withmemory toolstravel_agent = Agent(name="travel_specialist",model="gemini-2.5-flash",instruction=("You are a travel planning specialist. ""Use search_memory to understand user travel preferences before making recommendations. ""Save important preferences using save_memory."),tools=[search_memory,save_memory])fitness_agent = Agent(name="fitness_advisor",model="gemini-2.5-flash",instruction=("You are a fitness advisor. ""Use search_memory to understand dietary restrictions and fitness goals. ""Save workout preferences and health constraints."),tools=[search_memory,save_memory])
# Coordinator delegates to specialistscoordinator = Agent(name="coordinator",model="gemini-2.5-flash",instruction=("Delegate travel questions to travel_specialist and fitness questions to fitness_advisor. ""Use search_memory to understand user context before delegation."),tools=[AgentTool(agent=travel_agent),AgentTool(agent=fitness_agent),search_memory])
from google.adk.agentsimportAgentfromgoogle.adk.tools.agent_toolimportAgentToolfrommem0 importMemoryClient
# Initialize Mem0 clientmem0 = MemoryClient()
# Define memory tool functionsdef search_memory(query: str,user_id: str):"""Retrieve memories matching the query for a user."""results = mem0.search(query=query,filters={"user_id":user_id}).get("results",[])if results:return"\n".join([m["memory"]for m inresults[:3]])return""def save_memory(text: str,user_id: str)-> dict:"""Save a memory for a user."""try:mem0.add(text,user_id=user_id)return{"status":"success"}except Exceptionas e:return{"status":"error","message":str(e)}
# Specialist agents withmemory toolstravel_agent = Agent(name="travel_specialist",model="gemini-2.5-flash",instruction=("You are a travel planning specialist. ""Use search_memory to understand user travel preferences before making recommendations. ""Save important preferences using save_memory."),tools=[search_memory,save_memory])fitness_agent = Agent(name="fitness_advisor",model="gemini-2.5-flash",instruction=("You are a fitness advisor. ""Use search_memory to understand dietary restrictions and fitness goals. ""Save workout preferences and health constraints."),tools=[search_memory,save_memory])
# Coordinator delegates to specialistscoordinator = Agent(name="coordinator",model="gemini-2.5-flash",instruction=("Delegate travel questions to travel_specialist and fitness questions to fitness_advisor. ""Use search_memory to understand user context before delegation."),tools=[AgentTool(agent=travel_agent),AgentTool(agent=fitness_agent),search_memory])
from google.adk.agentsimportAgentfromgoogle.adk.tools.agent_toolimportAgentToolfrommem0 importMemoryClient
# Initialize Mem0 clientmem0 = MemoryClient()
# Define memory tool functionsdef search_memory(query: str,user_id: str):"""Retrieve memories matching the query for a user."""results = mem0.search(query=query,filters={"user_id":user_id}).get("results",[])if results:return"\n".join([m["memory"]for m inresults[:3]])return""def save_memory(text: str,user_id: str)-> dict:"""Save a memory for a user."""try:mem0.add(text,user_id=user_id)return{"status":"success"}except Exceptionas e:return{"status":"error","message":str(e)}
# Specialist agents withmemory toolstravel_agent = Agent(name="travel_specialist",model="gemini-2.5-flash",instruction=("You are a travel planning specialist. ""Use search_memory to understand user travel preferences before making recommendations. ""Save important preferences using save_memory."),tools=[search_memory,save_memory])fitness_agent = Agent(name="fitness_advisor",model="gemini-2.5-flash",instruction=("You are a fitness advisor. ""Use search_memory to understand dietary restrictions and fitness goals. ""Save workout preferences and health constraints."),tools=[search_memory,save_memory])
# Coordinator delegates to specialistscoordinator = Agent(name="coordinator",model="gemini-2.5-flash",instruction=("Delegate travel questions to travel_specialist and fitness questions to fitness_advisor. ""Use search_memory to understand user context before delegation."),tools=[AgentTool(agent=travel_agent),AgentTool(agent=fitness_agent),search_memory])
If the travel agent saves a memory like “User prefers window seats on long flights,” the fitness agent can later retrieve that information when planning in-flight routines. The coordinator can also access it for future travel queries.
This works because Mem0 runs outside ADK’s session service. All agents query the same backend using user_id. Whether a request hits Pod 1 or Pod 2, they read from the same memory store.
The key idea is simple: memory is infrastructure, not application state. Once it lives outside the agent process, every instance and every agent can access it.
What are the production considerations?
When you move from localhost to Cloud Run or GKE with Mem0, a few operational concerns become relevant. For production deployment patterns on GKE, see the GKE AI Labs tutorial on ADK memory.
Memory scoping and user isolation
Every Mem0 operation requires a user_id parameter. This scopes reads and writes to prevent cross-user data leakage:
# User A's memories
mem0.search(query="preferences",filters={"user_id":"user_a"})
# User B's memories (completely isolated)
mem0.search(query="preferences",filters={"user_id":"user_b"})
# User A's memories
mem0.search(query="preferences",filters={"user_id":"user_a"})
# User B's memories (completely isolated)
mem0.search(query="preferences",filters={"user_id":"user_b"})
# User A's memories
mem0.search(query="preferences",filters={"user_id":"user_a"})
# User B's memories (completely isolated)
mem0.search(query="preferences",filters={"user_id":"user_b"})
Mem0 enforces this at the API level. Even if multiple agent instances run simultaneously, each request only retrieves memories tied to its `user_id`.
Always validate and sanitize the `user_id` before sending it to Mem0. Use authenticated identifiers, not raw user input. Otherwise, one user could guess another user’s ID and access their memory. For more on securing AI agent memory, see best practices for memory isolation and access control.
For high-volume systems, consider memory retention policies:
Delete or archive memories older than 90 days for inactive users
Summarize older memories instead of storing every conversation turn
Keep recent interactions and prune outdated context
If latency matters, don’t search memory on every request. Let the agent call search_memory only when needed. That keeps API calls and response time low.
When to use Mem0 vs. ADK's built-in options
With ADK, you can use the built-in InMemoryMemoryService, Google’s Vertex AI Memory Bank, or an external memory layer like Mem0.
Why do Google ADK agents lose memory between sessions?
ADK's default InMemoryMemoryService stores all context in a Python dictionary in RAM. When the process restarts, a pod is replaced, or traffic routes to a different replica, that dictionary is gone. It's in-process storage, not a shared external service.
What is the difference between ADK's InMemoryMemoryService and Mem0?
InMemoryMemoryService lives inside the agent process and uses keyword matching. Mem0 runs as an external service, stores memories as vector embeddings, and supports semantic search — meaning it can surface relevant context even when the exact words don't match.
Does Mem0 work across multiple ADK agent replicas on GKE or Cloud Run?
Yes. Because Mem0 runs outside the agent process, every replica reads from and writes to the same memory backend. Whether a request hits Pod 1 or Pod 3, the agent has access to the same user history.
How do ADK agents call Mem0?
ADK agents call Mem0 through two Python tool functions — search_memory and save_memory — registered in the agent's tools parameter. The agent's LLM decides when to invoke them based on context.
Can multiple ADK agents in a multi-agent system share Mem0 memory?
Yes. Because memory is scoped by user_id and stored externally, any agent in the hierarchy — coordinator or specialist — can read and write to the same store. A preference saved by the travel agent is accessible to the fitness agent in the same session.
How does Mem0 prevent one user's memories from leaking to another?
Every Mem0 read and write requires a user_id parameter. Mem0 enforces this scoping at the API level, ensuring each query only returns memories tied to that specific user.
When should I use Mem0 instead of Vertex AI Memory Bank?
Use Mem0 when you want vendor-independent persistent memory, deploy across multiple clouds, or already use Mem0 across other frameworks like LangChain or LlamaIndex. Use Vertex AI Memory Bank when you're already committed to Vertex AI Agent Engine and prefer native Google Cloud integration.
Can I automatically store every conversation without relying on tool calls?
Yes. ADK's Runner emits events during execution. When a final response event fires, you can extract the user input and agent response and store the conversation pair in Mem0 directly — no explicit save command needed from the user or the agent.