DEVELOPERS

PRICING

USECASES

RESOURCES

DOCS

Star

home_primary_get-started

Home

Start For Free

DEVELOPERS

PRICING

USECASES

RESOURCES

DOCS

home_primary_get-started

Home

Start For Free

Blog

Miscellaneous

How to Add Memory to OpenAI Responses API Agents

Aashi Dutt

•

June 23, 2026

How to Add Memory to OpenAI Responses API Agents

Production agents built on the OpenAI Responses API need more than a single request and response. They need to remember users, decisions, and facts across many turns and sessions. The core challenge is that the Responses API is stateless, while real agents must behave statefully.

This article explains how to add long-term memory to OpenAI Responses API agents using Mem0. It covers what "memory" means in this context, how to architect a memory layer, and how Mem0 fits cleanly into an existing Responses API workflow with concrete Python code.

What memory means for Responses API agents

The OpenAI Responses API is designed around a single call that can handle tools, reasoning, and response generation. It works best when given all the context needed for the current task. That context is typically short-lived, for example, the last few messages in a chat, the current tool outputs, or a task-specific prompt.

Memory, by contrast, is long-lived state tied to identities and tasks. For Responses API agents, useful memory usually falls into four categories:

User profile memory: Preferences, constraints, and traits that should influence future responses. Example: "User is vegetarian", "Prefers terse answers".
Interaction history memory: Summaries of past conversations or decisions that should persist across sessions. Example: "Yesterday we debugged a FastAPI authentication issue".
Task and project memory: Long-running task details that span multiple sessions. Example: "This agent is helping draft a 3-part technical report; we completed sections 1 and 2".
Operational memory: Internal notes that help the agent maintain consistency or reduce re-computation. Example: "We already checked this user's billing status; they are active".

The Responses API does not store this state for you. Tokens used for context must fit within the model's context window, and everything outside each request is the developer's responsibility. That is where a dedicated memory layer becomes essential.

Stateless core, stateful behavior

The Responses API gives a stateless interface. You send a prompt plus optional messages, tools, and tool calls, and the model returns a response object with content and tool outputs.

Behaving as a stateful agent means the application must:

Identify the entity: Usually a user_id, session_id, or some composite key.
Load relevant memory: Given the identity and current query, fetch only the most relevant past information.
Merge memory into the prompt: Insert memory into system messages or additional user messages in a structured way.
Generate response: Call the Responses API with the enriched context.
Write back new memory: After receiving the response, extract what should be remembered and store it.

The main difficulties are scaling beyond simple "store the last N messages" buffers, deciding what to store and what to ignore, efficiently searching large memory collections, and maintaining privacy and isolation between users. Mem0 specifically addresses these memory concerns so the agent logic can remain focused on tools, reasoning, and workflow.

Core memory operations for OpenAI agents

For an OpenAI Responses API agent to use memory effectively, it needs a few core operations exposed as simple functions:

add(messages, user_id, metadata) Stores a piece of information, often a snippet of conversation, a summary, or a structured fact.
search(query, filters, limit) retrieves the most relevant past memories given the current user query or task.
update(memory_id, data) refines or corrects existing memories when the user changes preferences or corrects facts.
delete(memory_id) or delete_all(user_id) removes data for compliance reasons or user requests.

Everything else builds on these primitives. The memory layer handles embeddings, search, ranking, and persistence. The Responses API integration only needs to call these primitives at the right points in the request lifecycle. Mem0 exposes these operations through both SDKs and HTTP APIs, and aligns them with how agents already structure sessions and users.

How Mem0 structures memory for Responses API agents

Mem0 treats memory as a first-class resource tied to identities and metadata. Typical objects include an id, a user_id, the memory text content, a metadata JSON blob for tags or types, created_at and updated_at timestamps, and an embedding handled internally by Mem0 for similarity search.

For a Responses API agent, metadata can encode agent-specific context, for example {"type": "preference", "source": "chat"}, {"type": "project_note", "project_id": "abc123"}, or {"type": "system_state", "key": "billing_status"}.

This structure lets you store arbitrary unstructured text, filter by type or project, run semantic search over all prior content, and delete cleanly per user or per project. The integration with the Responses API then becomes about when to call Mem0, what text to send, and how to embed memory into prompts.

Reference architecture for a memory-augmented Responses API agent

A typical architecture for an OpenAI Responses API agent with Mem0 looks like this:

Request received: Includes user_id and current input text.
Retrieve relevant memories: Call Mem0 search() with the user scope and query text, optionally filtering by memory type or project.
Build prompt: A system message describes the agent and memory usage, an additional message summarizes retrieved memories, and a user message contains the current request.
Call OpenAI Responses API: Pass messages and any tools or tool configurations.
Process response: Extract any memory-worthy information from the user input and response, then store it with Mem0 add().
Return response to client: Include the Responses API result.

This pattern can run synchronously in a web server or async in an event-driven setup. The key is that memory calls sit before and after each Responses API invocation.

Python example integrating Mem0 with the OpenAI Responses API

The following example demonstrates a minimal integration in Python. It assumes OPENAI_API_KEY is set for the OpenAI client, MEM0_API_KEY is set for Mem0, and the agent should remember user preferences and prior topics.

👉Wanna give it a try? Get a Mem0 API Key and try it yourself.

import os
from typing import List, Dict, Any

from openai import OpenAI
from mem0 import MemoryClient  # pip install mem0ai

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
mem0_client = MemoryClient(api_key=os.environ["MEM0_API_KEY"])

def fetch_relevant_memories(user_id: str, query: str, limit: int = 5) -> List[Dict[str, Any]]:
    """Retrieve the most relevant memories for this user and query.

    search() takes the query first; scope goes in filters=. Results come
    back under the "results" key.
    """
    result = mem0_client.search(
        query,
        filters={"user_id": user_id},
        limit=limit,
    )
    return result.get("results", [])

def format_memories_for_prompt(memories: List[Dict[str, Any]]) -> str:
    """Convert retrieved memories into a compact text block."""
    if not memories:
        return "No prior memories for this user."

    lines = []
    for m in memories:
        content = m.get("memory", "")  # the text field is "memory"
        mtype = m.get("metadata", {}).get("type", "general")
        lines.append(f"- ({mtype}) {content}")
    return "Relevant past information:\n" + "\n".join(lines)

def store_new_memory(user_id: str, text: str, mtype: str = "interaction") -> None:
    """Store a new memory snippet for the given user.

    add() takes a messages list first, then user_id as a keyword.
    """
    if not text.strip():
        return
    mem0_client.add(
        [{"role": "user", "content": text}],
        user_id=user_id,
        metadata={"type": mtype},
    )

def call_responses_api_with_memory(user_id: str, user_input: str) -> str:
    """Wrap an OpenAI Responses API call with memory retrieval and storage."""
    # 1. Retrieve relevant prior memories
    memories = fetch_relevant_memories(user_id=user_id, query=user_input)
    memory_block = format_memories_for_prompt(memories)

    # 2. Build messages for the Responses API
    system_prompt = (
        "You are a helpful assistant that uses the provided memory about the user. "
        "Always respect user preferences found in memory. "
        "If memory conflicts with new explicit instructions, follow the new instructions."
    )

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "system", "content": f"MEMORY CONTEXT:\n{memory_block}"},
        {"role": "user", "content": user_input},
    ]

    # 3. Call the Responses API. The SDK's output_text helper aggregates the
    #    text output for you, so there is no need to walk the output array by hand.
    response = openai_client.responses.create(
        model="gpt-5.5",
        input=messages,
    )
    output_text = response.output_text

    # 4. Store this interaction. In production, prefer the selective
    #    extraction pattern shown below over storing raw turns.
    store_new_memory(user_id=user_id, text=user_input, mtype="user_input")
    store_new_memory(user_id=user_id, text=output_text, mtype="assistant_reply")

    return output_text

if __name__ == "__main__":
    user_id = "user_123"
    while True:
        user_msg = input("You: ").strip()
        if user_msg.lower() in {"exit", "quit"}:
            break
        reply = call_responses_api_with_memory(user_id, user_msg)
        print("Agent:", reply)

import os
from typing import List, Dict, Any

from openai import OpenAI
from mem0 import MemoryClient  # pip install mem0ai

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
mem0_client = MemoryClient(api_key=os.environ["MEM0_API_KEY"])

def fetch_relevant_memories(user_id: str, query: str, limit: int = 5) -> List[Dict[str, Any]]:
    """Retrieve the most relevant memories for this user and query.

    search() takes the query first; scope goes in filters=. Results come
    back under the "results" key.
    """
    result = mem0_client.search(
        query,
        filters={"user_id": user_id},
        limit=limit,
    )
    return result.get("results", [])

def format_memories_for_prompt(memories: List[Dict[str, Any]]) -> str:
    """Convert retrieved memories into a compact text block."""
    if not memories:
        return "No prior memories for this user."

    lines = []
    for m in memories:
        content = m.get("memory", "")  # the text field is "memory"
        mtype = m.get("metadata", {}).get("type", "general")
        lines.append(f"- ({mtype}) {content}")
    return "Relevant past information:\n" + "\n".join(lines)

def store_new_memory(user_id: str, text: str, mtype: str = "interaction") -> None:
    """Store a new memory snippet for the given user.

    add() takes a messages list first, then user_id as a keyword.
    """
    if not text.strip():
        return
    mem0_client.add(
        [{"role": "user", "content": text}],
        user_id=user_id,
        metadata={"type": mtype},
    )

def call_responses_api_with_memory(user_id: str, user_input: str) -> str:
    """Wrap an OpenAI Responses API call with memory retrieval and storage."""
    # 1. Retrieve relevant prior memories
    memories = fetch_relevant_memories(user_id=user_id, query=user_input)
    memory_block = format_memories_for_prompt(memories)

    # 2. Build messages for the Responses API
    system_prompt = (
        "You are a helpful assistant that uses the provided memory about the user. "
        "Always respect user preferences found in memory. "
        "If memory conflicts with new explicit instructions, follow the new instructions."
    )

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "system", "content": f"MEMORY CONTEXT:\n{memory_block}"},
        {"role": "user", "content": user_input},
    ]

    # 3. Call the Responses API. The SDK's output_text helper aggregates the
    #    text output for you, so there is no need to walk the output array by hand.
    response = openai_client.responses.create(
        model="gpt-5.5",
        input=messages,
    )
    output_text = response.output_text

    # 4. Store this interaction. In production, prefer the selective
    #    extraction pattern shown below over storing raw turns.
    store_new_memory(user_id=user_id, text=user_input, mtype="user_input")
    store_new_memory(user_id=user_id, text=output_text, mtype="assistant_reply")

    return output_text

if __name__ == "__main__":
    user_id = "user_123"
    while True:
        user_msg = input("You: ").strip()
        if user_msg.lower() in {"exit", "quit"}:
            break
        reply = call_responses_api_with_memory(user_id, user_msg)
        print("Agent:", reply)

import os
from typing import List, Dict, Any

from openai import OpenAI
from mem0 import MemoryClient  # pip install mem0ai

openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
mem0_client = MemoryClient(api_key=os.environ["MEM0_API_KEY"])

def fetch_relevant_memories(user_id: str, query: str, limit: int = 5) -> List[Dict[str, Any]]:
    """Retrieve the most relevant memories for this user and query.

    search() takes the query first; scope goes in filters=. Results come
    back under the "results" key.
    """
    result = mem0_client.search(
        query,
        filters={"user_id": user_id},
        limit=limit,
    )
    return result.get("results", [])

def format_memories_for_prompt(memories: List[Dict[str, Any]]) -> str:
    """Convert retrieved memories into a compact text block."""
    if not memories:
        return "No prior memories for this user."

    lines = []
    for m in memories:
        content = m.get("memory", "")  # the text field is "memory"
        mtype = m.get("metadata", {}).get("type", "general")
        lines.append(f"- ({mtype}) {content}")
    return "Relevant past information:\n" + "\n".join(lines)

def store_new_memory(user_id: str, text: str, mtype: str = "interaction") -> None:
    """Store a new memory snippet for the given user.

    add() takes a messages list first, then user_id as a keyword.
    """
    if not text.strip():
        return
    mem0_client.add(
        [{"role": "user", "content": text}],
        user_id=user_id,
        metadata={"type": mtype},
    )

def call_responses_api_with_memory(user_id: str, user_input: str) -> str:
    """Wrap an OpenAI Responses API call with memory retrieval and storage."""
    # 1. Retrieve relevant prior memories
    memories = fetch_relevant_memories(user_id=user_id, query=user_input)
    memory_block = format_memories_for_prompt(memories)

    # 2. Build messages for the Responses API
    system_prompt = (
        "You are a helpful assistant that uses the provided memory about the user. "
        "Always respect user preferences found in memory. "
        "If memory conflicts with new explicit instructions, follow the new instructions."
    )

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "system", "content": f"MEMORY CONTEXT:\n{memory_block}"},
        {"role": "user", "content": user_input},
    ]

    # 3. Call the Responses API. The SDK's output_text helper aggregates the
    #    text output for you, so there is no need to walk the output array by hand.
    response = openai_client.responses.create(
        model="gpt-5.5",
        input=messages,
    )
    output_text = response.output_text

    # 4. Store this interaction. In production, prefer the selective
    #    extraction pattern shown below over storing raw turns.
    store_new_memory(user_id=user_id, text=user_input, mtype="user_input")
    store_new_memory(user_id=user_id, text=output_text, mtype="assistant_reply")

    return output_text

if __name__ == "__main__":
    user_id = "user_123"
    while True:
        user_msg = input("You: ").strip()
        if user_msg.lower() in {"exit", "quit"}:
            break
        reply = call_responses_api_with_memory(user_id, user_msg)
        print("Agent:", reply)

This example focuses on the basic pattern: retrieve memory with mem0_client.search before the Responses API call, pass that memory as a system message, then store the interaction with mem0_client.add. In production, the extraction and storage steps can be more selective, but the integration shape stays similar.

Extracting memory-worthy facts with the Responses API

Storing every input and output unfiltered is simple yet inefficient at scale. It produces noisy memory, raises storage costs, and can increase retrieval time. A more precise pattern uses the model itself to decide what should become memory.

One practical approach is to call the Responses API again after the main response with a small prompt that asks the model to extract durable facts or preferences, then store only that structured output. The Responses API supports Structured Outputs through text_format, and the SDK's parse helper returns the typed result on output_parsed.

from pydantic import BaseModel

class MemoryFacts(BaseModel):
    facts: List[str]

def extract_memory_facts(conversation_snippet: str) -> List[str]:
    """Use the Responses API to extract durable facts or preferences worth storing."""
    response = openai_client.responses.parse(
        model="gpt-5.5",
        input=[
            {
                "role": "system",
                "content": (
                    "From the user-assistant exchange, extract only long-term "
                    "preferences or stable facts about the user. Return each as a "
                    "short sentence. If there is nothing worth storing, return an "
                    "empty list."
                ),
            },
            {"role": "user", "content": conversation_snippet},
        ],
        text_format=MemoryFacts,
    )
    parsed = response.output_parsed
    return parsed.facts if parsed else []


def store_memory_from_exchange(user_id: str, user_text: str, assistant_text: str) -> None:
    snippet = f"User: {user_text}\nAssistant: {assistant_text}"
    facts = extract_memory_facts(snippet)
    for fact in facts:
        store_new_memory(user_id, fact, mtype="long_term_fact")

from pydantic import BaseModel

class MemoryFacts(BaseModel):
    facts: List[str]

def extract_memory_facts(conversation_snippet: str) -> List[str]:
    """Use the Responses API to extract durable facts or preferences worth storing."""
    response = openai_client.responses.parse(
        model="gpt-5.5",
        input=[
            {
                "role": "system",
                "content": (
                    "From the user-assistant exchange, extract only long-term "
                    "preferences or stable facts about the user. Return each as a "
                    "short sentence. If there is nothing worth storing, return an "
                    "empty list."
                ),
            },
            {"role": "user", "content": conversation_snippet},
        ],
        text_format=MemoryFacts,
    )
    parsed = response.output_parsed
    return parsed.facts if parsed else []


def store_memory_from_exchange(user_id: str, user_text: str, assistant_text: str) -> None:
    snippet = f"User: {user_text}\nAssistant: {assistant_text}"
    facts = extract_memory_facts(snippet)
    for fact in facts:
        store_new_memory(user_id, fact, mtype="long_term_fact")

from pydantic import BaseModel

class MemoryFacts(BaseModel):
    facts: List[str]

def extract_memory_facts(conversation_snippet: str) -> List[str]:
    """Use the Responses API to extract durable facts or preferences worth storing."""
    response = openai_client.responses.parse(
        model="gpt-5.5",
        input=[
            {
                "role": "system",
                "content": (
                    "From the user-assistant exchange, extract only long-term "
                    "preferences or stable facts about the user. Return each as a "
                    "short sentence. If there is nothing worth storing, return an "
                    "empty list."
                ),
            },
            {"role": "user", "content": conversation_snippet},
        ],
        text_format=MemoryFacts,
    )
    parsed = response.output_parsed
    return parsed.facts if parsed else []


def store_memory_from_exchange(user_id: str, user_text: str, assistant_text: str) -> None:
    snippet = f"User: {user_text}\nAssistant: {assistant_text}"
    facts = extract_memory_facts(snippet)
    for fact in facts:
        store_new_memory(user_id, fact, mtype="long_term_fact")

This pattern keeps memory focused and compact, and lets Mem0 store higher quality information that benefits retrieval across many sessions.

Comparing memory strategies for Responses API agents

Several memory strategies can be used with the OpenAI Responses API. Each has tradeoffs in complexity, quality, and scalability.

Strategy	Description	Pros	Cons	Fit for production agents
In-memory chat buffer	Keep last N messages in process memory	Simple to implement, no extra infra	Lost on restart, no cross-session memory	Poor for real users
Database transcript logging	Store full transcripts in SQL / NoSQL	Durable, easy auditing	Retrieval quality depends on manual queries	Medium, needs custom retrieval
Manual vector store integration	Application manages embeddings and search	Fine-grained control over storage and search	Higher complexity, more code and infra	High, but high engineering cost
Mem0 as a dedicated memory layer	Mem0 manages embeddings, retrieval, metadata	Purpose-built memory abstraction, simple API	Requires separate service and API integration	High, fast to integrate and evolve

Mem0 sits in the last category. It focuses on memory semantics, retrieval quality, and identity handling, and exposes a simple API so the Responses API integration can remain small and readable.

Where Mem0 fits in the Responses API workflow

Mem0 is not a replacement for the Responses API. It augments it by handling persistent storage of user and project memories, semantic retrieval given the current query or context, identity and metadata management, and tools for updating, deleting, and inspecting memories.

In a typical Responses API pipeline, Mem0 sits in two places. Before the OpenAI call, search() provides memory context that becomes part of the prompt. After the OpenAI call, add() stores new information extracted from the conversation, and optionally update() corrects existing facts.

Because Mem0 is model-agnostic, it can serve as a shared memory layer across multiple models and agents that all use the OpenAI Responses API. This is useful when several microservices or tools need to share user state.

Limitations of memory for Responses API agents

Memory is powerful but not a universal solution. Some important limitations apply when designing memory-augmented Responses API agents:

Hallucinated or incorrect memory: If the model produces incorrect statements and those are stored as memory, the agent can reinforce errors. Guardrails or validation are needed for sensitive domains.
Context window constraints: Memory retrieval still must fit into the model's context window. Even with Mem0, very large memory collections need summarization and careful prompting.
Overfitting to old information: Agents can over-trust old memories and ignore new instructions. Prompts must clearly state how to resolve conflicts between fresh input and past memory.
Privacy and compliance: Persistent memory ties information to user identities. Systems must implement deletion, access controls, and consent flows where required.
Latency: Additional network calls to a memory service and larger prompts add latency. For real-time applications, time budgets need careful tuning.
Domain adaptation requirements: Generic memory extraction prompts might not capture the right facts for specific domains, such as finance or healthcare. Domain-specific extraction and schema design are often necessary.

These limitations come from the nature of LLMs and long-term context handling, not from Mem0 or the Responses API in particular. Addressing them requires thoughtful application architecture and policies.

Frequently asked questions

Q. What does Mem0 add on top of the OpenAI Responses API for agents?

Mem0 provides a structured, persistent memory layer that stores and retrieves user-specific and task-specific information across sessions. The Responses API focuses on single-request reasoning and tool orchestration, while Mem0 handles long-term state and semantic retrieval.

Q. How should identities be modeled when adding memory to a Responses API agent?

Typically, each end user receives a stable user_id, which Mem0 uses as the primary key for memory. Additional metadata such as project_id or conversation_id can be added to support multiple parallel threads or long-running tasks.

Q. When should a Responses API agent write to memory versus ignoring an interaction?

Agents should write to memory when an interaction reveals stable preferences, important decisions, or reusable facts. Routine questions or ephemeral details are usually better left out to avoid clutter and reduce retrieval noise.

Q. How does Mem0 handle retrieval for large memory collections?

Mem0 performs semantic search over vector representations of stored content and ranks results by relevance to the current query. Filters and metadata allow narrowing the search to specific types of memory, for example, preferences or project notes.

Q. Why use Mem0 instead of storing everything directly in a database?

While a database can store transcripts, it does not handle embeddings, semantic search, or memory-specific retrieval strategies by default. Mem0 focuses specifically on these concerns, so the application can call higher-level operations like search and add without building a custom memory stack.

Q. Can a single Mem0 instance serve multiple Responses API agents or services?

Yes. Mem0 is designed to support many agents and services by using identities and metadata to isolate and organize memories. Different agents can share memory when appropriate, or use separate memory namespaces based on metadata policies.