Miscellaneous

Miscellaneous

Mem0 + Vercel AI SDK: Memory for Your Chat Agents

Mem0 + Vercel AI SDK: Memory for Your Chat Agents

Quick Takeaways

  • Vercel AI SDK orchestrates LLM calls, tools, and streaming, but it gives agents no durable memory. Once a message scrolls out of the window, the agent forgets the user's preferences, corrections, and goals.

  • The symptom in production: users repeat themselves every session, agents ignore long-term constraints, and the model hallucinates past decisions it can no longer see.

  • The fix is a dedicated memory layer beside the SDK. Mem0 stores extracted facts per user_id, retrieves only the relevant ones per request, and injects them into the prompt.

  • You can wire the whole loop in one backend route and prove it works in five minutes with a free API key.

Why Vercel AI SDK agents need real user memory?

Vercel AI SDK gives frontend and edge developers a clean way to orchestrate LLM calls, tools, and streaming responses. It is ideal for building chat interfaces, assistants, and small agents that run close to the user.

What the SDK does not provide by default is durable, semantically searchable user memory across sessions. As soon as context drops out of the prompt window, the agent forgets preferences, past tasks, and corrections.

Mem0 fits here as a dedicated memory layer that sits beside Vercel AI SDK. It stores user interactions, retrieves relevant memories, and injects them into prompts so agents behave as if they remember everything important.

This article explains what the memory problem looks like inside Vercel AI SDK agents, how Mem0 works, and how to wire the two together.

What "user memory" means ?

In the context of Vercel AI SDK, user memory is the set of facts, preferences, and historical interactions that should shape the agent's behavior beyond the current request.

Typical categories include profile facts (role, expertise level, company, time zone), long-term preferences (tone of voice, tools to avoid, default formats), ongoing projects (active tasks, previous decisions, constraints), and corrections ("Do not use React, use Svelte instead").

A simple chat handler in the SDK often looks like this:

import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export const POST = async (req: Request) => {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o-mini'),
    messages,
  });

  return result.toDataStreamResponse();
};
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export const POST = async (req: Request) => {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o-mini'),
    messages,
  });

  return result.toDataStreamResponse();
};
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';

export const POST = async (req: Request) => {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o-mini'),
    messages,
  });

  return result.toDataStreamResponse();
};

Here, the model only sees messages. Anything that happened in earlier sessions is gone unless manually reattached. It breaks down once conversations span days or weeks, chats are multi-surface (web, mobile, Slack), or agents specialize per user over time.

Real memory must survive restarts, be queryable, and remain separate from the raw chat buffer. That is the gap Mem0 fills.

The core memory problem

Vercel AI SDK encourages simple, functional handlers. Developers can store history in the browser, in a database, or in edge storage, but three hard problems remain.

  • Token constraints: LLMs cannot see unbounded history. At some point, old messages must be dropped and this is where deciding what to keep becomes non-trivial.

  • Semantic relevance: Recent messages are not always the most important. A preference expressed weeks ago can matter more than the last two chat turns.

  • Multi-agent and multi-surface context: Different agents or UIs might share the same user. Each needs access to the same long-term memory without duplicating storage or logic.

Without a structured memory layer, teams usually persist the full chat transcript and rely on simple truncation, hand-roll embedding pipelines and vector search for past messages, or patch memories into prompts ad hoc, which becomes hard to manage. Mem0 abstracts these concerns into a consistent interface that works alongside Vercel AI SDK, regardless of where the SDK code runs.

Mem0 as a memory layer

Mem0 is an memory system that sits between your agents and your storage. Instead of treating memory as raw chat history, it treats it as structured pieces of information tied to users and contexts.

Memories are small, extracted facts or summaries stored with metadata. User identifiers link memories across sessions, devices, and channels. Scopes like agent_id and app_id separate memories for different agents or apps. Retrieval returns only the most relevant memories for a new request.

From an integration perspective, the loop is four steps. On every interaction, you send user messages and metadata to Mem0 to update memory. When handling a new request, you ask Mem0 for relevant memories for that user and task. You feed those memories into the prompt of your Vercel AI SDK agent. The agent response may create new memories, which you persist again via Mem0.

The system runs as a hosted API or a self-hosted service. In both cases, integration is via HTTP or language clients. This article focuses on Python integration on the backend that serves a Vercel front end.

Architecture pattern with Vercel AI SDK and Mem0

A clean way to structure a production agent with Vercel AI SDK and Mem0 splits responsibilities across three layers.

  • The frontend is built with Next.js and Vercel AI SDK. It handles UI, streaming, and local chat state, and sends messages to a backend route that encapsulates memory logic.

  • The backend (Python service) receives user_id, messages, and optional metadata. It queries Mem0 for relevant memories, constructs a prompt that includes those memories, calls the LLM provider, sends the response back to the frontend, and optionally writes new memories based on the latest turn.

  • Mem0 stores extracted memories for each user_id, performs semantic retrieval per request, and optionally consolidates memories in the background.

The important boundary is that Vercel AI SDK is not responsible for memory. It orchestrates conversation and streaming. Mem0 handles what should persist and be recalled.

Integrating Mem0 with Vercel AI SDK

The example below shows a minimal Flask backend that integrates Mem0. The same pattern works with FastAPI or any other framework.

Install dependencies:

Then, grab a free API key at app.mem0.ai, set MEM0_API_KEY, and run this:

export MEM0_API_KEY="your_mem0_api_key"
export OPENAI_API_KEY="your_openai_api_key"
export MEM0_API_KEY="your_mem0_api_key"
export OPENAI_API_KEY="your_openai_api_key"
export MEM0_API_KEY="your_mem0_api_key"
export OPENAI_API_KEY="your_openai_api_key"

Here is the full /chat endpoint. The frontend on Vercel calls this with the user's messages.

from flask import Flask, request, jsonify
from mem0 import MemoryClient
from openai import OpenAI
import os

app = Flask(__name__)

mem0_client = MemoryClient(api_key=os.environ["MEM0_API_KEY"])
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

SYSTEM_PROMPT = """You are a helpful assistant. Use the provided user memory to
respect long term preferences, previously given instructions, and ongoing tasks.
If memory conflicts with new instructions, ask the user to clarify."""

def get_user_memories(user_id: str, query: str) -> str:
    # search() scopes through filters=, not bare kwargs.
    results = mem0_client.search(
        query,
        filters={"user_id": user_id},
        top_k=10,  # tune for your use case
    )
    # search() returns {"results": [...]} - extract the list.
    memories = [m["memory"] for m in results.get("results", [])]
    return "\n".join(f"- {m}" for m in memories) if memories else "None."

def store_user_memory(user_id: str, messages: list, metadata: dict | None = None):
    # add() takes user_id= directly and a list of message dicts, not text=.
    mem0_client.add(messages, user_id=user_id, metadata=metadata or {})

@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    user_id = data["user_id"]
    messages = data["messages"]  # array of { role, content }
    user_query = messages[-1]["content"]

    # 1. Retrieve relevant memories
    memory_context = get_user_memories(user_id, user_query)

    # 2. Build prompt for the LLM
    system_message = {
        "role": "system",
        "content": SYSTEM_PROMPT + "\n\nUser memory:\n" + memory_context,
    }
    full_messages = [system_message] + messages

    # 3. Call the LLM for the actual response
    completion = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=full_messages,
        temperature=0.3,
    )
    reply = completion.choices[0].message.content

    # 4. Store this turn as memory. Let Mem0 extract what matters from the
    #    full exchange rather than dumping the raw user string.
    store_user_memory(
        user_id=user_id,
        messages=[
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": reply},
        ],
        metadata={"source": "chat"},
    )

    return jsonify({"reply": reply})
from flask import Flask, request, jsonify
from mem0 import MemoryClient
from openai import OpenAI
import os

app = Flask(__name__)

mem0_client = MemoryClient(api_key=os.environ["MEM0_API_KEY"])
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

SYSTEM_PROMPT = """You are a helpful assistant. Use the provided user memory to
respect long term preferences, previously given instructions, and ongoing tasks.
If memory conflicts with new instructions, ask the user to clarify."""

def get_user_memories(user_id: str, query: str) -> str:
    # search() scopes through filters=, not bare kwargs.
    results = mem0_client.search(
        query,
        filters={"user_id": user_id},
        top_k=10,  # tune for your use case
    )
    # search() returns {"results": [...]} - extract the list.
    memories = [m["memory"] for m in results.get("results", [])]
    return "\n".join(f"- {m}" for m in memories) if memories else "None."

def store_user_memory(user_id: str, messages: list, metadata: dict | None = None):
    # add() takes user_id= directly and a list of message dicts, not text=.
    mem0_client.add(messages, user_id=user_id, metadata=metadata or {})

@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    user_id = data["user_id"]
    messages = data["messages"]  # array of { role, content }
    user_query = messages[-1]["content"]

    # 1. Retrieve relevant memories
    memory_context = get_user_memories(user_id, user_query)

    # 2. Build prompt for the LLM
    system_message = {
        "role": "system",
        "content": SYSTEM_PROMPT + "\n\nUser memory:\n" + memory_context,
    }
    full_messages = [system_message] + messages

    # 3. Call the LLM for the actual response
    completion = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=full_messages,
        temperature=0.3,
    )
    reply = completion.choices[0].message.content

    # 4. Store this turn as memory. Let Mem0 extract what matters from the
    #    full exchange rather than dumping the raw user string.
    store_user_memory(
        user_id=user_id,
        messages=[
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": reply},
        ],
        metadata={"source": "chat"},
    )

    return jsonify({"reply": reply})
from flask import Flask, request, jsonify
from mem0 import MemoryClient
from openai import OpenAI
import os

app = Flask(__name__)

mem0_client = MemoryClient(api_key=os.environ["MEM0_API_KEY"])
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

SYSTEM_PROMPT = """You are a helpful assistant. Use the provided user memory to
respect long term preferences, previously given instructions, and ongoing tasks.
If memory conflicts with new instructions, ask the user to clarify."""

def get_user_memories(user_id: str, query: str) -> str:
    # search() scopes through filters=, not bare kwargs.
    results = mem0_client.search(
        query,
        filters={"user_id": user_id},
        top_k=10,  # tune for your use case
    )
    # search() returns {"results": [...]} - extract the list.
    memories = [m["memory"] for m in results.get("results", [])]
    return "\n".join(f"- {m}" for m in memories) if memories else "None."

def store_user_memory(user_id: str, messages: list, metadata: dict | None = None):
    # add() takes user_id= directly and a list of message dicts, not text=.
    mem0_client.add(messages, user_id=user_id, metadata=metadata or {})

@app.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    user_id = data["user_id"]
    messages = data["messages"]  # array of { role, content }
    user_query = messages[-1]["content"]

    # 1. Retrieve relevant memories
    memory_context = get_user_memories(user_id, user_query)

    # 2. Build prompt for the LLM
    system_message = {
        "role": "system",
        "content": SYSTEM_PROMPT + "\n\nUser memory:\n" + memory_context,
    }
    full_messages = [system_message] + messages

    # 3. Call the LLM for the actual response
    completion = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=full_messages,
        temperature=0.3,
    )
    reply = completion.choices[0].message.content

    # 4. Store this turn as memory. Let Mem0 extract what matters from the
    #    full exchange rather than dumping the raw user string.
    store_user_memory(
        user_id=user_id,
        messages=[
            {"role": "user", "content": user_query},
            {"role": "assistant", "content": reply},
        ],
        metadata={"source": "chat"},
    )

    return jsonify({"reply": reply})

Two SDK details that trip people up, both fixed above. search() takes its scope through filters={"user_id": ...} and returns a {"results": [...]} envelope, so you extract the list before iterating. add() takes user_id= directly but expects a list of message dicts, not a text= string. Passing the full user-and-assistant exchange lets Mem0's extraction model pull the durable facts instead of storing every raw turn.

Run the loop yourself in five minutes

Before wiring up the frontend, prove the memory loop works in isolation. Get a free API key at app.mem0.ai, set MEM0_API_KEY, and run this:

import os
from mem0 import MemoryClient

mem0 = MemoryClient(api_key=os.environ["MEM0_API_KEY"])

# Session 1: the user states a hard preference
mem0.add(
    [{"role": "user", "content": "I use Svelte. Never suggest React."}],
    user_id="dev_42",
)

# Session 2, days later: a fresh request, no chat history attached
hits = mem0.search("which frontend framework should I scaffold with?",
                   filters={"user_id": "dev_42"})
print([m["memory"] for m in hits.get("results", [])])
# -> ['Uses Svelte', 'Does not want React suggestions']
import os
from mem0 import MemoryClient

mem0 = MemoryClient(api_key=os.environ["MEM0_API_KEY"])

# Session 1: the user states a hard preference
mem0.add(
    [{"role": "user", "content": "I use Svelte. Never suggest React."}],
    user_id="dev_42",
)

# Session 2, days later: a fresh request, no chat history attached
hits = mem0.search("which frontend framework should I scaffold with?",
                   filters={"user_id": "dev_42"})
print([m["memory"] for m in hits.get("results", [])])
# -> ['Uses Svelte', 'Does not want React suggestions']
import os
from mem0 import MemoryClient

mem0 = MemoryClient(api_key=os.environ["MEM0_API_KEY"])

# Session 1: the user states a hard preference
mem0.add(
    [{"role": "user", "content": "I use Svelte. Never suggest React."}],
    user_id="dev_42",
)

# Session 2, days later: a fresh request, no chat history attached
hits = mem0.search("which frontend framework should I scaffold with?",
                   filters={"user_id": "dev_42"})
print([m["memory"] for m in hits.get("results", [])])
# -> ['Uses Svelte', 'Does not want React suggestions']

The second call has no conversation history, yet the preference comes back. That is the entire value proposition in eight lines. Once you see it return the Svelte preference, the backend route above is just this loop wrapped around an LLM call.

Wiring Vercel AI SDK to the Python memory backend

The typical pattern inside a Next.js route with Vercel AI SDK is to proxy to the Python backend.

// app/api/chat/route.ts
import { NextRequest } from 'next/server';

export const runtime = 'edge';

export async function POST(req: NextRequest) {
  const body = await req.json();
  const { userId, messages } = body;

  const res = await fetch(process.env.PY_BACKEND_URL + '/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      user_id: userId,
      messages,
    }),
  });

  const data = await res.json();

  // In a real app, convert this to a streaming response if needed
  return new Response(JSON.stringify({ reply: data.reply }), {
    headers: { 'Content-Type': 'application/json' },
  });
}
// app/api/chat/route.ts
import { NextRequest } from 'next/server';

export const runtime = 'edge';

export async function POST(req: NextRequest) {
  const body = await req.json();
  const { userId, messages } = body;

  const res = await fetch(process.env.PY_BACKEND_URL + '/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      user_id: userId,
      messages,
    }),
  });

  const data = await res.json();

  // In a real app, convert this to a streaming response if needed
  return new Response(JSON.stringify({ reply: data.reply }), {
    headers: { 'Content-Type': 'application/json' },
  });
}
// app/api/chat/route.ts
import { NextRequest } from 'next/server';

export const runtime = 'edge';

export async function POST(req: NextRequest) {
  const body = await req.json();
  const { userId, messages } = body;

  const res = await fetch(process.env.PY_BACKEND_URL + '/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      user_id: userId,
      messages,
    }),
  });

  const data = await res.json();

  // In a real app, convert this to a streaming response if needed
  return new Response(JSON.stringify({ reply: data.reply }), {
    headers: { 'Content-Type': 'application/json' },
  });
}

The frontend uses useChat or streamText against /api/chat. The Vercel AI SDK manages user-side streaming. The Python backend manages memory with Mem0 and calls the LLM. This split keeps memory logic in one place, independent of UI frameworks or hosting platforms.

Designing memory schemas for Vercel-powered agents

Mem0 supports metadata and scope fields that help tailor memory by agent and surface. For Vercel AI SDK agents, reasonable patterns include a stable user_id from your auth system, an agent_id per product area (such as support_assistant, coding_mentor, ops_bot), and metadata fields like channel (web, mobile), language, importance, or topic.

Here is how to add memory with richer scoping and metadata:

def store_preference(user_id: str, preference: str, topic: str):
    mem0_client.add(
        [{"role": "user", "content": preference}],
        user_id=user_id,
        agent_id="personal_assistant",   # scope to this agent
        metadata={
            "type": "preference",
            "topic": topic,
            "channel": "web",
        },
    )
def store_preference(user_id: str, preference: str, topic: str):
    mem0_client.add(
        [{"role": "user", "content": preference}],
        user_id=user_id,
        agent_id="personal_assistant",   # scope to this agent
        metadata={
            "type": "preference",
            "topic": topic,
            "channel": "web",
        },
    )
def store_preference(user_id: str, preference: str, topic: str):
    mem0_client.add(
        [{"role": "user", "content": preference}],
        user_id=user_id,
        agent_id="personal_assistant",   # scope to this agent
        metadata={
            "type": "preference",
            "topic": topic,
            "channel": "web",
        },
    )

When retrieving, use the same agent_id and metadata filters to scope results to the current agent, which is useful when multiple Vercel AI SDK agents share the same user identity but solve different problems. Note that scoping is done through agent_id, app_id, and run_id, plus metadata filters, not a separate collection argument.

Comparing ad hoc memory vs Mem0 with Vercel AI SDK

Aspect

Ad hoc memory with Vercel AI SDK

Mem0 as memory layer

Storage model

Raw transcripts, custom tables

Structured memories with embeddings

Retrieval

Manual SQL or vector search

Built-in semantic search by user and context

Cross-agent sharing

Custom joins and schemas

Shared user_id and scope fields

Token budget handling

Manual truncation or summarization

Retrieve only top relevant memories

Maintenance

Custom code for extraction and cleanup

Managed logic with focused API

Multi-surface consistency

Hard to keep in sync

Centralized store per user

Vercel AI SDK remains responsible for orchestration, UI, and request handling. Mem0 addresses storage, retrieval, and evolution of long-term memory, so the agent behaves consistently across sessions and surfaces.

If you are weighing this against rolling your own, the honest test is maintenance cost over time. The embedding pipeline, the relevance tuning, the cross-surface sync, and the pruning logic are what cost you by month three. That is the work Mem0 absorbs.

👆Todo: Get an API key and run the comparison against your own agent before committing either way.

Limitations of this pattern

This integration solves long-term memory for many use cases, but it is not universal.

  • Latency impact: Each call to Mem0 adds network latency. For strict low-latency environments, co-locate the Mem0 deployment with your Python backend and tune retrieval limits.

  • Memory control and pruning: Not every user message should become permanent memory. Production systems should implement heuristics or rules to decide what to store, and schedule periodic pruning or consolidation.

  • Prompt complexity: Injecting too many memories into the system prompt can confuse the model. Retrieval must be tuned, and prompts should guide the model on how to use memory and when to disregard it.

  • Multi-tenant complexity: In SaaS scenarios with many tenants, careful management of user_id and scope fields is required to avoid cross-tenant leakage and to support data deletion requirements.

  • Migration from existing storage: Teams that already store chat history in databases will need a migration or synchronization plan to populate Mem0 with relevant historical memories.

Despite these limits, the pattern offers a clear abstraction: Vercel AI SDK for interaction orchestration and Mem0 for durable memory, which scales more cleanly than ad hoc implementations.

👉Start here

The fastest path to a memory-backed Vercel agent is the eight-line loop above.

  • Get a free API key, run the Svelte-preference test, then wrap the loop around your existing LLM call in one backend route.

  • If you self-host by policy, the open source repo is the starting point.

  • For deeper integration patterns, the Mem0 docs cover scoping, filters, and consolidation in full.

Frequently Asked Questions

Q. How does Mem0 integrate with Vercel AI SDK in practice?

Mem0 does not plug into Vercel AI SDK directly. Instead, a backend service, such as a Python API, calls Mem0 to read and write memory, while the Vercel AI SDK front end talks to that backend. The SDK manages streaming and UI, and Mem0 manages what the agent remembers.

Q. What should be stored as memory for a Vercel-based agent?

Useful memories include user preferences, factual profile data, ongoing tasks, and important corrections that should affect future behavior. Short-lived context, such as small clarifications within a single turn, does not always need to be stored and can remain in the local message buffer.

Q. When should memory retrieval happen during an agent interaction?

Retrieval should occur before each call to the LLM so the model can consider relevant memories when generating the response. The pattern is: get messages from the client, query Mem0 for user memories based on the latest user input, construct a prompt that includes those memories, then send it to the model.

Q. Why use Mem0 instead of storing chat history directly in a database?

Raw transcripts give no prioritization and do not scale well with token limits. Mem0 extracts and stores information in a retrieval-friendly format, then returns only what is relevant for the current query. This reduces token usage, improves personalization, and centralizes memory logic across multiple agents and surfaces.

Q. How does Mem0 handle multiple agents or applications for the same user?

Mem0 associates memories with user_id and supports agent_id and app_id scopes plus metadata. Different agents can use their own scope while still sharing core profile memories, which allows consistent user identity with tailored memory per agent.

Q. What changes are needed in an existing Vercel AI SDK app to adopt Mem0?

The main changes are routing chat requests through a backend that uses Mem0 and adjusting the prompt construction to include retrieved memories. The frontend logic using useChat or streamText can usually stay the same, with only the API endpoint URL and payload shape updated.

GET TLDR from:

Summarize

Website/Footer

Summarize

Website/Footer

Summarize

Website/Footer

Summarize

Website/Footer