How to make your clients more context-aware with OpenMemory MCP

Taranjeet Singh

13 May 2025 • 11 min read

In the rapidly evolving landscape of AI, Large Language Models (LLMs) have transformed how we interact with technology. Yet, they face a fundamental limitation: they forget everything between sessions.

What if there could be a way to have a personal, portable LLM “memory layer” that lives locally on your system, with complete control over your data?

Today, we're excited to present OpenMemory MCP - a private, local-first memory layer powered by Mem0 that enables persistent, context-aware AI across MCP-compatible clients such as Cursor, Claude Desktop, Windsurf, Cline, and more. It provides a memory service that runs entirely on your machine, with a built-in UI for memory visibility, auditing, and control. It runs entirely on your local system, keeping your data under your control while enabling truly personalized experiences across any MCP-compatible tool.

This guide explains how to install, configure, and operate the OpenMemory MCP Server. It also covers the internal architecture, available features, and real-world applications.

What You'll Learn

What OpenMemory MCP Server is and why it matters
Step-by-step setup guide
Dashboard features and UI breakdown
Security, Access Control, and Architecture overview
Practical use cases with examples

What is OpenMemory MCP Server?

OpenMemory MCP is a private, local memory layer for MCP clients. It provides the infrastructure to store, manage and utilize your AI memories across different platforms, all while keeping your data locally on your system.

In simple terms, it's like a vector-backed memories layer for any LLM client using the standard MCP protocol and works out of the box with tools like Mem0.

Key Capabilities:

Add, retrieve, list and delete memory objects via MCP server tools (add_memories, search_memory, list_memories, delete_all_memories).
Store semantically indexed data using Qdrant as a vector store under the hood.
Runs fully on your local infrastructure (Docker + Postgres + Qdrant) with no data sent outside.
Pause or revoke any client’s access at the app or memory level, with audit logs for every read or write.
Built-in UI for observability and manual control

OpenMemory MCP Server serves as the backbone of your memory-aware AI stack, enabling clients to operate with shared, persistent context.

🔁 How it works (the basic flow)

Here’s the core flow in action:

You can spin up OpenMemory (API, Qdrant, Postgres) with a single docker-compose command.
The API process itself hosts an MCP server (using Mem0 under the hood) that speaks the standard MCP protocol over SSE.
Your MCP client opens an SSE stream to OpenMemory’s /mcp/... endpoints and calls methods like add_memories(), search_memory() or list_memories().
Everything else, including vector indexing, audit logs and access controls, is handled by the OpenMemory service.

2. Step-by-step guide to set up and run OpenMemory.

In this section, we will walk through how to set up OpenMemory and run it locally.

The project has two main components you will need to run:

api/ - Backend APIs and MCP server
ui/ - Frontend React application (dashboard)

Step 1: System Prerequisites

Before getting started, make sure your system has the following installed. I've attached the docs link so it's easy to follow.

Docker and Docker Compose
Python 3.9+ - required for backend development
Node.js - required for frontend development
OpenAI API Key - used for LLM interactions
GNU Make

GNU Make is a build automation tool. We will use it for the setup process.

Please make sure Docker Desktop is running before proceeding to the next step.

Step 2: Clone the repo and set your OpenAI API Key

You need to clone the repo available at github.com/mem0/openmemory by using the following command.

git clone <https://github.com/mem0ai/mem0.git>
cd openmemory

Next, set your OpenAI API key as an environment variable.

export OPENAI_API_KEY=your_api_key_here

This command sets the key only for your current terminal session. It only lasts until you close that terminal window.

Step 3: Setup the backend

The backend runs in Docker containers. To start the backend, run these commands in the root directory:

# Copy the environment file and edit the file to update OPENAI_API_KEY and other secrets
make env

# Build all Docker images
make build 

# Start Postgres, Qdrant, FastAPI/MCP server
make up

The .env.local file will have the following convention.

OPENAI_API_KEY=your_api_key

Once the setup is complete, your API will be live at http://localhost:8000.

You should also see the containers running in Docker Desktop.

Here are some other useful backend commands you can use:

Step 4: Setup the frontend

The frontend is a Next.js application. To start it, just run:

# Installs dependencies using pnpm and runs Next.js development server
make ui

After successful installation, you can navigate to http://localhost:3000 to check the OpenMemory dashboard, which will guide you through installing the MCP server in your MCP clients.

Here's how the dashboard looks.

For context, your MCP client opens an SSE channel to GET /mcp/{client_name}/sse/{user_id}, which wires up two context variables (user_id, client_name).

On the dashboard, you will find the one-line command to install the MCP server based on your choice of preference of Client (like Cursor, Claude, Cline, Windsurf).

Let's install this in Cursor, the command looks something like this.

npx install-mcp i https://mcp.openmemory.ai/xyz_id/sse --client cursor

It will prompt you to install install-mcp if it's not already and then you just need to provide a name for the server.

I'm using a dummy command for now, so please ignore that. Open the cursor settings and check the MCP option in the sidebar to verify the connection.

Open a new chat in Cursor and give a sample prompt like I've asked it to remember some things about me (which I grabbed from my GitHub profile).

This triggers the add_memories() call and stores the memory. Refresh the dashboard and go to the Memories tab to check all of these memories.

Categories are automatically created for the memories, which are like optional tags (via GPT-4o categorization).

OpenMemory Updated Dashboard with Memories

You can also connect other MCP clients like Windsurf.

Each MCP client can “invoke” one of four standard memory actions:

add_memories(text) : stores text in Qdrant, inserts/updates a Memory row and audit entry
search_memory(query) : embeds the query, runs a vector search with optional ACL filters, logs each access
list_memories() : retrieves all stored vectors for a user (filtered by ACL) and logs the listing
delete_all_memories(): clear all memories.

All responses stream over the same SSE connection. The dashboard shows all active connections, which apps are accessing memory and details of reads/writes.

Updated view of all MCP clients connected to OpenMemory MCP

3. Features available in the dashboard (and what’s behind the UI)

The OpenMemory dashboard includes three main routes:
/ – dashboard
/memories – view and manage stored memories
/apps – view connected applications

In brief, let's see all the features available in the dashboard so you can get the basic idea.

1) Install OpenMemory clients

Get your unique SSE endpoint or use a one-liner install command
Switch between MCP Link and various client tabs (Claude, Cursor, Cline, etc.)

2) View memory and app stats

See how many memories you’ve stored
See how many apps are connected
Type any text to live-search across all memories (debounced)

The code is available at ui/components/dashboard/Stats.tsx, which:

reads from Redux (profile.totalMemories, profile.totalApps, profile.apps[])
calls useStats().fetchStats() on mount to populate the store
renders “Total Memories count” and “Total Apps connected” with up to 4 app icons

3) Refresh or manually create a Memory

Refresh button (re-calls the appropriate fetcher(s) for the current route)
Create Memory button (opens the modal from CreateMemoryDialog)

Create New Memory directly on OpenMemory MCP UI

5) You can open the filters panel to pick:

Which apps to include
Which categories to include
Whether to show archived items
Which column to sort by (Memory, App Name, Created On)
Clear all filters in one click

6) You can inspect and manage individual Memories

Click on any memory to:

archive, pause, resume or delete the memory
check access logs & related memories

You can also select multiple memories and perform bulk actions.

4. Security, Access control and Architecture overview.

When working with the MCP protocol or any AI agent system, security becomes non-negotiable. So, let's discuss briefly.

🎯 Security

OpenMemory is designed with privacy-first principles. It stores all memory data locally in your infrastructure, using Dockerized components (FastAPI, Postgres, Qdrant).

Sensitive inputs are safely handled via SQLAlchemy with parameter binding to prevent injection attacks. Each memory interaction, including additions, retrievals and state changes, is logged for traceability through the MemoryStatusHistory and MemoryAccessLog tables.

While authentication is not built-in, all endpoints require a user_id and are ready to be secured behind an external auth gateway (like OAuth or JWT).

CORS on FastAPI is wide open (allow_origins=["*"]) for local/dev, but for production, you should tighten that from their default open state to restrict access to trusted clients.

🎯 Access Control

Fine-grained access control is one of the core things focused on in OpenMemory. At a high level, the access_controls table defines allow/deny rules between apps and specific memories.

These rules are enforced via the check_memory_access_permissions function, which considers memory state (active, paused, so on), app activity status (is_active) and the ACL rules in place.

In practice, you can pause an entire app (disabling writes), archive or pause individual memories, or apply filters by category or user. Paused or non-active entries are hidden from tool access and searches. This layered access model ensures you can gate memory access at any level with confidence.

As you can see, I've paused the access to the memories, which results in an inactive state.

🎯 Architecture

Let's briefly walk through the system architecture. You can always refer to the codebase for more details.

1) Backend (FastAPI + FastMCP over SSE) :

exposes both a plain-old REST surface (/api/v1/memories, /api/v1/apps, /api/v1/stats) &
an MCP “tool” interface (/mcp/messages, /mcp/sse/<client>/<user>) that agents use to call (add_memories, search_memory, list_memories) via Server-Sent Events (SSE).
It connects to Postgres for relational metadata and Qdrant for vector search.

2) Vector Store (Qdrant via the mem0 client) : All memories are semantically indexed in Qdrant, with user and app-specific filters applied at query time.

3) Relational Metadata (SQLAlchemy + Alembic) :

track users, apps, memory entries, access logs, categories and access controls.
Alembic manages schema migrations.
Default DB is SQLite (openmemory.db) but you can point DATABASE_URL at Postgres

4) Frontend Dashboard (Next.js) :

Redux powers a live observability interface
Hooks + Redux Toolkit manage state, Axios talks to the FastAPI endpoints
Live charts (Recharts), carousels, forms (React Hook Form) to explore your memories

5) Infra & Dev Workflow

docker-compose.yml (api/docker-compose.yml) including Qdrant service & API service
Makefile provides shortcuts for migrations, testing, hot-reloading
Tests live alongside backend logic (via pytest)

Together, this gives you a self-hosted LLM memory platform:

⚡ Store & version your chat memory in both a relational DB and a vector index
⚡ Secure it with per-app ACLs and state transitions (active/paused/archived)
⚡ Search semantically via Qdrant
⚡ Observe & control via dashboard.

In the next section, we will explore some advanced use cases and creative workflows you can build with OpenMemory.

5. Practical use cases with examples.

Once you are familiar with OpenMemory, you can realize it can be used anywhere you want an AI to remember something across interactions, which makes it very personalized.

Here are some advanced and creative ways you can use OpenMemory.

✅ Multi-agent research assistant with memory layer

Imagine building a tool where different LLM agents specialize in different research domains (for example, one for academic papers, one for GitHub repos, another for news).

Each agent stores what it finds via add_memories(text) and a master agent later runs search_memory(query) across all previous results.

Technical flow can be:

Each sub-agent is an MCP client that:
- Adds summaries of retrieved data to OpenMemory.
- Tags memories using auto-categorization (GPT).
The master agent opens an SSE channel and uses:
- search_memory("latest papers on diffusion models") to pull all related context.
The dashboard shows which agent stored what and you can restrict memory access between agents using ACLs.

If you're still curious, you can check this GitHub repo, which shows how to Build a Research Multi Agent System - a Design Pattern Overview with Gemini 2.0.

Tip: We can add a LangGraph orchestration layer, where each agent is a node and memory writes/reads are tracked over time. So we can visualize knowledge flow and origin per research thread.

✅ Intelligent meeting assistant with persistent cross-session memory

We can build something like a meeting note-taker (Zoom, Google Meet, etc.) that:

Extracts summaries via LLMs.
Remembers action items across calls.
Automatically retrieves relevant context in future meetings.

Let's see what the technical flow looks like:

add_memories(text) after each call with the transcript and action items.
Next meeting: search_memory("open items for Project X") runs before the call starts.
Related memories (tagged by appropriate category) are shown in the UI and audit logs trace which memory was read and when.

Tip: Integrate with tools (like Google Drive, Notion, GitHub) so that stored action items link back to live documents and tasks.

✅ Agentic coding assistant that evolves with usage

Your CLI-based coding assistant can learn how you work by storing usage patterns, recurring questions, coding preferences and project-specific tips.

The technical flow looks like:

When you ask: “Why does my SQLAlchemy query fail?”, it stores both the error and fix via add_memories.
Next time you type: “Having issues with SQLAlchemy joins again,” the assistant auto-runs search_memory("sqlalchemy join issue") and retrieves the previous fix.
You can inspect all stored memory via the /memories dashboard and pause any that are outdated or incorrect.

In each case, OpenMemory’s combination of vector search (for semantic recall), relational metadata (for audit/logging) and live dashboard (for observability and on-the-fly access control) lets you build context-aware applications that just feel like they remember.

Now your MCP clients have real memory.

You can track every access, pause what you want and audit everything in one dashboard. The best part is that everything is locally stored on your system.