Miscellaneous

Miscellaneous

How to Build Context Queries for AI Agents with Mem0

How to Build Context Queries for AI Agents with Mem0

Context queries are the primary interface between an AI agent and everything it has seen before. They decide what the model can remember, how it reasons across sessions, and how it acts consistently.

Shows how an agent turns a user message into explicit context queries against Mem0, then feeds selected memories plus the message into the LLM and writes new memories afterward. This clarifies context queries as the real interface between agent behavior and past experience.

Fig: How Mem0 sits between the agent and the LLM

Most production failures in agents look like context problems:

  • The agent forgets previous user preferences

  • It repeats the same questions across sessions

  • It ignores past API results or tool output

  • It hits context window limits and truncates something important

All of these point back to how context is queried, stored, and shaped before reaching the model.
This article walks through how context queries work, where they break, and how Mem0 provides a concrete memory layer for production agents.

What context queries are in practice?

In theory, context is "everything relevant passed to the model". However, in practice, context queries are explicit calls from the agent to a memory source to fetch information that should influence the current decision.

Typical context queries look like:

  • "What has this user asked in the last 30 days?"

  • "What tools did I call for this task earlier in the workflow?"

  • "What did the CRM API return for this customer last week?"

  • "What steps did I take previously in this multi-step task?"

They fall into a few categories:

  1. User memory: History of user preferences, profile, and constraints across sessions.

  2. Task memory: Long-running workflows, intermediate outputs, and failures.

  3. Knowledge memory: Extracted facts from documents, tools, or external data.

  4. Agent memory: How the agent itself has been corrected or steered over time.

Each category has different query patterns and lifetime expectations. Confusing them often leads to bad retrieval and wasted tokens.

From chat logs to structured memory

Naive agents use raw conversation history as context. They pass the last N messages into the prompt and rely on the model to figure out what matters.

This breaks quickly:

  • Context exceeds model limits

  • Important facts are buried in irrelevant small talk

  • Relevance is hard to control or explain

  • Different types of memory have different lifecycles

Production agents need an explicit memory layer that:

  • Stores atomic, structured memories, not just raw chat logs

  • Supports semantic and metadata-based queries

  • Controls what gets written and when it expires

  • Provides efficient read patterns for agents to use repeatedly

Mem0 is built specifically as that memory layer. It focuses on encoding user and task-level information into queryable memories that can be pulled back into context selectively.

Patterns for context queries in agents

Most agents combine several retrieval patterns.

1. Sequential history slices

The simplest pattern is sliced history:

  • Take the last K messages

  • Optionally summarize older ones

  • Attach to the prompt

This works for short-lived conversations but fails when:

  • Facts appear far back in history

  • Conversations span days or weeks

  • The same user interacts in different contexts or channels

2. Semantic search over memories

Here, each memory is a compact unit, for example:

  • "User prefers metric units."

  • "User's default project is 'infra-automation'."

  • "Last data export for this user failed due to a permission error."

The agent issues a semantic search query like:

User preferences for notifications and formatting for this session.

Relevant memories are returned based on embeddings and metadata.
This reduces noise and allows cross-session retrieval.

3. Hybrid context: semantic and scoped filters

Better context queries combine:

  • Semantic similarity

  • Structured filters (user_id, session_id, topic, memory_type)

  • Time-based constraints (last 30 days, last N events)

This allows patterns like:

  • "All billing-related issues for this user in the last 90 days."

  • "Latest tool outputs for the current task_id."

  • "Corrections to the agent's behavior within this workspace."

Mem0 exposes this hybrid pattern as first-class behavior, rather than leaving it to custom database logic for each team.

How Mem0 structures memories for context queries

Mem0 treats memory as first-class data with identity, metadata, and lifecycle. Instead of storing raw chat logs, agents write atomic memories along with structured context.

A memory in Mem0 includes:

  • memory Natural language text representing the fact or event

  • user_id Identity for per-user memories

  • metadata Arbitrary key-value data like type, topic, source, session_id

  • created_at and updated_at timestamps

  • embedding Behind the scenes, used for semantic search

For example, an agent might store:

{
  "memory": "User prefers CSV format for data exports.",
  "user_id": "user_123",
  "metadata": {
    "type": "preference",
    "topic": "export_format",
    "source": "chat"
  }
}
{
  "memory": "User prefers CSV format for data exports.",
  "user_id": "user_123",
  "metadata": {
    "type": "preference",
    "topic": "export_format",
    "source": "chat"
  }
}
{
  "memory": "User prefers CSV format for data exports.",
  "user_id": "user_123",
  "metadata": {
    "type": "preference",
    "topic": "export_format",
    "source": "chat"
  }
}

This design supports precise retrieval:

  • Fetch all type="preference" memories for a user

  • Filter by topic and sort by time

  • Combine preferences, past errors, and task context for a prompt

Mem0 also provides automatic extraction helpers so agents do not have to manually decide what to store on every turn. It can convert long chat logs into discrete memories with stable identifiers.

Integrating Mem0 into an agent for context

The core pattern is:

  1. User sends a message

  2. Agent queries Mem0 for relevant memories

  3. Agent builds a prompt using the current message and memories

  4. LLM returns a response

  5. Agent writes new memories based on the interaction

Below is a minimal Python example that illustrates this loop.

💡 Before you start: you need a Mem0 API key

import os
from mem0 import MemoryClient
from openai import OpenAI

# Configure clients
MEM0_API_KEY = os.getenv("MEM0_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

mem_client = MemoryClient(api_key=MEM0_API_KEY)
llm_client = OpenAI(api_key=OPENAI_API_KEY)

def query_memories(user_id: str, query: str, limit: int = 5):
    """
    Query Mem0 for relevant memories for this user.
    """
    results = mem_client.search(
        query=query,
        user_id=user_id,
        limit=limit,
        filters={"type": ["preference", "task", "feedback"]},
    )
    return results

def format_memories_for_prompt(memories):
    if not memories:
        return "No prior memories found."
    
    lines = []
    for m in memories:
        meta = m.get("metadata", {})
        prefix = meta.get("type", "memory")
        lines.append(f"- ({prefix}) {m['memory']}")
    return "\n".join(lines)

def call_llm(system_prompt: str, user_message: str):
    completion = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message},
        ],
        temperature=0.2,
    )
    return completion.choices[0].message.content

def write_memory_from_turn(user_id: str, user_message: str, agent_response: str):
    """
    Very simple heuristic: extract preference sentences and store them.
    In production, use LLM-based extraction or Mem0's higher-level APIs.
    """
    if "I prefer" in user_message:
        mem_client.add(
            memory=user_message,
            user_id=user_id,
            metadata={"type": "preference", "source": "chat"},
        )
    # Store the full turn as task context
    mem_client.add(
        memory=f"User said: {user_message}\nAgent replied: {agent_response}",
        user_id=user_id,
        metadata={"type": "task", "source": "chat_turn"},
    )

def handle_user_message(user_id: str, message: str) -> str:
    # 1. Query memories relevant to this message
    memories = query_memories(user_id, query=message, limit=6)
    memories_text = format_memories_for_prompt(memories)

    # 2. Build a context-aware system prompt
    system_prompt = f"""
You are a production assistant for data exports.

Use the following memories about the user and ongoing tasks when responding.
If a memory conflicts with the current request, ask the user to clarify.

Memories:
{memories_text}
    """.strip()

    # 3. Call the LLM
    response = call_llm(system_prompt, message)

    # 4. Write new memories based on this turn
    write_memory_from_turn(user_id, message, response)

    return response

if __name__ == "__main__":
    uid = "user_123"
    msg = "Can you export last month's sales in CSV format? I prefer CSV over Excel."
    reply = handle_user_message(uid, msg)
    print("Agent:", reply)
import os
from mem0 import MemoryClient
from openai import OpenAI

# Configure clients
MEM0_API_KEY = os.getenv("MEM0_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

mem_client = MemoryClient(api_key=MEM0_API_KEY)
llm_client = OpenAI(api_key=OPENAI_API_KEY)

def query_memories(user_id: str, query: str, limit: int = 5):
    """
    Query Mem0 for relevant memories for this user.
    """
    results = mem_client.search(
        query=query,
        user_id=user_id,
        limit=limit,
        filters={"type": ["preference", "task", "feedback"]},
    )
    return results

def format_memories_for_prompt(memories):
    if not memories:
        return "No prior memories found."
    
    lines = []
    for m in memories:
        meta = m.get("metadata", {})
        prefix = meta.get("type", "memory")
        lines.append(f"- ({prefix}) {m['memory']}")
    return "\n".join(lines)

def call_llm(system_prompt: str, user_message: str):
    completion = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message},
        ],
        temperature=0.2,
    )
    return completion.choices[0].message.content

def write_memory_from_turn(user_id: str, user_message: str, agent_response: str):
    """
    Very simple heuristic: extract preference sentences and store them.
    In production, use LLM-based extraction or Mem0's higher-level APIs.
    """
    if "I prefer" in user_message:
        mem_client.add(
            memory=user_message,
            user_id=user_id,
            metadata={"type": "preference", "source": "chat"},
        )
    # Store the full turn as task context
    mem_client.add(
        memory=f"User said: {user_message}\nAgent replied: {agent_response}",
        user_id=user_id,
        metadata={"type": "task", "source": "chat_turn"},
    )

def handle_user_message(user_id: str, message: str) -> str:
    # 1. Query memories relevant to this message
    memories = query_memories(user_id, query=message, limit=6)
    memories_text = format_memories_for_prompt(memories)

    # 2. Build a context-aware system prompt
    system_prompt = f"""
You are a production assistant for data exports.

Use the following memories about the user and ongoing tasks when responding.
If a memory conflicts with the current request, ask the user to clarify.

Memories:
{memories_text}
    """.strip()

    # 3. Call the LLM
    response = call_llm(system_prompt, message)

    # 4. Write new memories based on this turn
    write_memory_from_turn(user_id, message, response)

    return response

if __name__ == "__main__":
    uid = "user_123"
    msg = "Can you export last month's sales in CSV format? I prefer CSV over Excel."
    reply = handle_user_message(uid, msg)
    print("Agent:", reply)
import os
from mem0 import MemoryClient
from openai import OpenAI

# Configure clients
MEM0_API_KEY = os.getenv("MEM0_API_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

mem_client = MemoryClient(api_key=MEM0_API_KEY)
llm_client = OpenAI(api_key=OPENAI_API_KEY)

def query_memories(user_id: str, query: str, limit: int = 5):
    """
    Query Mem0 for relevant memories for this user.
    """
    results = mem_client.search(
        query=query,
        user_id=user_id,
        limit=limit,
        filters={"type": ["preference", "task", "feedback"]},
    )
    return results

def format_memories_for_prompt(memories):
    if not memories:
        return "No prior memories found."
    
    lines = []
    for m in memories:
        meta = m.get("metadata", {})
        prefix = meta.get("type", "memory")
        lines.append(f"- ({prefix}) {m['memory']}")
    return "\n".join(lines)

def call_llm(system_prompt: str, user_message: str):
    completion = llm_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message},
        ],
        temperature=0.2,
    )
    return completion.choices[0].message.content

def write_memory_from_turn(user_id: str, user_message: str, agent_response: str):
    """
    Very simple heuristic: extract preference sentences and store them.
    In production, use LLM-based extraction or Mem0's higher-level APIs.
    """
    if "I prefer" in user_message:
        mem_client.add(
            memory=user_message,
            user_id=user_id,
            metadata={"type": "preference", "source": "chat"},
        )
    # Store the full turn as task context
    mem_client.add(
        memory=f"User said: {user_message}\nAgent replied: {agent_response}",
        user_id=user_id,
        metadata={"type": "task", "source": "chat_turn"},
    )

def handle_user_message(user_id: str, message: str) -> str:
    # 1. Query memories relevant to this message
    memories = query_memories(user_id, query=message, limit=6)
    memories_text = format_memories_for_prompt(memories)

    # 2. Build a context-aware system prompt
    system_prompt = f"""
You are a production assistant for data exports.

Use the following memories about the user and ongoing tasks when responding.
If a memory conflicts with the current request, ask the user to clarify.

Memories:
{memories_text}
    """.strip()

    # 3. Call the LLM
    response = call_llm(system_prompt, message)

    # 4. Write new memories based on this turn
    write_memory_from_turn(user_id, message, response)

    return response

if __name__ == "__main__":
    uid = "user_123"
    msg = "Can you export last month's sales in CSV format? I prefer CSV over Excel."
    reply = handle_user_message(uid, msg)
    print("Agent:", reply)

This example shows the essence of context queries with Mem0:

  • Search with a natural language query and structured filters

  • Format memories into a compact section for the system prompt

  • Persist new memories after each turn

In a production setup, the extraction step is usually more sophisticated and uses the LLM or Mem0's own tools to detect stable preferences, identifiers, and long-lived facts.

Context query patterns with Mem0

Mem0 supports multiple query shapes that map directly onto agent patterns.

Per-user preference context

Retrieve stable user preferences to enforce consistent behavior:

prefs = mem_client.search(
    query="User preferences, defaults, and constraints.",
    user_id="user_123",
    filters={"type": ["preference"]},
    limit=10,
)
prefs = mem_client.search(
    query="User preferences, defaults, and constraints.",
    user_id="user_123",
    filters={"type": ["preference"]},
    limit=10,
)
prefs = mem_client.search(
    query="User preferences, defaults, and constraints.",
    user_id="user_123",
    filters={"type": ["preference"]},
    limit=10,
)

Task-scoped context

Attach a task_id or workflow_id to memories as the agent executes steps:

task_id = "workflow_456"

steps = mem_client.search(
    query="Steps and results for the current workflow.",
    user_id="user_123",
    filters={"task_id": [task_id]},
    limit=20,
)
task_id = "workflow_456"

steps = mem_client.search(
    query="Steps and results for the current workflow.",
    user_id="user_123",
    filters={"task_id": [task_id]},
    limit=20,
)
task_id = "workflow_456"

steps = mem_client.search(
    query="Steps and results for the current workflow.",
    user_id="user_123",
    filters={"task_id": [task_id]},
    limit=20,
)

Tool output recall

Store and recall expensive tool outputs instead of recomputing:

mem_client.add(
    memory="CRM API returned 142 open tickets for account ACME-001.",
    user_id="user_123",
    metadata={"type": "tool_result", "tool": "crm_api", "account_id": "ACME-001"},
)

crm_memories = mem_client.search(
    query="Recent CRM API results for account ACME-001.",
    user_id="user_123",
    filters={"type": ["tool_result"], "account_id": ["ACME-001"]},
    limit=5,
)
mem_client.add(
    memory="CRM API returned 142 open tickets for account ACME-001.",
    user_id="user_123",
    metadata={"type": "tool_result", "tool": "crm_api", "account_id": "ACME-001"},
)

crm_memories = mem_client.search(
    query="Recent CRM API results for account ACME-001.",
    user_id="user_123",
    filters={"type": ["tool_result"], "account_id": ["ACME-001"]},
    limit=5,
)
mem_client.add(
    memory="CRM API returned 142 open tickets for account ACME-001.",
    user_id="user_123",
    metadata={"type": "tool_result", "tool": "crm_api", "account_id": "ACME-001"},
)

crm_memories = mem_client.search(
    query="Recent CRM API results for account ACME-001.",
    user_id="user_123",
    filters={"type": ["tool_result"], "account_id": ["ACME-001"]},
    limit=5,
)

Mem0's consistent metadata structure makes these patterns easy to implement repeatedly across agents and services.

Comparing context query strategies

Summarizes the comparison table into a compact visual that contrasts raw history, manual vector stores, summarization, external DBs, and Mem0 as a hybrid layer. This supports architectural decisions about when to introduce Mem0 for context queries.

Fig: Common strategies with Mem0 as a structured memory layer

Different approaches to context queries have very different properties.
The table below compares common strategies with Mem0 as a structured memory layer.

Strategy

Strengths

Weaknesses

Good for

Raw chat history window

Simple to implement, no extra infra

Breaks on long sessions, noisy, hard to control

Prototypes, short-lived sessions

Manual vector store per project

Flexible, full control over schema

Custom glue code, inconsistent across services

Single-team experimental agents

Summarized history only

Token-efficient, compresses long timelines

Summaries may omit critical details, hard to query precisely

High-level assistants

External relational DB only

Strong structure, joins, analytics

Poor semantic search, hard to map to natural language

Strict business entities

Mem0 memory layer (hybrid)

Semantic search with metadata, user-scoped memory, structured lifecycle

Requires explicit memory model design

Production agents with long-term context

In practice, production setups often combine multiple strategies. Mem0 is built to be the dedicated layer that handles semantic and user-scoped memory while integrating cleanly with existing databases and tools.

Limitations of context queries as a pattern

Context queries solve a real memory gap, but they are not a universal fix.
There are structural limitations to this pattern that engineers should account for.

  1. Over-retrieval noise: Pulling too many memories into the prompt can confuse the model. Relevance scoring is imperfect, and agents can latch onto outdated or marginally relevant details.

  2. Stale or conflicting memories: Users change preferences, data changes, and workflows evolve. Without explicit invalidation or expiry, context queries may surface obsolete facts that lead to wrong actions.

  3. Latent dependencies on memory shape: The way memories are phrased and chunked affects retrieval. Two logically equivalent memories with different wording can have different embeddings and relevance behavior.

  4. Inference-time cost: Every memory query adds latency, and every extra token in a prompt adds cost. Aggressive use of context queries for every turn can hurt responsiveness in production.

  5. Ambiguous responsibility boundaries: If agents rely entirely on implicit context, it becomes harder to reason about behavior. Some constraints belong in business logic or tool implementations, not as soft memory hints.

Mem0 helps address several of these pain points through metadata filters, time-based querying, and structured memory. However, the core limitations come from how LLMs consume context and how agents are designed, so they must be treated as architectural concerns.

How Mem0 fits into production agent stacks

In production, Mem0 typically sits between the agent orchestrator and the rest of the stack:

  • Upstream, it integrates with chat frontends, workflow engines, and tool runtimes.

  • Downstream, it can reference identifiers from relational databases, object stores, and external APIs.

Common integration points include:

  • Storing user and workspace preferences as explicit memories

  • Recording corrections and feedback to adapt agent behavior

  • Persisting tool outputs or intermediate steps for reuse

  • Providing cross-session continuity for the same user across channels

Several best practices emerge for using Mem0 with context queries:

  1. Model memory types explicitly: Use type, topic, and source metadata to differentiate preferences, tasks, tools, and feedback.

  2. Keep memories atomic: Store single facts or small clusters of related information, not full transcripts.

  3. Design queries from the agent's perspective: Start from the information the agent needs to answer, then shape Mem0 queries accordingly.

  4. Control memory growth: Use expiry policies, soft deletion, and periodic cleanup for short-lived task memories.

  5. Monitor retrieval quality: Log Mem0 queries and LLM prompts, and review them for noise, omissions, and inconsistent behavior.

With these patterns, Mem0 turns context queries from an ad hoc retrieval step into a predictable, testable part of the agent architecture.

Frequently Asked Questions

What are context queries in AI agents?

Context queries are explicit retrieval calls that an agent makes to fetch relevant past information before responding. They shape what the model can "remember" from user history, tasks, and tool outputs at inference time.

When should an engineer introduce a memory layer like Mem0?

A dedicated memory layer becomes important when conversations span multiple sessions or when user-specific preferences and task states must persist. If agents are repeating questions or ignoring past corrections, it is time to introduce structured memory.

How does Mem0 differ from a basic vector store for context?

Mem0 stores memories with user-level identity, metadata, and lifecycle semantics, not just raw chunks. It provides first-class APIs for per-user and per-task retrieval, which simplifies agent integration and keeps memory behavior consistent across services.

How many memories should be loaded into context for each request?

Most production agents work well with a small set of highly relevant memories, often between 5 and 30 entries depending on the model and prompt design. It is better to focus on precise filters and relevance than to maximize the number of retrieved items.

Can context queries replace a relational database or source of truth?

No, context queries and memory layers are not a replacement for authoritative data stores. They complement them by capturing interaction-level and preference-level information that is impractical to model in rigid schemas.

How should engineers handle conflicting or outdated memories?

Agents should prefer explicit user statements in the current turn over past memories, and memory storage should include expiry or update logic. Mem0's metadata and timestamps make it easier to filter by recency and to override older entries with newer ones.

Further Reading

Mem0 is an intelligent, open-source memory layer designed for LLMs and AI agents to provide long-term, personalized, and context-aware interactions across sessions.

Get your free API Key here: app.mem0.ai or

Self-host mem0 from our open-source GitHub repository.

GET TLDR from:

Summarize

Website/Footer

Summarize

Website/Footer

Summarize

Website/Footer

Summarize

Website/Footer