Anthropic Claude Pricing: Subscription Plans and API Costs

Posted In

Miscellaneous

Posted On

February 22, 2026

Summarize with AI

Summarize

Blogs

Summarize

Blogs

Summarize

Blogs

Summarize

Blogs

On This Page

Posted On

February 22, 2026

Posted In

Miscellaneous

Summarize with AI

Summarize

Blog

Summarize

Blog

Summarize

Blog

Summarize

Blog

You're building something on Claude. It's working. Then usage picks up, you check your bill, and the number is twice what you expected. Token costs compound fast - and if you don't understand exactly how Claude charges for input, output, caching, and long-context requests, you'll keep getting surprised.

This guide breaks down every Claude pricing tier: subscription plans for individuals and teams, API token costs for developers, and the specific mechanics behind caching, batch processing, and tool usage. It also covers where those costs come from at the code level - and what you can do to bring them down.

TLDR

  • Claude has two pricing models: flat-rate subscriptions for chat users, and pay-per-token API pricing for developers.

  • Free: Basic access with rate limits (roughly 10-15 messages per session, per community reports).

  • Pro: $20/month ($17 billed annually) - approximately 5x more usage than Free.

  • Max: $100/month (5x Pro usage) or $200/month (20x Pro usage).

  • Team: $25/seat/month standard ($20 annual); $125/seat/month premium ($100 annual).

  • API: Haiku 4.5 at $1/$5 per MTok, Sonnet 4.5/4.6 at $3/$15, Opus 4.5/4.6 at $5/$25. Prompt caching cuts costs by up to 70-90% on repeated context.

How Do Claude's Subscription Plans Compare at a Glance?


Free

Pro

Max

Team

Enterprise

Price

$0

$17-$20/month

$100-$200/month

$20-$25/seat/month or $100-$125/seat/month

Starts at ~$20/seat/month; usage billed at API rates

Usage

Basic

More than Free

5x or 20x more than Pro

More than Pro

Pooled across org

Key features

Chat (web, mobile, desktop), data visualization, web search, Slack/Google Workspace, MCP, extended thinking

Claude Code, Cowork, unlimited projects, Research access, cross-conversation memory, Claude in Excel and Chrome

Higher output limits, early feature access, priority at peak traffic, Claude in PowerPoint

Everything in Max, central billing, admin controls, org-wide search, no model training on your data by default

500k context window, RBAC, audit logs, SCIM, HIPAA-ready option, compliance API

Models

Sonnet and Haiku

Sonnet, Opus, and Haiku

Sonnet, Opus, and Haiku

Sonnet, Opus, and Haiku

Sonnet, Opus, and Haiku

Best for

Simple tasks and lightweight use

Individual professionals and power users

Users who hit Pro limits daily

Small teams of 5-75

Large organizations

What Does Claude Cost for Individual Users?

Claude's individual pricing is subscription-based, tiered across Free, Pro, and Max. Here's what each level actually gives you.

Claude Free Plan

The Free plan covers basic usage: Sonnet and Haiku access via web, mobile, or desktop, along with image analysis, file creation, code execution, and web search. If you need a lightweight assistant for occasional tasks, this may be all you need.

The plan is rate-limited. Based on community reporting, users typically hit limits after roughly 10-15 messages per session, depending on message length, file sizes, and conversation depth. Anthropic does not publish exact limits, but the in-app usage monitor gives you a live read on where you stand.

Claude Pro Plan

Pro runs $20/month (or $17/month billed annually at $200 upfront). It includes access to Opus, Sonnet, and Haiku, along with Claude Code, Cowork, unlimited projects, Research access, cross-conversation memory, and Claude in Excel and Chrome.

The usage ceiling is meaningfully higher than Free. Community reports put it at around 45 prompts per session before throttling, with a reset window of approximately 5 hours. These figures are user-reported, not confirmed by Anthropic. The in-app usage monitor is the most reliable guide.

Claude Max Plan

Max comes in two tiers. At $100/month, you get 5x the usage of Pro. At $200/month, you get 20x. Both tiers include higher output limits, priority access during peak traffic, early access to new features, and Claude in PowerPoint.

For developers and researchers burning through Pro limits daily, Max is often cheaper than the equivalent API usage - especially at the 20x tier when you're running multiple long-context sessions.

What Does Claude Cost for Teams and Enterprises?

Claude has three separate pricing plans for teams, enterprises, and educational institutes. Let's look at those.

Team Plan

The Team plan is built for organizations of 5 to 75 users who need centralized management without the overhead of a full enterprise deployment.

The Standard seat tier costs $25/seat/month ($20 billed annually) and includes everything in Max plus central billing, admin controls, org-wide enterprise search, and a default no-training-on-your-data policy. The Premium seat tier runs $125/seat/month ($100 annually) with 5x the usage of the standard tier.

Enterprise Plan

Enterprise is for large-scale deployments. It adds everything in Team plus an enhanced 500k context window, SCIM provisioning, audit logs, compliance API, custom data retention, IP allowlisting, role-based and network-level access controls, and a HIPAA-ready option.

Anthropic does not publish Enterprise pricing. Community reports suggest a minimum of around $60/seat with a 70-user floor - but treat those numbers as anecdotal. Contact the Anthropic sales team for actual figures.

Education Plan

Anthropic offers a discounted plan for universities and educational institutions, covering students, faculty, and staff. Details and access require reaching out to the Anthropic education team.

How Does Claude API Pricing Work?

API billing is token-based. A token is roughly 4 characters or 0.75 words in English. For practical reference:

  • 100 tokens ~ 75 words

  • 1-2 sentences ~ 30 tokens

  • 1 paragraph ~ 150 tokens

  • 1M tokens ~ 750,000 words

Input tokens (your prompt) and output tokens (Claude's response) are billed separately and both count toward Claude's context window. The standard context window is 200k tokens. Opus 4.5/4.6 and Sonnet 4.5/4.6/4 support up to 1M tokens in Team and Enterprise plans.

If you send 50k input tokens, Claude has up to 150k tokens available for output within the 200k standard window.

What Is the Model Pricing Breakdown for Opus, Sonnet, and Haiku?

The table below shows input/output pricing for all current Claude models across standard (200k or fewer input tokens) and long-context (more than 200k input tokens) requests. MTok = million tokens.

Model

Input (≤200k)

Input (>200k)

Output (≤200k)

Output (>200k)

Opus 4.5/4.6

$5/MTok

$10/MTok

$25/MTok

$37.50/MTok

Sonnet 4.5/4.6

$3/MTok

$6/MTok

$15/MTok

$22.50/MTok

Haiku 4.5

$1/MTok

-

$5/MTok

-

The 200k threshold is based on input tokens only. If your input exceeds 200k, all tokens in that request - input and output - shift to premium rates.

Example: A request using Sonnet 4.6 with 250k input tokens and 5k output tokens:

  • Input: 250k × $6/MTok = $1.50

  • Output: 5k × $22.50/MTok = $0.11

  • Total: $1.61

The same request capped at 200k input tokens:

  • Input: 200k × $3/MTok = $0.60

  • Output: 5k × $15/MTok = $0.08

  • Total: $0.68

Crossing the 200k threshold on input doubled the total cost despite only a 25% increase in input size. Staying under the threshold where possible is one of the more effective cost controls available.

How Does Prompt Caching Reduce API Costs?

Prompt caching lets Claude reuse a stored prefix from prior requests instead of reprocessing it from scratch. This cuts both processing time and cost on repetitive tasks - repeated system prompts, multi-turn conversations, and document analysis pipelines all benefit significantly.

By default, cached content has a 5-minute TTL (time-to-live), refreshed for free each time the cached content is used. A 1-hour cache option is available for content accessed in longer intervals.

Models

5m Cache Write (1.25x input)

1h Cache Write (2x input)

Cache Read (0.1x output)

Opus 4.5/4.6

$6.25/MTok

$10/MTok

$0.50/MTok

Sonnet 4.5/4.6

$3.75/MTok

$6/MTok

$0.30/MTok

Haiku 4.5

$1.25/MTok

$2/MTok

$0.10/MTok

Scenario: Legal document analysis. A law firm analyzes a 150k-token contract with 10 queries (2k tokens each) over 2 hours using Sonnet 4.6 with 1-hour caching. The first request costs $0.91 (150k × $6/MTok cache write + 2k × $3/MTok query). Each following request costs $0.05 (150k × $0.30/MTok cache read + 2k input). Total across 10 queries: $1.37, versus $4.56 without caching - a 70% reduction. The 1-hour TTL was the right choice here because a 30-minute gap between queries would have expired a 5-minute cache and forced a full rewrite.

What Does Batch Processing Cost?

Batch processing handles large volumes of requests asynchronously through the Message Batches API. Instead of submitting requests one at a time, you submit them in bulk and receive responses when the full batch is complete. This suits content processing, data extraction, and classification tasks well.

Batch API pricing is 50% of standard API rates. For maximum savings, batch processing can be combined with prompt caching.


What Do Claude's Tools and Extras Cost?

Some tools carry additional costs on top of base API rates:

  • Fast mode for Claude Opus 4.6 delivers faster output at 6x standard rates.

  • Client-side tools add tokens automatically: bash (+245 tokens), text editor (+700 tokens), computer use (+735 tokens plus 466-499 system prompt tokens). All billed at standard base rates.

  • Web Fetch (server-side) has no additional cost. You pay standard rates for the fetched content.

  • Web search costs $10 per 1,000 searches, plus standard token costs for search-generated content.

  • Code execution includes 1,550 free hours per month. Beyond that, it's $0.05/hour per container with a 5-minute minimum billing window. Pre-loading files triggers billing even if the tool is never called.

Why Do Token Costs Compound at the Code Level?

Even with a solid understanding of the pricing tables, costs can spiral in ways that aren't immediately obvious. The problem is usually context. Every session loads system prompts, prior conversation state, and any persistent instructions - and all of that counts as input tokens before Claude generates a single word of output.

For developers building on Claude Code, this compounds further. Claude Code's auto-memory feature records learnings and patterns during task execution, and its Claude.md files accumulate instructions across conversations. Both are loaded at session startup. On large projects, these files grow large, and a significant portion of every session's token budget gets consumed before the actual work begins. As those files grow, so does your bill - silently, and on every session.

This is the core failure mode of stateless AI agents: without intelligent memory management, agents load everything they've ever known instead of retrieving what's actually relevant. The longer a project runs, the worse the overhead becomes.

How Does Mem0 Reduce Claude Token Usage?

Mem0 is a memory layer for AI applications that replaces full-context loading with targeted retrieval. Rather than storing full conversation transcripts and reloading them entirely each session, Mem0 extracts high-signal facts and stores them in a structured memory store. At query time, Mem0 retrieves only the memories relevant to that specific query - not everything, just what matters.

The result is that each session starts with a much smaller, more relevant context. Per Mem0's research paper, this approach reduces token usage by up to 90% compared to full-context retrieval methods - not by discarding information, but by being precise about what gets loaded and when.

For Claude Code specifically, Mem0 replaces the growing Claude.md and auto-memory files with a persistent, queryable memory store. You can set up persistent memory for Claude Code in about five minutes.

The pattern generalizes. Whether you're building a context-aware chatbot, a multi-turn agentic RAG system, or navigating the tradeoffs between short- and long-term memory across agent sessions, the same principle applies: load less, retrieve smarter, spend less.

What Does Mem0 Look Like in Production?

Three case studies show how this plays out at different scales.

OpenNote reduced token costs by 40% by replacing full conversation context with Mem0's selective retrieval. Users got more personalized responses. The platform spent less per query.

RevisionDojo saw a similar 40% token reduction, with the added benefit that the AI tutor retained user-specific learning patterns across sessions without reloading full history every time.

Sunflower scaled to 80,000 users on a recovery support platform where personalization was non-negotiable. Mem0 made per-user memory practical at that volume without the cost structure blowing out.

For teams evaluating memory solutions, Mem0's benchmark against OpenAI Memory, LangMem, and MemGPT on the LOCOMO dataset is the most rigorous head-to-head available.

How Does Claude Pricing Compare to ChatGPT and Gemini?

Category

Claude

ChatGPT

Google Gemini

Basic subscription

Pro: $20/month

Plus: $20/month

Pro: $20/month

Premium subscription

Max 5x: $100/month; Max 20x: $200/month

Pro: $200/month

Ultra: ~$42/month ($125/3 months)

Mid-tier API

Sonnet 4.5/4.6: $3/$15

GPT-5.2: $1.75/$14.00

Gemini Flash: $0.50/$3.00

Flagship API

Opus 4.6: $5/$25

GPT-5.2 Pro: $21/$168

Gemini Pro: $2/$12

Claude's flagship Opus 4.6 is substantially cheaper than OpenAI's flagship on the API. Sonnet is competitive in the mid-tier. Haiku undercuts most budget models. The 200k context pricing premium is specific to Claude - neither OpenAI nor Gemini structures long-context pricing the same way, so factor that in when modeling costs for long-document workloads.

How Do You Choose the Right Claude Plan?

For most individuals, the decision comes down to usage volume. The Free plan works for casual use. Pro handles most professional workloads. If you're hitting Pro's limits daily, Max 5x ($100/month) is the next step - and at heavy usage, Max 20x ($200/month) is often cheaper than the equivalent API spend.

For teams, the Standard Team seat ($25/month) is the entry point. The Premium seat tier ($125/month) makes sense when your team runs workloads that would otherwise require individual Max subscriptions.

For developers using the API: at moderate usage, Sonnet 4.5 or 4.6 at $3/$15 per MTok is the most cost-effective entry point for serious work. Combine it with prompt caching and batch processing, and the effective per-token cost drops substantially. For teams consistently processing millions of tokens per day, the Max 20x plan at $200/month frequently undercuts the API equivalent - run the math against your specific usage pattern before defaulting to API access.


Conclusion

Claude's pricing structure is straightforward in outline but has meaningful complexity in the details - particularly the 200k context threshold, caching mechanics, and the way session overhead accumulates at the code level. Subscription tiers run from Free to Max at $200/month for individuals, with Team and Enterprise plans for organizations. API pricing has dropped substantially with each model generation: Opus went from $15/$75 to $5/$25 per million tokens. Sonnet sits at $3/$15. Haiku at $1/$5.

The most overlooked cost driver is context loading. Every session that loads a full conversation history or a large instructions file spends tokens before doing any real work. Managing what gets loaded - through prompt caching, batch processing, and tools like Mem0 - is where meaningful cost reduction actually happens.

FAQs

What is the cheapest way to use Claude?

The Free plan costs nothing and covers basic chat access via web, mobile, and desktop. It includes Sonnet and Haiku models, web search, and file creation. The tradeoff is rate limits — most users hit them after roughly 10–15 messages per session based on community reports.

Is Claude Pro worth it at $20 a month?

For regular professional use, yes. Pro gives you access to all three model tiers (Haiku, Sonnet, and Opus), Claude Code, Research access, cross-conversation memory, and roughly 5x more usage than the Free plan. The annual billing option brings it down to $17/month.

What happens when you exceed the 200k token limit on the API?

All tokens in that request shift to premium pricing — both input and output. A request with 250k input tokens on Sonnet 4.6 costs $1.61 total versus $0.68 if you stay under 200k. Staying below the threshold where possible is one of the most effective cost controls available.

How much does Claude's API cost per month for heavy users?

It depends heavily on model choice and context length. At Sonnet 4.6 rates ($3/$15 per MTok), a developer processing 5 million tokens of input and 1 million tokens of output daily would spend roughly $525/month without caching. With prompt caching on repeated context, that figure can drop by 70–90%. At that volume, the Max 20x plan at $200/month is often cheaper.

What is prompt caching and how much does it save?

Prompt caching stores a reusable prefix from your prompt so Claude does not reprocess it on every request. Cache reads cost 0.1x the output rate — on Sonnet 4.6, that is $0.30/MTok versus $15/MTok for standard output. In real workloads like repeated document analysis, this cuts total costs by around 70%.

Does Anthropic train on my data?

On the Team plan and above, Anthropic does not train on your data by default. On the Free and Pro plans, Anthropic's standard data usage policies apply. Enterprise customers can negotiate custom data retention terms.

What is the difference between Max 5x and Max 20x?

Both Max tiers include the same features: higher output limits, priority access at peak traffic, early access to new features, and Claude in PowerPoint. The only difference is usage volume. Max 5x ($100/month) gives you five times the usage of Pro. Max 20x ($200/month) gives you twenty times. For users hitting Pro limits daily, Max 20x frequently costs less than the equivalent API spend.

Can I reduce token usage when building with Claude Code?

Yes. Claude Code's auto-memory and Claude.md files load at every session startup and grow over time, burning tokens before any real work begins. Tools like Mem0 replace full-context loading with targeted memory retrieval, reducing token usage by up to 90% compared to naive full-context methods according to Mem0's research.

On This Page

Subscribe To New Posts

Subscribe for fresh articles and updates. It’s quick, easy, and free.

No spam. Unsubscribe anytime.

No spam. Unsubscribe anytime.

No spam. Unsubscribe anytime.

Give your AI a memory and personality.

Instant memory for LLMs—better, cheaper, personal.

Give your AI a memory and personality.

Instant memory for LLMs—better, cheaper, personal.

Give your AI a memory and personality.

Instant memory for LLMs—better, cheaper, personal.

Summarize with AI

Summarize

Blog

Summarize

Blog

Summarize

Blog

Summarize

Blog

© 2026 Mem0. All rights reserved.

Summarize with AI

Summarize

Blog

Summarize

Blog

Summarize

Blog

Summarize

Blog

© 2026 Mem0. All rights reserved.

Summarize with AI

Summarize

Blog

Summarize

Blog

Summarize

Blog

Summarize

Blog

© 2026 Mem0. All rights reserved.