Stop running out of tokens in Claude

Why your Claude keeps slowing down

Quick story so the rest of this page makes sense. Every time you send Claude a message, it doesn't just read your new message — it re-reads the entire conversation from the beginning. Every previous question, every previous answer, every file you pasted in.

So a fresh chat with one short question = cheap. The same short question 30 messages deep into a long chat = wildly more expensive, because Claude is re-processing everything that came before it. People have measured this and found that more than 90% of the "tokens" they burn aren't on their actual question — they're on Claude re-reading old history that nobody needs anymore.

The "tokens" thing in plain English

A token is just a chunk of text — roughly ¾ of a word. Claude's "context window" is the maximum amount of text it can hold in a single conversation. When you hit your usage limit or get the "Claude is slowing down for you" message, it's almost always because your conversation has gotten too long, not because Claude is broken.

Once you understand that, the playbook becomes obvious: don't let your chats get long. That's the whole game. Everything below is just five different ways to stop that from happening.

The Handoff (the single biggest move)

This is the move that changes everything. Once you've done it a couple of times, you'll wonder why nobody told you sooner.

The pattern: Every 15-20 messages — or any time a chat starts to feel slow — you ask Claude to summarise what you've worked on so far. Then you start a brand new chat and paste that summary in as the very first message.

That's it. That's the whole tactic. You've kept the bits that matter (decisions made, files created, context) and ditched the bits that don't (every back-and-forth, every "wait actually" course-correct, every tangent). New chat, fresh tokens, way faster Claude.

The exact prompt — copy this

Paste this at the end of your old chat. Claude will give you a tidy handoff doc. Then start a new chat and paste the doc in as message #1.

I want to start a fresh chat to save tokens. Please write me a handoff document I can paste into a new conversation, so the next chat picks up exactly where this one leaves off. Include:

1. The goal — what we're actually trying to do
2. What's already been decided (and any open questions)
3. Any files, links, or specific details the next chat will need
4. The very next step

Write it as if you're handing off to a teammate who has no context. Skip the chit-chat. No preamble.

Why this works (the 60-second version)

Long chat = Claude re-reads everything every turn = expensive and slow. Short chat with a tight summary = Claude only carries the bits that matter = cheap and fast. You're not losing your work — you're compressing it. The summary becomes your save file.

When to do it

Every 15-20 messages — set yourself a mental timer. The longer you wait, the more painful the chat gets.
The moment Claude starts repeating itself or feels sluggish — that's the signal it's working too hard to remember.
Before any big task switch — finishing one piece of work and starting another? Handoff first, every time.
At the end of a working session — so tomorrow's chat starts strong instead of wading through yesterday's mess.

If you're using Claude Code

Same idea, less manual work. Claude Code has a built-in /compact command that does the handoff for you in the same chat — it summarises everything older into a tight version and frees up space. Type /compact, hit enter, keep going. (More on this in Section 3.)

The 4 quick wins (10 seconds each)

Once the Handoff is in your toolkit, these four are the rest of the kit. None of them require any technical knowledge — they're just habits you build over the next week.

For Claude Code

Use `/compact` instead of starting over

In Claude Code, /compact shrinks your conversation down to a summary so you can keep going in the same chat. Use it whenever your session feels heavy. Use /clear when you're switching to a totally unrelated task — it wipes the chat and starts fresh.

For Claude.ai

Start a new chat — sooner than feels natural

Most people stay in one chat way too long out of habit. New topic? New chat. Done with the current job? New chat. The "Claude is slowing down" message is almost always your fault for not starting one sooner. Build the muscle.

For Claude.ai

Put your reference docs in a Project

If you keep pasting the same brand voice doc, ICP doc, or SOP into every chat — stop. Upload them once into a Claude Project. Then every chat inside that Project has them already, and they're way cheaper to reuse than re-pasting them every time.

Both

Use Sonnet for most things — not Opus

Opus is the "smartest" model but it's also the most expensive and the first one to hit limits. For 80% of tasks (writing, summarising, brainstorming), Sonnet is genuinely fine. Save Opus for the gnarly thinking — strategy, debugging weird logic, multi-step plans.

The rule of thumb

Default to Sonnet. Switch to Opus when Sonnet gets it wrong. You'll be amazed how rarely that happens. You can switch the model from the dropdown in Claude.ai or with /model in Claude Code.

The counterintuitive one nobody talks about

Here's the thing nobody mentions: Claude's replies cost about 5x more than your messages. Input tokens (your stuff) are cheap. Output tokens (Claude's stuff) are the expensive ones. So the single fastest way to save tokens isn't even about your prompts — it's about telling Claude to shut up faster.

When you ask a question, Claude's default is to be polite, restate your question, walk you through its reasoning, summarise its answer, and offer follow-up suggestions. That's a lot of words you didn't ask for. Each one costs you.

The fix is one line at the top of any prompt:

Skip the preamble. Don't restate my question. Give me the answer only.

You can also set this once-and-forget by going to Claude's settings and using a custom style or profile preference that says "Be concise. Skip pleasantries. No preambles or summaries unless I ask." Claude follows it across every new chat without you typing it again.

Real talk on what this saves

People who measure this report 30-60% fewer output tokens per response, just from cutting the preamble and self-summary. That's not nothing — that's potentially doubling how much you can do in a 5-hour window. It's the highest-leverage change for the lowest effort on this whole page.

What you can safely ignore

If you read three Reddit threads on saving tokens, you'll see a hundred "tips" that don't actually matter for non-technical users. Here's the list of stuff to not waste your time on:

Prompt caching. You'll see this everywhere. It's a developer/API thing — Anthropic does it automatically when it makes sense, you don't have to set it up. If you're using Claude.ai or Claude Code as a normal human, ignore it completely.
The exact token counts of every prompt. Nobody's actually tracking their tokens to four decimal places. The five habits on this page do 95% of the work — chasing the last 5% is a rabbit hole.
Special "token-saving" prompt formats. You don't need to write everything in markdown tables or learn arcane prompt syntax. Just write clearly. Long rambly prompts are wasteful, but a normal-length English question is totally fine.
"Tricking" Claude into being faster. Some posts swear that asking nicely or using emoji or saying "please be brief" works — and to be fair, "be brief" does help. But anything more elaborate than that is folklore.
Worrying about every image you upload. Yes, image attachments cost more tokens than text. No, you don't need to never use them. Just don't paste 12 screenshots into one chat when one would do. Crop tighter when you can.

The whole playbook in one sentence

Use the Handoff every 15-20 messages, default to Sonnet, tell Claude to skip the preamble, and put your reference docs in a Project. That's it. That's the playbook. You can stop here and you'll already be using Claude better than 90% of people on it.

Want community + support along the way?

If you want to be in a room with other women figuring this out — asking questions, sharing wins, getting unstuck together — come into the Wright Mode Membership. We share workflows, prompts, and skills like the ones in this playbook every week.

Community

Wright Mode Membership

Weekly trainings, a vault of prompts and skills, and a room of women building real AI workflows together.

Join us →

Masterclass

Claude Code Masterclass

The complete walkthrough for non-coders. Install, set up, and start building with Claude Code — from zero.

Learn more →

Daily

Follow me on IG

@wright_mode — that's where I drop the new prompts, skills, and workflows the moment I find them.

Follow →

Stop running out of tokens in Claude

Stop running out of tokens in Claude

What's inside

Why your Claude keeps slowing down

The "tokens" thing in plain English

The Handoff (the single biggest move)

The exact prompt — copy this

Why this works (the 60-second version)

When to do it

If you're using Claude Code

The 4 quick wins (10 seconds each)

Use /compact instead of starting over

Start a new chat — sooner than feels natural

Put your reference docs in a Project

Use Sonnet for most things — not Opus

The rule of thumb

The counterintuitive one nobody talks about

Real talk on what this saves

What you can safely ignore

The whole playbook in one sentence

Want community + support along the way?

Wright Mode Membership

Claude Code Masterclass

Follow me on IG

Use `/compact` instead of starting over