In partnership with

AIHOY AInauten,

Welcome to the latest edition of your favorite newsletter.

AI feels almost limitless right now. More models, more agents, more tools, more automations. At the same time, new limits are appearing: your context window is full, your subscription is throttled, your favorite model is unavailable, or access is suddenly being negotiated politically.

This issue is about that new scarcity. We show you how to use your token budget better, why Claude in the team chat changes more than Slack convenience, and why GPT-5.6 Sol could be a preview of how frontier models may be distributed from now on. Sounds good?

Here is what we have for you today:

  • 🐳 7 Token-Maxxing Rules: How to Get More Out of Your Subscription

  • πŸ€– Claude @: Why the Next Agent Sits in the Team Chat

  • πŸ₯² GPT-5.6 Is Here. The Question Is When You Get to Use It.

Let’s go.

Keep up with AI in 5 minutes a day

Keep up with AI in 5 minutes a day. MavSource aggregates updates from all major AI newsletters, podcasts, companies, AI labs, and hundreds of other sources β€” then summarizes the key updates and analyzes the trends shaping AI. Free. Daily.

🐳 7 Token-Maxxing Rules: How to Get More Out of Your Subscription

If your AI subscription hits the limit too early, it is rarely because of one big prompt. Most of the time, your setup eats the context window one small bite at a time.

In practice, we see two groups fighting the same problem:

  • Power users: if you pay for a large subscription, you want to actually use the token budget you paid for.

  • Occasional users: if you are on a free plan or a smaller subscription, you do not want to hit the limit exactly when the AI finally understands what needs to happen.

Your AI limit is rarely consumed by one single chat. It is slowly drained by old threads, huge uploads, unclear tasks, active tools, correction loops, and premium models doing tiny chores.

We pulled together the 7 most important rules so your subscription lasts longer and you get more value from it.

Note: Desktop apps such as Claude Desktop and Codex benefit most from these tips because they can work with local files, folders, and tools. That is the Files-over-Tools principle.

1. Separate Planning and Execution

The mistake is not always the wrong model. Often it is using one model for everything.

If you have a lot of material and first need orientation, a cheaper model is often enough for scouting. It finds levers, sorts options, and evaluates effort.

If the decision itself is hard, flip the setup: let a strong model plan. A cheaper model can then execute the clearly described individual steps.

Short planning prompt:

Separate this task into planning and execution.
Tell me which layer needs a strong model.
Suggest one cheaper and one stronger model tier.
Do not execute until I approve the plan.

Model names change depending on the interface. As a rough guide:

  • Instant, Medium, or Sonnet-level models are often enough for scouting, structure, drafts, and routine work.

  • High, Extra-High, Opus, or Ultracode-level models belong at points where a wrong decision has consequences.

  • Pro Extended or Max modes are not the default gear. Use them when the input is difficult, the error would be expensive, or the result directly creates value.

Model selection in ChatGPT

Model selection in Codex

Model selection in Claude

Rule of thumb: think with the expensive model, execute with the cheaper one. Or scout cheaply, solve expensively.

The main thing: do not blast everything through the same model tier.

2. Give a Clear Goal, Not β€œJust Make It Better”

Weak assignments cost tokens, time, and nerves because the model first has to guess what matters. We admit it: β€œmake this better” was one of our favorite prompts last year after reviewing our ChatGPT usage. But we learned from it.

So include these four lines and briefly force your own brain to formulate them properly. If you need help, ask the agent to give you suitable answer options for each field.

Goal: [What should get better?]
Material: [Which files or links count?]
Output: [What should exist at the end?]
Boundary: [What should not be touched?]

3. Smaller Context Beats Larger Context

Tokens are the model’s currency. Your question costs tokens. The answer costs tokens. Older messages, files, tools, and outputs cost tokens too.

It can even go so far that the entire chat history is sent along with every interaction and keeps eating tokens.

So ask yourself: what does this chat need before it can work? If the answer is β€œeverything,” it will become slow, expensive, and less precise.

Better: provide only the relevant context, not your entire archive.

4. Turn Files Into the Workspace

Storage is cheap. Context is expensive.

A file on your drive costs no tokens as long as the model does not read it. It becomes expensive only when you paste it into the chat, upload it, or actively ask the model to process it. If you upload it into the chat, it fills the context window.

So: Files over Tools.

Pull your knowledge out of chats and save it locally in Markdown files. You can use the prompt from tip 5 for that.

Desktop and CLI agents such as OpenAI Codex or the Claude Desktop app can work very well with local Markdown files.

5. Start Fresh Earlier or Summarize Your Chat Cleanly

Long chats feel convenient, but at some point they get sluggish.

When one chat contains several topics, old decisions, and correction loops, context usage goes up. At the same time, the model often answers more vaguely.

So make a clean handoff and start a new chat.

We put together this prompt to summarize any chat so you have the relevant context for a fresh start.

That saves tokens and prevents outdated knowledge from slipping in.

6. Load Tools Only When You Need Them

Connectors, MCPs, plugins, and browser tools are powerful. But they cost tokens in the context window.

Put differently: if you only want to write a text, you do not need a whole toolbox with 27 other tools active.

Check the base usage. In Claude Code, /usage shows what an empty chat already consumes. In Codex, /status shows how much context is already occupied.

In ChatGPT on the web and Claude Cowork, you can manually check which tools, plugins, and connectors are active.

The rule stays the same: anything active should have a job. Disable what you do not need.

Advanced users can turn some MCPs into CLIs. Then the tool stays outside the chat and is called only when needed.

Review my active tools, MCPs, plugins, and connectors.
Sort them into: constantly needed, rarely needed, currently unnecessary.
Recommend what I should disable or move into a CLI
so more context remains for the actual task.

For that, look at MCPorter and mcp2cli. We explained the background in our AInauten Knowledge Base: CLI or MCP: when do you use what?

7. AI Is a Slot Machine. Use That.

AI is sometimes a slot machine: one run lands, the next one goes completely sideways.

Use that deliberately. If the model is clearly moving in the wrong direction, do not feed the chat with five repair rounds. That is more context you drag along, and worse, it is flawed context.

Edit a prompt in ChatGPT

Edit a prompt in Codex

Stop early, edit the original prompt, or start a new chat with a clean briefing. That helps you avoid context contamination: wrong assumptions, half-corrections, and failed attempts otherwise remain in the context and pull the next run off track.

You can also try different paths in parallel by using branch or fork features.

Create a branch in ChatGPT

Fork a chat in Claude Desktop

The Best Subscription Hack Is Better Direction

If you constantly hit your limit, memorize these tips. They are your checklist.

If you never hit your limit, they are your quality lever. Give it a try!

Six people doing the work. Your headcount is one.

Your finance close runs in #finance. Stripe and QuickBooks reconciled, runway updated, posted Sunday night without you asking.

Engineering review lands in #eng. Viktor pulled the open PRs, left comments on auth-refactor, flagged a dependency blocking api-pagination.

Campaign brief lands in #growth: Meta CPA up 18%, recommendation to pause broad match, a draft landing page already deployed for the variant test.

You hired him on day zero. He lives in Slack and Microsoft Teams alongside your contractors and investors, connects to 3,000+ tools, pushes back when you ship something dumb.

"Viktor is now an integral team member, and after weeks of use we still feel we haven't uncovered the full potential." Patrick, Director, Yarra Web.

πŸ€– Claude @: Why the Next Agent Sits in the Team Chat

The new Claude Tag feature looks at first like another Slack integration. It is in beta for Claude Team and Enterprise. A team owner or admin sets up Claude for Slack, and after that the agent can be mentioned in approved channels.

But it is more than a simple bot in chat, because something changes with that little @: the private prompt gets an audience.

The Chat Gets Witnesses

Until now, AI work was often invisible. Someone opens a chatbot, writes a prompt with context, has something built, and brings the result back to the team. The output is there. The path remains hidden.

With @Claude, that process suddenly becomes transparent. Prompting gets witnesses. Delegation gets witnesses. Errors get witnesses. The private super-assistant becomes a team member with chat history.

Andrej Karpathy, the former OpenAI co-founder and Tesla AI lead, recently joined Anthropic for pre-training and is already excited about the new teammate.

You Are Sharing Your Workspaces With Permanent Agents

This fits a bigger trend: AI is moving out of the chat window and into the surfaces where we already work.

Claude Tag is the team-chat version of that. The agent sits where status, shortcuts, misunderstandings, and office politics already live.

Slack is not a neutral tool. Slack is the office hallway with search. When an agent shows up there, output, visibility, trust, and responsibility all land in the same thread.

In a private chat, Claude belongs to you. In a team channel, Claude belongs to the room.

Three people write along, two have data access, one person can see customer data, and someone else is only a guest. Whose context may Claude use? Whose boundaries apply? Who cleans up when the agent puts something wrong into the thread?

Anthropic addresses this with an agent identity model. Claude gets its own access, memories, and rules per workspace or channel. According to the docs, access follows the channel, not the person asking at that moment.

The new question is: who is allowed to bring AI into which workspace, and who is responsible when that room starts working differently?

Our Take: AI Disappears Into the Interface and Becomes Everywhere

Claude Tag is a strong signal for where this is going: the most useful agent sits where the next action happens.

Chatbots learn you and your company message by message. That is the value. It is convenient. And it is a shift in power.

That is also why some teams will feel weird about it.

πŸ₯² GPT-5.6 Is Here. The Question Is When You Get to Use It.

Last week, it sounded like: GPT-5.6 is coming. Buckle up.

Now GPT-5.6 Sol is here. Just not for us.

Access starts in a limited way after consultation with the U.S. government. No public waitlist. No regular users. Only selected partners get to try it first.

That is more than bureaucracy. Until now, β€œpartners first, everyone else later” was normal release logic. Now the U.S. government is visibly sitting near the lever. OpenAI itself says this process should not become the norm because it keeps the best tools away from users, developers, companies, and cyber defenders.

And OpenAI is not alone. Claude Mythos 5 is again allowed to go to certain U.S. organizations in critical infrastructure and cyber defense after negotiations. Fable 5 remains stuck. This is where the uncomfortable question starts: will regular users even see the strongest models in the future, or only the public, safely limited version?

The meme version: Anthropic CEO Dario Amodei ruined it for everyone. If labs themselves say their models could become dangerous, nobody should be surprised when officials eventually stand near the release button and the newest supermodels stay locked away for state-approved purposes.

But fine. If we do not get the new model, at least we get good AI drama and the hope that Fable 5 comes back before July 1, if you believe the gamblers on Polymarket.

Our Take: Your AI Stack Needs a Plan B

The most important question is not: when do I get GPT-5.6?

We are curious about GPT-5.6 and will test it as soon as we can. Of course. According to benchmarks, it is supposed to be ahead of Fable in some areas.

The more important question is: can my workflow switch if my favorite model is unavailable tomorrow?

So: store context in files and get it out of chatbots. Keep setups model-neutral. Build offline pipelines. See our post on the offline AI setup.

If we do not get a new model, at least we get more clarity about why Sam Altman was fired from OpenAI and how he ended up back in the CEO chair. β€œYou should, unfortunately, be worried about Sam Altman” is a very worthwhile film with a lot of background.

Done. See you in the next issue.

Reto & Fabian from AInauten

Login or Subscribe to participate

Keep Reading