👨‍🚀 How to Find Real Jobs for Your Agents

In partnership with

Hello AInauts,

Welcome to the latest issue of your favorite newsletter!

Today we are not starting with the next tool. We are starting with the question: What specific task should your agent handle for you every morning? With our prompts, you will find your first practical agent in three steps and can test it directly in your everyday work.

After that, we look at Claude Fable 5, which briefly showed what the AI future can feel like before it disappeared overnight. At the end, we test the counter-idea: if the supermodel suddenly vanishes, can a model team step in?

Here is what we have packed for you today:

👷‍♂️ How to find real jobs for your agents
🚨 Claude Fable 5 shutdown makes local models more important
👌 OpenRouter Fusion: Fable-level output from multiple models? Almost...

Let's go!

Learn AI in 5 minutes a day

You don't have to scroll every AI thread, track every new tool, or watch every demo.

The Rundown AI breaks it all down for you — the latest AI news, tools, and tutorials in one free 5-minute email every morning.

Trusted by 2M+ professionals at Apple, Google, and NASA.

Sign up to start learning.

👷‍♂️ How to find real jobs for your agents

In recent AInauten sessions we are seeing an interesting pattern: the technology is often no longer the bottleneck. Creating skills, connecting files, testing MCP servers, starting browser agents... all of that now works surprisingly well.

The break comes after that: what is the agent actually supposed to do for you?

An agent becomes useful once it can handle a recurring task with enough context and clear boundaries, reliably.

So today we want to look at how you can find concrete candidates from your real work or everyday life. For most people, this is where it gets interesting.

Step 1: Prompt "Find my best agent jobs"

Let's get straight into it, without a long preface: copy this prompt into the chatbot you use most.

Analyze my goals, working style, recurring tasks, and tools based on everything you know about me.
If you are not sure, ask me concrete multiple-choice questions.

Then find at least 10 concrete tasks that I could delegate to an AI agent, workflow, or automation.

Rate each candidate by:
- repetition
- time saved
- data access
- risk
- measurability

Choose the 3 best starting candidates.

For each candidate, describe:
- desired outcome
- required context
- allowed actions
- stop rules
- human review point
- success criterion after 7 days

Rules:
No vague ideas. Prefer recurring screen work.
Sending, deleting, paying, and customer contact stay with the human.

Step 2: Prompt "Define the agent job"

Many people have never actively built an agent. Totally fine. It sounds complicated. But because it does not have to be complicated, we will do it together today.

Now choose one agent job from the result of step 1.

Do not pick the biggest, coolest, or technically wildest suggestion. Instead, choose one or more suggestions where you immediately know what a good result would look like.

Good agent jobs often sound simple:

Prepare draft replies for new support tickets in your style every morning
Reconcile open invoices in a Google Sheets list every Friday
Collect questions and feedback from chat and transcript after a webinar
Prepare five social media posts from the bookmarks of the last few days

That is screen work with context. And that is exactly where the first test is worth it.

Use this prompt after you have selected the first candidate:

Take the best [or selected] candidate from the list.
Turn it into a clean briefing (prompt) for a desktop-agent test.

Work like this:
1. Describe a useful test run and the context required for it.
2. List which local files, folders, websites, or tools the agent needs.
3. Write the concrete task for a desktop agent such as Codex or Claude Code/Cowork.
4. Define stop rules: what may be done, and what may not?
5. Define the review result that shows whether the test worked.
6. Create a checklist to repeat the run cleanly.

Rules:
- Read and plan first.
- No action without approval. No deleting, no sending, no paying.
- Prefer a small test with evidence over a big automation without control.
- If something is unclear, ask instead of guessing.
- Explain all steps simply and traceably.

Step 3: Set up your first agent

Now it gets practical: to set up the agent, you need to install a desktop-agent app such as Codex by OpenAI (usable even with a free account) or Claude Code/Cowork (from the Pro account upward).

Once you have done that, you can in the desktop app start a new chat and paste in the briefing/prompt from step 2 to begin the agent setup process.

At its core, you will be asked to answer questions, grant access, and confirm actions so the agent can carry out the task you gave it.

You will probably also run into terms such as skills, tools, MCP, and so on. We will sort out what those mean in a moment.

Skill, tool, MCP: sorted cleanly once

The desktop agent needs access, but access alone does not create a clean workflow.

Tool, skill, MCP, and context can all show up in the same setup menu. That can create misunderstandings.

Rule of thumb: tools, MCPs, and connectors say what the agent may access. Skill and context say how it should work there.

A tool grants access. Read Gmail. Check CRM fields. Open a website.
A skill defines behavior. "Write an email" becomes: "Write an AInauten support reply so the customer understands what happens next."
MCPs and connectors are the sockets in between. They connect your agent to Drive, Slack, HubSpot, a database, or the browser.

Quick note here: if you install third-party skills, have them reviewed first with NVIDIA's SkillSpector or another tool so you do not install something questionable from a security perspective.

Our take: agents need clean instructions first

If you want to delegate AI work, you have to become uncomfortably precise. Only when the handoff is painfully clear do agents become useful.

So build one boring agent this week. One you can honestly evaluate after seven days: did it save me 20 minutes every morning? That is all it needs at the beginning.

No cable. No subscription. Every match this summer, free.

104 matches. 48 teams. 39 days of football. Right now, streaming means logging into a cable account you don't have or paying for a subscription you'll cancel in August.

Norton Neo is a free browser with a free built-in VPN. No sign-up, no credit card, no catch. Private by default, backed by Norton security. Anti-fingerprinting and ad blocking run quietly in the background while you watch.

Download in 45 seconds. Watch every match for free.

Fast. Safe. Intelligent. That's Neo.

Try Norton Neo Now

🚨 Claude Fable 5 shutdown makes local models more important

For three days, Fable was the model that made us want to push every other model into the corner. Now it is gone, at least for now.

Fable is a preview of the future

Language note: the linked AInauten background article in this section is in German.

The appeal of Claude Fable 5 was that Anthropic made the new Mythos class broadly available for the first time: 1-million-token context, long autonomous tasks, better self-checking, more patience with coding and knowledge work. Exactly the kinds of improvements where many agents have stumbled so far.

Fable felt less like a chatbot and more like a colleague with a lot of coffee and even more IQ. For us, it was a preview of what AI feels like when it does not need to be led by the hand after every second sentence.

And then: the U.S. government pulled the plug.

On June 12, the Trump administration demanded that Fable 5 and Mythos 5 be suspended for all foreign nationals, even foreign Anthropic employees inside the U.S. Because that was practically impossible to implement cleanly, Anthropic had to shut down Fable and Mythos for all customers. WTF?!

The trigger was reportedly a jailbreak that Amazon reportedly confirmed after a White House request . The issue was said to be a simple vulnerability that can also be exploited in other publicly available models such as GPT 5.5.

Because Amazon is also a major investor in Anthropic, we do not quite understand what is going on. Maybe it also has something to do with the rumors that Chinese actors had access to Mythos - who knows?

In any case, Anthropic apologized for the "disruption" and believes it is a misunderstanding. Today there is a discussion. And maybe it is already running again by the time you read this.

Still: a precedent with an important warning.

Security is the most important topic, especially at the breakneck speed things are moving right now (see our post here).

Anthropic had positioned the Mythos class for months as dangerous, powerful, and security-critical, which justified guardrails, trusted access, and high prices.

Some critics say you should not keep telling everyone that your own AI is a biological-weapons and cybercrime factory if you do not want a government to eventually listen seriously.

There is a grain of truth in that...

But if a narrow jailbreak allegation that cannot be publicly verified is enough to rip a production model globally out of workflows, we all learn a hard lesson: the strongest tool in your stack can disappear tomorrow without you being able to do anything. A kill switch you do not control.

via Reddit

Imitating Fable-level behavior becomes a sport

The mood online quickly flipped from hype and cool use cases to dependency shock. The first reaction is the tinkerer path: take available models, build better harnesses, secure prompts, write loops more cleanly, and partially imitate Fable behavior (see our next piece below).

Jailbreaker Pliny had already broken into the model and extracted the system prompt, which was then used in tinkering mode to rebuild Fable-like behavior with the Opus 4.8 Claude model elsewhere.

Sure, a system prompt does not bring the Fable model back, but the idea is not bad.

Sovereign infrastructure: you have a Plan B that actually runs

If you want to play it safe, you need more than workarounds and hope. And if you want control, you need to be able to run local models - or at least use open competing models such as Kimi 2.7, GLM 5.2, and others.

Sovereign AI infrastructure starts small. A local model for confidential standard tasks. A cloud top model for the heavy work. A simple rule for which task may go where. A folder where context, prompts, and checklists live. A test of whether your most important workflow continues if your favorite model disappears.

Tools like llmfit become more interesting in this moment because they ask the right question: What can run sensibly on your hardware?

The tool detects how much RAM, CPU, and GPU your computer has and rates models by quality, speed, fit, and context. It supports providers such as Ollama, LM Studio and others.

Curious what would run on your machine? See the agent section above and simply tell the Claude or Codex desktop app:

Install https://github.com/AlexsJones/llmfit locally.
Then start it for me.

In the Open-Source Arena.ai leaderboard you will find the open cloud models from Kimi, Qwen, and DeepSeek. They rank high on the list and close to the closed cloud providers, close enough that you should seriously test them.

These tests matter, because a benchmark alone says very little about how well something works for your specific use case.

On the free arena.ai platform, not all frontier models are available. If you want to choose from all models for your tests, you can alternatively use OpenRouter (credit system).

arena.ai for model comparison (free)

openrouter.com for model comparison (all models, credit system)

Our take: Fable was a gift, the cutoff is the lesson

We see the shutdown as a dangerous precedent. Safety needs clear processes. Otherwise, caution (or some other motivation) turns into arbitrariness.

The current Fable drama shows that these rules are still missing. Until they exist, total dependence on closed frontier models is an operational risk.

Of course, we do not want to boycott cloud models. That is where the best models live, and the Fable shutdown hurt precisely because it was so good.

Our pragmatic lesson: store your prompts, skills, project rules, and acceptance criteria in files. Yes, once again: files over tools.

Models can disappear or become insanely expensive. Features get rebuilt. But if your operating mode lives in files, it can travel.

Our ideal setup is becoming clearer and clearer. It is a hybrid:

Frontier models for tasks where the best model matters.
Local models for confidential, recurring, and simple work.
Portable context should always be saved in files.
Prompts, skills, and evaluation rubrics should also live outside individual chats.
And a small emergency test just in case: what happens if your favorite model or account is gone tomorrow?

A few weeks ago, we already had local models on the table as an offline AI travel stack. After Fable, that feels less like a travel trick and more like basic equipment.

P.S. To close, we found a strong tip from Fable on X - in essence: "Sit down regularly and write honestly about your own life, your thoughts, and your wishes, not to react to something, but to see more clearly what you really want and do." 🙏

— # (#)

👌 OpenRouter Fusion: Fable-level output from multiple models? Almost...

While everyone is mourning Fable, OpenRouter is presenting an interesting thesis: Maybe one single model no longer has to do everything.

OpenRouter Fusion is the answer to that. At its core, it is a small model panel.

You ask a complex question, several models work on it in parallel, a judge model compares the answers and returns a structured analysis: consensus, contradictions, blind spots, individual good observations. Then the final answer is built from that.

Language note: the linked AInauten background articles in this paragraph are in German.

We have already written several times about Mixture of Experts (MoE), multi-persona prompts and LLM council.

This approach is much closer to good teamwork than classic routing.

Routing asks: Which model should get this task?
Fusion asks: Which models should work on this task together?

And all of that runs nicely in the background through the chat interface or the API. Nice.

OpenRouter is selling the launch accordingly aggressively. In the post on X OpenRouter calls Fusion the "smartest compound model in the market".

And claims that Fusion reaches Fable level at half the price. Ehm...

In the benchmark post it becomes clear that these numbers need context. This is OpenRouter's own test on 100 deep-research tasks.

OpenRouter also writes that seven Fable tasks were not scored because of content filters.

So please do not immediately post "Fable has been replaced!" on LinkedIn 😄 ...

— # (#)

Our take: Fusion is another tool in your toolbox

We see Fusion as an important signal, and it fits today's issue. After Fable, a few things matter more from our perspective: local AI, sovereign hybrid AI setups, and model orchestration.

For short prompts, Fusion is overkill. But it is interesting for tasks where perspectives matter: research, decision prep, tool comparisons, market analysis, strategy questions, first-pass legal sorting, medical research, complex product decisions, and so on.

Fusion is useful wherever a single answer is dangerously convenient.

The best AI workflows increasingly remind us of a small editorial team. Several voices in, contradictions made visible, one good synthesis out.

That is how good teams work too.

Done. See you in the next issue.

Reto & Fabian from AInauten

🌠 Please rate this issue:

Your feedback is our rocket fuel - to the moon and beyond!