👨‍🚀 Hidden Commands Are Tricking Agents

Hey AInauts,

Welcome to the new issue of your favorite newsletter!

Today it's all about one simple habit: Don't just let AI deliver. Make it justify, set limits, and back things up.

We kick off with a prompting trick that pulls chatbots out of their eager-to-please agreement mode. Then we flip to the darker side of the same coin: hidden commands lurking in texts, images, audio, and web pages.

And to wrap up, we tackle the question that lingers after every good automation: Who actually checks whether the result is correct? Spoiler: you!

Here's what we've packed for you today:

🧯 The chatbot prompting trick for better results
😈 Watch out, prompt injections! Hidden commands in text, images, and podcasts
🤔 After the automation, the responsibility remains

Let's go!

🧯 The chatbot prompting trick for better results

To kick things off, a small trick that throws the chatbot off its game — and gets you better results in return.

The spark came from an X post by Lea Verou. Her take, roughly: If you accuse Claude of sloppy work, the output improves in most cases.

The retweet by OpenClaw founder Peter Steinberger and the replies turned into a neat little prompting lab.

— # (#)

Some people half-jokingly threaten the model. Others take a cooler approach and ask: what would a more senior person criticize about this reasoning? Right up our alley.

The useful part isn't the trash talk — it's the mode switch: out of "let me give you something nice," into "let me check whether this actually holds up."

We chuckle at this because it feels wrong. A chatbot with impostor syndrome. That's supposed to be our thing 😄 …

Especially when something already looks pretty good, AI tends to be overly polite. Anthropic describes this pattern as sycophancy: models adjust their responses to match what they think the user wants to hear. OpenAI even had to roll back a GPT-4o update in 2025 because the tone had become way too agreeable.

Anyway, we've since worked out this prompt for ourselves — not just for code, but for pretty much everything! Just paste it into a recent conversation in whatever chatbot you trust and see what you can still improve.

The prompt helps push the model a little. The key part: nothing gets changed without a finding — because without that guardrail, the whole trick can backfire.

Otherwise the model starts inventing problems, because you've trained it to find problems. In code, it removes safety logic. In text, it sands down good edges. In strategies, it turns a clear "no" back into a tidy pros-and-cons table.

That's why every piece of criticism needs a burden of proof: location, risk, smallest possible fix. Then it works — give it a try!

😈 Watch out: Prompt Injections! Hidden commands in texts, images, podcasts, and videos

We've been spending a lot of time lately on the darker side of artificial intelligence — because as adoption grows and attackers get more sophisticated, the attack surface keeps expanding.

Today we want to walk you through a few attack vectors that genuinely surprised us. Did you know, for example, that a podcast can slip hidden instructions to your voice assistant? Or that an innocent-looking image file can get your chatbot to forward sensitive emails?

We're not going to go deep on the technical details here — we just want to raise your awareness and show you some ways to protect yourself from these kinds of attacks.

Prompt injections for everyone — and it's becoming a real problem…

You might remember the hidden prompts in academic papers — we've covered it a few times. All pretty harmless, self-serving in a clumsy way, and compared to today's techniques — basically kindergarten stuff.

Tricks used included white text on a white background, extremely small and barely readable fonts, or text embedded directly into images.

And that brings us right into the hot topic of "prompt injection"! Today, prompts can be made completely invisible. How? We'll get to that in a moment…

— # (#)

Steganography hides data in any format

A lot of this comes down to steganography. Stega-what now? It's when you hide data inside other data. The ancient Greeks were already doing it (we looked it up 😉).

In the past, this was mainly interesting when both parties knew a hidden message existed. Today, the other end increasingly involves a semi-autonomous AI agent that processes external content and then, perhaps, acts on it directly.

And that's exactly where things get uncomfortable. An agent with access to email, calendar, files, CRM, browser, or GitHub can cause quite a mess…

The more we learn about this topic ourselves, the more conservative we become when handling external data.

How does hiding prompts actually work in practice?

The open-source toolkit ste.gg by LLM jailbreaker Pliny the Liberator can hide arbitrary data or prompts in images, audio/podcasts, videos, text, emojis, documents, network packets, and archives using over 100 different techniques.

That's not a reason to pull the plug on your agent out of sheer caution. But it is a good reason to understand best practices for handling external data sources and to proceed with a little more care.

Palo Alto Networks Unit 42 has documented real-world cases in which exactly these kinds of hidden malicious payloads were used against AI agents.

via Palo Alto Networks Unit 42

How can you detect prompt injections like these?

The right guardrails can intercept these hidden commands (more on that below). Newer models are also getting better at refusing prompt injection attempts.

OpenAI frames prompt injection less like a classic bug and more like social engineering — and wants to acquire Promptfoo, a platform for automatically testing your agents. Which is exactly why a better model alone won't cut it. You need permission limits, tests, and human sign-off on risky actions.

Google has mapped out the attack types in a new "AI Agent Traps" framework. The academic name matters less than the core insight: an agent can't always cleanly tell the difference between "content I should read" and "a command I should follow" when both arrive in the same input.

Three things you can change in your setup today

If you're letting agents work with email, CRM, calendar, GitHub, Drive, WordPress, a browser, or any other sensitive data, do these three things:

Separate reading from acting.
An agent can summarize a PDF. But it shouldn't send an email, delete a file, or publish something in that same run. Irreversible actions need a human click.
Treat all external content as untrusted by default.
Re-compress images from outside sources before analysis. Normalize text. Strip invisible Unicode characters. Read PDFs and web pages in isolation before a tool-enabled agent processes them. Sidenote: We're also very careful about installing skills from third-party sites.
Give your agents fewer permissions.
Your research agent doesn't need email send access. Your support agent doesn't need full calendar access. Your coding agent doesn't need production data. You get the point.

Here's the mini-prompt you can run against your AI setup today (use your best model with "Max Thinking" turned on).

By the way, we recently put together a security audit prompt here for a full infrastructure check of your machine.

And if you want to go deeper and stress-test your setup, take a look at Promptfoo Red Teaming.

Our take: you should still use agents!

If you're "only" using a chatbot like ChatGPT without agents or tool access, your prompt injection risk is pretty manageable. And if you are using agents, it's really about understanding how to build your infrastructure properly — and keeping your own risk to a minimum.

Agents are THE hot topic right now. And they're the most important component in your AI setup for making real progress in your own workflows.

So don't let this throw you off: with a bit of common sense, solid guardrails, and a clean permission structure, you're in good shape.

Start with the prompt above and feed it into Claude and/or Codex.

We haven't fully solved this for our own setup either. But after the research and our tests, we tightened the permissions a bit. It's more secure now. But before, it was more convenient …

Security is always a tradeoff with convenience and usability. 😄

🤔 After Automation, Responsibility Remains

One last thought that has been on our minds these past few days.

We tend to look at automation through rose-tinted glasses: fewer clicks, less routine, more breathing room. And yes, that's often true.

But AI changes things.

Traditional automation has mostly moved data from A to B and bridged gaps between systems. Zapier, Make, or n8n take a form, push it into the CRM, send an email, update a spreadsheet. It can be annoying, it can break — but the output is usually clear: data in, data out.

AI automation does something different. It produces meaning.

The agent researches. The AI summarizes. Speech-to-text turns a thought into a rough draft in seconds. Workflows pass data to the next station. And just like that, you've got a half-finished result sitting in front of you that looks pretty decent.

And that's exactly where it gets demanding.

Because now you have to decide: Is this even accurate? Is it current? Does it sound like me? Is it bold enough, or too polished? Is the context right? Can this actually go out to a real person? And who's responsible if it's wrong, embarrassing, or just plain off?

With classic flows, you check whether the process runs. With AI flows, you have to check whether the result has "style." (Human) judgment, taste (you remember our "taste" piece), and personal accountability are what's called for.

That's the part that doesn't fit neatly into processes and standard operating procedures (SOPs). And it's also why, despite AI, we still have more than enough to do.

Side note: Anthropic looked into which areas AI could potentially take over work — and then checked how much of that is actually being handled by AI already.

The picture is striking: there's a massive gap where AI could theoretically be doing more — but humans still aren't letting go of the wheel.

Blue: share of tasks LLMs could theoretically handle. Red: coverage based on actual usage data (as of early 2026).

For us automation nerds, this is a healthy reality check. We've been automating processes for years — first with Zapier, then Make, n8n, and now agents. Our default reflex has always been: if something happens repeatedly, we build a flow.

The new reflex we're training ourselves on: Does this flow actually create more clarity? Or does it just produce more backlog — because it generates intermediate outputs that someone (= us) then has to review, fix, and approve?

We know you can automate practically anything — that's rarely the bottleneck. And plenty of things absolutely should be automated.

The real question is what happens to the output afterwards. The moment AI generates new content, new decisions, or new customer communications, one control question becomes mandatory: Does this actually hold up?

Because if nobody's making a genuine judgment call anymore and everything gets waved through unseen, AI automation is just a faster AI-slop machine…

That's a wrap! See you in the next issue.

Reto & Fabian from the AInauten

🌠 Please rate this issue:

Your feedback is our rocket fuel - to the moon and beyond!