• AInauten.net
  • Posts
  • πŸ‘¨β€πŸš€ OpenAI experiments with new model – and strikes gold!

πŸ‘¨β€πŸš€ OpenAI experiments with new model – and strikes gold!

PLUS: Should you be using the new ChatGPT agent?

 

Hello AInauts,

Welcome to the latest issue of your favorite newsletter!

Today, we have another OpenAI special that packs a punch. Between math wonders, agent hype  and a pinch of AI entertainment, we ask ourselves: are we witnessing the birth of true superintelligence? Or are these just smart marketing stunts?

Here's what we have in store for you today:

  • 🐳 OpenAI's secret weapon: an experimental model wins gold

  • πŸ”₯ OpenAI's ChatGPT agent: a real breakthrough or just hot air?

  • 😁 AI-Fun: The funniest home videos, AI edition

Let's dive in, it's getting wild!

🐳 OpenAI's secret weapon - a model experiment that happens to win the gold medal

OpenAI has been the talk of the town again.

But the International Math Olympiad (IMO) is where it gets really exciting, because this time, there was a juicy surprise: a new model from OpenAI scored the gold medal!

What at first glance looks like just another checkmark in a long list of benchmarks actually has wider implications. Let's take a look…

The big bang: AI conquers the mathematical elite and wins gold!

Why is that so amazing? The Math Olympiad requires creative thinking and long written proofs, not just dull arithmetic.

AI models used to fail miserably at something like this. But OpenAI has stepped up with reinforcement learning and more "thinking time" (test-time compute scaling) - and thus made the leap from primary school math level to the Olympic podium with a gold medal in under two years.

This achievement is thanks to a new experimental reasoning model. The model competed under the same rules as the human participants: no internet, no tools, only 4.5 hours per session, and full natural language evidence - and solved 5 out of 6 problems, scoring 35 out of 42 points!

Sam tweeted: "This is a milestone for general intelligence - we're releasing GPT-5 soon, but this is experimental and not coming right away."

Even the OpenAI researchers were surprised by the results. And if you want to do the math yourself, you can do so here on Github.

Example of a solved task …

β€œOne model to rule them all …”

The impressive thing is that it is a new, non-public, non-specialized model - which means it can be applied to all kinds of domains! Not only is AI itself based on math, but math is also the cornerstone of STEM (Science, Technology, Engineering, Math).

An AI that masters this can theoretically also crack quantum physics, invent new math or analyze huge amounts of data from experiments - and help us to explore the secrets of the universe even faster and deeper.

Next year, we may very well see new theorems discovered by AI. Even AI critic Gary Marcus grudgingly admits: "This is impressive."

Prediction markets such as Manifold have been caught off guard: prior to the announcement, the estimate for a gold medal was just 20% - now it's 95%! This shows that AI is progressing quickly and often operates in the shadows.

via Manifold.markets

Google DeepMind was already well positioned last year and has also reached gold level this year. Grok-4, on the other hand, has not yet made it.

Our take: What's next? Exp...ohhhh...nentially!

This is not a niche victory - it's accelerating the path to ASI (Artificial Superintelligence) before 2027. And the curve only goes exponentially from here on out. And that's what's keeping us up at night …

It is not exactly conducive to our peace of mind that over 40 leading researchers from OpenAI, Google Deepmind and Anthropic have joined forces to issue an important warning (PDF here): "We are losing the ability to understand AI!"

On the other hand, the cautionary voice in the back of our heads is then briefly silenced when we hear promising reports about a β€œuniversal cancer vaccine”.

via ChatGPT

πŸ”₯ OpenAI's ChatGPT agent: A real breakthrough or just hot air?

OpenAI has launched ChatGPT agent and (almost) everyone is freaking out. But honestly, is this really the game-changer they're selling it for? Spoiler: Rather not.

After weeks of hype about agents, OpenAI finally delivers. ChatGPT agent is supposed to be able to do everything: Operate websites, research, create presentations, even shop online. It doesn't sound bad - until you take a closer look.

What the agent can really do (and what it can't)

Yes, the system combines Operator, Deep Research and ChatGPT into one unit. Yes, it can theoretically process complex workflows. Yes, you will soon be able to test it in your ChatGPT Plus account.

The technical toolbox is quite impressive, and the system should automatically select the best tool for each task:

  • Visual browser for graphical interfaces,

  • text-based browser for simple web queries,

  • terminal access and

  • direct API access.

  • Plus connectors to Gmail, GitHub and co.

OpenAI LiveStream

Research & Action

Slideshows

Spreadsheets

It can do everything you can do with a keyboard and mouse. Credit card entries, banking, sending e-mails. And that, of course, also comes with an increased risk profile.

It almost seems like OpenAI was afraid of what they were building. Half of the announcement was about prompt injections, and the agent was categorized as "High Biological Risk" - was it a categorization to get more media attention, or are there real risks? Who knows…

Alternative agents such as Genspark and the like are much better

But reality still has a few surprises in store: if you search on X for "ChatGPT agent vs.." you'll quickly realize that the only agent ChatGPT has outperformed here is the previous Operator agent which you were able to get with the $200 OpenAI plan...

ChatGPT agent is slow, unreliable, gets stuck and delivers sub-par results. Alternatives such as Genspark, Manus, etc. are currently clearly ahead.

Just two weeks ago, OpenAI highlighted GenSpark's Super Agent, which made a whopping 36 million dollars in sales in just 1.5 months. "Thanks for the publicity", Genspark must have thought - and is dialing up the marketing efforts.

They're betting around $1 million in prize money that their results will be better than ChatGPT agent's! We love guerrilla marketing campaigns like this - keep 'em coming, it's great entertainment.

P.S. We've long been fans of Genspark and ChatLLM. Ultimately, it's all about having proper processes - and building proven and reliable automations with AI.

ChatGPT agent is of course not yet available in the EU ...

Oh yes, of course the new feature is not available for the EU region. However, it will be rolled out to all Plus users in the next few days.

So if you have a paid ChatGPT account, the same applies as always in such cases: Activate VPN and test with a US IP address!

Or you can also give Mistral a try. Deep Research is now an option, and you can use Magistral (Download) for reasoning, Devstral (Download) for coding and Voxstral (Download) for voice.

The next logical step for Mistral is also obvious: to pack everything into one agent!

Our take: wait and see

OpenAIs agent is an impressive tech demo. But a tool that makes you more productive? Hmm... somehow it doesn't seem ready for prime time in all areas yet.

But it gives a taste of the future, even if it is still 1-2 years a few months away for OpenAI. In any case, the positive aspect is that the large-scale rollout will provide OpenAI with an incredible amount of data that can be used to improve the system.

And it is also certain that the more ChatGPT can do by itself (πŸ‘‹ hello, GPT-5!), the less you need other tools. And it's no coincidence that OpenAI will soon be launching its own browser (codenamed "Aura"). Exciting times!

😁 AI-Fun: The funniest home videos, AI edition

To close out this issue, a bit of fun, to show you that AI videos are definitley not boring!

We made it! But no need to be sad. The AInauts will be back soon, with new stuff for you.

Reto & Fabian from the AInauts

P.S.: Follow us on social media - it motivates us to keep going 😁!
X, LinkedIn, Facebook, Insta, YouTube, TikTok

Your feedback is essential for us. We read EVERY comment and feedback, just respond to this email. Tell us what was (not) good and what is interesting for YOU.

🌠 Please rate this issue:

Your feedback is our rocket fuel - to the moon and beyond!

Login or Subscribe to participate in polls.