AInauten.net
Posts
🍓 OpenAI o1-Special: Everything you (really) need to know

🍓 OpenAI o1-Special: Everything you (really) need to know

PLUS: Our take, prompting tips and visions for the future

Die AInauten
September 16, 2024

This issue is brought to you by:

To the Content Hub

AI-HOI AInauts,

Welcome to the latest issue of your favorite newsletter, today with a Strawberry Special. No, this isn't a recipe for a sweet dessert, but our attempt to digest and catch up on the past few days ourselves.

OpenAI has unveiled its latest o1 model, and it's ... impressive. But we didn't want to just parrot the news, we wanted to go deeper to understand what it means for us, and for you. Are you ready for this rollercoaster (plus a few extra memes)?

Here's what we have in store for you today:

🍓 SPECIAL: OpenAI's o1 - Think before you speak!
🫣 Our take: Holy sh*t, it's happening... are you ready?
👀 o1 is smarter than most people
💡 Prompting tips: When you should use which model

Here we go!

🍓 SPECIAL: OpenAI's o1 - Think before you speak!

OpenAI has unveiled its latest and long-awaited model, which has been the subject of rumors for months under the code name Strawberry 🍓. Now it's here! With the boring name OpenAI o1-preview, but it is already available for all ChatGPT Plus subscribers (with a limited number of requests).

via X

What is o1 and why is it important?

o1 is an AI model that doesn't just babble away, but first thinks for itself, fires its synapses in all directions and considers the response before answering. Sounds almost uncannily human, doesn't it?

It's like a ultra-smart expert, and therein lies the crux of the matter: perhaps you know those super-intelligent people for whom a simple question leads to a scientific rabbit hole exploration. That's exactly how it is with o1!

It thinks longer and provides better answers. This also means that the model leaves traditional limitations behind. In the future, it will no longer just think for seconds, but meditate on a request for hours, days or even weeks.

via Giphy

And with an IQ of 120, it's not just a one-off improvement of an existing model, but a new paradigm for scaling AI power! For the record, the average human IQ is 100.

via X

Chain of Thought: How AI is suddenly really learning to think!

You may have heard that language models only predict the next token, which is often cited as proof that they have no real intelligence.

If you - like us - regularly work with ChatGPT and alike, you can probably confirm this. Sometimes the AI still seems a confused after the third attempt, going round in circles and not really getting anywhere ... But if you’re trying to find a hair in the soup, you lose sight of the big picture! More on this below…

The o1 model was trained to think first and then answer using a reinforcement learning process. It uses an internal chain of thought process, which enables the model to analyze complex problems and develop in-depth solutions.

And this is where the rules of the game change: AI uses human-like techniques - it breaks down complex problems, recognizes errors and tries out different approaches. OpenAI doesn't tell us exactly how all this works - but there are some theories out there.

Essentially, it's similar to the "think step by step" prompting trick that we've been using for a long time. And of course, this process takes more time. You might be thinking: "Nice, but who has time to wait for an AI that takes forever to think?" Don't worry, this is where OpenAI brings o1-mini into play!

o1-mini is the little brother of the o1-preview model. It is 80 % cheaper and faster. It has been trained on math and coding tasks - and has already outgrown the preview stage!

Jailbreaking - How well is "dangerous knowledge" secured?

OpenAI claims in the System Card that o1 is four times more resistant to jailbreaking attempts than its predecessors. In general, the risk of OpenAI based on the Preparedness Framework is classified as "medium". However, this has not prevented the model from trying to hack its test environment (more "highlights" here).

If you follow Pliny, probably the best-known "Red Teamer" (= someone who tries to circumvent the guardrails), you can see for yourself whether and how well the model is secured. And if you are interested in this topic (like we are), you can learn more in this YouTube video or experiment for yourself.

As you can see, on the one hand, the model can develop a mind of its own, and on the other hand, of course, it can reveal things with targeted prompts ... How well OpenAI has done its homework will be shown in the coming weeks and months.

🫣 Our take: Holy sht, it's happening*... are you ready?

The release of o1 is impressive, and it excels at complex questions - but for quick, everyday tasks (= 90% of cases) we will still rely on Claude Sonnet 3.5 or GPT-4o in our daily business.

What we find much more exciting, however, is that o1 shows us for the first time where the journey is heading: towards AI systems that don't just spit out data, but actually "think". This opens doors for applications in science, technology and wherever tricky problems need to be solved.

To use a selective statement from X: "Soon we won't need doctors anymore!" Why should we, when AI reliably and consistently makes better diagnoses than humans? After all, you would want to go to the best specialist and not to a student who only has an incomplete picture - right?

via X

There's another point we can't get out of our heads: imagine arguing with someone who is always right AND can explain it to you in a super plausible way. Dangerous or ingenious?

The more convincing AI becomes, the more important it will be to question critically and not trust blindly. How do we shape this new human-machine relationship? How do we deal with answers to ethical and moral questions?

Our role as a partner to the AI is suddenly diminishing. Less thinking is required, soon less intervention. This means that our previous role as a sparring partner is changing, and we are becoming managers who set the pace.

Or perhaps it will soon be the AI that sets the pace for us? Because apparently ChatGPT has just initiated a conversation with a user, and not the other way around! (Source, Shared Chat Proof).

What's next?

Progress is increasing exponentially (not linear!) and the accumulated knowledge of mankind is just a command away behind the keyboard, at your disposal anytime.

But what do we do when we can suddenly have the answers to all questions?

Existential crisis, staring at the cursor … Ah! “Wie viele Rs hat es im Wort Strawberry?” …

All joking aside. Imagine you give the AI a big task and the necessary resources (computing power, access to a development environment, Internet access, money, ...). There is no reason why an AI could not independently pursue and achieve the goals you set!

These can be simple things like: "Build me a side business that brings in 5000 euros a month passively", complex projects like "How do we solve the climate crisis?", or all the crazy ideas from Black Mirror that make the hairs on the back of our necks stand up just thinking about it ... And the possibilities of robots are not far behind.

Don't worry, we haven't gone from techno-optimists to "doomers" overnight. But we are not naive either, and well aware of the risks and potential dangers... That's why we're pulling the plug here to take a step back and acknowledge that we don't have any satisfactory answers to these big questions at the moment.

👀 o1 is smarter than most people

The release of o1 shows how unprepared we are for testing highly developed AIs. The model shines across the board in the standard evaluation tests. However, just because the numbers look good doesn't automatically mean that you can directly feel the difference AGI.

Even for experts, it is often not easy to find the tasks where o1 performs better than GPT-4o. How can we figure out what the AI is good or bad at if we don't (or no longer) understand its capabilities ourselves?

This is why the model is currently available as a preview. OpenAI can use it to collect data and find out which use cases it is best suited for. And once you have found these "magical" tasks, you realize that something big is happening here!

The smartest model beats them all

o1 has outperformed 89% of all human programmers in coding competitions, while the light version, o1-mini, achieved 70% - enough to make it into the top 500 US high school math geniuses.

At the International Math Olympiad, it solved an impressive 83% of the problems, outperforming GPT-4o, which scored only 13%.

Both versions scored over 92% on the Human Eval scale and 78.2% on the MMLU test, making them true academic all-rounders.

via X

Examples from practice

But what does this mean in real life? Here are some impressive examples that we found on X:

A physicist reported that o1 did in one hour what took him almost a year to do during his PhD!
Ammaar from ElevenLabs combined o1 with the Cursor code editor and created a fully functional iOS weather app in less than ten minutes - including animations!
In the medical field, start-ups used o1 to manage complex administrative tasks and offer helpful solutions.
And of course, o1 can also be used to develop fun games - like a 3D game in under a minute, with just one prompt.
We will soon see even more use cases in research, consulting and daily business.

These examples show that o1 is not only impressive in theory, but also delivers amazing results in practice.

💡 Prompting tips: When to use which model

In most cases, GPT-4o will probably still be the more effective solution than the new models. It is fast and can handle images and files plus it has web access.

o1-preview is ideal for solving complex problems, in-depth research and difficult questions as it provides thorough and thoughtful answers.

o1-mini specializes in quick, simple answers and creative brainstorming, perfect for clearly structured tasks or quick feedback.

Here are a few general tips for the o1 family:

No ordinary chat model: Think of o1 as the expert you consult when there are complex problems to solve and you expect clean, thoughtful answers.
Crisp prompt: You don't have to use prompt hacks or pack all the details into your prompt. Just be direct and clear about what you expect and provide the appropriate context.
Use o1-mini for simpler tasks: For simpler tasks that require less world knowledge, o1-mini is your go-to - clearly structured and fast.
Start with GPT-4o, but without uploads! Start a conversation with GPT-4o and then switch to o1 when it gets down to the nitty gritty. But be aware: Do not upload any pictures or files, otherwise the chat switch will not work!

Important to know for developers: The API does not offer structured output, function calls, fine-tuning, streaming ... RAG is limited, and long response times and higher costs for additional reasoning tokens are to be expected. It is currently only available if you are a Tier 5 user (=$1000+ spending per month). Alternatively, you can access it via OpenRouter 😎.

o1-preview costs you 15$ per million input tokens and a whopping 60$ per million output tokens. o1-mini is 80% cheaper at 3$ per million input tokens and 12$ per million output tokens. Please note: the "thought processes" are billed as invisible output tokens, as already mentioned, and that can quickly add up!

We made it! But no need to be sad. The AInauts will be back soon, with new food in the usual format. 🍓🍓🍓

Reto & Fabian from the AInauts

P.S.: Follow us on social media - that motivates us to keep going 😁!
Twitter, LinkedIn, Facebook, Insta, YouTube, TikTok

Your feedback is essential for us. We read EVERY comment and feedback, just respond to this email. Tell us what was (not) good and what is interesting for YOU.

🌠 Please rate this issue:

Your feedback is our rocket fuel - to the moon and beyond!

🍓 OpenAI o1-Special: Everything you (really) need to know

PLUS: Our take, prompting tips and visions for the future

🍓 SPECIAL: OpenAI's o1 - Think before you speak!

What is o1 and why is it important?

Chain of Thought: How AI is suddenly really learning to think!

Jailbreaking - How well is "dangerous knowledge" secured?

🫣 Our take: Holy sh*t, it's happening... are you ready?

👀 o1 is smarter than most people

The smartest model beats them all

Examples from practice

💡 Prompting tips: When to use which model

🌠 Please rate this issue:

🫣 Our take: Holy sht, it's happening*... are you ready?