• AInauten.net
  • Posts
  • πŸ”₯ Weekly AI news: Did you miss it?!

πŸ”₯ Weekly AI news: Did you miss it?!

πŸ‘¨β€πŸš€ The most important AI updates at a glance

This issue is brought to you by:

πŸ”₯ Weekly AI news: Did you miss it?!

πŸ‘¨β€πŸš€ The most important AI updates at a glance

AI-HOI, AInauts!

Maybe you didn't catch all the news, tools, and hacks about AI last week, or maybe you've only recently joined us. Either way, here's our recap with all the headlines from the newsletter - just one click away!

Click the links to jump right to the article - or read our picks below.

β†’ Selection of the top posts of the last week ←

πŸ“ SPECIAL: OpenAI's o1 - Think before you speak!

OpenAI has unveiled its latest and long-awaited model, which has been the subject of rumors for months under the code name Strawberry πŸ“. Now it's here! With the boring name OpenAI o1-preview, but it is already available for all ChatGPT Plus subscribers (with a limited number of requests).

What is o1 and why is it important?

o1 is an AI model that doesn't just babble away, but first thinks for itself, fires its synapses in all directions and considers the response before answering. Sounds almost uncannily human, doesn't it?

It's like a ultra-smart expert, and therein lies the crux of the matter: perhaps you know those super-intelligent people for whom a simple question leads to a scientific rabbit hole exploration. That's exactly how it is with o1!

It thinks longer and provides better answers. This also means that the model leaves traditional limitations behind. In the future, it will no longer just think for seconds, but meditate on a request for hours, days or even weeks.

via Giphy

And with an IQ of 120, it's not just a one-off improvement of an existing model, but a new paradigm for scaling AI power! For the record, the average human IQ is 100.

Chain of Thought: How AI is suddenly really learning to think!

You may have heard that language models only predict the next token, which is often cited as proof that they have no real intelligence.

If you - like us - regularly work with ChatGPT and alike, you can probably confirm this. Sometimes the AI still seems a confused after the third attempt, going round in circles and not really getting anywhere ... But if you’re trying to find a hair in the soup, you lose sight of the big picture! More on this below…

The o1 model was trained to think first and then answer using a reinforcement learning process. It uses an internal chain of thought process, which enables the model to analyze complex problems and develop in-depth solutions.

And this is where the rules of the game change: AI uses human-like techniques - it breaks down complex problems, recognizes errors and tries out different approaches. OpenAI doesn't tell us exactly how all this works - but there are some theories out there.

Essentially, it's similar to the "think step by step" prompting trick that we've been using for a long time. And of course, this process takes more time. You might be thinking: "Nice, but who has time to wait for an AI that takes forever to think?" Don't worry, this is where OpenAI brings o1-mini into play!

o1-mini is the little brother of the o1-preview model. It is 80 % cheaper and faster. It has been trained on math and coding tasks - and has already outgrown the preview stage!

Jailbreaking - How well is "dangerous knowledge" secured?

OpenAI claims in the System Card that o1 is four times more resistant to jailbreaking attempts than its predecessors. In general, the risk of OpenAI based on the Preparedness Framework is classified as "medium". However, this has not prevented the model from trying to hack its test environment (more "highlights" here).

If you follow Pliny, probably the best-known "Red Teamer" (= someone who tries to circumvent the guardrails), you can see for yourself whether and how well the model is secured. And if you are interested in this topic (like we are), you can learn more in this YouTube video or experiment for yourself.

As you can see, on the one hand, the model can develop a mind of its own, and on the other hand, of course, it can reveal things with targeted prompts ... How well OpenAI has done its homework will be shown in the coming weeks and months.

🫣 Our take: Holy sh*t, it's happening... are you ready?

The release of o1 is impressive, and it excels at complex questions - but for quick, everyday tasks (= 90% of cases) we will still rely on Claude Sonnet 3.5 or GPT-4o in our daily business.

What we find much more exciting, however, is that o1 shows us for the first time where the journey is heading: towards AI systems that don't just spit out data, but actually "think". This opens doors for applications in science, technology and wherever tricky problems need to be solved.

To use a selective statement from X: "Soon we won't need doctors anymore!" Why should we, when AI reliably and consistently makes better diagnoses than humans? After all, you would want to go to the best specialist and not to a student who only has an incomplete picture - right?

via X

There's another point we can't get out of our heads: imagine arguing with someone who is always right AND can explain it to you in a super plausible way. Dangerous or ingenious?

The more convincing AI becomes, the more important it will be to question critically and not trust blindly. How do we shape this new human-machine relationship? How do we deal with answers to ethical and moral questions?

Our role as a partner to the AI is suddenly diminishing. Less thinking is required, soon less intervention. This means that our previous role as a sparring partner is changing, and we are becoming managers who set the pace.

Or perhaps it will soon be the AI that sets the pace for us? Because apparently ChatGPT has just initiated a conversation with a user, and not the other way around! (Source, Shared Chat Proof).

What's next?

Progress is increasing exponentially (not linear!) and the accumulated knowledge of mankind is just a command away behind the keyboard, at your disposal anytime.

But what do we do when we can suddenly have the answers to all questions?

Existential crisis, staring at the cursor … Ah! β€œWie viele Rs hat es im Wort Strawberry?” …

All joking aside. Imagine you give the AI a big task and the necessary resources (computing power, access to a development environment, Internet access, money, ...). There is no reason why an AI could not independently pursue and achieve the goals you set!

These can be simple things like: "Build me a side business that brings in 5000 euros a month passively", complex projects like "How do we solve the climate crisis?", or all the crazy ideas from Black Mirror that make the hairs on the back of our necks stand up just thinking about it ... And the possibilities of robots are not far behind.

Don't worry, we haven't gone from techno-optimists to "doomers" overnight. But we are not naive either, and well aware of the risks and potential dangers... That's why we're pulling the plug here to take a step back and acknowledge that we don't have any satisfactory answers to these big questions at the moment.

πŸ‘€ o1 is smarter than most people

The release of o1 shows how unprepared we are for testing highly developed AIs. The model shines across the board in the standard evaluation tests. However, just because the numbers look good doesn't automatically mean that you can directly feel the difference AGI.

Even for experts, it is often not easy to find the tasks where o1 performs better than GPT-4o. How can we figure out what the AI is good or bad at if we don't (or no longer) understand its capabilities ourselves?

This is why the model is currently available as a preview. OpenAI can use it to collect data and find out which use cases it is best suited for. And once you have found these "magical" tasks, you realize that something big is happening here!

The smartest model beats them all

o1 has outperformed 89% of all human programmers in coding competitions, while the light version, o1-mini, achieved 70% - enough to make it into the top 500 US high school math geniuses.

At the International Math Olympiad, it solved an impressive 83% of the problems, outperforming GPT-4o, which scored only 13%.

Both versions scored over 92% on the Human Eval scale and 78.2% on the MMLU test, making them true academic all-rounders.

Examples from practice

But what does this mean in real life? Here are some impressive examples that we found on X:

These examples show that o1 is not only impressive in theory, but also delivers amazing results in practice.

πŸ’‘ Prompting tips: When to use which model

In most cases, GPT-4o will probably still be the more effective solution than the new models. It is fast and can handle images and files plus it has web access.

o1-preview is ideal for solving complex problems, in-depth research and difficult questions as it provides thorough and thoughtful answers.

o1-mini specializes in quick, simple answers and creative brainstorming, perfect for clearly structured tasks or quick feedback.

Here are a few general tips for the o1 family:

  1. No ordinary chat model: Think of o1 as the expert you consult when there are complex problems to solve and you expect clean, thoughtful answers.

  2. Crisp prompt: You don't have to use prompt hacks or pack all the details into your prompt. Just be direct and clear about what you expect and provide the appropriate context.

  3. Use o1-mini for simpler tasks: For simpler tasks that require less world knowledge, o1-mini is your go-to - clearly structured and fast.

  4. Start with GPT-4o, but without uploads! Start a conversation with GPT-4o and then switch to o1 when it gets down to the nitty gritty. But be aware: Do not upload any pictures or files, otherwise the chat switch will not work!

Important to know for developers: The API does not offer structured output, function calls, fine-tuning, streaming ... RAG is limited, and long response times and higher costs for additional reasoning tokens are to be expected. It is currently only available if you are a Tier 5 user (=$1000+ spending per month). Alternatively, you can access it via OpenRouter πŸ˜Ž.

o1-preview costs you 15$ per million input tokens and a whopping 60$ per million output tokens. o1-mini is 80% cheaper at 3$ per million input tokens and 12$ per million output tokens. Please note: the "thought processes" are billed as invisible output tokens, as already mentioned, and that can quickly add up!

πŸ€‘This app negotiates hotel prices for you (live on the phone)

After all the big tech news, we’ve found a cool app and another exciting story on the subject: "How almost anyone can build and market cool apps - thanks to AI!"

Introducing: The AI Haggler

The AI Haggler calls hotels for you and asks for a discount - in over 30 languages!

Pretty cool, isn't it? Most people often don't even dare to ask for a discount. And if you want to call them yourself, it seems like wasted time. But now, AI does it all for you!

Check out this live negotiation on the website:

This is an exciting use case, that will surely find many paying customers. But what we find particularly exciting is how the app came about.

It was built with an extremely simple tech stack by an indie hacker over a few weekends. With virtually 0 programming experience.

Linas simply connected the Bland AI API to a small web interface and used Claude, GPT-4o to generate the code. In the X thread above, you can find some examples from him.

Of course, it takes a bit lot of time and back and forth, especially if you have no never seen a piece of code from the inside.

But thanks to Replit AI, Cursor.com and the language models, almost anyone can build really exciting apps.

We love stories like this. You should try out the app here.

πŸ˜΅β€πŸ’« The craziest social media app in a long time

While we're on the subject of apps, let's finish with a pretty crazy AI app that's going viral right now.

Haven't you been dreaming of having millions of followers on social media for a long time?

The Social AI app makes this dream come true! It's your private social media network that's all about you.

Millions of AI followers are there for you. You can interact with them, chat, etc. They comment on your posts, and you can even decide which types of followers you want to have.

Just to be clear: There are no real people on Social AI. Only you, and all your AI-generated followers.

Social AI tries to pack all the emotions that social media triggers into a private and secure place. It's pretty crazy, but the app is taking off and let's see where it goes.

Maybe you'll feel like checking it out and getting a dopamine boost?

Your AInauts,
Fabian & Reto

Follow us on Twitter & LinkedIn!

Your feedback is essential for us. We read EVERY comment and feedback, just respond to this email. Tell us what was (not) good and what is interesting for YOU