- AInauten.net
- Posts
- π― OpenAI clones our voices
π― OpenAI clones our voices
PLUS: These AI tools make video editing a walk in the park

Greetings, AInauts!
It's Thursday, and weβve got fresh news, tools, and hacks about AI.
This is what we have in store for you today:
π― OpenAI clones our voices (SOON)
π½ This is how China gets around the AI chip sanctions
π₯ These AI tools make video editing a walk in the park
π° AI news quickie: the HAI-lights
Here we go!
π― OpenAI clones our voices (SOON)
Our friends at OpenAI have once again introduced something fascinating: a new voice model called Voice Engine.
The idea is not new. Computers have been able to speak like humans for a long time. Hi Alexa π.
The exciting thing about the current AI voice wave is that you can use it to clone any voice very quickly.
The voice clone can then be used to say anything based on text input, in almost any language imaginable. And the result is hardly distinguishable from the original.
This is also not new per se, but OpenAI impresses us once again: the voice engine model only needs 15 seconds of voice recording to clone a voice. π€―
That's pretty powerful. You can find some examples in the OpenAI blog.

Unfortunately, we are not able to use Voice Engine yet - as with Sora, there are concerns on the side of OpenAI, especially with he upcoming elections in the US and the spread of deepfakes.
How to clone your voice
But thankfully, you don't have to wait for OpenAI to create a clone of your voice.
We have been using AI-Voices and our own cloned votes for a long time. The best tool for this at the moment is probably ElevenLabs.
It gives you excellent results with a 1-2 minute recording of your voice, and we now have a whole pool of cloned voices:

You have two options for cloning a voice. One is Instant and the other is Professional.

Simply make a 1-2 minute recording of your voice and upload it.
The Professional voice is a little more complex, but also much better in quality. You need a good microphone and a quiet environment.
Inexpensive entry-level microphones include the Audio Technica AT2020 or the Rode NT1. These should work, and you can also fine-tune the voice afterward.
By the way, ElevenLabs has solved the issue of security quite well. With a Professional voice clone, you have to read out a text live during the cloning process. This ensures that you can only clone your own voice.
β We first developed Voice Engine in late 2022 β
Says it all folks.
β Jimmy Apples π/acc (@apples_jimmy)
7:57 PM β’ Mar 29, 2024
Our take: OpenAI is cooking up a lot of tricks that we don't know about.
You can tell that OpenAI really seems to be the most advanced in terms of technology. Voice Engine was already developed at the end of 2022 and has only now been publicly announced!
But they are also so well known, and every update generates a lot of attention and buzz. As a result, they can no longer push new models onto the market as quickly as they used to. And that is an opportunity for start-ups with innovative technology.
π½ This is how China gets around the AI chip sanctions

There are plenty of heated discussions about chips, computing power, NVIDIA, and power consumption. One revolves around the rivalry between the US and China.
China has been somewhat cut off from the latest high-performance chips from NVIDIA due to American sanctions.
But as is so often the case, when pressure grows, it sparks the innovative spirit of the local industry to help itself out of the misery. An example are the new chips from Intellifusion.

They have nowhere near the power of NVIDIA's top models, but are incredibly affordable. To be more precise: 90% cheaper, starting at just under 140 dollars.
For example, to use the new generation of Windows computers (which have Copilot pre-installed locally), Microsoft recommends 40 TOPS. (TOPS describe the computing power of the chips)
The chips from Intellifusion achieve 48 TOPS and aim for 96 TOPS by the end of the year. Not bad!
Intellifusion's goal is not to build the fastest chips in the world - but rather to build "90% cheaper AI hardware for 90% of the relevant scenarios"!
Another example of market dynamics is "the fastest AI in the world", Groq.

Our take: This is an important development that shows Chinaβs resilience!
The hardware debate is a critical topic with far-reaching consequences - be it energy consumption, trade conflicts or the security of supply chains.
However, as is so often the case, we believe that innovative technologies will be the key to solving these challenges in the long term.
The example from China shows once again how adaptable people and the economy can be when the framework conditions change. On the other hand, the earthquake in Taiwan also exposes the worldsβ dependency on NVIDIA, even if they say it will not affect chips supply.
π₯ These AI tools make video editing a walk in the park
We are currently in the process of improving our video output, since we are spending more and more time cutting and editing for social media and our online courses.
That's why we're playing around with various tools, which we'd like to briefly introduce here. And maybe you're interested too?
This is just a short introduction for now, more extensive tutorials will follow.
Ok, let's go! We start with:

The video editor from TikTok. Lots of templates, easy to use, lots of features. Oh yes, and it's even free!
Our favorite feature here is the background remover for videos.

Opus Clip turns your longer video, such as interviews, into several short clips - suitable for TikTok, Insta Reels or YouTube Shorts.
It doesn't just randomly snip clips together, but picks out the most exciting moments, gives them subtitles, focuses on the respective speaker and more.
Then you get everything ranked according to the degree of probable virality. Really well done and helpful!

Let's move on to the last feature in this issue. Descript.
It's probably the most helpful tool for us at the moment for all the videos we're currently producing - and it's packed with AI features:

We can use it to edit our videos by simply rewriting the transcript. Or even use our voice clones to rephrase whole passages where we've talked nonsense. (Yes, we still don't use this enough π β¦)
π° AI news quickie: the HAI-lights
Finally, here are a few easily digestible news tidbits - developers can rejoice, there are plenty of new models.
Metas Ray Ban are getting smarter - the glasses can now also recognize objects, animals, plants etc., but not yet record 3D! The NY Times has tested them.

Automation nerds take note! Zapier has introduced new drag-and-drop workflows and internal databases, as well as Zapier Central for bots - making it even more powerful. And have you tried Zapier Canvas yet?
Zapier feels too expensive? Then check out Make.com, they're going full throttle - or alternatively there's the AI-native solution n8n.io.
A cool voice-to-voice chatbot was introduced by Hume AI, which continuously evaluates and displays the emotion of voice input.

Sakana AI has introduced an exciting approach: "Model Merging" combines existing models to create a new model. This evolutionary approach produces hundreds of new generations of models, and the most successful ones are then the parents for the next generation.
Databricks DBRX is a new open source model that is ahead of other models such as LLaMa2, Mistral and Grok in some areas.
The new Jamba model from A21 convinces with a large context window, and Stability AI doubles up with Stable Code Instruct 3B.
So many models, and yet the world only uses ChatGPT - even in the business environment. And that costs significantly more than expected β¦
a16z just released an enterprise AI report after speaking to dozens of Fortune 500 companies.
Thanks @sarahdingwang for this.
Here are 8 insightful slides:
1. For production use cases, OpenAI still has dominant market share.
β Chief AI Officer (@chiefaioffice)
1:00 PM β’ Mar 24, 2024
With OpenFoundry.ai, you can select and fine-tune the right AI model and use it in your cloud in just a few minutes.
A new technology has been developed at MIT that creates high-quality images at lightning speed!
Diffusion models generate high-quality images but require hundreds of forward passes.
@MIT_CSAIL and @AdobeResearch introduce Distribution Matching Distillation (DMD), a distillation approach that converts costly multi-step diffusion models into fast one-step generators.
Aβ¦ twitter.com/i/web/status/1β¦
β MIT CSAIL (@MIT_CSAIL)
4:41 PM β’ Mar 29, 2024
Thatβs it! But no need to be sad. We will be back soon, with new food for you.
See you soon, your AInauten
π Please rate this issue:Your feedback is our rocket fuel - to the moon and beyond! |
Your feedback is essential for us. We read EVERY comment and feedback, just respond to this email. Tell us what was (not) good and what is interesting for YOU.