You don’t need a PhD or VC backing to change the game.
This week, we’re diving into the story of Dia, a new open-source text-to-speech (TTS) model from Korean startup Nari Labs—founded by two undergraduate students with no funding.
And yet, they’ve created a voice model that beats industry giants like ElevenLabs and Sesame in side-by-side tests.
Let’s break it down. 👇
Dia is a 1.6B parameter voice model with powerful capabilities:
- ✅ Emotionally expressive speech (happy, sad, angry, calm)
- ✅ Multi-speaker tagging
- ✅ Nonverbal vocalizations like laughter, coughing, and even screaming
In benchmark tests, Dia outperformed ElevenLabs Studio and Sesame CSM-1B on:
- Timing precision
- Expressive depth
- Handling complex scripts with nonverbal elements.
No lab. No cash. Just hustle.
The Nari Labs team:
- Was inspired by Google’s NotebookLM
- Used Google’s TPU Research Cloud (free compute credits)
- Trained and deployed Dia fully open-source
It’s one of the clearest examples yet that raw talent, paired with access to open tools, can match (or exceed) what the big players are doing.
According to founder Toby Kim, Nari Labs is now building a consumer app that will let people:
- Remix content
- Create social audio
- Use Dia to power dynamic, emotionally rich voiceovers
Imagine a TikTok-like platform, but for expressive AI voices.
This isn’t just a technical feat. It’s a cultural signal.
Sam Altman once tweeted: “You can just do things.”
This is what that looks like in action.
Two undergrads, no budget, no connections—and now they’re on the map with one of the most impressive open-source voice models in the world.
If you're thinking of building something, let this be your wake-up call:
The tools are out there. The moment is now.
Thanks for reading.
Catch you next time with more breakthroughs, big and small.
– The AIDB Today Team