
Shutterstock
Sesame’s new AI voice technology has achieved something remarkable — it sounds so human that testers can’t distinguish it from real people. The startup’s recent demo is generating both amazement and discomfort among those who’ve experienced it.
Benj Edwards, Senior AI Reporter at Ars Technica, spent nearly half an hour conversing with Sesame’s Conversational Speech Model (CSM). “The synthesized voice was expressive and dynamic, imitating breath sounds, chuckles, interruptions, and even sometimes stumbling over words and correcting itself. These imperfections are intentional,” Edwards noted after his 28-minute conversation.
This level of realism isn’t accidental.
On their website, Sesame explains their ambition: “At Sesame, our goal is to achieve ‘voice presence’ — the magical quality that makes spoken interactions feel real, understood, and valued. We are creating conversational partners that do not just process requests; they engage in genuine dialogue that builds confidence and trust over time. In doing so, we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding.”
The technology appears to be creating emotional responses that go beyond typical interactions with voice assistants. One tester on the Hacker News forum described their experience: “It was genuinely startling how human it felt. Apparently they are planning on open-sourcing some of their work as well as selling glasses (presumably with the voice assistant). I’m very excited to have a voice assistant like this and am almost a bit worried I will start feeling emotionally attached to a voice assistant with this level of human-like sound.”
Curious listeners can try the demo here.
Uncanny Valley of Voice
PCWorld senior editor Mark Hachman had a particularly strong reaction to the technology. “Fifteen minutes after ‘hanging up’ with Sesame’s new ‘lifelike’ AI, and I’m still freaked out,” he reported.
While the technology represents a breakthrough in voice synthesis, it raises significant security concerns. The most obvious risk? Scams. Edwards points out the troubling implication: “As synthetic voices become increasingly indistinguishable from human speech, you may never know who you’re talking to on the other end of the line.”
This development comes as voice cloning technology has already been used in several high-profile fraud cases over the past year — including incidents where family members believed they were speaking to kidnapped relatives.
Sesame hasn’t yet announced when this technology might be commercially available.