Sesame, a San Francisco-based startup, has unveiled an artificial voice technology of unprecedented realism. This advancement in artificial intelligence (AI) is generating both excitement and concern among early adopters, with some reporting feeling "disturbed" when interacting with the system.
On February 27, 2025, Sesame unveiled its Conversational Speech Model, trained on nearly a million hours of English audio data. Two AI characters, Maya and Miles, are now publicly available for demonstration via the company's research blog. The technology aims to achieve what Sesame calls "vocal presence," a voice quality so natural that it becomes indistinguishable from a human voice.
Founded by Oculus co-founder Brendan Iribe, Sesame has focused its efforts on four key areas: emotional intelligence, conversational dynamics, contextual awareness, and personality consistency. The goal is to create voice interactions with computers that are as natural and fluid as with a human being.
Read also – This AI can imitate your voice from a few seconds of recording, it's scary
Mixed reactions to this innovation
Reactions from users and industry experts are mixed. Sean Hollister of The Verge called this technology "the first voice assistant that I want to talk to more than once." Shopify CEO Tobi Lutke has publicly hailed the innovation as "absolutely incredible." However, some users, like PCWorld's Mark Hachman, have reported an uneasy feeling about the uncanny realism of these artificial voices.
Sesame plans to pair this voice technology with lightweight AI glasses, providing "convenient access to a companion who can observe the world alongside you." The prospect raises questions about privacy and the pervasiveness of AI in our daily lives.
The potential applications for this technology are vast, from improving call centers to learning languages. Sesame plans to expand support to over 20 languages and open-sourcing some key components under the Apache 2.0 license. Sesame's breakthrough suggests that voice-centric interfaces could define the next wave of human-machine interaction, for better or worse.
0 Comments