Who Needs Humans Anymore: Spotify to Clone Podcasters’ Voices with AI

Technology September 25, 2023

Andrew Burton /Getty

Spotify has unveiled a new feature, powered by ChatGPT developer OpenAI, that clones podcasters’ voices using AI. The company claims the technology will be used for translation purposes, allowing the voices of podcasters to present their shows in languages they don’t speak.

The Verge reports that Spotify’s venture into the realm of AI has led to the creation of a voice translation feature that aims to redefine the podcasting landscape. This feature allows podcasters to seamlessly translate their English-language episodes into Spanish, with plans underway to incorporate French and German translations in the near future. The initial rollout includes episodes from popular podcasters such as Dax Shepard, Lex Fridman, Bill Simmons, and Steven Bartlett, with expansions to include more names like a forthcoming show from Trevor Noah.

who needs humans anymore spotify to clone podcasters voices with ai

LAS VEGAS, NEVADA – APRIL 03: Host Trevor Noah speaks onstage during the 64th Annual GRAMMY Awards at MGM Grand Garden Arena on April 03, 2022 in Las Vegas, Nevada. (Photo by Rich Fury/Getty Images for The Recording Academy)

The essence of this translation feature is rooted in OpenAI’s voice transcription tool, Whisper. Whisper has the capability to transcribe English speech and translate various languages into English. However, Spotify’s feature takes a step further by not only translating the podcast content but also reproducing it in a synthesized version of the podcaster’s voice, allowing listeners around the globe to experience content in a more authentic manner.

Ziad Sultan, Spotify’s vice president of personalization, emphasized the transformative potential of this feature, stating, “By matching the creator’s own voice, Voice Translation gives listeners around the world the power to discover and be inspired by new podcasters in a more authentic way than ever before.” This innovation is poised to bridge the gap between creators and diverse audiences, fostering a sense of connection and understanding through the universal language of music and dialogue.

OpenAI has also been pivotal in the voice replication aspect of this feature. The company has recently announced the launch of a tool that can generate human-like audio from mere text and a few seconds of sample speech. However, the availability of this tool is intentionally restricted due to concerns surrounding safety and privacy, highlighting the ethical considerations inherent in the deployment of such advanced technologies.

Currently, Spotify is testing the translation technology with a select group of podcasters. The details regarding the wider availability and the timeline for the expansion of the feature have yet to be announced.