Voice cloning is reality. That’s how Adrien Brody spoke with a Hungarian accent in The Brutalist, how opera singer Maria Callas got her voice back after illness, and how Richard Nixon delivered an alternative speech about the Moon landing — the one written in case the mission failed.
All these are cases of the Ukrainian company Respeecher, which creates synthetic voices for film, TV, advertising, and music. Respeecher was widely discussed ahead of the Oscars — when it turned out that their technology had been used in two films competing for the award. And now we’ll talk about them.
Film critic Natalia Serebriakova sat down with Respeecher Business Development Executive Volodymyr Ovsiienko, who spoke about working with Hollywood, AI, and the ethics of voice cloning.
I first learned about Respeecher thanks to the Oscar campaign — likely as many others did. Tell us, what exactly does your company do, and how was your technology used in The Brutalist and Emilia Pérez?
Respeecher develops technologies for the film industry. We worked on The Brutalist and helped Adrien Brody and Felicity Jones. Lately, there’s been a growing trend in Hollywood to use 100% authentic speech. The industry is tired of actors speaking foreign languages with American or British accents.
There are a few ways to approach this. The first is hiring a native speaker for the role. The second is when actors train extensively with a coach, which may take a year, two, or even three. This is especially relevant for languages like Hungarian — considered one of the hardest in the world. To master it at a native level, you’d need years. So the third option is artificial intelligence, which allows the desired result within a few weeks.
In The Brutalist we worked exclusively with Hungarian lines spoken off-screen — the rest was done by Adrien Brody and Felicity Jones. When we asked the director whether this technology might take jobs away from actors, he replied that, on the contrary, it creates new opportunities. We still need a native speaker at the microphone. So we trained a model of Brody’s voice, and then a native speaker recorded the lines in Hungarian.
How does voice cloning for an actor work?
Usually the process looks like this: we take a trained model and record another person — a performer at the microphone. Then we transfer that performance onto the model so it sounds as natural as possible.
With visual effects, it works differently: first, you film a person from different angles, capturing their facial muscles in motion. Then another actor performs in a special suit with cameras, and dozens of visual artists are needed to recreate the image precisely.
It’s a complex and expensive process. Even with modern technology, creating a realistic scene still requires a lot of time and resources.
When did you found the company, and how did it all start?
The company was officially founded in 2018, but the work on the technology began in 2016. If you were in Ukraine at that time, you may remember the buzz online — everyone was saying that Kuzma (Andriy Kuzmenko, frontman of Skryabin – ed.) was still alive. That was an early prototype of our technology: at a hackathon, the team recreated Skryabin’s voice.
The technology was developed by two Ukrainians. They built a prototype, recreated the recording, and when they saw the huge reaction online the next day, they realized the potential. Later, at a conference, they met their third co-founder, an American from Grand Reaber, who also became passionate about the idea. In 2018, the three of them founded the company together.
For every voice we train, we have written consent. Some just take other people’s data and claim it as their own. But companies like ours have been working under ethical speech synthesis standards since 2016, before regulations even existed. Intuitively, we understood that it was wrong to use someone’s voice without permission, even for non-commercial or PR purposes. We’ve received hundreds of requests, for example, to create a Donald Trump impersonator for a comedy show. But we decline such projects — we don’t work with manipulative content.
What products do you provide for the film industry?
Our core specialization is speech synthesis technology. We developed a system that allows voice conversion in real time: when one person speaks into a microphone, their voice can sound like a completely different person, preserving all nuances — intonation, accent, rhythm. This became the foundation of our brand.
We also create authentic voice clones that neither human listeners nor algorithms can distinguish from the original. Today we work with all major Hollywood studios, and our technology has been used in over 180 projects.
For example, in The Mandalorian we worked on young Luke Skywalker, and in Obi-Wan Kenobi we recreated the voice of Darth Vader. More recently, we worked on Here with Tom Hanks and created voices for characters in Alien: Romulus.
Most of our clients are from Hollywood, but we don’t limit ourselves to that. We also actively work with record labels. Singers usually have more concerns than actors, since for them the voice is the only medium. But AI can take over routine tasks. For instance, during world tours singers often have to record promotional texts and appearances for radio and TV — not the most enjoyable part of their job. With AI, this process can be automated or delegated to another performer.
Your company received an Emmy and is connected to the Oscars. Can you tell us more about these awards?
We received an Emmy in 2019 for a project where we recreated Richard Nixon’s voice. Few people know this, but for the Apollo 11 mission, two speeches were prepared: one for a successful Moon landing, and one for a potential failure. We recreated that “alternative” speech with our technology. The project won an Emmy, and everyone in the credits became an Emmy winner.
Now Ukraine has both an Oscar and an Emmy. Personally, I’d really like to add a Grammy.
How does the process of training and cloning a voice work?
We realized that achieving high quality requires human involvement, which is why around 20–30% of our team are sound engineers — we call them Synthetic Speech Artists.
When we get a client request, we first check it for ethics — whether there is permission to use the voice, or if it can be obtained. Then the client provides audio data to train the model.
The synth engineer carefully reviews the dataset and gives feedback:
— whether there’s enough material for a quality model
— whether additional audio is needed (for example, if the client wants the voice to sing or whisper)
Then we launch the training process, which takes one to two weeks. Importantly, we don’t just upload audio into a neural network and wait for the result — we constantly interact with the client, monitor progress, and adjust parameters.
Once training is complete, there are two options for using the model:
#1. Individual projects (for major studios like Warner Bros.). Clients provide a set of recorded lines → we convert them into the desired voice → we select several sound variations → then the post-production team works with the chosen version.
#2. Marketplace, where we host a library of ready-made voices. Available both as speech-to-speech (voice generation from audio) and text-to-speech (voice generation from text). We can assign a model exclusively to a client, so only they have access. They can independently work with the voice — record, edit, regenerate audio.
By the way, did you know that the voice in the Berlin metro belongs to a transgender person, processed with AI?
That’s the first I’ve heard. It’s an interesting practice — training voices on neutral or non-contextual material. For speech-to-speech technology, context isn’t required. What matters is having data on the vocal range, pronunciation patterns, laughter or shouting — this makes it possible to recreate a person’s unique vocal traits. The more such recordings, the more accurate the model.
Have you heard about the Chinese film What’s Next, made with AI, that was shown at this year’s Berlinale?
Yes, I attended a panel with the producer of that film. We had an interesting discussion, and she shared some details. To be honest, I didn’t find the film very compelling — it felt rather primitive. But she admitted they made it in a very short time — just six or seven days. The main goal wasn’t to create high-quality cinema, but to demonstrate how quickly a product could be made, even if imperfect.
You can make a film in six days, but the quality will match that. For truly good content, you need more time and resources: working with multiple models, months of editing and refinement.
What can you say about the competition between ChatGPT and Deepseek?
I don’t have enough technical knowledge to compare them deeply, but as far as I know, Deepseek operates with smaller and less powerful models compared to other companies. Their product can be described as a simplified version, but with some drawbacks. They allow models to be hosted on private servers, which gives flexibility. We, for example, use Amazon for our projects — which means higher costs, but also more reliability.

