A New YouTube channel called Vocal Synthesis is dedicated to publishing audio deepfakes. These audio deepfakes are speech generated by artificial intelligence to mimic the voices of humans. The voices that these AI-generated voices mimic are synthesized by text through training a neural network with state of the art system.
The videos in Vocal Synthetics are remarkable, combining popular voices with dialogues that are unlikely, including Bob Dylan singing in Britney Spears’s voice, Tucker Carlson reading the Unabomber Manifesto, Ayn Rand and Slavoj Zizek singing Sonny and Cher, Bill Clinton reciting the Baby Got Back, as well as JFK touting Rick and Morty’s intellectual merits.
Some of the videos were created by fans who added music into it in order to come up with hilarious musical mashups.
Videos Were Taken Down
For the first time ever since the YouTube channel was created, Vocal Synthesis’s owner, who remains anonymous got a copyright claim on YouTube. Two of his videos published in the channel where Jay-Z’s audio was deepfaked saying the To Be or Not to Be– a Hamlet soliloquy, as well as Bill Joel’s “We Didn’t Start the Fire” were taken down.
The creator of the YouTube channel said that Roc Nation LLC was the one that filed the copyright claims and that the reason for the removal of the video was because the channel’s videos unlawfully use artificial intelligence to impersonate the voice of Roc Nation LLC’s clients.
While both of the videos were removed immediately by the said video-sharing platform, it can still be viewed on the open-source and decentralized publishing platform called LBRY. Meanwhile, videos of Jay-Z with deepfaked audio still remain on the internet. These videos are where he was rapping the Book of Genesis as well as the Navy Seal copypasta.
The creator of the video announced the video’s removal in a creative manner. He used the voices of the U.S. presidents Donald Trump, Barack Obama, JFK, FDR, and Ronald Reagan.
Below is the message that the video contains:
Over the several months that have passed, the Vocal Synthesis’s creator was able to train many speech synthesis models basing them on the patterns of different celebrities and popular individuals. He used these models to make over a hundred videos for the YouTube channel. The videos usually feature a celebrity with a synthetic voice narrating a speech or short text. Most of the time, the texts that are selected are the ones that provide a funny or entertaining contrast with the real-life persona of the featured celebrity.
It seems like the YouTube channel was made by a hobbyist who has a lot of spare time on his hands as well as a huge interest in machine learning as well as AI technologies. It also seems like he wanted to note that all the videos he publishes on his YouTube channel are for entertainment purposes and that there is not a single malicious purpose on any of them. Furthermore, the title of all the videos he publishes has a speech synthesis label in both their title and the description.
Jay-Z Synthetic Voice Gets More YouTube Views?
The synthetic voices in the video also expressed their disappointment about how Jay-Z and Roc Nation LLC were bullying a YouTuber by having his videos taken down. In addition, people were also disappointed that the video-sharing platform would choose to side powerful individuals in stifling the creativity of a small content creator. On top of that, it seems ironic that YouTube accepted “AI impersonation” as a reason for taking down the videos from Vocal Synthesis when Google, the same company that created YouTube, has argued in the “Authors Guild vs. Google” case, saying that machine learning models trained on copyrighted material must be protected under the fair use.
No Intention to Deceive
The controversy that underlies deepfakes is focused on disinformation and deception. For instance, Facebook and Twitter blocked misleading and harmful deepfakes that can cause a bad impact on this year’s elections.
However, the case of Vocal Synthesis is very much different. As he said in his statement, all of the videos he creates and publishes on his YouTube channel are clearly labeled as speech synthesis in both the title and description. This only means that the videos are not intended to deceive anyone and that it falls outside the guidelines of YouTube about manipulated media.
Fair Use and Copyright Claims
Roc Nation LLC has two claims with regards to Vocal Synthesis’s video takedown. First is that the videos were infringing on Jay-Z’s copyright use. On the other hand, the second claim is that the videos were unlawfully using artificial intelligence to impersonate the voice of their client.
The videos published in the Vocal Synthesis YouTube channel were created through training a model with a huge amount of text transcriptions and audio samples. In Jay-Z’s case, the owner of the channel fed the artist’s lyrics and songs into Tacotron 2, a neural network architecture that was developed by the largest search platform Google.
YouTube Views on Jay-Z Push The Site
For this reason, it is quite reasonable that one will assume that an audio synthesis modeled from copyrighted audio would be considered as derivative works. However, the real question is whether it should be considered as copyright infringement or not. In the world where almost everything is copyrighted, it depends on how the AI-generated audio was used and what its purpose really is.
It is very easy to imagine the law finding many copyright and publicity rights infringement using today’s technology. For instance, if a record producer had Jay-Z or other artist’s guest on a single without informing or asking for their permission, a legal recourse would surely happen.
However, as it was pointed out by the Vocal Synthesis creator, derivative works such as his videos must be protected under fair use. Fair use is very complicated to define but there are four ways to weigh it in the court: its purpose, the copyrighted work’s nature, the amount as well as the substantiality of the part taken, and the effect of potential use in the market.
In Vocal Synthesis’s case, the videos he creates are solely for entertainment purposes and are not intended to deceive. For this reason, there s a strong case that his videos do not fall under copyright infringement.