Weird A.I. Yankovic, a cursed deep dive into the world of voice cloning

In the parallel universe of last year’s Weird: The Al Yankovic Story, Dr. Demento encourages a young Al Yankovic (Daniel Radcliffe) to move away from song parodies and start writing original songs of his own. During an LSD trip, Al writes “Eat It,” a 100% original song that’s definitely not based on any other song, which quickly becomes “the biggest hit by anybody, ever.”

Later, Weird Al’s enraged to learn from his manager that former Jackson 5 frontman Michael Jackson turned the tables on him, changing the words of “Eat It” to make his own parody, “Beat It.”

This got me thinking: what if every Weird Al song was the original, and every other artist was covering his songs instead? With recent advances in A.I. voice cloning, I realized that I could bring this monstrous alternate reality to life.

This was a terrible idea and I regret everything.

Covers of Weird Al's "Eat It" single and "Even Worse" albums

Of course, I started with Michael Jackson covering “Eat It,” the Grammy-winning 1984 single that made Weird Al a household name.

Michael Jackson’s song is pitched lower and sung much higher than Weird Al’s parody, so I pitched the vocals up an octave and lowered the entire song by half an octave to try to match the original.

Be warned: you can’t unhear this.

Artifacts aside, it sounds like Michael Jackson doing a Weird Al impression?! Every line has a distinctly “white and nerdy” vibe: it loses any seriousness and edge, exaggerating words for comic effect and enunciating lyrics really clearly so the punchlines can be heard.

I tried six different Michael Jackson A.I. voice models, including one trained on seven hours of vocals over 300 epochs — a fancy word for cycles through the training dataset — but it didn’t make much difference. (Generally, it isn’t necessary to use more than 15 minutes of clean audio for a good model.) The results were mostly the same unholy amalgamation: “Weird Michael” Jacksonkovic.

Here’s the A.I. Michael Jackson covering “Fat,” using a model trained off songs from Destiny, Off The Wall, and Thriller.

But it’s not just Michael Jackson: Weird Al’s distinctive voice and pronunciation makes it hard to replace his vocals with any other A.I.-generated voice.

No current artificial intelligence is powerful enough to hide the weirdness of Weird Al.

The center of the A.I. cover songs community is a massive 500,000+ member Discord called A.I. Hub, where members trade new tips, tools, techniques, and links to their original and cover songs. (Update: Three days after publishing this article, Discord banned A.I. Hub for copyright complaints. See the update at the end of this article.)

Community members also upload the A.I. voice models they’ve trained, adding hundreds of new models daily to a growing database of Discord threads. Musicians are a popular category, but also fictional characters, anime characters, YouTubers/streamers, and celebrities.

A glance at recent A.I. Hub’s voice model threads is a chaotic grab bag: Francoise Hardy, Donald Duck, every member of Korean girl group VCHA, Markiplier, Tom Waits, LeBron James, Knuckles, and, uh, Adolf Hitler.

Screenshot of recent voice model threads in AI Hub

Discussions and links to the models are on Discord, but the files themselves are almost universally found on Hugging Face, a prominent A.I. startup that raised $235M in a Series D round in August at a $4.5 billion valuation from some of tech’s biggest companies, including Google, Amazon, Nvidia, Salesforce, AMD, Intel, IBM, and Qualcomm.

Hugging Face plays a central role in the A.I. music community, providing free and reliable permanent hosting. A.I. Hub now requires Hugging Face link to list a model, and the tool that I used to generate these samples, AICoverGen, suggests using direct links to Hugging Face models in its UI and examples.

Most users just upload models to their own accounts, but some upload hundreds or thousands of models made by others into enormous repositories of A.I. voices: this one account alone has nearly 4,000 voice models, from celebrities and musicians to cartoon characters and YouTube personalities.

The RIAA is very aware of A.I. Hub, and has targeted the community for uploading datasets — the original copyrighted songs used to train voice models — demanding in June that Discord shut it down, remove links to the infringing files, and reveal the identity of uploaders.

Despite their demands, A.I. Hub is still going strong, though put into place strict rules around linking to copyrighted datasets, particularly A.I.-processed vocal separations used to train new voice models.

But the RIAA hasn’t, as far as I can tell, taken any action against the A.I. models themselves or the people making them.

Continuing my descent into Weird A.I. hell, I next tried to get Madonna to cover “Like A Surgeon.”

According to the model’s creator, it was trained on “13 minutes of clean, studio quality acapellas from her 1984 album, Like a Virgin” over 500 epochs. Again, her singing pitch was much higher than Weird Al, so I pitch shifted it up an octave.

It definitely sounds like a female vocalist, but not a very good one, and only vaguely like 1980s Madonna.

Moving into the 1990s, I made the questionable decision to have A.I. Kurt Cobain sing “Smells Like Nirvana,” Weird Al’s 1992 parody of “Smells Like Teen Spirit.” I tried several models, but the best was by a YouTuber named @Cleberslk, who wrote, “Fun fact: I made the model on my phone in a hurry.”

I’m not sure why he has a vaguely European accent, but that’s probably the least offensive thing about it.

Discord and Hugging Face are critical to the A.I. voice cloning community, but there’s another big tech company that plays an important role for many A.I. hobbyists: Google.

Generating audio with these models will work on most PCs with a decent video card, but if you don’t have a compatible GPU or are simply intimidated by a terminal, Google Colab allows anyone to quickly and easily run entire generative A.I. workflows on their servers for free, or upgrade to more powerful GPUs for a small hourly fee.

I’m on a Mac, which doesn’t have an Nvidia GPU required for running inference on these models locally, so I used the Colab notebook for AICoverGen, a powerful package that handles every step of generating A.I. covers from an existing model with a convenient web UI. It took a few minutes to start up, and then under a minute to generate each song.

The simplified view of AICoverGen’s web UI for generating A.I. covers, hiding all the advanced options

This software isn’t difficult to use, but Colab and WebUI interfaces can be imposing for non-technical users. Like with Stable Diffusion and “magic avatars,” a number of startups have moved to launch paid products that fill the usability gap, including Kits AI, Voicify AI, Voiceflip, voicemy.ai, and covers.ai, making simple apps for generating vocal covers with officially licensed voices (or not) or training your own models. It’s only going to get faster and easier.

With his channel There I Ruined It, Dallas musician Dustin Ballard built a following of 3.1 million TikTok followers and 700k YouTube subscribers making absurdist song remixes and mashups. For the last four months, he’s started experimenting with voice cloning, collaborating with a friend-of-a-friend in South America to change his vocal tracks to sound like other singers.

The results have been consistently inspired: The Beach Boys singing Nine Inch Nails’ “Hurt” to the tune of “Surfin’ USA,” Hank Williams doing a twangy “Straight Outta Compton”, and most recently, this ridiculous reworking of Red Hot Chili Peppers’ “Snow (Hey Oh)” with nonsensical lyrics.

Ballard achieves uncanny results by recording entirely new vocal tracks of his own, presumably doing a passable impression of each artist in their vocal range and style, before the A.I. voice cloning is applied.

This allows him to do things that would otherwise be challenging with today’s current technology: applying A.I. to change the lyrics, melody, meter, or intonation to make something wildly different from the original.

At least for now, the best way to pull off this Weird A.I. project in a believable way, without every artist sounding vaguely like Weird Al, would be to get someone to sing Weird Al’s lyrics in a similar range and style as the parodied artist, and then apply the A.I. voice cloning.

But this likely won’t be necessary for long: Singing Voice Synthesis (SVS) and Singing Voice Conversion (SVC) are active fields of study that are moving very quickly, and even in the last six months, we’ve seen major improvements in quality, speed, and ease of use for vocal melody detection and voice changing. For example, the library that Ghostwriter used to mimic Drake and The Weeknd for “Heart on My Sleeve” last April was so-vits-svc, but it’s already largely defunct and archived by the repo owner, replaced by the now-ubiquitous RVC, or Retrieval-Based Voice Conversion.

Academic researchers have already demonstrated that it’s possible to use a neural network to “beautify” vocal tone and intonation, synthesize new vocals from text naturally, and transfer the style to another artist’s voice, opening the door to generating new songs from written lyrics in someone else’s style without any source song to base it off of, or any musical ability at all.

To end this godforsaken project, I made my way into the 2010s with Lady Gaga covering Weird Al’s “Perform This Way,” off his 2011 album, Alpocalypse. I used a model made by @udrivemecrazy, using only five minutes of “super clean acapellas.”

Finally, I chose a song off of Mandatory Fun, Al’s fourteenth and final studio album: Lorde covering “Foil,” Weird Al’s tribute to aluminum foil, loved by home cooks and conspiracy theorists everywhere.

I actually kind of like this one?? But it’s also possible I’m losing my grip on reality.

In addition to being the world’s most beloved song parodist and arguably the most famous accordion player in the world, Al Yankovic is a brilliant songwriter in his own right.

Many of my favorite songs of his are original “style parodies,” riffing off another artist’s style, but not directly parodying a particular song.

Unfortunately, many of the artists that inspired him are unavailable as pre-existing A.I. models. So as much I’d love to hear synthetic versions of Devo’s Mark Mothersbaugh singing “Dare to Be Stupid,” David Byrne singing “Dog Eat Dog,” or James Taylor singing “Good Old Days,” none of these singers are on A.I. Hub, so each would require training a new voice model.

That shouldn’t be a big surprise: after spending some time in A.I. Hub, I get the sense that it skews young, and some of those older artists are maybe off their radar, just based on the voice models, covers, and requests they’re making. My guess that many of those 500,000 users in A.I. Hub are enthusiastic and motivated teenagers.

The vast majority of what happens in A.I. Hub is non-commercial: the models are distributed freely and people are posting their YouTube-hosted A.I. covers constantly, though some people do take paid commissions to train voice models in the #request-a-model channel.

Like with so many conversations around generative A.I., I’m left with big questions around the ethics and legality of these tools. Some artists like Holly Herndon are excited about it and happy for others to use their voice in this way. Some, like Grimes, are okay with commercial use if they get a cut. Others want nothing to do with it, regardless of whether it’s free or not.

I first wrote about audio deepfakes here in April 2020, when Jay-Z asked YouTube to remove several deepfake audio parodies of his voice offline. Those were obvious parodies, but back then I wrote:

“It’s easy to imagine a court finding that many uses of this technology would infringe copyright or, in many states, publicity rights. For example, if a record producer made Jay-Z guest on a new single without his knowledge or permission, or if a startup made him endorse their new product in a commercial, they would have a clear legal recourse.”

That’s now the situation artists are facing with pseudonymous producers like Ghostwriter, who are using the names and voices of well-known artists to drive popularity for a song, making their own music without their knowledge, consent, or compensation. The reaction to “Heart on My Sleeve” from the music industry was swift, issuing takedowns to every streaming platform that he uploaded it to. Ghostwriter followed up with another song last month using A.I. versions of Travis Scott and 21 Savage, uploaded only to X and TikTok. (TikTok removed it quickly, but it’s still up on X.)

The recording industry seem likely to continue clamping down on commercial use of A.I. vocals, but ultimately, I don’t think it will do anything to stop them from being made.

Half a million excited kids are out there in Discord doing their thing, and more are joining every day. No copyright intended.

(Special thanks to Leonard Lin, Simon Willison, and Greg Knauss for their valuable feedback on early drafts of this post.)

Update

Yesterday, Discord permanently banned A.I. Hub, presumably because of repeated copyright violations, only two days after publishing this article.

TorrentFreak’s Ernesto Van der Sar was first to report the story, which cites an unconfirmed report from a newly-created server, which claims that “AI Hub was banned because of copyright, apparently someone did the trick of editing posts and added several links with copyrighted content, which left Discord with no option but to DMCA the server.”

Comments

I wish I had more to say than “thank you for this fantastic post,” but that’s all I’ve got.

All the models on aihub were backed up to https://www.weights.gg/

Also if you have a mac you can use their tool to run inference locally, https://www.tryreplay.io/

I find it easier to get the voice down by making two versions. One with more likeness to the new voice, and one with more likeness to the original song, with better and more clear pronounciation. Then I can fix the first one with the second one at times when the words get muddy. You can also add more range the same way, by doing multiple different versions on slightly different pitch and taking best parts from each.

Also adding some separate original voice lines from the replacing singer can have a big psychological effect too. Something like them saying “hello everyone”, or just “ok” added before the song starts for example.

I wish I could say this stuff is cool and I definitely get the fun that kids get from it. But it’s always at a cost. Cory Doctorow talks about this indirectly in Chokepoint Capitalism but all of this comes at some artists who were underpaid, never cited and frankly just erased from history being even more further removed from the art. I do get a chuckle out of hearing Patrick from Spongebob sing “Umbrella” by Rihanna but I could have also lived without it. And I’m the one who loses the least.

(Orginally posted at https://jacky.wtf/2023/11/bwKT)

Update

Comments

Leave a Reply Cancel reply