Skip to content
Waxy.org
About
Mastodon
Contact

With questionable copyright claim, Jay-Z orders deepfake audio parodies off YouTube

Posted April 28, 2020April 29, 2020 by Andy Baio

On Friday, I linked to several videos by Vocal Synthesis, a new YouTube channel dedicated to audio deepfakes — AI-generated speech that mimics human voices, synthesized from text by training a state-of-the-art neural network on a large corpus of audio.

The videos are remarkable, pairing famous voices with unlikely dialogue: Bob Dylan singing Britney Spears, Ayn Rand and Slavoj Žižek dueting Sonny and Cher, Tucker Carlson reading the Unabomber Manifesto, Bill Clinton reciting “Baby Got Back,” or JFK touting the intellectual merits of Rick and Morty.

Many of the videos have been remixed by fans, adding music to create hilarious and surreal musical mashups. Six U.S. presidents from FDR to Obama rap N.W.A.’s, Fuck Tha Police, George W. Bush covers 50 Cent’s In Da Club, Obama covers Notorious B.I.G.’s Juicy, and my personal favorite, Sinatra slurring his way through the Navy Seal copypasta, a decade-old 4chan meme.

Videos Taken Offline

Over the weekend, for the first time, the anonymous creator of Vocal Synthesis received a copyright claim on YouTube, taking two of his videos offline with deepfaked audio of Jay-Z reciting the “To Be or Not To Be” soliloquy from Hamlet and Billy Joel’s “We Didn’t Start the Fire.”

According to the creator, the copyright claims were filed by Roc Nation LLC with an unusual reason for removal: “This content unlawfully uses an AI to impersonate our client’s voice.”

Both videos were immediately removed by YouTube, but can still be viewed on LBRY, a decentralized and open-source publishing platform. Two synthetic Jay-Z videos remained online, in which he raps the Book of Genesis and the Navy Seal copypasta. (Update: The videos were temporarily reinstated by Google. See updates below.)

The video’s creator announced the takedown in a creative way: using the voices of Barack Obama, Donald Trump, Ronald Reagan, JFK, and FDR.

Here’s an excerpt transcript from the video:

“Over the past few months, the creator of the channel has trained dozens of speech synthesis models based on the speech patterns of various celebrities or other prominent figures, and has used these models to generate more than one hundred videos for this channel. These videos typically feature a synthetic celebrity voice narrating some short text or a speech. Often, the particular text was selected in order to provide a funny or entertaining contrast with the celebrity’s real-life persona.

“For example, some of my favorites are George W. Bush performing a spoken-word version of “In Da Club” by 50 Cent, or Franklin Roosevelt’s powerful rendition of the Navy Seals Copypasta.

“The channel was created by an individual hobbyist with a huge amount of free time on his hands, as well as an interest in machine learning and artificial intelligence technologies. He would like to emphasize that all of the videos on this channel were intended as entertainment, and there was no malicious purpose for any of them.

“Every video, including this one, is clearly labeled as speech synthesis in both the title and description. Which brings us to the reason why we’re delivering this message.

“Over the past two days, several videos were posted to the channel featuring a synthetic Jay-Z rapping various texts, including the Navy Seals Copypasta, the Book of Genesis, the song “We Didn’t Start the Fire” by Billy Joel, and the “To Be Or Not To Be” soliloquy from Hamlet.

“Unfortunately, for the first time since the channel began, YouTube took down two of these videos yesterday as a result of a copyright strike. The strike was requested by Roc Nation LLC, with the stated reason being that it, quote, “unlawfully uses an AI to impersonate our client’s voice.”

“Obviously, Donald and I are both disappointed that Jay-Z and Roc Nation have decided to bully a small YouTuber in this way. It’s also disappointing that YouTube would choose once again to stifle creativity by reflexively siding with powerful companies over small content creators. Specifically, it’s a little ironic that YouTube would accept “AI impersonation” as a reason for a copyright strike, when Google itself has successfully argued in the case of “Authors Guild v. Google” that machine learning models trained on copyrighted material should be protected under fair use.”

No Intent to Deceive

At its core, the controversy over deepfakes is about deception and disinformation. Earlier this year, Facebook and Twitter banned deepfakes that could mislead or cause harm, largely motivated by their potential impact on the 2020 elections.

Though it’s worth nothing that the use of deepfakes for fake news is largely theoretical so far, as Samantha Cole covered for VICE, with most created for porn. (And, no, Joe Biden sticking his tongue is not a deepfake.)

In this case, there’s no deception involved. As he wrote in his statement, every Vocal Synthesis video is clearly labeled as speech synthesis in the title and description, and falls outside of YouTube’s guidelines for manipulated media.

Copyright and Fair Use

With these takedowns, Roc Nation is making two claims:

  1. These videos are an infringing use of Jay-Z’s copyright.
  2. The videos “unlawfully uses an AI to impersonate our client’s voice.”

But are either of these true? With a technology this new, we’re in untested legal waters.

The Vocal Synthesis audio clips were created by training a model with a large corpus of audio samples and text transcriptions. In this case, he fed Jay-Z songs and lyrics into Tacotron 2, a neural network architecture developed by Google.

It seems reasonable to assume that a model and audio generated from copyrighted audio recordings would be considered derivative works.

But is it copyright infringement? Like virtually everything in the world of copyright, it depends—on how it was used, and for what purpose.

It’s easy to imagine a court finding that many uses of this technology would infringe copyright or, in many states, publicity rights. For example, if a record producer made Jay-Z guest on a new single without his knowledge or permission, or if a startup made him endorse their new product in a commercial, they would have a clear legal recourse.

But, as the Vocal Synthesis creator pointed out, there’s a strong case to be made this derivative work should be protected as a “fair use.” Fair use can get very complicated, with different courts reaching different outcomes for very similar cases. But there are four factors judges use when weighing a fair use defense in federal court:

  1. The purpose and character of the use.
  2. The nature of the copyrighted work.
  3. The amount and substantiality of the portion taken.
  4. The effect of the use upon the potential market.

There’s a strong case for transformation with the Vocal Synthesis videos. None of the original work is used in any recognizable form—it’s not sampled in a traditional way, using an undisclosed set of vocal samples, stripped from their instrumentals and context, to generate an amalgam of the speaker.

And in most cases, it’s clearly designed as parody with an intent to entertain, not deceive. Making politicians rap, philosophers sing pop songs, or rappers recite Shakespeare pokes fun at those public personas in specific ways.

Vocal Synthesis is an anonymous and non-commercial project, not monetizing the channel with advertising and no clear financial benefit to the creator, and the impact on the market value of Jay-Z’s discography is non-existent.

There are questions about the amount and substantiality of the borrowed work. But even if the model was trained on everything Jay-Z ever produced, it wouldn’t necessarily rule out a fair use defense for parody.

Ultimately, there are two clear truths I’ve learned about fair use from my own experiences: only a court can determine fair use, and while it might be a successful defense, fair use won’t protect you from getting sued and the costs of litigating are high.

Interviewing the Creator

As far as I know, this is the most prominent example of a celebrity claiming copyright over their own deepfakes, the first example of a musician issuing a takedown of synthesized vocals, and according to the creator, the first time YouTube’s removed a video for impersonating a voice with AI. (Previously, Conde Nast took down a Kim Kardashian deepfake by claiming copyright over the source video, and Jordan Peterson ordered a voice simulator offline.)

I reached out to the anonymous creator of Vocal Synthesis to learn more about how he makes these videos, his reaction to the takedown order, and his concern over the future of speech synthesis. (Unfortunately, Roc Nation didn’t respond to a request for comment.)


How do you feel about the takedown order? Were you surprised to receive it?
I was pretty surprised to receive the takedown order. As far as I’m aware, this was the first time YouTube has removed a video for impersonating a voice using AI. I’ve been posting these kind of videos for months and have not had any other videos removed for this reason. There are also several other channels making speech synthesis videos similar to mine, and I’m not aware of any of them having videos removed for this reason.

I’m not a lawyer and have not studied intellectual property law, but logically I don’t really understand why mimicking a celebrity’s voice using an AI model should be treated differently than someone naturally doing an (extremely accurate) impression of that celebrity’s voice. Especially since all of my videos are clearly labeled as speech synthesis in both the title and description, so there was no attempt to deceive anyone into thinking that these were real recordings of Jay-Z.

Can you talk a little about the effort that goes into generating a new model? For example, how long does it typically take to gather and train a new model until it sounds good enough to publish?
Constructing the training set for a new voice is the most time-consuming (and by far the most tedious) part of the process. I’ve written some code to help streamline it, though, so it now usually takes me just a few hours of work (it depends on the quality of the audio and the transcript), and then there’s an additional 12 hours (approximately) needed to actually train the model.

Are you using Tacotron 2 for synthesis?
Yeah, I’m using fine-tuned versions of Tacotron 2.

I saw you’ve struggled getting enough dialogue to fully develop some models, like with Mr. Rogers. Have there been other voices you’ve wanted to synthesize, but it’s just too challenging to find a corpus to work from?
Yeah, several. Recently I tried to make one for Theodore Roosevelt, but there’s only about 30 minutes of audio that exists for him (and it’s pretty poor quality), so the model didn’t really come out well.

The Crocodile Hunter (Steve Irwin) is another one I really want to do, and I can find enough audio, but I haven’t been able to find any accurate transcripts or subtitles yet (it’s very tedious for me to transcribe the audio myself).

How do you decide the voices and dialogue to pair together?
I try to consistently have all my voices read the Navy Seals Copypasta and the first few lines of the Book of Genesis, since it’s easier to hear the nuances of each voice when I can compare them to other voices reading the same text. Other than that, there’s no real method to it. If I have an idea for voice/text combination that I think would be funny or interesting enough to be worth the effort of making the video, then I’ll do it.

What do these videos mean to you? Is it more of a technical demonstration or a form of creative expression?
I wouldn’t really consider my videos to be a technical demonstration, since I’m definitely not the first to make realistic speech synthesis impersonations of well-known voices, and also the models I’m using aren’t state-of-the-art anymore.

Mainly, I’m just making these videos for entertainment. Sometimes I just have an idea for a video that I really want to exist, and I know that if I don’t make it myself, no one else will.

On the more serious side, the other reason I made the channel was because I wanted to show that synthetic media doesn’t have to be exclusively made for malicious/evil purposes, and I think there’s currently massive amounts of untapped potential in terms of fun/entertaining uses of the technology. I think the scariness of deepfakes and synthetic media is being overblown by the media, and I’m not at all convinced that the net impact will be negative, so I hoped that my channel could be a counterexample to that narrative.

Are you worried about the legal future for creative uses of this technology?
Sure. I expect that this technology will improve even more over the next few years, both in terms of accuracy and ease of use/accessibility. Right now it seems to be legally uncharted waters in some ways, but I think these issues will need to be settled fairly soon. Hopefully the technology won’t be stifled by overly restrictive legal interpretations.

It seems inevitable that, at some point, an artist’s voice is going to be used for other uses against their will: guesting on a track without permission, promoting products they aren’t paid for, or maybe just saying things they don’t believe. What would you say to artists or other public figures who are worried that this technology will damage their rights and image?
There are always trade-offs whenever a new technology is developed. There are no technologies that can be used exclusively for good; in the hands of bad people, anything can be used maliciously. I believe that there are a lot of potential positive uses of this technology, especially as it gets more advanced. It’s possible I’m wrong, but for now at least I’m not convinced that the potential negative uses will outweigh that.


Thanks to the anonymous creator of Vocal Synthesis for their time. You can subscribe to the YouTube channel (for now) for new videos, follow updates and remixes in the /r/VocalSynthesis subreddit, and the video mirror on LBRY.

Update: I just heard from Vocal Synthesis’s creator that the copyright strike was removed, and both videos are back on his channel. I initially suspected that Roc Nation dropped the copyright claim, but Nick Statt at The Verge reported that Google reviewed the DMCA takedowns.

“After reviewing the DMCA takedown requests for the videos in question, we determined that they were incomplete,” a Google spokesperson tells The Verge. “Pending additional information from the claimant, we have temporarily reinstated the videos.”

If Roc Nation provides the missing information to complete the DMCA requests, the videos will go offline again. Or, given the press coverage, they may choose to let it go. We’ll see!

9 Comments

Paste Parties: The Ephemeral, Chaotic Joy of Random Clipboards

Posted April 23, 2020April 23, 2020 by Andy Baio

Yesterday was my birthday, and like I’ve done for the last four years, I posted a single tweet that instantly destroyed my mentions for over 24 hours.

Today's my birthday, and all I want is yᴏᴜʀ ᴄʟɪᴩʙᴏᴀʀᴅ! Hit reply and paste—NO EDITING! ✂️📋🎉

— Andy Baio (@waxpancake) April 22, 2020

That tweet kicked off a paste party with over 2,000 replies, a potpourri of pure chaos and joy.

Random strings from emails and chat, passwords and 2FA tokens to unknown apps, screenshots and photos, obscure Unicode characters, dollar amounts from spreadsheets, bits of text in languages from Python to Esperanto, and so many links to articles, songs, videos, tweets, and obscure web pages.

It’s a momentary snapshot of digital ephemera, to be used and immediately discarded, much of it never meant to be seen by anyone and stripped of all context.

| grep -v "assigned or assigning state" | grep -v "ppp" | grep -v "AT&T" | grep -v Modem | grep -v Simple | grep -v tty | grep waiting

— Gregory (calm) (@g_pass) April 22, 2020

*changes his password immediately*

— Burak Yigit Kaya (@madbyk) April 22, 2020

c̶o̶o̶k̶i̶e̶s̶

— Gretchen McCulloch (@GretchenAMcC) April 23, 2020

I first saw this idea in a private file-sharing/discussion community, and tried it on Twitter back in 2012, giving away copies of games and movies to people who replied with the contents of their clipboard. (Those attempts netted 14 and 24 replies, respectively, but Twitter won’t show threaded replies for older tweets.)

But the idea goes back much further. Discussion forums and message boards have played variations of the “Ctrl+V Game” (or “Ctrl+V Threads”) since at least the early 2000s. Some of them ran for years, like this 12-year-long thread from Ants Marching with 4,500 replies.

The earliest examples I found are this Usenet thread from May 2001 (thanks, Ben!) and this thread from October 2001, but pre-2001 digital archives are hard to search these days. I wouldn’t be surprised if this idea went back to forums, Usenet, and BBSes in the ’80s or ’90s. (Add a comment if you know more!)

pic.twitter.com/qufnYlPrfF

— Jeff Atwood (@codinghorror) April 22, 2020

It's mom's birthday, yeah? Something about the numbers always messes it up in my head, even though I have it in my calendar.

— Qathi Hart (@Qathi) April 22, 2020

pic.twitter.com/xntglu8ZuB

— Danielle (@djbaskin) April 23, 2020

Without context, everything seems more mysterious. You wonder what it meant, or why someone had it in their clipboard.

1. Sarah Paulson
2. Talullah Bankhead
3.

— S. M., Esq. 🖤🤟 (@ForkingSupreme) April 22, 2020
https://twitter.com/aidanz/status/1253069039875784719

https://t.co/vEYNCfEL8I

— Jolene (@jocamo1980) April 23, 2020

Its mind bending we can get yeast from practically the point we settled down as a species to grow food rather than forage

— Andrew Singleton (@singletona) April 22, 2020

MIDI Stuff?

— Cyber City Circuits 🐼 (@MakeAugusta) April 22, 2020

Less performative love making, please

— Rev. Dr. Uncle Steven (@swestdahl) April 22, 2020

5. Try to focus on the present. In my divorce I spent a lot of time and energy both running post mortems in my head, trying to figure out how things had gotten to this point, and worrying about what my life would look like when it was over.

— Jordan Running (@swirlee) April 23, 2020

It’s a great way to discover interesting links to music, video, articles, and web pages, because if it was in someone’s clipboard, it probably means they found it interesting enough to send to someone.

https://t.co/5ChGutokrT

— Callie (@calliesaurus) April 22, 2020

https://t.co/uzmIocUTOa

— Cassie M. (@cassmarketos) April 22, 2020

https://t.co/D0I2IE3Qhi

— Simone Giertz (@SimoneGiertz) April 22, 2020

https://t.co/sNsjoPJbxi

— ark patrol (@ArkPatrol) April 23, 2020

Our clipboards show temporary glimpses of work in progress, whether it’s art, design, or code.

🖌🥳🎨 pic.twitter.com/q17AlePudK

— Grafera (@Grafera) April 22, 2020

// colors
this.colors = new Float32Array([
1, 1, 1, 1, // white
1, 0, 0, 1, // red
0, 1, 0, 1, // grean
0, 0, 1, 1, // blue
1, 1, 0, 1, // yellow
1, 0, 1, 1 //

— Dylan Ascencio (@SageOfMirrors) April 22, 2020

pic.twitter.com/t7BFCCzdIW

— Richard Perez (@SkinnyShips) April 22, 2020

pic.twitter.com/WtIMGW1Rd8

— Joel Califa (@notdetails) April 22, 2020

And so many good videos.

https://t.co/dM0BZLOfIi

— Patrick Burke 🌹🔥 (@secretpeej) April 22, 2020

https://t.co/hZsQaxaije

— Ellen K. Pao (@ekp) April 22, 2020

https://t.co/3xyhfj68OL https://t.co/bTLrxruBnx

— Sean Masn (@seanmasn) April 22, 2020

https://t.co/1vUEljhppG

— fiona (@fioroco) April 22, 2020

https://t.co/q5027H07QP

— Scott Martin (@hex) April 23, 2020
https://twitter.com/billyterr/status/1253089776938311680

It’s also a snapshot of a moment in time: we’re at the height of a global pandemic, and our clipboards reflect it in the content we’re copying.

People should still limit interactions except with immediate household

— Laura Howe (@LauraAnneHowe) April 23, 2020

If everyone is wearing masks, can we start to take photos of crowds for publication without needing a signed model release? #randomthoughts

— Greg Rose (@gregrose) April 23, 2020

Wife just found out her uncle died from kidney failure after contracting the coronavirus. They were estranged and she's doing okay, but I think we're all going to end up knowing at least one person this killed.

— Ryan Acheson (@plagiarize) April 22, 2020

Clorox wipes

Hand soap

Toilet paper

— Chip Crary (@ccrary) April 23, 2020
https://twitter.com/aflawson/status/1253113690896949248

I'll have as many masks ready to ship down to BC on Friday as I can before I leave.

— big boops (@pajamatammy) April 23, 2020
https://twitter.com/angledge/status/1253068732030574592
https://twitter.com/bvac/status/1253142259861721099

https://t.co/eraMhD5fxX

— Tempest (@tempest9) April 23, 2020
https://twitter.com/failladrum/status/1253071195924123648

Is it 2021 yet?

— Saad (@iSaadSalman) April 22, 2020

This tiny peek into everyone’s lives — their work, interests, and concerns, or even just the mundane momentary ephemera that’s forgotten two seconds later — is the perfect birthday gift.

Thanks for the presents. See you next year. ✂️📋🎉

Leave a comment

Kickstarting Flatter Me: A Compliment Battle Card Game

Posted April 14, 2020April 14, 2020 by Andy Baio

Three years ago, my wife Ami designed and developed her first game, a charming conversational card game called You Think You Know Me, which went on to sell over 9,000 copies around the world and now close to selling out its second print run.

I loved helping out with the package and card design for You Think You Know Me, a return to my pre-web career in desktop publishing and print production, as well as making the official homepage to support it. (The cards are all CSS!)

The followup to her first game is Flatter Me, a new game where you compete with friends to give compliments, with rules similar to the classic card game of War. It takes literally seconds to learn, explained in full in the project video below.

Each of the 250 cards have a unique compliment on them, which you can give away as little tokens of affection.

Once again, I helped out with the packaging and card designs, and if it hits its goal, you can expect to see a site at flatterme.cards once it’s officially on sale.

I know I’m biased, but Ami’s games have a gentle sweetness that really resonates with me. They’re all designed to bring people together, whether it’s by learning more about people you love or simply by telling them how much they mean to you.

Her games have rules and win conditions like any other card game, but they’re so quick and easy to understand that they become a convenient framework to enrich the connections between friends, family, and partners.

Flatter Me is now funding on Kickstarter, currently at 95% funded (!) with three days to go, and I’d love it if you checked it out or helped spread the word. Thanks!

2 Comments

Bbbreaking News: Discovering Amateur News Videos by Monitoring Journalists on Twitter

Posted December 10, 2019April 14, 2020 by Andy Baio

If you’ve ever looked at the replies on any newsworthy amateur video posted to Twitter, you’ll see an inevitable chorus of news organizations and broadcast journalists in the replies, usually asking two questions:

  1. Did you shoot this video?
  2. Can we use it on all our platforms, affiliates, etc with credit?

That gave me an idea, which I posted to Twitter.

I bet you could make a great breaking news site that just monitors this Twitter search of media properties asking for permission to broadcast user videos, and scoops them by automatically posting the most active videos. https://t.co/xP3160ezHQ

— Andy Baio (@waxpancake) August 1, 2019

Within two days, a talented developer named Corey Johnson made it real by launching Bbbreaking News.

I’ve returned regularly since Corey launched it and, as expected, it’s a powerful way of tracking a particular type of breaking news: visual stories with footage captured by normal people at the right place and right time.

Much of it is of interest only to local news channels: traffic accidents, subway mishaps, a wild animal on the loose, the occasional building fire.

But frequently, Bbbreaking News shows the impact of gun violence and climate change: a near-constant stream of active shooter scenarios, interspersed with massive brush fires, catastrophic flooding, and extreme weather events.

It’s a fascinating way to see the stories that broadcast media is currently tracking and viewing their sources before they can even report on it, captured by the people stuck in the middle.

I recommend checking it out. Thanks to Corey for running with the idea and saving me the effort of building it myself!

4 Comments

The Tools I Use: My Setup, 11 Years Later

Posted December 9, 2019April 14, 2020 by Andy Baio

On January 22, 2009, I linked to Daniel Bogan’s newly-launched Uses This (then called “The Setup”), an interview series where he asks interesting people about “the tools and techniques they use to get things done.”

Three days later, Daniel asked me on AOL Instant Messenger if I’d be open to doing an interview myself.

I happily agreed—and then waited nearly 11 years to get around to it, despite his occasional prodding.

Since he first asked me, Daniel’s published over 1,000 interviews with an incredibly interesting group of people spanning dozens of fields and professions.

So I finally sat down and wrote my answers, and the interview is now live.

I can’t say it’s particularly interesting or meaningful, but it might give you a glimpse into how I think about the tools I use to make things.

And Daniel? Thanks for your patience.

1 Comment
⇠ Older Posts
Newer Posts ⇢
Waxy.org | About