YouTube's Content ID Disputes Are Judged by the Accuser

Posted March 2, 2012August 25, 2023 by Andy Baio

Last Friday, a YouTube user named eeplox posted a question to the support forums, regarding a copyright complaint on one of his videos. YouTube’s automated Content ID system flagged a video of him foraging a salad in a field, claiming the background music matched a composition licensed by Rumblefish, a music licensing firm in Portland, Oregon.

The only problem? There is no music in the video; only bird calls and other sounds of nature.

Naturally, he filed a dispute, explaining that the audio couldn’t possibly be copyrighted.

The next day, amazingly, his claim was rejected. Not by YouTube itself — it’s unlikely that a Google employee ever saw the claim — but from a representative at Rumblefish, who reviewed the dispute and reported back to YouTube that their impossible copyright for nonexistent music was indeed violated.

Back at YouTube, eeplox found himself at a dead end. YouTube now stated, “All content owners have reviewed your video and confirmed their claims to some or all of its content.” No further disputes were possible, the case was closed.

Whether caused by a mistake or malice, Rumblefish was granted full control over eeplox’s video. They could choose to run ads on the video, mute the audio, or remove it entirely from the web.

A History of Screw-Ups

On Sunday night, Reddit took notice. Within hours, the thread was on the homepage, commenters were freaking out and, to his credit, Rumblefish CEO Paul Anthony was fielding questions in an IAmA interview until 2:30am.

His argument: One of Rumblefish’s Content ID reps made a mistake by denying the dispute, and they released the claim on Sunday night. “We review a substantial amount of claims every day and the number is increasing significantly,” said Anthony. “We have millions of videos now using our songs as soundtracks and keeping up is getting harder and harder.”

This is the latest in a long series of foibles or outright abuses of YouTube’s Content ID system. Content ID was intended to help copyright holders manage the chaos of YouTube. They’d provide copies of their audio and video for analysis, which would then algorithmically match newly-uploaded videos. If a match was found, rightsholders could automatically block the video or, increasingly, claim money from video advertising.

Content ID’s monetization was a huge boon for copyright holders. Uploaders could keep their videos online, while copyright holders profited from the creative reuse of their work.

But the last couple years have seen a dramatic rise in Content ID abuse, using it for purposes that it was never intended. Scammers are using Content ID to steal ad revenue from YouTube video creators en masse, with some companies claiming content they don’t own, deliberately or not. The inability to understand context and parody regularly leads to “fair use” videos getting blocked, muted or monetized.

Bypassing the DMCA

The problem is that media companies and scammers are using Content ID as an end run around the DMCA.

With the DMCA, the process works like this. A rightsholder could file a claim against a video with YouTube, and YouTube would immediately take the video offline. If there was a mistake, the uploader could file a counter-notice. The video would then be restored by YouTube within 10-14 business days of the counter-notice, unless it went to court.

It wasn’t perfect, by any means, but it was fair. Disputes could always be appealed, and both parties were given equal power. And if a claimant lied about owning the copyright to the material in question, they could face perjury charges.

The current system, led by Content ID, tips the balance far in favor of the claimant.

Rumblefish never needed to prove they were the copyright holder, but were still given ultimate control over the video’s fate. Uploaders can dispute claims, but the only people reviewing claims are the Content ID partners that filed the claim in the first place, who are free to deny them wholesale.

A Simple Fix

The solution is simple: if a copyright holder wants to pursue a disputed Content ID match, they should file a DMCA claim. That’s the only way to guarantee their rights, and make the copyright holder legally responsible for telling the truth.

In fact, this is exactly how YouTube says that Content ID “fair use” claims should work. In practice, this doesn’t appear to be true any longer. Content ID partners, of course, can file a DMCA notice at any time, but why bother if they can reject the counter-claims themselves?

(Preferred partners like Universal Music Group can go a step further and block videos directly without filing a claim.)

This problem has been on YouTube’s radar for at least two years, but it’s only getting worse as unsavory companies discover this nascent business model. Claim copyright on media you may or may not own, and let Content ID do the rest.

By letting Content ID partners have the final word, and not trusting their own users, YouTube is violating its trust with its community and damaging fair use in the process.

Update

I originally published this article over at Wired, where a commenter pointed out that this process may actually violate YouTube’s “safe harbor” granted through the DMCA. If they choose to ignore disputes, they’re effectively giving content providers an end run around fair use and the DMCA.

Selfish Crab wrote:

It seems like by providing the Content ID system, Youtube was trying to pre-emptively identify copyrighted material, like a first-pass dispute system. Their lawyers probably concluded that so long as the content ID system falls back onto DMCA takedown procedure, they are still in compliance with the DMCA sufficiently to retain their safe harbor.

So if Content ID claim disputes do not fall back onto DMCA takedown, as Andy’s article suggests, there’s a case to be made that YouTube no longer has liability protection from users. It is a whole another can of worms to analyze what a legal claim against youtube would look like. You’d have to look at the YouTube Terms of Service (i.e., the contract) to see if maybe they contracted around this problem already, you’d have to figure out damages, etc etc. Or I guess you can just raise a shitstorm and that’s enough of a moral victory.

In a Google+ comment last December, senior copyright counsel for Google and former EFF staff attorney Fred von Lohmann acknowledged the problem.

Yes, we’re aware of that problem in the Content ID dispute process and are looking at what we can do to fix it. It’s the result of a complicated collision of how to handle geographically limited Content ID claims, disputes, and global DMCA removals. Turns out to be a hard problem to figure out. But we’re thinking on it.

Virginia law student Patrick McKay got in touch with Annie Baxter, a public relations manager at YouTube, about this issue.

This is one of those corner-case outcomes that emerges from several different rules, none of which was intended to yield the result you’ve encountered (i.e., DMCA takedowns are global, but Content ID ownership claims are territorial). Unfortunately, addressing it YouTube-wide is going to take some time, both for pondering and implementing.

So while we can promise you that we’re thinking about this, we can’t promise you a fix or time-table. And feel free to tell the OVC we’re looking at it and trying to come up with something.

In the meantime, anyone in the Content ID program is offered free rein to claim copyright on your videos and profit directly from them. I’m hoping this gets cleared up soon.

Introducing Playfic

Posted February 15, 2012 by Andy Baio

So, I made a weird new thing with my 15-year-old nephew, Cooper McHatton. It’s experimental and has lots of rough edges, but quite frankly, I’m tired of working on it, so here you go.

Playfic is a community for writing, sharing, and playing interactive fiction games (aka “text adventures”) entirely from your browser, using a “natural language”-inspired language called Inform 7.

Inform 7 is incredibly awesome and weird. For example, this is a fully functional game:

East of the Garden is the Gazebo. Above is the Treehouse. A billiards table is in the Gazebo. On it is a trophy cup. A starting pistol is in the cup. In the Treehouse is a container called a cardboard box.

Type that into Playfic, and you end up with this simple game, ready to send to the world.

The official documentation is extensive, with a great manual and recipe book. I’ve collected a list of resources to help you get started.

For now, there’s very little documentation on Playfic itself, but you can click the “View game source” link on every game to see how it was made, and Cooper’s adding sample games from the official Recipe Book.

My hope is that Playfic opens up the world of interactive fiction to a much wider audience — young writers, fanfic authors, and culture remixers of all ages.

While the language can be tricky, building simple games is surprisingly easy. Cooper had never coded anything or made a game before trying Playfic, and within 30 minutes of futzing around, he’d made his first game.

Some stuff is broken and missing, but I’d love to hear what you make of it. Open to any and all feedback. Go make some games!

The Perpetual, Invisible Window Into Your Gmail Inbox

Posted February 11, 2012 by Andy Baio

The other day, I tried out Unroll.me, a clever new service that reads your inbox to let you unsubscribe from mailing lists and other unwanted e-mail flotsam with a single click.

As I was about to connect my Gmail account, my finger hovered over the “Grant access” button.

Wait a second. Who am I giving access to my Gmail account, anyway? There was no identifying information on their site — no company address, no team page listing the names of its team members, and broken links to their privacy policy or terms of service.

For all I knew, it could be run by unscrupulous spammers or an Anonymous troll looking for lulz. And I was about to give them unfettered access to eight years of my e-mail history and, with password resets, the ability to access any of my online accounts?

I had to dig around online to find out who’s behind it, and fortunately, Unroll.me is a totally legit NYC-based startup providing a useful service. I spoke to Perri Blake Gorman, Unroll.me’s cofounder and CMO, who assured me they’ll add all the company information as they roll out their public beta.

But since Gmail added OAuth support in March 2010, an increasing number of startups are asking for a perpetual, silent window into your inbox.

I’m concerned OAuth, while hugely convenient for both developers and users, may be paving the way for an inevitable privacy meltdown.

The Road to OAuth

For most of the last decade, alpha geeks railed against “the password anti-pattern,” the common practice for web apps to prompt for your password to a third-party, usually to scrape your e-mail address book to find friends on a social network. It was insecure and dangerous, effectively training users how to be phished.

The solution was OAuth, an open standard that lets you grant permission for one service to connect to another without ever exposing your username or password. Instead of passwords getting passed around, services are issued a token they can use to connect on your behalf.

If you’ve ever granted permission for a service to use your Twitter, Facebook, or Google account, you’ve used OAuth.

This was a radical improvement. It’s easier for users, taking a couple of clicks to authorize accounts, and passwords are never sent insecurely or stored by services who shouldn’t have them. And developers never have to worry about storing or transmitting private passwords.

But this convenience creates a new risk. It’s training people not to care.

It’s so simple and pervasive that even savvy users have no issue letting dozens of new services access their various accounts.

I’m as guilty as anyone, with 49 apps connected to my Google account, 80 to Twitter, and over 120 connected to Facebook. Others are more extreme. My friend Sam is a developer at Kickstarter, and he authorized 148 apps to use his Twitter account. Anil counted 88 apps using his Google account, with nine granted access to Gmail.

For Twitter, the consequences are unlikely to be serious since almost all activity is public. For Facebook, a mass leak of private Facebook photos could certainly be embarrassing.

But for Gmail, I’m very concerned that it opens a major security flaw that’s begging to be exploited.

The Privacy Danger

A long list of services, large and small, request indefinite access to your Gmail account.

I asked on Twitter and Google+ for people to check their Google app permissions to see who they’ve granted Gmail access to. The list includes a range of inbox organizers, backup services, email utilities, and productivity apps: TripIt, Greplin, Rapportive, Xobni, Gist, OtherInbox, Unsubscribe, Backupify, Blippy, Threadsy, Nuevasync, How’s My Email, ToutApp, ifttt, Email Game, Boomerang, Kwaga, Mozilla F1, 0boxer, Taskforce, and Cloudmagic.

Once granted, all of these services are issued a token that gives unlimited access to your complete Gmail history. And that’s where the danger lies.

You may trust Google to keep your email safe, but do you trust a three-month-old Y Combinator-funded startup created by three college kids? Or a side project from an engineer working in his 20 percent time? How about a disgruntled or curious employee of one of these third-party services?

Any of these services becomes the weakest link to access the e-mail for thousands of users. If one’s hacked or the list of tokens leaked, everyone who ever used that service risks exposing his complete Gmail archive.

The scariest thing? If the third-party service doesn’t discover the hack or chooses not to invalidate its tokens, you may never know you’re exposed.

In the past, Gmail’s issued security warnings to accounts being accessed from multiple IP addresses. I spoke to OtherInbox founder Joshua Baer, and he said that Google’s eased up on the warnings because of the prevalence of third-party services.

It’s entirely possible for someone with a stolen token to read, search, and download all your mail to their server for months, and you’d never find out unless they exposed themselves, or you were diligently auditing your “Last account activity” history.

Stay Safe

Clearly, we’re not going to stop using awesome new utilities just because there’s a privacy risk. But there are best practices you can follow to stay safe.

Clean up your app permissions. The best thing you could do, right now, is to log into each service you care about and revoke access to the apps you no longer use or care about, especially those that have access to Gmail. Finding the permissions pages can be tricky, but the nice folks at MyPermissions.org made a handy dashboard linking to every one.
Think before you authorize. Before authorizing an account, find out who you’re granting access to. Look for a staff page, contact address, and take a look at the privacy policy to make sure they’re not sharing or selling your info with third parties. Bonus points if they outline their security policies and offer a way to disconnect service from within the app. If anything seems off, don’t do it.
When in doubt, change your password. Have a feeling that someone might be reading your mail, but not sure which app is to blame? Changing your password instantly invalidates all your Google and Facebook OAuth tokens, though Twitter tokens persist after password changes.

Google could improve, as well. Their permissions page is too hard to find, even for experienced users, and it’s impossible to see which apps have accessed your account recently.

Facebook does an excellent job with this, but Google only shows you the IP address and the protocol it used to connect. Surfacing this information, as a periodic e-mail or on-site notification, would go a long way to averting a potential disaster.

The Greatest Troll of All

So, I originally published everything above over on my Wired column yesterday, but I left off something else I’ve been thinking about.

While I think a compromised database is the most likely scenario, there’s another possibility that disturbs me more.

Imagine that a brand new service pops up, offering a simple, fun service that uses your Gmail account. Maybe a neat visualization like Tout’s Year in Review, or maybe something more practical like sending all your attachments to Dropbox.

But it’s all just a giant troll, where the app’s creators are silently running targeted searches, downloading your mail, and looking for compromising photos and sensitive documents behind-the-scenes. They could collect the documents for months or years, and then release it all online in an anonymous blast. Lulz!

You’d likely never find out where the data came from, and the perpetrators would never be caught. Hell, if you’ve Gmail-authed a questionable app, this could be happening to you right now and you’d never know. Whee!

Pirating the Oscars 2012: Ten Years of Data

Posted January 31, 2012 by Andy Baio

Every year, the MPAA tries desperately to stop Oscar screeners — the review copies sent to Academy voters — from leaking online. And every year, teenage boys battling for street cred always seem to defeat whatever obstacles Hollywood throws at them.

For the last 10 years, I’ve tracked the online distribution of Oscar-nominated films, going back to 2003. Using a number of sources (see below for methodology), I’ve compiled a massive spreadsheet, now updated to include 310 films.

This year, for the first time, I’m calling it: after three years of declines, the MPAA seems to be winning the battle to stop screener leaks. But why?

number_of_films

A record 37 films were nominated this year, and the studios sent out screeners for all but four of them. But, so far, only eight of those 33 screeners have leaked online, a record low that continues the downward trend from last year.

(Disclaimer: Any of this could change before the Oscar ceremony, and I’ll keep the data updated until then.)

They may be winning the battle, but they’ve lost the war.

While screeners declined in popularity, 34 of the nominated films (92 percent) were leaked online by nomination day, with 25 of them available as high-quality DVD or Blu-ray rips. Only three films — Extremely Loud & Incredibly Close, My Week with Marilyn and W.E. — haven’t leaked online in any form (yet!).

If the goal of blocking leaks is to keep the films off the internet, then the MPAA still has a long way to go.

There are a number of theories about what’s causing the decline.

It could be attributed to tighter controls — personalized watermarks, the aggressive prosecution of leakers, and greater awareness of the risks for Academy voters.

But the MPAA may have little to do with the decline. Oscar-nominated films could be coming out earlier in the year, making screeners less important.

Or maybe the interests between the mainstream downloader and industry favorites is diverging? If the Oscars are mostly arthouse fare and critical darlings, but with low gross receipts, they’ll be less desirable to leak online. It would be very interesting to track the historical box office performance of nominees to see how it affects downloading. (Maybe next year!)

The continuously shrinking window between theatrical and retail releases may be to blame. After all, once the retail Blu-ray or DVD is released, there’s no reason for pirate groups to release a lower-quality watermarked screener.

The chart below tracks the window between U.S. release and its first DVD/Blu-Ray leak online, which shows how the window between theatrical and retail release dates is slowly closing since 2003.

Whatever the reason, online movie releasing groups are taking longer to pirate movies than ever. When I first started tracking releases in the early- to mid-2000s, the median time between theatrical release to its first leak online was 1 to 2 days. Now, that number’s crept up to over three weeks.

The rise in leak time correlates with a dip in popularity for lower-quality sources, like camcorder-sourced footage. This year, only eight of the 37 nominees (21 percent) were sourced from camcorder footage. (This is likely because there are fewer blockbuster nominees than in the mid-2000s.)

As the industry slowly transitions from physical media to streaming video, it’ll be interesting to see if the downward trend continues, or if the ease of capturing streaming video spawns a new renaissance for screeners. Last year, Fox Searchlight distributed screeners with iTunes, and all were quickly and easily pirated.

The Data Dump

Skeptical of my results? Want to dig into it yourself? Good! Here’s the complete dataset, available on Google Spreadsheets or downloadable as an Excel spreadsheet or comma-separated text file.

Methodology

I include the full-length feature films in every category except documentary and foreign films (even music, makeup, and costume design).

I use Yahoo! Movies for the release dates, always using the first available U.S. date, even if it was a limited release, falling back to the first available U.S. date in IMDB.

All the cam, telesync, and screener leak dates are taken from VCD Quality, supplemented by dates in ORLYDB. I always use the first leak date, excluding unviewable or incomplete nuked releases.

The official screener release dates are from Academy member Ken Rudolph, who kindly lists the dates he receives each screener on his personal homepage. Thanks again, Ken!

For previous years, see 2004, 2005, 2007, 2008 (part 1 and part 2), 2009, 2010, and 2011.

Why SOPA and PIPA Must Die

Posted January 18, 2012 by Andy Baio

Today, you’re going to hear a million solid reasons why SOPA and PIPA — the two proposed bills sponsored by the entertainment industry to censor the web — have to die. Wikipedia, Google, Reddit, craigslist, Metafilter, and many, many more have made their cases. Here’s mine.

Virtually every project I’ve ever worked on is threatened by this legislation:

Upcoming.org faced copyright complaints for event posters and listings that users added to the site.

Kickstarter gets DMCA takedowns from artists who find their work used in pitch videos, and from project founders quarreling with each other.

Supercut.org indexes hundreds of video remixes that reuse copyrighted content.

Kind of Bloop faced a lawsuit over the cover art.

And here on Waxy.org, I’ve had a number of battles over copyright. Among them, I received a cease-and-desist from EMI for being the first person to host DJ Danger Mouse’s Grey Album on the web, from Disney for hosting the Kleptones’ Night at the Hip-Hopera, and from Bill Cosby for hosting House of Cosbys, which was clearly fair use as a parody.

Every cease-and-desist and DMCA request I’ve received wasn’t fun to get in my inbox, but it allowed me to deal with the issues directly with the copyright holder or using the due process of the court system.

Imagine, instead, a world where a bill like SOPA or PIPA passes. A copyright holder could bypass due process entirely, demanding that search engines stop linking to my sites, ad providers drop me, and force DNS providers not to resolve my domain name. All in the name of stopping piracy.

The chilling effect would be huge.

Every online community that allows for community-contributed content — discussion forums, imageboards, Usenet newsgroups, photo sharing communities, video sites, and many more — would be forced to pre-emptively self-censor, shut down, or risk getting blown off the net entirely.

That fucking sucks.

Everything I love about the web requires the unfettered freedom to build new ways to let people express themselves, and with that, comes the risk of copyright infringement.

Breaking the web isn’t a solution.

Please take 10 minutes today to call your representatives — or show up in person! –and let them know you won’t stand for this. SOPA and PIPA must die.