OpenAI’s Jukebox Opens the Pandora’s Box of AI-Generated Music

Today, research laboratory OpenAI announced Jukebox, a sophisticated neural network trained on 1.2 million songs with lyrics and metadata, capable of generated original music in the style of various artists and genres, complete with rudimentary singing and vocal mannerisms.

The Jukebox AI can generate new music in a genre or artist’s style, guided with lyrics and an optional audio prompt, or completely unguided.

Note that Jukebox doesn’t generate lyrics: it can only sing lyrics when they’re provided as input. Without lyrics for guidance, Jukebox generates nonsensical vocal utterances in the style of the original singer. (The lyrics in the Curated Samples section of the Jukebox announcement were generated with an unrelated language model, GPT-2, and used as playful sample input text.)

The resulting work is a clear leap forward in musical quality, though it comes with some limitations.

“While Jukebox represents a step forward in musical quality, coherence, length of audio sample, and ability to condition on artist, genre, and lyrics, there is a significant gap between these generations and human-created music.

For example, while the generated songs show local musical coherence, follow traditional chord patterns, and can even feature impressive solos, we do not hear familiar larger musical structures such as choruses that repeat.”

Even with those limitations, the results are just incredible to explore. I recommend starting with the featured samples from the blog post, and then diving into the uncurated library of over 7,100 song samples.

Highlights

Just digging around the sample library, I found so many intriguing examples. It’s the uncanny valley of music: machine-hallucinated melodies and nonsensical DeepDream-esque vocals, but often capturing the style and mannerisms of the artist it’s trying to mimic.

In this example, the Jukebox AI is fed the lyrics from Eminem’s “Lose Yourself” and told to generate an entirely new song in the style of Kanye West.

With no lyrics for guidance, the AI tries to generate an entirely new David Bowie song. Have fun making out the lyrics!

Again, with no lyrics to guide it, the AI tries to generate an entirely new Prince song. I asked Anil Dash about it, and he said it sounded like it was trained heavily on Prince’s 2000s-era work.

A neural network tries to write a Tori Amos song.

A.I.-generated Al Green is pretty listenable. If the audio fidelity was better, I’d put this on at a dinner party. The machine-generated vocal utterances (you can’t really call them lyrics) are nonsense, but it hardly matters.

In one of the stranger examples, the OpenAI researchers fed the lyrics of Avril Lavigne’s “Dumb Blonde” to the model—and told it to make a Talking Heads song, complete with David Byrne’s vocal mannerisms.

For the Continuations collection, researchers prompted the AI with the real lyrics and first 12 seconds of the original song, and then just… let it loose. Listen to this version of David Bowie’s “Space Oddity” that rapidly goes sideways once the leash is off.

I wonder if this is what Let It Be-era Beatles sounds like to people who hate the Beatles and/or don’t speak English.

Find any great ones in the collection? Post a comment with your favorites.

Unfortunately, making your own songs won’t be as easy. While the code is available, OpenAI says it takes three hours to render 20 seconds of audio on an NVIDIA Tesla V100, a $10,000 GPU. You can experiment with it on Google Colab for short, low-quality samples, but rendering times and memory limits may make it challenging.

Legality

Just two days ago, I wrote about how Jay-Z ordered two deepfaked audio parodies off YouTube, the first known example of someone claiming copyright over an AI voice impersonation and the first time YouTube removed a video for it.

One of the OpenAI researchers on the project addressed the legality question directly, stating that they believe training the AI on copyrighted material is fair use, but sought clarification from the U.S. Patent and Trademark Office for clarification.

But what about using the AI to generate new music? If I make a new album of Britney Spears songs, in her style and in her voice, who owns the copyright for that work?

I’d refer to the discussion of copyright and fair use from my earlier post, which applies here across the board. In short, it depends on how it’s used.

New music generated from a corpus of copyrighted music by a single artist may be considered a derivative work, in which case, only the original elements would be protected by copyright—and what constitutes “original” in this context? Machine-generated melodies and lyrics? The vocal performance? We’re in untested legal waters.

While there’s no federal law for personality rights, many states have recognized the right to control your likeness for commercial use, either by common law or statutes. In one notable example from 1988, Bette Midler was able to win her case against Ford Motor for their use of a sound-alike singer in advertising.

But typically, personality rights statutes would only apply to commercial uses, and not the wide array of non-commercial use for creative remixing.

Even if it’s found to be copyright infringement, the use of AI-generated music for parody, criticism, and commentary should be protected under fair use, but only a court can decide that on a case-by-case basis.

The Future Is Here

In Robin Sloan’s first novella, Annabel Scheme, a quantum computer populates a massive file server with music that never existed in this dimension.

Image

Until this year, Annabel Scheme’s file server was the stuff of science fiction.

With the release of OpenAI’s Jukebox, the future is here and the world of music just got much, much weirder.

Comments

    Fascinating. I love how this mirrors the low-quality technical artifacts of early gramophones or badly-transmitted radio signals — we can imagine that over time, just like these old technologies, quality will become crystal clear.

    I didn’t spot any Creative Commons license on OpenAI’s music, it would be nice. For starters, I’d be curious what Google’s Closed Captioning AI would interpret the lyrics as. One AI listening to another…

Leave a Reply

Your email address will not be published.