Invasive Diffusion: How one unwilling illustrator found herself turned into an AI model

Posted November 1, 2022November 15, 2023 by Andy Baio

Last weekend, Hollie Mengert woke up to an email pointing her to a Reddit thread, the first of several messages from friends and fans, informing the Los Angeles-based illustrator and character designer that she was now an AI model.

The day before, a Redditor named MysteryInc152 posted on the Stable Diffusion subreddit, “2D illustration Styles are scarce on Stable Diffusion, so I created a DreamBooth model inspired by Hollie Mengert’s work.”

Using 32 of her illustrations, MysteryInc152 fine-tuned Stable Diffusion to recreate Hollie Mengert’s style. He then released the checkpoint under an open license for anyone to use. The model uses her name as the identifier for prompts: “illustration of a princess in the forest, holliemengert artstyle,” for example.

Artwork by Hollie Mengert (left) vs. images generated with Stable Diffusion DreamBooth in her style (right)

The post sparked a debate in the comments about the ethics of fine-tuning an AI on the work of a specific living artist, even as new fine-tuned models are posted daily. The most-upvoted comment asked, “Whether it’s legal or not, how do you think this artist feels now that thousands of people can now copy her style of works almost exactly?”

Great question! How did Hollie Mengert feel about her art being used in this way, and what did MysteryInc152 think about the explosive reaction to it? I spoke to both of them to find out — but first, I wanted to understand more about how DreamBooth is changing generative image AI.

Since its release in late August, I’ve written about the creative potential and complex ethical and legal debates unleashed by the open-source release of Stable Diffusion, explored the billions of images it was trained on, and talked about the data laundering that shields corporations like Stability AI from accountability.

By now, we’ve all heard stories of artists who have unwillingly found their work used to train generative AI models, the frustration of being turned into a popular prompt for people to mimic you, or how Stable Diffusion was being used to generate pornographic images of celebrities.

But since its release, Stable Diffusion could really only depict the artists, celebrities, and other notable people who were popular enough to be well-represented in the model training data. Simply put, a diffusion model can’t generate images with subjects and styles that it hasn’t seen very much.

When Stable Diffusion was first released, I tried to generate images of myself, but even though there are a bunch of photos of me online, there weren’t enough for the model to understand what I looked like.

Real photos of me (left) vs. Stable Diffusion output for the prompt “portrait of andy baio” (right)

That’s true of even some famous actors and characters: while it can make a spot-on Mickey Mouse or Charlize Theron, it really struggles with Garfield and Danny DeVito. It knows that Garfield’s an orange cartoon cat and Danny DeVito’s general features and body shape, but not well enough to recognizably render either of them.

On August 26, Google AI announced DreamBooth, a technique for introducing new subjects to a pretrained text-to-image diffusion model, training it with as little as 3-5 images of a person, object, or style.

Today, along with my collaborators at @GoogleAI, we announce DreamBooth! It allows a user to generate a subject of choice (pet, object, etc.) in myriad contexts and with text-guided semantic variations! The options are endless. (Thread 👇)
webpage: https://t.co/EDpIyalqiK
1/N pic.twitter.com/FhHFAMtLwS
— Nataniel Ruiz (@natanielruizg) August 26, 2022

Google’s researchers didn’t release any code, citing the potential “societal impact” risk that “malicious parties might try to use such images to mislead viewers.”

Nonetheless, 11 days later, an AWS AI engineer released the first public implementation of DreamBooth using Stable Diffusion, open-source and available to everyone. Since then, there have been several dramatic optimizations in speed, usability, and memory requirements, making it extremely accessible to fine-tune it on multiple subjects quickly and easily.

Yesterday, I used a simple YouTube tutorial and a popular Google Colab notebook to fine-tune Stable Diffusion on 30 cropped 512×512 photos of me. The entire process, start to finish, took about 20 minutes and cost me about $0.40. (You can do it for free but it takes 2-3 times as long, so I paid for a faster Colab Pro GPU.)

The result felt like I opened a door to the multiverse, like remaking that scene from Everything Everywhere All at Once, but with me instead of Michelle Yeoh.

Sample generations of me as a viking, anime, stained glass, vaporwave, Pixar character, Dali/Magritte painting, Greek statue, muppet, and Captain America

Frankly, it was shocking how little effort it took, how cheap it was, and how immediately fun the results were to play with. Unsurprisingly, a bunch of startups have popped up to make it even easier to DreamBooth yourself, including Astria, Avatar AI, and ProfilePicture.ai.

But, of course, there’s nothing stopping you from using DreamBooth on someone, or something, else.

I talked to Hollie Mengert about her experience last week. “My initial reaction was that it felt invasive that my name was on this tool, I didn’t know anything about it and wasn’t asked about it,” she said. “If I had been asked if they could do this, I wouldn’t have said yes.”

She couldn’t have granted permission to use all the images, even if she wanted to. “I noticed a lot of images that were fed to the AI were things that I did for clients like Disney and Penguin Random House. They paid me to make those images for them and they now own those images. I never post those images without their permission, and nobody else should be able to use them without their permission either. So even if he had asked me and said, can I use these? I couldn’t have told him yes to those.”

She had concerns that the fine-tuned model was associated with her name, in part because it didn’t really represent what makes her work unique.

“What I pride myself on as an artist are authentic expressions, appealing design, and relatable characters. And I feel like that is something that I see AI, in general, struggle with most of all,” Hollie said.

Four of Hollie’s illustrations used to train the AI model (left) and sample AI output (right)

“I feel like AI can kind of mimic brush textures and rendering, and pick up on some colors and shapes, but that’s not necessarily what makes you really hireable as an illustrator or designer. If you think about it, the rendering, brushstrokes, and colors are the most surface-level area of art. I think what people will ultimately connect to in art is a lovable, relatable character. And I’m seeing AI struggling with that.”

“As far as the characters, I didn’t see myself in it. I didn’t personally see the AI making decisions that that I would make, so I did feel distance from the results. Some of that frustrated me because it feels like it isn’t actually mimicking my style, and yet my name is still part of the tool.”

She wondered if the model’s creator simply didn’t think of her as a person. “I kind of feel like when they created the tool, they were thinking of me as more of a brand or something, rather than a person who worked on their art and tried to hone things, and that certain things that I illustrate are a reflection of my life and experiences that I’ve had. Because I don’t think if a person was thinking about it that way that they would have done it. I think it’s much easier to just convince yourself that you’re training it to be like an art style, but there’s like a person behind that art style.”

“For me, personally, it feels like someone’s taking work that I’ve done, you know, things that I’ve learned — I’ve been a working artist since I graduated art school in 2011 — and is using it to create art that that I didn’t consent to and didn’t give permission for,” she said. “I think the biggest thing for me is just that my name is attached to it. Because it’s one thing to be like, this is a stylized image creator. Then if people make something weird with it, something that doesn’t look like me, then I have some distance from it. But to have my name on it is ultimately very uncomfortable and invasive for me.”

I reached out to MysteryInc152 on Reddit to see if they’d be willing to talk about their work, and we set up a call.

MysteryInc152 is Ogbogu Kalu, a Nigerian mechanical engineering student in New Brunswick, Canada. Ogbogu is a fan of fantasy novels and football, comics and animation, and now, generative AI.

His initial hope was to make a series of comic books, but knew that doing it on his own would take years, even if he had the writing and drawing skills. When he first discovered Midjourney, he got excited and realized that it could work well for his project, and then Stable Diffusion dropped.

Unlike Midjourney, Stable Diffusion was entirely free, open-source, and supported powerful creative tools like img2img, inpainting, and outpainting. It was nearly perfect, but achieving a consistent 2D comic book style was still a struggle. He first tried hypernetwork style training, without much success, but DreamBooth finally gave him the results he was looking for.

Before publishing his model, Ogbogu wasn’t familiar with Hollie Mengert’s work at all. He was helping another Stable Diffusion user on Reddit who was struggling to fine-tune a model on Hollie’s work and getting lackluster results. He refined the image training set, got to work, and published the results the following day. He told me the training process took about 2.5 hours on a GPU at Vast.ai, and cost less than $2.

Reading the Reddit thread, his stance on the ethics seemed to border on fatalism: the technology is inevitable, everyone using it is equally culpable, and any moral line is completely arbitrary. In the Reddit thread, he debated with those pointing out a difference between using Stable Diffusion as-is and fine-tuning an AI on a single living artist:

There is no argument based on morality. That’s just an arbitrary line drawn on the sand. I don’t really care if you think this is right or wrong. You either use Stable Diffusion and contribute to the destruction of the current industry or you don’t. People who think they can use [Stable Diffusion] but are the ‘good guys’ because of some funny imaginary line they’ve drawn are deceiving themselves. There is no functional difference.

On our call, I asked him what he thought about the debate. His take was very practical: he thinks it’s legal to train and use, likely to be determined fair use in court, and you can’t copyright a style. Even though you can recreate subjects and styles with high fidelity, the original images themselves aren’t stored in the Stable Diffusion model, with over 100 terabytes of images used to create a tiny 4 GB model. He also thinks it’s inevitable: Adobe is adding generative AI tools to Photoshop, Microsoft is adding an image generator to their design suite. “The technology is here, like we’ve seen countless times throughout history.”

Toward the end of our conversation, I asked, “If it’s fair use, it doesn’t really matter in the eye of the law what the artist thinks. But do you think, having done this yourself and released a model, if they don’t find flattering, should the artist have any say in how their work is used?”

He paused for a few seconds. “Yeah, that’s… that’s a different… I guess it all depends. This case is rather different in the sense that it directly uses the work of the artists themselves to replace them.” Ultimately, he thinks many of the objections to it are a misunderstanding of how it works: it’s not a form of collage, it’s creating new images and clearly transformative, more like “trying to recall a vivid memory from your past.”

“I personally think it’s transformative,” he concluded. “If it is, then I guess artists won’t really have a say in how these models get written or not.”

As I was playing around with the model trained on myself, I started thinking about how cheap and easy it was to make. In the short term, we’re going to see fine-tuned for anything you can imagine: there are over 700 models in the Concepts Library on HuggingFace so far, and trending in the last week alone on Reddit, models based on classic Disney animated films, modern Disney animated films, Tron: Legacy, Cyberpunk: Edgerunners, K-pop singers, and Kurzgesagt videos.

Images generated using the “Classic Animation” DreamBooth model trained on Disney animated films

Aside from the IP issues, it’s absolutely going to be used by bad actors: models fine-tuned on images of exes, co-workers, and, of course, popular targets of online harassment campaigns. Combining those with any of the emerging NSFW models trained on large corpuses of porn is a disturbing inevitability.

DreamBooth, like most generative AI, has incredible creative potential, as well as incredible potential for harm. Missing in most of these conversations is any discussion of consent.

The day after we spoke, Ogbogu Kalu reached out to me through Reddit to see how things went with Hollie. I said she wasn’t happy about it, that it felt invasive and she had concerns about it being associated with her name. If asked for permission, she would have said no, but she also didn’t own the rights to several of the images and couldn’t have given permission even if she wanted to.

“I figured. That’s fair enough,” he responded. “I did think about using her name as a token or not, but I figured since it was a single artist, that would be best. Didn’t want it to seem like I was training on an artist and obscuring their influence, if that makes sense. Can’t change that now unfortunately but I can make it clear she’s not involved.”

Two minutes later, he renamed the Huggingface model from hollie-mengert-artstyle to the more generic Illustration-Diffusion, and added a line to the README, “Hollie is not affiliated with this.”

Two days later, he released a new model trained on 40 images by concept and comic book artist James Daly III.

Art by James Daly III (left) vs. images generated by Stable Diffusion fine-tuned on his work

Comments

The “It’s not a collage” argument:

“”[…]He thinks many of the objections to it are a misunderstanding of how it works: it’s not a form of collage, it’s creating new images and clearly transformative, more like “trying to recall a vivid memory from your past.”“

I am so, so very tired of these cheap and lazy “arguments” that try to build their alleged “check-mate” moment on the idea of “people just don’t get AI. It’s not a collage. The data is not stored. It’s parameters.”, misleading people who are less tech-savvy into believing them whereas this point of theirs does nothing to actually support their position.

The point is: It doesn’t matter.

It does not matter that AI does not patch actual pixels from actual images together into a collage to determine whether using data by other people as training and validation data to build an AI model is legal, at least from a commercial perspective.

It’s true, AI does not store the images. It analyzes them, derives patterns, rules, relationships etc., and stores its analysis results as abstract mathematical parameters.

It does not change the fact that you would not be able to build your tool without the data.

It is completely irrelevant that the data is discarded.

It is still a crucial component in your process of creating your software. Without it, it would literally not be possible.

This means, you are dependent on the data to build your tool.

But if you depend on the property of other people then you need to get their permission to use it for commercial purposes and reimburse them if they demand it.

It’s not “transformative”. It is not “reliving a vivid memory from the past”. This has to be the most misleading and incorrect “assessments” I have read on this issue.

Data is literally the heart of AI. Good data is one of the biggest treasures for AI creation. It is immensely valuable. That is why big tech companies are obsessed with collecting it and pay millions to acquire it. Sometimes they even buy an entire vacuum company (Amazon buying Roomba) just to get access to its treasure trove of data.

It bears absolutely zero logic, then, that in the case of art AI the very people who create these valuable components should neither have any rights to their property, any say in how their property is used and any rights to compensation.

(“If it is [transformative], then I guess artists won’t really have a say in how these models get written or not.”)

(Based on his comments I do not take it as him only wanting to use AI art programs, that use copyrighted data without permission and compensation, for non-commercial purposes. He very explicitly talks about tools like Stable Diffusion destroying the industry, i.e., a commercial entity. He also said he wanted to make a comic series, so unless it’s a complete non-profit comic without any intention to ever generate any commercial revenue, he sounds like being in favor of people creating content that can be commercialized with such art AI tools.

Additionally, his emphasis on the “transformative” nature of these tools makes limited sense if it is exclusively applied to non-commercial content, as the debate around transformative work is generally tied to commercial copyright and whether something is infringing it or transformative enough to be legally in the clear.
Also, some programs like Midjourney offer payment plans, directly benefitting commercially from the data they used to train their tool with, and people defend these tools with the very same arguments.)

Fair Use

The fair use argument is also unconvincing in my eyes. At least where I live fair use has little to do with whether actual pictures are used or not. You can already use actual images – pixels – noncommercially under fair use in certain situations. So, saying, it’s fair use because the images are not actually stored makes no sense because it completely misses the nature of what fair use constitutes.

Images not being stored and something being fair use are not actually related.

The deciding factor should be again whether the outcome of whatever you do with that data is commercial or not.

“You can’t copy style.”

Lastly, the “you can’t copy style” argument is another appeal to a completely different topic that does not actually describe the problem at hand. It’s a distraction.

First, what he did is not just “copying a style” or “being inspired” in the way a human artist is. He literally took the property (actual image data) of other people to build a tool, a software, that needs said actual data as a component to be created.

He can’t use mere abstract concepts and ideas represented as thoughts in his brain to train and validate – to build – his model.

Again, he needs actual, “tangible” property. He is dependent on other people’s output, work results, in his own model building process.

(Also, if he wants to create a depiction in the style of Mickey Mouse (not Mickey Mouse himself) he will have to feed his AI model actual images of Mickey Mouse, image material created and licensed by Disney. He is still using “tangible” property (actual data) that is copyright protected in the creation of his model to imitate the Mickey Mouse style.)

I know, uncritical art AI supporters love to say “it’s inspiration!” but unfortunately humans and AI are still different entities. Humans are not AI models that are built by another agent and AI models are not sentient, self-contained minds.

I am aware that the argument of sentience is often dismissed with a scoff. Perhaps because it’s so not technical, rather philosophical and so difficult to define neatly unlike technology and software. Despite naming certain data processing approaches “neural networks” in machine learning, the processes in such neural networks and in the sentient mind(!) of a person are not the same.

(See how I say “mind” instead of “brain”. Although I would also argue that currently the actual brain and artificial neural networks are not (yet) to be regarded as interchangeable either.)

Sentient beings “process” data first and foremost for the self-contained purpose of existing. Unfortunately, we need the input of our eyes and ears (and touch and smell etc.) to navigate this world, to survive.

We cannot shut down our perceptive (“collecting data” with our eyes, ears etc.) and higher-order cognitive processes (“analyzing that data” to classify it as usable information and to derive meaningful conclusions from it) because we are accidentally perceiving (“collecting the data” of) copyrighted material. It is literally not possible for us.

It IS possible for an AI tool because it is just that, a tool. It is used in specific situations and only fulfills its function when employed by a subject.

It is therefore not dependent on constant data collection and processing just to exist as a self-contained sentient being without any other purpose.

We are subjects with no purpose. And as soon as we use our perception and cognition (“data collection and data processing”) for a purpose that could infringe on the right (e.g., property) of other people we DO get problems. If my inspiration comes too close to another person’s original it is very well possible for me to face repercussions and reprimand. People who cite inspiration act as if we could just use “inspiration” as a constant free-out-of-jail card prior to AI when that’s not the case.

Second, “You can’t copy style” itself is already questionable. I am pretty sure if I were to create commercial art that is based on one very distinct IPs I’m not sure if I wouldn’t get legal problems. E.g., if I created something that looks like it could be 1:1 from Kim Possible or Spongebob Squarepants, or which copies Mickey Mouse’s style 1:1 or if I were to draw creatures whose style is indistinguishable from Pokemon – all franchises that don’t just have a “common” style like Superhero Comic style, Disney Princess style etc. – but whose styles are perceived as an integral part of a franchise itself. I can’t speak with certainty of course but I would not be surprised if a big corporation could actually argue successfully that a very unique style was part of a distinct IP and therefore copyright protected.

(I’m not saying I’d agree with this particular decision but it doesn’t seem inconceivable to me.)

In the end, artists are told that they are “unreasonably afraid” or “stuck in the past” because they don’t want to be robbed of their own data for others to build and profit off of tools that are advancing so fast, that it is not unreasonable to assume they will soon be able to replace the very artists whose works they are trained with in certain domains.

Demanding fairness is not being stuck in the past or being anti-tech. I’m sure most artists would feel very differently about art AI if it had been created under different circumstances: without living in a society that cares very little about art and artists and that is hell-bent on optimizing and automating everything economically with no regard for its impact on human.

This is a very long comment but at its core your argument is that because the original property is used as training input the artist must grant permission and be paid.

If you, as a human, look at artwork and then create art inspired by it, is that not the same?

Information flows from the artists brain onto their canvas into your eyes into your brain onto your canvas.

One can argue that machine learning models are fundamentally different from humans and therefore different rules apply, but the information flow is comparable.

Rational arguments can be made for your position and against it, it is not as clear cut as you argue.

@Sebastian

“If you, as a human, look at artwork and then create art inspired by it, is that not the same? One can argue that machine learning models are fundamentally different from humans and therefore different rules apply, but the information flow is comparable.”

I have already laid out my argument why it is not the same:

“First, what he did is not just “copying a style” or “being inspired” in the way a human artist is. He literally took the !property! (actual image data) of other people to !build! a tool, a software, that needs said !actual data! as a !component! to be created.

He can’t use mere !abstract! concepts and ideas represented as thoughts in his brain to train and validate – to !build! – his model.

Again, he needs actual, “tangible” property. He is dependent on other people’s output, work results, in his own model building process.

(If he wants to create a depiction in the !style! of Mickey Mouse (not Mickey Mouse himself) he will have to feed his AI model actual images of Mickey Mouse, image material created and licensed by Disney. He is still using “tangible” property (actual data) that is copyright protected in the creation of his model to imitate the Mickey Mouse style.)”

And:

“Sentient beings “process” data first and foremost for the !self-contained! purpose of existing. Unfortunately, we need the input of our eyes and ears (and touch and smell etc.) to navigate this world, to !survive!.

We !cannot! shut down our perceptive (“collecting data” with our eyes, ears etc.) and higher-order cognitive processes (“analyzing that data” to classify it as usable information and to derive meaningful conclusions from it) because we are accidentally perceiving (“collecting the data” of) copyrighted material. It is literally not possible for us.

It !IS! possible for an AI tool because it is just that, a !tool!. It is !used! in specific situations and only fulfills its function when employed by a subject.

It is therefore not dependent on constant data collection and processing just to !exist! as a self-contained sentient being !without any other purpose!.

We are subjects with no purpose. But !as soon as we use our perception and cognition! (“data collection and data processing”) !for a purpose! that could infringe on the right (e.g., property) of other people we DO get problems. If my inspiration comes too close to another person’s original it is very well possible for me to face repercussions and reprimand. People who cite inspiration act as if we could just use “inspiration” as a constant free-out-of-jail card prior to AI when that’s not the case.”

Sebastian’s reply is a classic one, but the Adpah already pointed out that human minds and AI are not the same. Training data is as fundamental to these programs as the code which reads the training data. People are willing to respect copyright (and pay for) that code — why aren’t they willing to do the same for the training data itself?

@Sebastian

“This is a very long comment but at its core your argument is that because the original property is used as training input the artist must grant permission and be paid.”

Yes, just as you would need to in any other scenario where an image is used. Image data is not an abstract concept. It is a concrete entity or output that somebody has ownership over.

And in case you want to argue “the image data is discarded” point again as to why this situation should be judged differently, I have already addressed that in the original comment, so I’m not going to repeat it again.

Data is one of the most important components of AI. AI stands and falls with it. AI is a tool, and in a commercial context a product. That is why data used to build it should be treated as components of product development (commercially speaking) and not as a misplaced analogy of sentient human inspiration (which as mentioned above also has its boundaries and is not a free get-out-of-jail card for any “inspired” human creation).

To argue that the creators of this crucial component of AI development should not have any say on it, any claim to their own work and output, in the context of commercially applied (art) AI model building – that everybody in the development chain can profit and decide – except those who provide the actual data – is not rational, it’s just vile.

People will like to draw “like Disney” and if the original data is “behind a paywall”, then they pay someone to draw 32 or 64 Disney “like” pictures. They can pool their resources to do so; there will be a cottage industry of cheap global artists that are able to clone any style. Then the AI is fed the alternative images. Those will never be published, but the AI data is. Since style isn’t copyrightable, this ends exactly here.

There are already music generation AIs that are fed the whole Mozart catalog and they create wild music that sometimes sounds like Mozart. You can correctly license the note sheet for dollars since you will never recreate the songs. But the result might sound like Michael Jackson wrote the song. This is just the beginning. Logically nothing prevents us creating whole movies this way, with actors that don’t exist. Just a matter of computer power.

@Sebastian

Another point I forgot to mention is that even on a purely functional level human inspiration/”data processing” and AI data processing are not the same.

Our brain, our “neural network”, works differently from AI. Hence, I’ll argue that your point that they are essentially the same (“artist sees image, image is processed in brain, processed information flows into the hand”) is incorrect.

Human perception and information processing is fundamentally holistic. We are not able to process something “information unit by information unit”: pixel by pixel. Bit by bit. Etc. Unless you are one of the very very few exceptional savants on this planet you cannot process that much information and consciously retrieve it.

We can only ever soak up a holistic impression at best. AI on the other hand is able to process images pixel by pixel. Hence it is capable of deriving patterns, rules and relationships on a pixel-by-pixel basis. Something (most) humans would never be able to do.
AI models are reality “scanners”. We do not have the ability to “scan” data like that.

Another aspect that is often ignored is the influence of emotion and attention on human perception and memory formation. As these factors thoroughly moderate said cognitive processes our “data collection” (perception), “data storing/representation” (memories) and “data processing” work very differently compared to AI.

Attention I probably don’t have to explain much but depending on the level of attention you dedicate to a given piece of information the better or worse – the more complete or incomplete – you will process and remember it. This fundamentally determines how well you are able to form mental representations of external information, incl. the works of other artists. We are a lot more prone to faulty, incomplete and incorrect representation of other artists’ work than AI. This will influence how we process their data (what information we have to work with) and the quality of processing (how good our transformations and derivatives will be).

AI does not have that problem. AI is not stifled by lacking attention. Its performance does not fluctuate based on varying attention levels, a factor that in humans can lead to performance differences within the same hour.

Emotions are lot more interesting though. First, emotional states can color your interpretation of the data you perceive and memorize, giving you a unique version of reality that differs a) from the version of reality of another person AND b) your own version of reality if you were in a different mood.

AI’s perception and processing of data is not altered by emotional responses. As long as the labels that are assigned to the data are stable its “perception” of reality (input data) will always remain the same (regardless of whether this “reality” – the labels – are truthful or accurate). And for the AI to have a different perception of the same data, external agents have to change the labels. AI doesn’t change its perception of the data itself based on any personal feelings.

Emotions can also influence memory. If you form a memory in a certain emotional state you are more likely to remember it (“recall the data”) if you are in the same state again. On the flip side, if you are not in that state your memory performance might decrease.

However, memory is very important to process data and create transformative content: You need to activate memories first in order to analyze and process them to derive any transformed and evolved creative and logical conclusions from it.

AI’s “memory” is not influenced in this way. Its recall performance is not moderated by emotions. At this point in time, people are not even sure whether an AI model can forget at all (they might have to be retrained from scratch if you don’t want certain data to be included).

The most extreme (but also the most illustrative example) of emotions influencing your memory are traumatic experiences, often leading to very fragmented, anachronistic, highly detailed but completely disordered memories. As a result, people can often not actually remember what happened but can at the same time relive very intense experiences of isolated details.
At the same time, there is emerging evidence that victims of trauma may actually form less trauma-typical memories if they are immediately distracted with a cognitive task afterwards –> the emotional response is altered, hence the memory formation process is altered. (I remember a study where people were asked to play Tetris immediately after a traumatic experience and apparently it provided first evidence for better regulated formation of memories.)

There is also very thorough psychological research on information representation in our active working memory. Depending on the input format (numbers, sounds, images, words) you can only keep a limited number of pieces of information in your active memory for short-term recall. E.g., most people can remember 3 – 5 short number strings, but considerably more words. The number of recallable images is different yet again. AI does not function like this at all.

You will most likely forget everything beyond that. Also, your active recall is severely modulated by different factors. For example, the position of the information. Information that is given in the beginning and the end of an information string is generally remembered more easily than information in the middle of that string.

And interestingly enough, once the information you have kept in your active working memory served its purpose you will often not be able to recall it (with the same precision) as you did before. For example, a waitress or waiter that was able to recite your order perfectly while still serving you will most likely not be able to do so an hour later after your order has been completed and is not relevant anymore.

I probably don’t need to mention yet again that AI does not have this limitation. AI does not immediately discard its memories once they are no longer useful. If activated again (e.g., by another prompt) it will be able to recall it.

And that is just our active working memory. Our long-term memory has its very own set of complexities and forgetting plays an incredibly important part in this.

The thing is: software (in general) does not have this problem. If software records information it does not automatically forget due to memory erosion as it happens in humans. It has to receive the order to delete the information. I am not saying forgetting is inherently bad. In fact, psychologists argue that it may fulfil a necessary function for our sentient minds. But when it comes to pure data handling, it’s a weakness software in general, and thus AI in particular, does not have.

Another important aspect is “processing power”. You do not only need perception and memory to analyze and transform information into creative derivatives. Unlike AI we cannot rely on powerful GPUs that will grow even more potent in the future.

Our ability to concentrate but also our ability how much information we are able to hold in our active mind at the same time, how we can represent the data (some people are better at visualizing than others, some people purely think in words etc.) and how fast we can analyze the data (derive conclusions, make connections) determines the quality of our output.
If we are under stress, if we have attention deficit disorders, if we are intoxicated, upset, if we are confronted with multiple stimuli at the same time: all of this can severely impact our processing performance. It can lead to us forgetting needed pieces of activated information more quickly, drawing false (or sub-par/uncreative) conclusions, being slower and less efficient or biased.

Sometimes we have literal art block or need to wait till inspiration strikes.

AI’s performance does not suffer in a comparable and unpredictable way. AI does not have art block. AI does not need to wait for inspiration. AI is not stressed. AI is not biased due to internal factors. (As mentioned before it’s biased by external faulty inputs in the form of labels.)

AI’s output performance is consistent. If more people access it or it has to handle bigger amounts of data, then it might have longer processing times. But it will not be as prone as humans are to incomplete and “faulty” data analysis, nor do any flaws in the final output of an AI stem from the same processes as for humans.

In short: AI’s whole analysis process is completely different.

Lastly, you are also ignoring the idea-output-gap. AI can create a perfect output of what it internally computed.

Humans will often not be able to create a perfect replication of the visual representation in their mind. I might be able to think up the sickest character design or composition in my head but as soon as I start putting it on paper it will become inaccurate. An approximation at best. And sometimes, the image on the canvas strays away more and more compared to the (initial) image in the artist’s mind. It is a very flexible, dynamic process that is often unpredictable or at least hard to control. This effect is even bigger if you are not as skilled yet.

The same applies for copying (e.g., still life, portrait): You might be able to perfectly represent the information you see in front of your eyes in your head but as soon as you try to convert this information into actual output executed by your hand inaccuracies will most likely occur.

The eye-brain-hand relationship is not as simple as you described it in your initial comment. There are a lot of disruptive factors in the human process that AI does not have to grapple with.

Adaph you need to learn to pare down your arguments! However, I agree with you. One thing I’d say, a rephrasing and expansion of your argument:

AI doesn’t patch pixels from actual images together, it’s true. But I argue that storing the data of the images it learns from as sets of parameters instead of pixel data isn’t fundamentally different. Vector graphics don’t store images the same way as raster images but it’s still the image. The can basically recreate the images it can trained on if you prompted it correctly.

Anyone saying it’s no different from people learning from other artists and imitating their styles is wrong. Humans learn by understanding the subject. AI don’t understand anything. When I am thinking about how to do something in another artist’s style, I am thinking about that artist’s style, how they do lines, or shading, or colours, or form. AI isn’t doing that, it’s just recreating parameters that it has lifted — not learned — by processing terabytes of data of other people’s work. I emphasize “not learned” because the AI doesn’t learn like humans do at all. There’s no understanding going on. The AI does not understand anything.

Your argument is based on the idea that Intellectually Property is a real thing but it is not. There is no *property* involved no matter how often you say it. That doesn’t mean I disagree with the existence of copyright or patents, but terminology matters in these arguments.

All of this argumentation is ultimately futile. The technology exists, and it will be used. The bell can’t be unrung.

There’s a big gap in your understanding of how stable diffusion works. It is absolutely transformative in at least the literal sense, as what sd learns is the relationship between vectors and noise reduction patterns, and it’s not a 1:1 relationship even on a fine-tune (which is already heavily influenced by the base stable diffusion model).
Every additional image trained does not”add” a new feature to the existing checkpoint, it “only modifies the existing weights” of those vectors. That’s why distributive training isn’t practical, you can’t synchronize the whole model for every single image outside of a data center environment.

Technically, there’s no law against accessing pictures posted publicly on clearnet. They’re posted for the explicit purpose of being accessible. If you want to add terms and conditions to accessing your works, that’s your choice and I respect that. If you don’t want your name used in a training set, I think that’s perfectly reasonable although I have no idea if you have any actual legal right to that. Right now, the law isn’t built to handle this situation, and that’s a problem.

This is a legal gray zone, but it also seems to be an emotionally charged issue that a lot of artists are missing the point on. You’ve just been given an incredible new tool for your own work. Be it canvas and inks, tablet and illustrator or simple diffusion – your art isn’t worse because other people can make imitations.

On a philosophical level, you’re right that if intellectual property law were a singular body of law that extended to all the limits of creative output, it wouldn’t be right to add copies of 30-plus works by an artist, and output more pictures in the style of that same artist.

The legal reality, though, at least in the U.S., is that there is ironclad separation of copyright and trademark law. Copyright is only what you do with copies of the individual pictures, and if your output picture isn’t recognizable as a copy of any one input picture, fair use probably is going to cover, at least for non-commercial use, copying those original pictures from the Internet onto your hard drive.

Your argument blurs the distinction between copyright and trademark law in a way that those bodies of law can’t currently accommodate. Trademark doesn’t protect artistic style. That’s why designer clothes that aren’t bespoke one-offs are emblazoned prominently with designers’ trademarks. And they have been since the late 70’s and early 80’s, when the courts ruled that trademarks didn’t protect the artistic style of clothes (or anything) the way that artists now want to protect pictures.

At least in the U.S. Twenty years ago, Congress bridged the gap between patent and copyright law with the Digital Millennium Copyright Act. And big players in the visual arts probably will lobby Congress for a similar law to bridge this gap between copyright and trademark. But that sort of law, because of its corporate backers, will probably make it illegal to create an image in the style of a Disney or Pixar cartoon using copied frame grabs from works for hire. Or a J.J. Abrams or Spielberg movie. Or a Pokémon, like you suggest. But that law will probably still leave out artists like Hollie who work across different publishers almost like a freelancer.

The problem with your logic is that if we apply it equally to all areas it would completely cripple the machine learning industry and all the products it produces. All modern machine learning methods rely on massive datasets collected from human generated content. Take spam detectors for example. They are trained on e-mail data created by people. Should spam detectors pay royalties to all the people whose e-mail they rely on?

All seems very arbitrary. Going back to the original purpose of copyright, “To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries” I tend to lean towards a morality that supports the freedom to reproduce, to mimic and copy.

But surely, others will want to stop one another from such mimicry when there’s a protectionist racket to put keep dollars in one pocket and out of another.

Seems somewhat arbitrary to say your tool can’t duplicate aspects of a drawing but a human is fine to do so.

Why is it arbitrary to say a human can, but a software program can’t, and on an industrial scale? Why are you attributing human rights to a company’s software program, and where is there a precedent to do that? In fact, it’s the opposite of arbitrary if you think about it. It’s a clear and apparent dividing line.

Exactly. Artists have imitated the styles of other artists for generations. Artists have no moral or ethical or legal right to prevent others from doing that. They have the right to protect their TRADEMARKS, they do not have the right to sue others for looking like them. To analogize to the realm of audio production, imagine if any artists could sue (successfully) any other artist for merely sounding like them.

As an artist and a habitual pessimist it’s hard not to see the bleakness of these AIs. I know I’m being dramatic with the following opinion.

All the creative jobs and hobbies of artist, musician, writer, etc, are going to be outsourced and given to these programs in the near future. What little space there is for making a living doing these things will shrink. Technology should be freeing us from tedium work, but instead it’s entrapping us from the freedom to create,

…from YOUR perspective. For me, Stable Diffusion has given me the ability to transform my ideas into actual images without being prohibited by not having drawing talent, money, and time, and like me, the vast majority of humanity will feel that way.

> Technology should be freeing us from tedium work, but instead it’s entrapping us from the freedom to create

You’ll still have the freedom to create. If you enjoy the process of painting, writing, or creating music in its own right, then nothing can take that away from you. What technology will do, and I think this is now inevitable, is make commercial art a much smaller industry. Some kind of artist will still be required, whether it’s to create original art that gets fed into models, or to pick which of the generated images should be used.

The relationship between art and commerce has been evolving since time immemorial, from roving bards to samples and streaming. Yes, anyone with any empathy will feel bad for artists who lose out, just as we feel bad for coal miners or factory workers.

I’m a writer and programmer, and I’m not feeling massively secure about either industry. But I’ll probably continue to be active in both, even if I’m forced to change profession, because I enjoy those endeavours at a fundamental level. I’m pessimistic in the sense that I think this is inevitable, but I’m optimistic because I think it will create opportunities that we haven’t even thought about yet.

The only way your work would be outsourced is if it’s so mechanical and reproducible that it hardly should qualify as art in the first place. Just because you work with images doesn’t make you an artist, just like merely working with sound doesn’t make you a musician. And how do you justify the fact that every single time technology as moved forward, artists who do not adapt are left in the dust studying their classic, outdated method. How many people are actually upset that hand-drawn animation is a niche thing now? Do we blame computer animation for it? Some do. Others recognize that art lives on, artists live on, those who work will keep working, and the sky will stay in the sky.

This is likely to be ruled fair use, at least in the United States. It’s textbook transformativity—as the article points out, the images themselves aren’t being used; the data model is built off of them. In linguistics, copyrighted works are compiled en masse for use in corpora already and this is recognized as fair use.

AI art is the logical end of “information wants to be free”. You can’t have your copyright and end it, too.

I think we should be able to agree that by the laws on the book, AI creations based on style transfer are not copyright violations. The pixels do not have to be the same (i.e. Andy’s experience with Kind of Bloop) for it to be a copyright violation, but our laws are not set up for lifting the _idea_ of how to draw and drawing something else. Our laws on data storage also don’t call this a violation, right now — we’re not storing the raw image, and we can’t possible get the raw image back out of the system, and arguments based on ephemeral storage are kind of weak these days (remember when people used to think that just loading an MP3 into RAM to play it was a copyvio? yeah, we’re not really doing that anymore)

So the argument here seems to be much more about morality/ethics, with a hint of a wish for legal protection. Let’s say this wish is granted and style transfer becomes illegal in approximately the same way that copyright makes straight-up copying illegal. What then? How does one enforce this styleright? Let’s say that some cases are clear — the artist in this example can clearly claim that their style was lifted wholesale and that the model is thus an illegal copying device. Note that even with copyright we do not attack the method but rather the infringing output (remember the Betamax case? is the model in this case like a VCR, or is it an actual tape? is the model _generation_ code like Betamax? these would all have to be litigated).

But in this future, all you need is to train a model with _two_ artists who are similar in style. So if you go after output then a dual-artist model has plausible deniability: any particular output is not provable to be your artist’s style, or is provable to be a remix of styles. So really the only thing to do is to go after the model.

And the code for making models is open source, widely available and used, and is a genie that we are quite unlikely to stuff back in the bottle. Remember DECSS? Besides, all it requires is for someone to train a model and not disclose the origin of content, and now you can’t even prove where it came from, even if it looks mighty familiar.

All this is to say, I sympathize with artists whose creations are being disrupted by models that take their work and can create convincingly similar output in seconds. I think that trying to fight a battle of ethics and morals against something that is literally thousands of times faster and cheaper, armed with no plausible legal defenses, is a terrible way to go out. Getting society to pass laws against style transfer is a decades-long quest with no guarantee of success. Relying on existing art communities to ban AI art will just encourage people to migrate to more permissive spaces. I think that the only choices are to stop doing this kind of creative work, or live to learn with it (perhaps by attaching more value to authentication of your original work and selling that authentication to clients who care about it). I am very sorry if people have to exit this profession after being replaced by machines, but this is hardly the first profession in which this happens, and — actually — hardly the first time this has happened even in this profession.

One more thing to add. One creative way to live with models like this is to actually own your own model. Imagine if an artist doesn’t just create a new drawing or illustration, but instead releases a model for generating new creations in a particular style. If you don’t even release the original work, it’s hard to train a model off of it. You can create an example or two, and I guess people could train a downstream model off of it, but yours will likely be better. Just like you can’t stop _people_ from copying your idea once they see your output, you probably can’t stop people from creating downstream models, but those who want the original will still come to you cause you control the source and can make it better.

Generative AI products based upon training data which is trademarked, copyrighted or patented need their rights defined by the highest court in each applicable jurisdiction.

In the USA, this issue should not be partisan, it is commercial without precedence in religion or culture.

One can expect the winning side to offer the most net benefit to society.

On one hand unrestrained generative AI is much less expensive, more flexible, and more available than custom human creations.

On the other hand, with AI competition and no compensation for creating the required training data, the supply of new human creations will greatly diminish, hurting generative AI in the long run. Generative AI studios would have to hire teams of human creators of training data which is not made public – for ripping off.

In the USA, let’s get this case before our Supreme Court for a decision.

What Adpah said. The issue isn’t output images infringing on input images. It’s about making a software product out of stolen data. You stole specific images to produce a specific result, that you couldn’t get any other way. Or you could, but you chose not to.

Most people get that just because you can find an image on google doesn’t mean you can stick it on a t-shirt and sell it. You will probably get away with it on small scale, but that does not legitimize it on any scale. It’s still profiting off of someone else’s work without compensation. And releasing it publicly means enabling it at all scales.

The finer points of the technology don’t matter. You stole. In order to make something that robs the livelihood of those you stole from.

What articles on AI generated art always seem to ignore are the huge limitations. At least currently, it takes a great many iterations to produce full images of living subjects that are mostly anatomically acceptable. Almost all of the images of people I have generated using the Stability AI model have gross defects, like 3 arms, seven fingers, hands that are melded together, or even limbs that float in the air unattached to anything. Try to make an image with more than one person close together, and the problems multiply. The AI it cannot adequately distinguish between one object and another. Humans riding horses become like mutilated centaurs; two people hugging become conjoined twins.

You have to rely on luck to get usable images, and they may not actually be what you want. At least at present, your ability to describe in detail exactly what you need seems to be limited to about six or seven discrete details, with no real way to alter an image that is “almost right,” and that’s assuming you’re willing to generate A LOT of useless pics.

These are not the kind of circumstances that will allow you to create commercial images for clients, who are usually very specific about their desires, and even more precise about the changes they want to any given result.

If billions of images have been used to create these models, then I wonder just how much training would be required to build a model that could *consistently* generate an anatomically correct group of humans from head to foot in swim wear.

Why bother, when *any* illustrator of reasonable skill can produce such an image in minutes?

There are severe limitations to machine learning that make it a wholly inappropriate technology for commission based commercial work.

Maybe that will change if we’re willing to fry the planet by using massive amounts of energy to continue training machines on quadrillions of images. Or if someone invents a better learning model than the current “throw the kitchen sink” in data at the machine and then see what comes out.

But it’s fun to play with, and acceptable for casual art, or for art that an artist intends to repaint manually.

As a disabled, self-taught artist there is so much opportunity for unique self-expression with AI art. I can train AI using three decades of my own images in different styles and media (realism, cartoon, pen, charcoal, digital, etc.). I can also generate reference “photographs” of subjects such as animals and buildings for free when getting out to take my own reference photographs or paying for copyrighted image is out of reach. It can help overcome art block with inspiring design detail or composition and generate color tests and variations in moments reducing trial and error. For a savvy artist, these tools can streamline workflow and help them spend more time on the expression, composition, and rendering final details. Rather than creating a single brush to generate a single flower in their personal style, they can generate a field of flowers approximately in their style and rework the composition and details to their personal taste.

However, it is disturbing to see any artistic work used without permission. Ideally no artist’s work would be used without permission, even as a reference. This includes being data mining by AI without permission or compensation or statement of usage. Photographers and artists on Flickr, DeviantArt, and other sites allow and disallow reference, alteration, etc. based on artist preference. Some artists and photographers make their work freely available for fair use, commercial free, or commercial licensing. Some don’t. I ask photographers whether I can reference photographs unless an artist had expressly given permission. Some artists don’t. This is rude at the least, even when the work is transformative, to a breech of copyright permissions at the worst when the art heavily references the original image.
Unless laws change (which they may with giants like Disney and Pixar being heavily represented in AI prompts) I believe it will become more a matter of etiquette. Companies utilizing data to train AI bear the most responsibility, individuals training AI without permission would be looked on unfavorably. But the average AI user’s choice to use styles is more akin deciding to repost the original artist’s post instead of reposting the image with credit (one directs more traffic flow to the artist directly). Since AI is currently lacking in many ways, it can actually increase traffic, interest, visibility for various artists. Hopefully that translates into actual increase in hiring, commission, artblock sales, wtc. because artists can’t live on exposure alone. But that remains to be seen.
Collectors, art enthusiasts, and big corporate media conglomerates don’t want a randomized approximations of a similar product. They would rather have a signed original, support artists, or a hire for precise crafted images that fit their vision. Logo makers haven’t put Designers out of work. Quality, precision, and artist’s unique expression and touches are still valued. AI mimics that, but lacks a critical eye for understanding composition, emotion, or responding directly to fine-tuned requests.

AI generated coloring and images can be incredibly helpful tools for disabled artists or artists as they age or otherwise facing decline in output. It can inspire creativity and overcome artblock, act reference materials, generate explorative variations, and materials for rework, “photo-bashing”, and redraw/repaint/paint overs saving artists time and struggle.

Artists and AI both struggle with hands, but so far, only an artist can go in to find tune a hand and rework it to exact specifications with minimal randomization and maximum character, intent, and style.

Much like using mirror projections to trace during the Renaissance or Ctrl-Z for digital art, it is a useful tool that helps save time and may improve quality, but it cannot replace basic understanding, specialized training, or years of practice and hard work. It won’t replace artists, but those who don’t utilize it may be at a disadvantage.

Obviously plagiarism and copyright infringement should be avoided. However, within this new technology and gray areas, I hope artists and public that utilize AI will figure out best etiquette practice. Examples may be including multiple artists within AI prompts to not copy one style exclusively and respecting the wishes of living artists who do not wish to be included. Within AI art communities users are already complaining of Clone Images (exact regeneration of an AI result) which can be avoided by altering prompts, generation steps, collage composition, etc.

It is a scary time for many, but all new technologies present possibilities for the best and worst of human nature. With plenty of doomer editorials already, I’ll point out that AI art may lead to a resurgence in interest in the arts the way electricity led to renewed public interest in science. Increased requests for tailored images, more visits to museums and exhibitions, even more interest in original works and artists with unique styles. It may help the public to understand how much work goes actually goes into original art. For example, while AI can quickly generate dozens of random images, it takes hard work, practice, and understanding of human anatomy to repaint or edit human hands into something less Cronenberg.

Ultimately the core reason to become an artist, to practice and acquire skills, remains unchanged: to create the images we and others imagine and express them in a satisfactory way that makes us fulfilled. AI still hasn’t managed that level of intentionality. Yet.

Wonderful post, and a very interesting comment section.

I am building a program named Illuminarty that tries to detect AI-generated artworks, and the comments here show exactly why I decided to create this program.

I am siding with Adpah here.

The AI models, and thus the generated artworks, are dependent on the source images that are drawn by human artists. This dependency is a very valuable piece of information for potential consumers and many people will be willing to know this information, if available. To take an example from the classic argument of “being inspired”, I believe that many people will be interested in knowing if there was a source image that “inspired” the artwork.

Currently, as the others have mentioned, we do not have the regulations necessary to force the AI artists to show this dependency. This is a market mismatch. I believe there could be two ways to solve this, penalizing those that do not respect the dependency, and providing incentives to those that disclose this info. An example of the former would be the recent lawsuit against GitHub Copilot. Sharing and respecting the dependency between the AI artworks themselves (e.g. copied prompts) could be an example of the latter.

I think both methods are equally feasible and will try to support both movements through Illuminarty. Please let me know if anyone else is interested.

Comments

Leave a Reply Cancel reply