Skip to content Skip to footer

Midjourney ‘Styles’ Leak Controversially Reveals 16,000 Artist Names

The legal battle surrounding the use of AI models in training off of copyrighted material has been heating up, and it’s been revealed that over 16,000 artist names have been linked with the non-consensual training of Midjourney’s image generation models! This disclosure was partially made in a 2023 class-action lawsuit and also in a recently leaked public Google spreadsheet, part of which can be viewed in the Internet Archive here.
The spreadsheet is believed to be sourced from Midjourney’s development team and is believed to encode artist work as ‘styles’ in order to efficiently recreate work in their style. This list mirrors the one found in the 2023 class action lawsuit against Midjourney, indicating its credibility, and also squares up with leaked Discord chats from Midjourney developers, which allude to the artist’s work being mapped to ‘styles.’
Jon Lam, a programmer and artist, shared screenshots from a Midjourney Discord chat where developers discuss using artist names and styles from Wikipedia and other sources and wrote, “Midjourney developers caught discussing laundering, and creating a database of Artists (who have been dehumanized to styles).” He also shared videos of lists of artists, including those used for Midjourney styles and another list of ‘proposed artists.’ Numerous X users stated their names were on these lists.
One screenshot appears to show a statement by Midjourney CEO David Holz celebrating the addition of 16,000 artists to the training program. Another shows a Midjourney developer discussing that you have to “launder it” through a “Codex,” though, without context, it’s tough to say whether this is referring to artists’ work. Others (not Midjourney employees) in that same conversation refer to how processing artwork through an AI model essentially disembodies it from copyright. One says, “all you have to do is just use those scraped datasets and the conveniently forget what you used to train the model. Boom legal problems solved forever.”

The legal processes surrounding copyright are slow and the AI training process and the technical process involved in generating AI outputs (e.g., text or images) from user inputs challenge the nature of intellectual property law. It’s hard to prove that AI models are definitely trained on copyright material and hard to prove their outputs replicate copyright material sufficiently.
There’s also the issue of accountability – AI companies like OpenAI and Midjourney at least partly used data harvested by others rather than harvesting it themselves. So, would it not be the original data scrapers that are liable for infringement?
In the context of the recent situation at Midjourney, Midjourney’s models, like others, will always reproduce a mixture of works contained within its data. Artists can’t easily prove what pieces they’ve used.
For example, when a recent copyright case against Midjourney, Stability AI, and DeviantArt was dismissed (it’s since been resubmitted with new plaintiffs), Federal Judge Orrick identified several defects in the way the claims were framed, particularly in their understanding of how AI image generators function.
The original lawsuit alleged that Stability AI, in training its Stable Diffusion model, stored compressed copies of the images. Stability AI refuted this, clarifying that the training process involves extracting attributes such as lines, shades, and colors and developing parameters based on these attributes rather than storing copies of the images.
Orrick’s ruling highlighted the need for the plaintiffs to amend their claims to more accurately represent the operation of these AI models. This includes a need for a clearer explanation of whether the claim against Midjourney was due to its use of Stable Diffusion, its independent use of training images, or both (as Midjourney is also being accused of using Stability AI’s models, which allegedly use copyrighted works). Another challenge for the plaintiffs is demonstrating that Midjourney’s outputs are substantially similar to their original artworks.

The legal battle is still alive and well, with the court denying AI companies’ most recent attempts to dismiss the artists’ claims. Not to mention, Legal cases submitted against Midjourney and co. also emphasized their potential use of the LAION-5B dataset – a compilation of 5.85 billion internet-sourced images, including copyrighted content.
Stanford recently blasted LAION for containing illicit sexual images, including child sex abuse and various sexist, racist, and otherwise deplorable content – all of which now also ‘lives’ inside the AI models that society is starting to depend on for creative and professional uses.
The long-term implications of that are hotly debated, but the fact these AIs are possibly firstly trained on stolen work and secondly on illegal content doesn’t shed positive light on AI development in general.
Midjourney developer comments have been widely lambasted on social media and the Y Combinator forum. It’s very likely that 2024 will cook up more fiery legal debates, and the angry mob is perhaps closing in.

This is an incredibly concerning situation and shouldn’t be taken lightly – the rights of these artists must be defended and the use of AI models in training off of copyrighted material needs to be addressed! It’s time for us to take action and ensure that those who have been wronged are given the justice they deserve. Let’s make sure we do our part to make sure these companies are held accountable and we can help protect the rights of artists everywhere!

Leave a comment

0.0/5