The recent disclosure of the Midjourney AI image generation models’ non-consensual training of over 16,000 artists’ names has sparked a flurry of reactions! A leaked public Google spreadsheet, viewable in the Internet Archive, mirrors a list found in a 2023 class-action lawsuit against Midjourney, indicating its credibility. Artist Jon Lam from Riot Games shared screenshots from a chat where Midjourney developers discussed using artist names and styles from Wikipedia and other sources, and a statement by Midjourney CEO David Holz celebrating the addition of 16,000 artists to the training program.
One of the developers made an ironic comment on bypassing copyright issues, saying “All you have to do is just use those scraped datasets and conveniently forget what you used to train the models. Boom legal problems solved forever.” But, as we all know, copyright law is still poorly defined in the era of AI, and it’s very hard for artists to prove what pieces of their work have been used.
When a recent copyright case against Midjourney, Stability AI, and DeviantArt was dismissed, Federal Judge Orrick identified several defects in the way the claims were framed, particularly in their understanding of how AI image generators function. He noted that it’s essentially impossible for artists to prove that their work is verbatim in the model, and it’s also hard to prove that the model’s outputs replicate copyright material sufficiently.
The reaction process is now underway, and it will be interesting to see how the plaintiffs amend their claims to more accurately represent the operation of these AI models and demonstrate that Midjourney’s outputs are substantially similar to their original artworks. Plus, the implications of Midjourney and other AI companies’ use of the LAION-5B dataset, a compilation of 5.85 billion internet-sourced images including copyrighted content, further emphasize the need for a clearer understanding of copyright law in the era of AI.
It’s outrageous and disheartening to see AI developers openly mock copyright as a trivial matter, as it doesn’t do much for the industry’s contentious image. We must remember that copyright is in place to protect people’s hard work, skills, and livelihoods, and the legal cases submitted against Midjourney et al. will be a major step forward in understanding the implications of copyright in the AI era.