The Lost Penny Files – MidJourney’s Beginning
On September 5, 2023, Brian Penny was permanently banned from Twitter (now called X). This deleted his profile and content, leaving many artists struggling to find important investigations he completed related to Midjourney, Stability AI, Adobe, Shutterstock, deepfake apps, and the rise of generative artificial intelligence.
We reached out to Penny and obtained backups of his original tweets. He agreed to help us write this story to track an essential part of gAI history. Here are the original chat logs from Midjourney’s public Discord server as originally reported by Penny in August 2023.
TL;DR
- Midjourney founder David Holz worked directly with Stability AI CEO Emad Mostaque on the development of both apps.
- The founders discuss many technical, legal, and creative aspects of the development, including a timeline of models used and discussions around pricing models.
- During early development, moderation teams red-teamed the technology by creating illegal imagery, including child sexual abuse material (CSAM).
- Many of the original Google Colab notebooks used in the initial development are still online and publicly available.
- All of the conversations below are available in Midjourney’s public Discord server. Therefore, no caution will be exercised in revealing the names of those involved.
Background
About MidJourney
Midjourney is a San Francisco-based AI service that generates images from text prompts, led by David Holz, co-founder of Leap Motion (now Ultraleap). The company began in closed beta with v1 in February 2022. It entered open beta on July 12, 2022 and is already profitable, with an expected $200 million in revenue for 2023.
Users interact with Midjourney via Discord bot commands, and it became the largest Discord server in Q4 2022. All images are stored in the cloud. There are a range of subscription options available, and users must pay an additional monthly fee for private mode to disable their image generations from being publicly searchable online. As it is not open source, it’s not entirely clear what datasets and technologies Midjourney utilizes, although we can guesstimate based on publicly available information.
The company continually updates its algorithms, releasing new versions every few months. It is known for having better quality images than most of the competition by default, although its strict moderation and usage of copyrighted materials in its datasets (along with encouraging using artist styles in prompts) makes it a lightning rod in the AI industry.
About Stable Diffusion
Stable Diffusion is a text-to-image deep learning model released on August 22, 2022. Developed by the CompVis Group, Runway, and funded by UK-based Stability AI, led by Emad Mostaque. It’s written in Python and works on consumer hardware with at least 8GB of GPU VRAM. The most current version as of this writing is Stable Diffusion XL (SDXL), which is freely available online and can be locally installed.
The model uses latent diffusion techniques for various tasks such as inpainting, outpainting, and generating image-to-image translations based on text prompts. The architecture involves a variational autoencoder (VAE), a U-Net block, and an optional text encoder. It applies Gaussian noise to a compressed latent representation and then denoises it to generate an image. Text encoding is done through a pretrained CLIP text encoder.
Stable Diffusion was trained on the LAION-5B dataset, a publicly available dataset containing 5 billion image-text pairs. It raised $101 million in a funding round in October 2022 and is considered lightweight by 2022 standards, with 860 million parameters in the U-Net and 123 million in the text encoder.
The Penny Files
You may not realize it, but before Midjourney kicked off the generative AI boom, it was a small team working in public in a small Discord server. Here are some screenshots of the public discussions had within that server between David Holz, Emad Mostaque, and several Midjourney developers, including Daniel Russell and Somnai.
It starts February 3, 2023 with developer Somnai referencing the “mutate” function, which was later renamed to “upscale.”
That same morning, MJ developer Jack mentions cc12 as the model being used in MidJourney v1. CC12M is a conceptual caption (CC) dataset from Google Research containing 12 million image-to-text pairs meant for vision and language pretraining. Do not let the name fool you–CC does not mean these images used a creative commons license.
Almost immediately (within a week), we’re introduced to the war-game channel (also referred to as the war room). This is the hidden channel where Midjourney staff and early testers generated disturbing and potentially illegal images related to gore, violence, drugs, politics, and not-safe-for-work (NSFW) CSAM.
There are some deep discussions especially related to the CSAM that could be (and has been, even by that point) created using MidJourney.
Immediately upon learning illegal CSAM was generated by a user, David asks him to send the illegal images to him.
It’s a huge topic of discussion, and David struggles with how to resolve it.
He does understand the gravity, and he mostly seems worried about getting media attention for anything created by his software.
And they are very clear about the distinctions between the NSFW room (legal but distasteful) and the War Room (purposely generating illegal content).
As for why? Well in their dev team’s own words, it’s because of the way the system is built (a series of yes/no questions) and the combinatory nature of language. In short, if it knows what a child is and what porn is, it can combine them. The obvious answer is to remove the porn, but they’re working harder to do it via the CLIP model in various pass/fail ways that work a lot better today (September 8, 2023) than they did a year ago (September 8, 2022).
It’s not long before we have Emad Mostaque joining the conversation to discuss the future with LAION 5B, which uses 5.8 billion (the name infers 5 billion, while Emad rounds up to 6 billion) as early as February 11, 2022. And it’s brutally obvious from the start that neither Emad nor David have any regard for intellectual property.
As early as February 2022, David Holz responds to a question about training on ArtStation by stating, “It looks like we will have commercial deals for special datasets.”
David also says in mid February 2022 that open sourcing Midjourney’s models is important to him, as is open-source software in general. As we’ve learned from Stable Diffusion and the Twitter algorithm, however, open sourcing without the weights does little to help understand the training involved.
Soon an entire thread is created to discuss the possibilities of utilizing the LAION datasets versus others. We know from living in the future that they ultimately chose to integrate LAION, but there are also some other datasets in the mix.
In fact, David reiterates multiple times that they will use private datasets, showing that he indeed understands the difference between “publicly available” data and data that is part of the “public domain.” Disney movies, for example, may be viewed in public, but you can’t just go into business selling Disney content without a license from Disney to do so.
Emad even explains that his team previous worked on Marvel movies, along with other video game and movie studios on AI voices.
During these early stages, the technical aspects are discussed very heavily. David needs user feedback, and the users have a lot of questions about what is happening under the hood and what’s capable. Both David and Emad (along with the development team following their lead) are very forthcoming about how everything works, including the refinements on the CLIP models.
Because they’re moving in real time and building while people continue learning about the software and joining the server, the conversation moves between a lot of different channels and threads. Here’s some more in #off-topic:
This is where Emad discusses how he’s already training on the LAION 5B model versus the Google CC models they’re using at the time. Keep in mind that this is Stability AI’s CEO discussing this in Midjourney’s server as they worked together on the development of both.
By March, they’re fully in the weeds, and we get a lot of peaks into what’s to come over the next 18 months. March is also when the crowd is big enough that Emad and David fully introduce themselves.
By April 2022, Brad Templeton enters the chat. If you’re unfamiliar, Brad is the former chairman of the Electronic Frontier Foundation (EFF), so he knows a thing or two about technology, data, and IP laws. The discussion initially starts around monetization and business models. It starts as a debate on whether Midjourney is selling GPU time (he calls it CPU) or images.
During the conversation, David maintains an almost childlike innocence by insisting he’s just playing games and having fun and that’s what he wants to preserve. It comes off as shallow and hollow (much like his promises to open source) given what we now know about his company’s $200 million revenue for 2023.
And when pressed on his fantastical vision of the future with realistic business advice, David falls into e/acc talking points while Emad jumps in to help him describe how their junk image creator will help humanity.
And after all of that talk, they finally get into the nitty gritty of the copyright problem that Midjourney will have.
It takes a minute, because David and Emad are in full apples to oranges comparison mode talking about how their computer “knows” things, but Brad inevitably pulls them into the meat and potatoes of the issue.
But as you can imagine, David is not happy with this serious legal talk blowing his buzz. Deep down he already knows (having discussed commercially licensing datasets two months prior in February) that he is doing something wrong. He knows that Brad Templeton is correct in what he’s saying. But he wants to blow it off and focus more on playing.
The talk of being an independent research lab is difficult for many to stomach. We now know how much money both Stability AI and Midjourney have made/raised, and it’s clear these are both billion-dollar companies that were built on the back of stolen labor from millions (potentially billions) of artists, photographers, and graphic designers.
By August 2022, the Midjourney Discord server would be full of millions of people on well on its way to becoming the largest server on Discord. Emad would release his competing Stable Diffusion app, and the generative AI image market would soon be in full swing.
It all started in this small Discord server where a small group of developers had the access to the right tools at the right time with a huge cache of compressed data scraped and stolen from across the internet.
If you want to support independent journalism and research, please consider an optional paid subscription plan.