Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.

The Lost Penny Files – MidJourney’s Beginning

On September 5, 2023, Brian Penny was permanently banned from Twitter (now called X). This deleted his profile and content, leaving many artists struggling to find important investigations he completed related to Midjourney, Stability AI, Adobe, Shutterstock, deepfake apps, and the rise of generative artificial intelligence.

We reached out to Penny and obtained backups of his original tweets. He agreed to help us write this story to track an essential part of gAI history. Here are the original chat logs from Midjourney’s public Discord server as originally reported by Penny in August 2023.

TL;DR

  • Midjourney founder David Holz worked directly with Stability AI CEO Emad Mostaque on the development of both apps.
  • The founders discuss many technical, legal, and creative aspects of the development, including a timeline of models used and discussions around pricing models.
  • During early development, moderation teams red-teamed the technology by creating illegal imagery, including child sexual abuse material (CSAM).
  • Many of the original Google Colab notebooks used in the initial development are still online and publicly available.
  • All of the conversations below are available in Midjourney’s public Discord server. Therefore, no caution will be exercised in revealing the names of those involved.

Background

About MidJourney

Midjourney is a San Francisco-based AI service that generates images from text prompts, led by David Holz, co-founder of Leap Motion (now Ultraleap). The company began in closed beta with v1 in February 2022. It entered open beta on July 12, 2022 and is already profitable, with an expected $200 million in revenue for 2023.

Users interact with Midjourney via Discord bot commands, and it became the largest Discord server in Q4 2022. All images are stored in the cloud. There are a range of subscription options available, and users must pay an additional monthly fee for private mode to disable their image generations from being publicly searchable online. As it is not open source, it’s not entirely clear what datasets and technologies Midjourney utilizes, although we can guesstimate based on publicly available information.

The company continually updates its algorithms, releasing new versions every few months. It is known for having better quality images than most of the competition by default, although its strict moderation and usage of copyrighted materials in its datasets (along with encouraging using artist styles in prompts) makes it a lightning rod in the AI industry.

About Stable Diffusion

Stable Diffusion is a text-to-image deep learning model released on August 22, 2022. Developed by the CompVis Group, Runway, and funded by UK-based Stability AI, led by Emad Mostaque. It’s written in Python and works on consumer hardware with at least 8GB of GPU VRAM. The most current version as of this writing is Stable Diffusion XL (SDXL), which is freely available online and can be locally installed.

The model uses latent diffusion techniques for various tasks such as inpainting, outpainting, and generating image-to-image translations based on text prompts. The architecture involves a variational autoencoder (VAE), a U-Net block, and an optional text encoder. It applies Gaussian noise to a compressed latent representation and then denoises it to generate an image. Text encoding is done through a pretrained CLIP text encoder.

Stable Diffusion was trained on the LAION-5B dataset, a publicly available dataset containing 5 billion image-text pairs. It raised $101 million in a funding round in October 2022 and is considered lightweight by 2022 standards, with 860 million parameters in the U-Net and 123 million in the text encoder.

The Penny Files

You may not realize it, but before Midjourney kicked off the generative AI boom, it was a small team working in public in a small Discord server. Here are some screenshots of the public discussions had within that server between David Holz, Emad Mostaque, and several Midjourney developers, including Daniel Russell and Somnai.

Discord profiles for Daniel Russell, Emad Mostaque, and David Holz
Discord profiles for Daniel Russell, Emad Mostaque, and David Holz

It starts February 3, 2023 with developer Somnai referencing the “mutate” function, which was later renamed to “upscale.”

Screenshot of MidJourney Discord "Welcome to #Discussion!"Somnai — 02/03/2022 1:31 AM Could mutate reply to the original image so it's easy to see the change?"
Screenshot of MidJourney Discord “Welcome to #Discussion!”

That same morning, MJ developer Jack mentions cc12 as the model being used in MidJourney v1. CC12M is a conceptual caption (CC) dataset from Google Research containing 12 million image-to-text pairs meant for vision and language pretraining. Do not let the name fool you–CC does not mean these images used a creative commons license.

jack — 02/03/2022 10:13 AMseems like either cc12 or the current set of tweaks really like adding extra spindly tendrils to objects
jack — 02/03/2022 10:13 AM
seems like either cc12 or the current set of tweaks really like adding extra spindly tendrils to objects

Almost immediately (within a week), we’re introduced to the war-game channel (also referred to as the war room). This is the hidden channel where Midjourney staff and early testers generated disturbing and potentially illegal images related to gore, violence, drugs, politics, and not-safe-for-work (NSFW) CSAM.

Somnai — 02/09/2022 10:55 PMHow good are visual filters for images these days? Could you make the channel not have any previews and assess final images before posting? DavidH — 02/09/2022 10:55 PM they're pretty good at like porn / not porn clip can be used for filters like gore / not gore honestly, last time we did a war-game channel most of the worst images used obviously bad words we need to do another war game this week where we see how easily people can make bad images without being allowed to use bad words it's a interesting game
David Holz discussing the war-game channel where illegal images are made

There are some deep discussions especially related to the CSAM that could be (and has been, even by that point) created using MidJourney.

𝒿𝑜𝓊𝓁𝑒 ⚡ — 02/09/2022 10:56 PMThis seems solvable with extracting sentiment from the phrase, if (sexy words) and (anything child), quit() DavidH — 02/09/2022 10:56 PM yea once we get to fuzzy filters we can in theory do that we didn't implement a fuzzy filter yet if anyone wants to help us do that we would be grateful and happy to use it i think it's intellectually very interesting though to see if we can get it to make bad images without using obviously filterable words i've only seen one example so far of someone doing this and it was very much by mistake deKxi — 02/09/2022 10:58 PM ive had it happen a few times accidentally in ⁠No Access with what I thought was more innocent of a phrase than CLIP
David discussing possibilities of censoring CSAM February 9, 2022

Immediately upon learning illegal CSAM was generated by a user, David asks him to send the illegal images to him.

deKxi — 02/09/2022 10:58 PMive had it happen a few times accidentally in ⁠No Access with what I thought was more innocent of a phrase than CLIP DavidH — 02/09/2022 10:58 PM maybe you can send those to me? danielrussruss — 02/09/2022 10:58 PM I think it's relatively easy if you describe a scene, but I guess we'll find out next time 𝒿𝑜𝓊𝓁𝑒 ⚡ — 02/09/2022 10:58 PM what would need to be done
David Holz asking a user to send him illegal CSAM

It’s a huge topic of discussion, and David struggles with how to resolve it.

DavidH — 02/09/2022 11:00 PMat some point we're going to have to do a image filter, but we should only filter images for things where we 'cant' easily filter them for their word content so far when i tell people to "make it draw bad images on purpose" 98% of things are obviously filterable via wrods and honestly a big issue here is filtering bad intent versus bad content i mean bad content too to the extent that it can make people feel physically ill or mentally disturbed Somnai — 02/09/2022 11:01 PM Problem comes from bad actors learning from each other the same way we do when perfecting prompts when you open it up more DavidH — 02/09/2022 11:02 PM yea but like....
David Holz discussing CSAM censorship in the early days of Midjourney

He does understand the gravity, and he mostly seems worried about getting media attention for anything created by his software.

Somnai — 02/09/2022 11:18 PMnsfw channel, cool whatever ill just post there if i want to try something edgy war-room, im going hard danielrussruss — 02/09/2022 11:19 PM I think it's scary for David to think about leaving that around while he's asleep 😄 DavidH — 02/09/2022 11:19 PM it should be scary right? 😅 we already got a buzzfeed article by mistake lol
Somnai states his intents for the war room, and David expresses regret at a Buzzfeed article

And they are very clear about the distinctions between the NSFW room (legal but distasteful) and the War Room (purposely generating illegal content).

𝒿𝑜𝓊𝓁𝑒 ⚡ — 02/09/2022 11:20 PMwar room can be a special moment where we try to break the system uncensored is just people not doing anything illegal deKxi — 02/09/2022 11:20 PM Ultimately this is why TOS and moderation exists, while people who are 'in' understand the context and nuance, its not the 'in' people that drag the media in with a sensationalized wrong take DavidH — 02/09/2022 11:20 PM i like the aspirational goal of a 24/7 room with moderation for people to poke the boundaries in some productive and positively vibed curious way
Reiterating the difference between NSFW and illegal

As for why? Well in their dev team’s own words, it’s because of the way the system is built (a series of yes/no questions) and the combinatory nature of language. In short, if it knows what a child is and what porn is, it can combine them. The obvious answer is to remove the porn, but they’re working harder to do it via the CLIP model in various pass/fail ways that work a lot better today (September 8, 2023) than they did a year ago (September 8, 2022).

danielrussruss — 02/11/2022 5:14 PM @DavidH / @𝒿𝑜𝓊𝓁𝑒 ⚡ (gosh is there any easy way to type your name?): I think we'd want to use the sort of CLIP similarity comparison that it was intended for, contrastive, versus just looking at the raw similarity between a single word and nsfw So provide a list of potential labels: SFW, NSFW, Suggestive, Graphic Injury Here's how powerful it can be if you use contrastive labels: Image untitledo 𝒿𝑜𝓊𝓁𝑒 ⚡ — 02/11/2022 5:18 PM so if match ["child pornography", "other illegal things"] && confidence > threshold, quit() danielrussruss — 02/11/2022 5:18 PM I believe this is the sort of system that they use to calculate tags at https://same.energy/ (Same Energy uses a model very similar to CLIP) Same Energy Same Energy | Visual Search Engine Find beautiful images. Image ColtonD — 02/11/2022 5:19 PM that general stuff is already filtered out of Clip though right? danielrussruss — 02/11/2022 5:19 PM No ColtonD — 02/11/2022 5:19 PM oof I mean i know NSFW isnt but damn danielrussruss — 02/11/2022 5:19 PM I mean, language is infinite and combinatory 𝒿𝑜𝓊𝓁𝑒 ⚡ — 02/11/2022 5:19 PM i get the feeling that these systems aren't.... beat me to it danielrussruss — 02/11/2022 5:19 PM if a model knows what a child is, and what porn is... 𝒿𝑜𝓊𝓁𝑒 ⚡ — 02/11/2022 5:19 PM ^

It’s not long before we have Emad Mostaque joining the conversation to discuss the future with LAION 5B, which uses 5.8 billion (the name infers 5 billion, while Emad rounds up to 6 billion) as early as February 11, 2022. And it’s brutally obvious from the start that neither Emad nor David have any regard for intellectual property.

Emad — 02/11/2022 9:34 AMNew LAION will be 6b images, good base we're putting together now :peepo_lightstick: Should probably figure out somewhere else to put it than AWS given egress :ultraberk: Playing with image correction, which Pratt adapted from bots-1 do you like? Soft or more rugged
Emad — 02/11/2022 9:34 AM
New LAION will be 6b images, good base. we’re putting together now

As early as February 2022, David Holz responds to a question about training on ArtStation by stating, “It looks like we will have commercial deals for special datasets.”

DavidH — 02/11/2022 2:32 PMIt took a few months Yah we can definitely train more models, we just need more people working on it ColtonD — 02/11/2022 2:33 PM you planning on training on artstation or are you steering clear of potential legal stuff there lol DavidH — 02/11/2022 2:33 PM It looks like we will have commercial deals for special datasets Emad — 02/11/2022 2:33 PM I should probably get everyone together given we're lining up cfg, dall-e, nuwa, clip and all sorts of models lol. Maybe best to roadmap it all
David Holz admitting they are working on commercial deals for special datasets

David also says in mid February 2022 that open sourcing Midjourney’s models is important to him, as is open-source software in general. As we’ve learned from Stable Diffusion and the Twitter algorithm, however, open sourcing without the weights does little to help understand the training involved.

DavidH — 02/11/2022 2:36 PMWe're gonna open source a lot we just don't have exact timelines for what and when the market is too dynamic to figure that out ColtonD — 02/11/2022 2:37 PM Better to have a product first then open source it than to open source it and have someone beat you to making a product haha DavidH — 02/11/2022 2:37 PM But I've open sourced like 500k-3mil lines of code in the last decade (depending on how you count it) so open source is def important to me
David discussing open sourcing MidJourney’s models (which they have yet to do)

Soon an entire thread is created to discuss the possibilities of utilizing the LAION datasets versus others. We know from living in the future that they ultimately chose to integrate LAION, but there are also some other datasets in the mix.

LAION filtering & other datasets started by MJ developer Jack February 13, 2022
LAION filtering & other datasets started by MJ developer Jack February 13, 2022

In fact, David reiterates multiple times that they will use private datasets, showing that he indeed understands the difference between “publicly available” data and data that is part of the “public domain.” Disney movies, for example, may be viewed in public, but you can’t just go into business selling Disney content without a license from Disney to do so.

Emad — 02/11/2022 2:38 PMI think realistically market will be a mix of open and not open source models. From our side (stability.ai) we have a multi year commitment to open source all the models by default. You can do different/better models potentially if you don't have to open source as you don't need to worry so much about memorization when dealing with private datasets. ultimately it comes down to user experience and there is plenty of room DavidH — 02/11/2022 2:38 PM Yah we definitely are going to train on some private datasets
Emad and David discussing the future of generative AI art on February 11, 2022

Emad even explains that his team previous worked on Marvel movies, along with other video game and movie studios on AI voices.

DavidH — 02/11/2022 2:40 PMArt tends to be understaffed always so this helps that Emad — 02/11/2022 2:40 PM Yeah we saw similar things working with video game & movie studios on AI voices. A bunch of our team worked on the Marvel movies etc and the budgets are huge for graphics, they'll still need loads of good people but they'll do less boring stuff
Emad discussing prior work with Marvel on AI voices

During these early stages, the technical aspects are discussed very heavily. David needs user feedback, and the users have a lot of questions about what is happening under the hood and what’s capable. Both David and Emad (along with the development team following their lead) are very forthcoming about how everything works, including the refinements on the CLIP models.

DavidH — 02/15/2022 6:07 PMyea i don't think anyone can quite imagine what this will feel like when we finally take all the duct tape out of the system thank gosh for all these duct taped proof of concepts though ColtonD — 02/15/2022 6:17 PM what training data would u use for ur own clip? all of laion dataset? or ur own one DavidH — 02/15/2022 6:19 PM we have some private data partners as well as some open ones like laion
David Holz explaining they use a combination of private data partners and open ones like LAION

Because they’re moving in real time and building while people continue learning about the software and joining the server, the conversation moves between a lot of different channels and threads. Here’s some more in #off-topic:

I wrote a thread on representation in image models https://twitter.com/EMostaque/status/1495323912951021568?s=20&t=46km6npWisUWgaM2RPIw2w
Emad linking to his Twitter account

This is where Emad discusses how he’s already training on the LAION 5B model versus the Google CC models they’re using at the time. Keep in mind that this is Stability AI’s CEO discussing this in Midjourney’s server as they worked together on the development of both.

Take it down to cc10m_1 or cc9m_1 😄sorting LAION 5bn now, I reckon once we train on that you'll be able to distill it to like 90% proof A100s go brrr
Emad discussing training on LAION models February 22, 2022
danielrussruss — 03/18/2022 11:43 PMYeah, and sure would be weird if all that content we deleted here in the Discord because we considered it not allowed, now can't be hidden/deleted by the users who (accidentally or not) created them Emad — 03/18/2022 11:44 PM The new LAION dataset we are tidying is 5bn, 243 Pb of data. I’m not sure more is more but we will find out soon
Emad and Daniel discussing model development

By March, they’re fully in the weeds, and we get a lot of peaks into what’s to come over the next 18 months. March is also when the crowd is big enough that Emad and David fully introduce themselves.

Emad — 03/19/2022 2:35 AMI'll chuck in my 2c. Some of you know me, but I'm the biggest backer of LAION, Eleuther AI and large scale image models in particular funding both researchers, dataset creation and (very) large scale compute with output that is MIT-licensed/open source. I helped fund the beta expansion and have been speaking closely with the team and can attest that this is an experiment to try to figure out how to scale the deployment of these models and the better ones we are working on in a collaborative and aligned way versus an extractive one. Even with what there is now it would be easy to get VC capital and scale and sell to big tech, but I'm 100% that won't happen. Lots of unknowns but team I know is open to feedback and wants to build aligned, value-additive tools versus extractive ones
Emad introducing himself in Midjourney Discord March 19, 2022
mrdoob — 03/19/2022 2:39 AMWouldn't mind paying for a subscription. DavidH — 03/19/2022 2:40 AM We want to send a survey out soon to gather people's feelings about different possible ways of paying for things Its very important to watch people use it for free to see what kind of usage patterns are "normal" in a unrestricted environment Without that we would have no way to estimate cost But there is significant cost per user right now Our goal is to drive the price down to zero but at this moment it's impossible for this to be free for the number of people who deserve to use it
The start of the paid subscription talk
untitledc
David Holz introducing himself following Emad’s lead March 19, 2022

By April 2022, Brad Templeton enters the chat. If you’re unfamiliar, Brad is the former chairman of the Electronic Frontier Foundation (EFF), so he knows a thing or two about technology, data, and IP laws. The discussion initially starts around monetization and business models. It starts as a debate on whether Midjourney is selling GPU time (he calls it CPU) or images.

DavidH — 04/13/2022 4:46 PMI would have to charge you like 2$ per high res image Or 5$ Brad Templeton — 04/13/2022 4:46 PM But what are you selling? Is it images or CPU? If it's images, charge for images. If it's CPU, charge for CPU. $2/high res might be reasonable. Or $1.50 plus 10 cents/low-res or whatever formula works. DavidH — 04/13/2022 4:47 PM 70 percent of beta testers don't want incremental billing They don't want to be metered Brad Templeton — 04/13/2022 4:47 PM My advice, though, is to think about what customers want, and sell it to them at a profit. If you have a 70/30 split you probably want to offer both.
EFF chair emeritus Brad Templeton talking to David Holz April 13, 2022

During the conversation, David maintains an almost childlike innocence by insisting he’s just playing games and having fun and that’s what he wants to preserve. It comes off as shallow and hollow (much like his promises to open source) given what we now know about his company’s $200 million revenue for 2023.

Brad Templeton — 04/13/2022 4:51 PM Neither. I am looking at you as a business. If I were an investor, I would say, what is it you sell? What is it customers are coming to you for? I think it's images they can use, and a little bit of amusement and wonder. In beta, more of the latter. I don't think customers are coming to you (once you are in production) to do low-res imagines that do not produce an image they can use. DavidH — 04/13/2022 4:53 PM A lot of people are just having fun right now Brad Templeton — 04/13/2022 4:53 PM Absolutely, it's very fun. DavidH — 04/13/2022 4:53 PM If 10 million people want to just have fun the internet will literally run out of gpus DavidH — 04/13/2022 4:54 PM The internet will run out of computers before we get through the fun stage tbh That's the strange reality to this atm Brad Templeton — 04/13/2022 4:54 PM So the question to ask yourselves is, is this a game, or a productive tool? I don't want to say it can't be a game, or that games can't be a business DavidH — 04/13/2022 4:54 PM I don't think it can be productive until people play with it first Brad Templeton — 04/13/2022 4:55 PM As a potential customer (not that every customer is like me) I like to play but in the end I want images. DavidH — 04/13/2022 4:55 PM And again, just getting people to play with it at scale will make the internet run out of servers, that's the actual math I'm not making that up The server utilization here is higher than any consumer toy ever Brad Templeton — 04/13/2022 4:56 PM So you may need to design a system which allows play and use. And I see you fear that if you only charge for images, you will waste endless CPU time on people playing. There can be ways to try to segment the market, one pricing plan for players, another for users. DavidH — 04/13/2022 4:56 PM It's more like.... If millions of people want to play with this, the cultural force of that washes everything else away

And when pressed on his fantastical vision of the future with realistic business advice, David falls into e/acc talking points while Emad jumps in to help him describe how their junk image creator will help humanity.

DavidH — 04/13/2022 4:59 PM But once there's competition I think we are not interested in being in the stock photo or clip art biz Brad Templeton — 04/13/2022 5:00 PM No brain reader, but you could build an interface that helps people improve their query before sending it in. Not so free form (though that is fun.) Not stock photo of course, that's the point of your tool, it gives people the thing they are imagining, not what somebody else did. You probably have to remove all the "in the style of <artist>" stuff on artists that are still in copyright if you are doing it by sucking in a lot of their images, though. I don't know what exact technique you use, I am just guessing.

DavidH — 04/13/2022 5:01 PMI think we're trying to be more playful ATM while things are new A lot of people have never touched a large ai system before and this is a emotional thing for them Brad Templeton — 04/13/2022 5:02 PM That's perfectly cool. I like it as a game. But because it looked immediately useful I presumed your target was that. I may be wrong. Emad — 04/13/2022 5:02 PM Goal isn’t to be a company that extracts value from users. Goal is to scale this sustainably while not getting misaligned Midjourney An independant research lab. Exploring new mediums of thought. Expanding the imaginative powers of the human species.
David wants to be playful and Emad wants to advance the human race

Brad Templeton — 04/13/2022 5:03 PM I would not express it as "extract value from users." I would express it as "provides value to users." Emad — 04/13/2022 5:03 PM https://www.midjourney.com/ DavidH — 04/13/2022 5:03 PM I think this is not a picture maker That's not the right way to think of it Brad Templeton — 04/13/2022 5:03 PM Then disregard my approach. But if it's not a picture maker, do you have a different high concept expression of what you want it to be?

DavidH — 04/13/2022 5:04 PM And the world with cars is simply a different world we can't fully understand yet Brad Templeton — 04/13/2022 5:05 PM Understanding that different world is what I do for a living. DavidH — 04/13/2022 5:05 PM Its like saying Instagram is a photo filter app Or Snapchat is for sending nudes This is a new medium that no one really understands yet Brad Templeton — 04/13/2022 5:06 PM Well, I am not sure snapchat was ever very clear on what it was for... DavidH — 04/13/2022 5:06 PM Snapchat is a ephemeral first social network I don't think most people still understand how it's probably the most humanistic in that regard

And after all of that talk, they finally get into the nitty gritty of the copyright problem that Midjourney will have.

DavidH — 04/13/2022 5:13 PM But it seems like we should probably avoid commercial use for the first stage Brad Templeton — 04/13/2022 5:13 PM You are going to have a copyright problem.

It takes a minute, because David and Emad are in full apples to oranges comparison mode talking about how their computer “knows” things, but Brad inevitably pulls them into the meat and potatoes of the issue.

DavidH — 04/13/2022 5:13 PM There is no image database used in the process of making an image danielrussruss — 04/13/2022 5:13 PM These models run without internet, it's like Pandora's box sitting in your machine DavidH — 04/13/2022 5:14 PM Yah exactly. No internet. No database. Emad — 04/13/2022 5:14 PM Images that went into model all open access too DavidH — 04/13/2022 5:14 PM It just "knows" all these things.

untitledzbuntitledzcuntitledzd

But as you can imagine, David is not happy with this serious legal talk blowing his buzz. Deep down he already knows (having discussed commercially licensing datasets two months prior in February) that he is doing something wrong. He knows that Brad Templeton is correct in what he’s saying. But he wants to blow it off and focus more on playing.

untitledzm

The talk of being an independent research lab is difficult for many to stomach. We now know how much money both Stability AI and Midjourney have made/raised, and it’s clear these are both billion-dollar companies that were built on the back of stolen labor from millions (potentially billions) of artists, photographers, and graphic designers.

By August 2022, the Midjourney Discord server would be full of millions of people on well on its way to becoming the largest server on Discord. Emad would release his competing Stable Diffusion app, and the generative AI image market would soon be in full swing.

It all started in this small Discord server where a small group of developers had the access to the right tools at the right time with a huge cache of compressed data scraped and stolen from across the internet.

If you want to support independent journalism and research, please consider an optional paid subscription plan.