Post

Conversation

I recently participated in a NYT story about fair use and generative AI, and why I'm skeptical "fair use" would be a plausible defense for a lot of generative AI products. I also wrote a blog post (suchir.net/fair_use.html) about the nitty-gritty details of fair use and why I believe this. To give some context: I was at OpenAI for nearly 4 years and worked on ChatGPT for the last 1.5 of them. I initially didn't know much about copyright, fair use, etc. but became curious after seeing all the lawsuits filed against GenAI companies. When I tried to understand the issue better, I eventually came to the conclusion that fair use seems like a pretty implausible defense for a lot of generative AI products, for the basic reason that they can create substitutes that compete with the data they're trained on. I've written up the more detailed reasons for why I believe this in my post. Obviously, I'm not a lawyer, but I still feel like it's important for even non-lawyers to understand the law -- both the letter of it, and also why it's actually there in the first place. That being said, I don't want this to read as a critique of ChatGPT or OpenAI per se, because fair use and generative AI is a much broader issue than any one product or company. I highly encourage ML researchers to learn more about copyright -- it's a really important topic, and precedent that's often cited like Google Books isn't actually as supportive as it might seem. Feel free to get in touch if you'd like to chat about fair use, ML, or copyright -- I think it's a very interesting intersection. My email's on my personal website.