Sure, GitHub's AI-assisted Copilot writes code for you, but is it legal or ethical?
GitHub Copilot, Microsoft's AI pair-programming service, has been out for less than a month now, but it's already wildly popular. In projects where it's enabled, GitHub states nearly 40% of code is now being written by Copilot. That's over a million users and millions of lines of code.
This extension and a back-end service suggest code to developers right in their editors. It supports integrated development environments (IDE) such as Microsoft's Visual Studio Code, Neovim, and JetBrains. Within these, the AI suggests the next line of code as developers type.
The program can suggest complete methods and complex algorithms alongside boilerplate code and assistance with unit testing. For all intents and purposes, the back engine AI acts as a pair-programming assistant. Developers are free to accept, reject or edit Copilot's suggestions. If you're a new programmer, Copilot can interpret simple natural language commands and translate them into one of a dozen programming languages. These include Python, JavaScript, TypeScript, Ruby, and Go.
Microsoft, GitHub, and OpenAI collaborated to build the program. It's based on OpenAI's Codex. The Codex was trained on billions of publicly available source code lines -- including code in public repositories on GitHub -- and on natural language, which means it can understand both programming and human languages.
It sounds like a dream come true, doesn't it? There's a rather large fly in the soup, though. There are legal questions about whether Codex had the right to use the open source code to provide the foundation of a proprietary service. And, even if it is legal, can Microsoft, OpenAI, and GitHub, and thus Copilot's users, ethically use the code it "writes?"
According to Nat Friedman, GitHub's CEO when Copilot was released in beta, GitHub is legally in the clear because "training ML systems on public data is fair use." But, he also noted, "IP [intellectual property] and AI will be an interesting policy discussion around the world in the coming years." You can say that again.
Others venomously disagree. The Software Freedom Conservancy (SFC), a non-profit organization that provides legal services for open source software projects, holds the position that OpenAI was trained exclusively with GitHub-hosted projects. And many of these have been licensed under copyleft licenses. Therefore, as Bradley M. Kuhn., the SFC's Policy Fellow and Hacker-in-Residence, stated, "Most of those projects are not in the 'public domain,' they are licensed under Free and Open Source Software (FOSS) licenses. These licenses have requirements including proper author attribution and, in the case of copyleft licenses, they sometimes require that works based on and/or that incorporate the software be licensed under the same copyleft license as the prior work. Microsoft and GitHub have been ignoring these license requirements for more than a year."
Therefore, the SFC bites the bullet and urges developers not only to avoid using Copilot but to stop using GitHub completely. They know that won't be easy. Thanks to Microsoft and GitHub's "effective marketing, GitHub has convinced Free and Open Source Software (FOSS) developers that GitHub is the best (and even the only) place for FOSS development. However, as a proprietary, trade-secret tool, GitHub itself is the very opposite of FOSS," added Kuhn.
Other people land between these extremes.
For example, Stefano Maffulli, executive director of the Open Source Initiative (OSI), the organization that oversees open source licenses, understands "why so many open source developers are upset: They have made their source code available for the progress of computer science and humanity. Now that code is being used to train machines to create more code -- something the original developers never envisioned nor intended. I can see how it's infuriating for some."
That said, Maffulli thinks, "Legally, it appears that GitHub is within its rights." However, it's not worth getting "lost in the legal weeds discussing if there is an open source license issue here or a copyright issue. This would miss the wider point. Clearly, there *is* a fairness issue that affects the whole of society, not just open source developers."
Maffulli argues:
Copilot has exposed developers to one of the quandaries of modern AI: the balance of rights between individuals participating in public activities on the internet and in social networks and the corporations using 'user-generated content' to train a new almighty AI. For many years we knew that uploading our pictures, our blog posts, and our code on public internet sites meant we'd be losing some amount of control over our creations. We created norms and licenses (open source and Creative Commons, for example) to balance control and publicity between creators and society as a whole. How many billions of Facebook users realized that their pictures and tags were being used to train a machine that would recognize them in the streets protesting or shopping? How many of those billions would choose to participate in this public activity if they understood that they were training a powerful machine with unknown reach into our private lives?
We can't expect organizations to use AI in the future with "goodwill" and "good faith," so it's time for a broader conversation about AI's impact on society and on open source.
That's an excellent point. Copilot is the tip of an iceberg of a much larger issue. The OSI won't be ignoring it. The organization has been working for several months on building a virtual event called Deep Dive: AI. This, the OSI hopes, will launch a conversation about the legal and ethical implications of AI and what's acceptable for AI systems to be "open source". It comprises a podcast series, which will launch soon, and a virtual conference, which will be held in October 2022.
Focusing more on the legal elements, well-known open-source lawyer and OSS Capital General Partner Heather Meeker believes Copilot is legally in the clear.
People get confused when a body of text like software source code -- which is a copyrightable work of authorship -- is used as data by other software tools. They might think that the results produced by an AI tool are somehow "derivative" of the body of text used to create it. In fact, the licensing terms for the original source code are probably irrelevant. AI tools that do predictive writing are, by definition, suggesting commonly used phrases or statements when the context makes them appropriate. This would likely fall under the fair use or scene-a-faire defenses to copyright infringement -- if it were infringement in the first place. It's more likely that these commonly used artifacts are small code snippets that are entirely functional in nature and, therefore, when used in isolation, don't enjoy copyright protection at all.
Meeker noted that even the Freedom Software Foundation (FSF) doesn't claim that what Copilot does is copyright infringement. As John A. Rothchild, Professor of Law at Wayne State University, and Daniel H. Rothchild, Ph.D. candidate at the University of California at Berkeley, said in their FSF paper, "The use of Copilot's output by its developer-customers is likely, not infringing." That, however, "does not absolve GitHub of wrongdoing, but rather argues that Copilot and its developer-customers likely do not infringe developers' copyrights." Instead, the FSF argues that Copilot is immoral because it is a Software as a Service (SaaS).
Open source legal expert and Columbia law professor Eben Moglen thinks Copilot doesn't face serious legal problems, but GitHub and OpenAI do need to answer some concerns.
That's because, Moglen said, "like photocopiers, or scissors and paste, code recommendation programs can result in copyright infringement. Therefore, parties offering such recommendation services should proceed in a license-aware fashion so that users incorporating recommended code in their projects will be informed in a granular fashion of any license restrictions on recommended code. Ideally, users should have the ability to filter recommendations automatically to avoid the unintentional incorporation of code with conflicting or undesired license terms." At this time, Copilot doesn't do this.
Therefore, since many "free software programmers are uncomfortable with code they have contributed to free software projects being incorporated in a GitHub code database through which it is distributed as snippets by the Copilot recommendation engine at a price," said Moglen. GitHub should provide "a simple, persistent way to sequester their code from Copilot." If GitHub doesn't, they've given programmers a reason to move their projects elsewhere, as the SFC is suggesting. Therefore, Moglen expects GitHub to offer a way to protect concerned developers from having their code vacuumed into the OpenAI Codex.
So, what happens now? Eventually, the courts will decide. Besides open source and copyright issues, there are still larger legal issues over the use of "public" data by private AI services.
As Maffulli said, "We need to better understand the needs of all actors affected by AI in order to establish a new framework that will embed the value of open source into AI, providing the guardrails for collaboration and fair competition to happen at all levels of society."
Finally, it should be noted that GitHub isn't the only company using AI to help programmers. Google's DeepMind has its own AI developer system AlphaCode, Salesforce has CodeT5, and there's also the open-source PolyCoder. In short, Copilot isn't the only AI coder. The issue of how AI fits into programming, open-source, and copyright is much bigger than the simplistic "Microsoft is bad for open source!"
Related Stories:
AI is getting scary good at finding hidden software bugs - even in decades-old code
Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
- AI is proving better than expected at finding old, obscure bugs.
- Unfortunately, AI is also good at finding bugs for hackers to exploit.
- In short, AI still isn't ready to replace programmers or security pros.
In a recent LinkedIn post, Microsoft Azure CTO Mark Russinovich said he used Anthropic's new AI model Claude Opus 4.6 to read and analyze assembly code he'd written in 1986 for the Apple II 6502 processor.
Also: Why AI is both a curse and a blessing to open-source software - according to developers
Claude didn't just explain the code; it performed what he called a "security audit," surfacing subtle logic errors, including one case where a routine failed to check the carry flag after an arithmetic operation.
That's a classic bug that had been hiding, dormant, for decades.
The good news and the bad news
Russinovich's experiment is striking because the code predates today's languages, frameworks, and security checklists. However, the AI was able to reason about low-level control flow and CPU flags to point out real defects. For veteran developers, it's a reminder that long-lived codebases may still harbor bugs that conventional tools and developers have learned to live with.
Also: 7 AI coding techniques I use to ship real, reliable products - fast
Yet despite the progress, some experts believe this experiment raises concerns.
As Matthew Trifiro, a veteran go-to-market engineer, said: "Oh, my, am I seeing this right? The attack surface just expanded to include every compiled binary ever shipped. When AI can reverse-engineer 40-year-old, obscure architectures this well, current obfuscation and security-through-obscurity approaches are essentially worthless."
Trifiro makes a point. On the one hand, AI will help us find bugs so we can fix them. That's the good news. On the other hand, and here's the bad news, AI can also break into programs still in use that are no longer being patched or supported.
As Adedeji Olowe, founder of Lendsqr, pointed out, "This is scarier than we're letting on. Billions of legacy microcontrollers exist globally, many likely running fragile or poorly audited firmware like this."
Also: Is Perplexity's new Computer a safer version of OpenClaw? How it works
He continued: "The real implication is that bad actors can send models like Opus after them to systematically find vulnerabilities and exploit them, while many of these systems are effectively unpatchable."
LLMs complementing detector tools
Traditional static analysis tools such as SpotBugs, CodeQL, and Snyk Code scan source code for patterns associated with bugs and vulnerabilities. These tools excel at catching well-understood issues, such as null-pointer dereferences, common injection patterns, and API misuse, and they do so at scale across large Java and other-language codebases.
Now, it has become clear that large language models (LLMs) can complement those big detector tools. In a 2025 head-to-head study, LLMs like GPT-4.1, Mistral Large, and DeepSeek V3 were as good as industry-standard static analyzers at finding bugs across multiple open-source projects.
Also: This new Claude Code Review tool uses AI agents to check your pull requests for bugs -- here's how
How do these models do it? Instead of asking, "Does this line violate rule X?", the LLM is effectively asking, "Given what this system is supposed to do, where are the failure modes and attack paths?" Combined, this approach is a powerful pairing.
For example, Anthropic's Claude Opus 4.6 AI is helping clean up Firefox's open-source code. According to Mozilla, Anthropic's Frontier Red Team found more high-severity bugs in Firefox in just two weeks than people typically report in two months. Mozilla proclaimed, "This is clear evidence that large-scale, AI-assisted analysis is a powerful new addition to security engineers' toolbox."
Anthropic isn't the only organization using AI engines to find bugs in code. Black Duck's Signal product, for instance, combines multiple LLMs, Model Context Protocol (MCP) servers, and AI agents to autonomously analyze code in real time, detect vulnerabilities, and propose fixes.
Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic
Meanwhile, security consultancies, such as NCC Group, are experimenting with LLM-powered plugins for software reverse-engineering tools, like Ghidra, to help discover security problems, including potential buffer overflows and other memory-safety issues that can be hard for people to spot.
Passing security checks to AI
These successes don't mean we're ready to pass our security checks to AI. Far from it.
Also: I tried to save $1,200 by vibe coding for free - and quickly regretted it
Researchers have found that LLM-driven bug finding is not a drop-in replacement for mature static analysis pipelines. Studies comparing AI coding agents to human developers show that while AI can be prolific, it also introduces security flaws at higher rates, including unsafe password handling and insecure object references.
CodeRabbit found "that there are some bugs that humans create more often and some that AI creates more often. For example, humans create more typos and difficult-to-test code than AI. But overall, AI created 1.7 times as many bugs as humans.
Code generation tools promise speed but get tripped up by the errors they introduce. It's not just little bugs: AI created 1.3-1.7 times more critical and major issues."
Also: Rolling out AI? 5 security tactics your business can't get wrong - and why
You can also ask Daniel Stenberg, creator of the popular open-source data transfer program cURL. He's loudly and legitimately complained that his project has been flooded with bogus, AI-written security reports that drown maintainers in pointless busywork.
The moral of the story
AI, in the right hands, makes a great assistant, but it's not ready to be a top programmer or security checker. Maybe someday, but not today. So, use AI with existing tools carefully, and your programs will be far more secure than they are currently.
As for old code, well, that's a real worry. I foresee people replacing firmware-powered devices due to realistic fears that they'll soon be compromised.
Artificial Intelligence
90% of AI projects fail - here are 3 ways to ensure yours doesn't
Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
- Boards are starting to ask tougher questions about money sunk into AI.
- Interrogations into the value of AI projects are an opportunity to re-focus.
- Concentrate on capacity building, strong partnerships, and co-development.
The amount of money that organizations invest in AI shows no signs of abating. Worldwide spending on AI is forecast to reach $2.52 trillion in 2026, a 44% year-over-year increase, according to tech analyst Gartner.
However, there's a twist in the tale. With AI slipping into the abyss in Gartner's Hype Cycle for Emerging Technologies, boards are starting to ask tougher questions about the money spent on AI explorations, and digital and business professionals will be expected to turn dollars and cents into tangible benefits.
Also: 5 ways you can stop testing AI and start scaling it responsibly in 2026
ZDNET reported last year that several areas of AI have slipped into the Trough of Disillusionment, where interest in a technology wanes because explorations fail to deliver promised returns. That's exactly where generative AI finds itself right now, with hype fading and business leaders questioning the ROI.
Many organizations have barely found a way to make the most of the technology. Now, interest in gen AI appears to be waning, and the bubble surrounding the emerging technology could be about to burst. Sounds like bad news, right?
Yet John-David Lovelock, chief forecaster and distinguished VP analyst at Gartner, told ZDNET in a one-to-one interview that the slide should be seen as a sign of hope. Slipping into the trough allows everyone to think much more carefully about their investments in gen AI. In short, business and digital professionals should embrace the opportunity.
Also: 5 ways rules and regulations can help guide your AI innovation
"They probably should be looking for AI to slip into the ditch," he said. "The trough is all about expectations being at their lowest. And the problems we have seen with AI in the last two years are connected to these over-the-top moonshot projects."
With MIT research suggesting that 95% of gen AI projects fail to deliver value, Lovelock said a new approach is required to ensure AI investments are focused on the right targets. He suggested the following three areas should be priorities through 2026.
1. Focus on capacity building
Gartner reports that a massive build-out of AI infrastructure will characterize emerging tech investments through 2026.
Building AI foundations alone will drive a 49% increase in spending on AI-optimized servers, accounting for 17% of AI spending this year. AI infrastructure, meanwhile, will add $401 billion in spending in 2026, as technology providers build out their foundations.
Also: 6 reasons why autonomous enterprises are still more a vision than reality
Lovelock said this investment by IT companies will be crucial, even as AI drops into the Trough of Disillusionment. "They are building the capacity needed to run all the AI that's coming," he said.
"This area is where we have the hyperscalers, tech providers, and even software companies buying AI-optimized servers to build data centers that provide the capacity to train new models, train agents, and run agents."
Lovelock gave the example of a finance organization that's looking to find the capacity to run a model that automates credit card approvals.
The organization has several choices -- it could run its own standalone data center; work with a big-name cloud provider like AWS, Microsoft, or Google; focus on a platform provider that manages compute; or make an API call to a large language model from a specialist like OpenAI.
Also: 5 ways Lenovo's AI strategy can deliver real results for you too
The key to success, said Lovelock, is deciding how the provider's capacity-building approach suits your organization's resources and priorities.
"You need to ask, 'How deeply do I need to own this technology? How much can I deal with it as a commodity? And how much of our approach is about differentiating AI that we must own, operate, and create?'"
2. Create strong partnerships
Finding suitable answers to those kinds of questions will involve building close relationships with technology providers.
Lovelock suggested that these partnerships will be crucial for business and digital professionals who want to improve AI ROI through 2026.
"This year, most people should be looking for the technology coming from their established partner stack," he said. "It's only the leaders, the visionaries, who should be looking to self-develop AI solutions or push the envelope."
Also: AI isn't getting smarter, it's getting more power hungry - and expensive
With AI in the Trough of Disillusionment throughout 2026, it will most often be sold to companies by their incumbent software providers rather than bought for a moonshot project.
Rather than spending time and money on developing bespoke solutions, Lovelock agreed that most companies should focus this year on making good bets on solid tech partners across the digital and data stack.
"That's exactly right," he said. "It's about finding your technology partners to take you on your path, whether that's simple use of AI or you're going to push toward being an autonomous business."
3. Avoid random explorations
With gen AI sliding into the Trough of Disillusionment, Gartner suggests professionals should avoid broad-brush explorations into emerging tech and instead focus on ensuring that the best of their moonshot projects reach the stars.
So, how can digital leaders and their business peers ensure that exploratory projects turn into valuable initiatives? Lovelock suggested focusing on three areas: "Partners, data, and processes."
Another crucial element, he added, is bringing along internal stakeholders for the ride from the moon to the stars.
"Success is all about line-of-business functions as well," he said. "How well are you focused on defined business outcomes? How well can your partners help you with meeting these requirements? What level of investiture do they have?"
Also: I stopped using ChatGPT for everything: These AI models beat it at research, coding, and more
Lovelock said the best relationships will ensure you and your supplier benefit from turning moonshots into valuable production services.
"If you're doing time-and-materials billing, your provider has no skin in the game. If you're doing value-based pricing, they have some. If you're doing outcome-based pricing, they have more. If you're doing co-development, that's great," he said.
"The best approach is about tying their reward to your outcome. Now, that is not easily accomplished. It's a difficult approach to sell across the organization. It's also a very deep and tricky relationship to maintain over time. But when it works, it's incredibly and deeply rewarding for both participants."