nondeterministic.computer is one of the many independent Mastodon servers you can use to participate in the fediverse.

Administered by:

Server stats:

2
active users

Matthew Garrett

Free software people: A major goal of free software is for individuals to be able to cause software to behave in the way they want it to
LLMs: (enable that)
Free software people: Oh no not like that

When I write code I am turning a creative idea into a mechanical embodiment of that idea. I am not creating beauty. Every line of code I write is a copy of another line of code I've read somewhere before, lightly modified to meet my needs. My code is not intended to evoke emotion. It does not change people think about the world. The idea→code pipeline in my head is not obviously distinguishable from the prompt->code process in an LLM

Look, coders, we are not writers. There's no way to turn "increment this variable" into life changing prose. The creativity exists outside the code. It always has done and it always will do. Let it go.

(Yes ok there are cases where code is beauty and embodies an idea that could make a grown man cry and:

(1) your code is not that code
(2) you would think nothing of copying the creative aspect of that code if you needed to don't fucking lie to me)

Personally I'm not going to literally copy code from a codebase under an incompatible license because that is what the law says, but have I read proprietary code and learned the underlying creative aspect and then written new code that embodies it? Yes! Anyone claiming otherwise is lying!

Clearly my most unpopular thread ever, so let me add a clarification: submitting LLM generated code you don't understand to an upstream project is absolute bullshit and you should never do that. Having an LLM turn an existing codebase into something that meets your local needs? Do it. The code may be awful, it may break stuff you don't care about, and that's what all my early patches to free software looked like. It's ok to solve your problem locally.

@mjg59 Perhaps, but anyone claiming an LLM has "learned the underlying creative aspect" is also lying.

@mjg59 What you propose is actually illegal, even if the law doesn’t make much sense. I wonder if you ever had the cops sent after you on a corp-run IP case… maybe it would make you feel different?

@promovicz Information wants to be free

@promovicz
Let's hope the AI lobby will (in any combination of purposely and inadvertently) make that law obsolete.
@mjg59

@promovicz
That completely oversimplifies what's being discussed here. Every math book you ever studied is copyright, that does not mean you cannot use what you learned to solve math problems.

@mjg59

@ck @mjg59 Science works for the public domain. What you describe is explicitly exempt from copyright. If you look at proprietary source code and use its methods, that's a legally distinct situation. Landmark case: "IBM BIOS reverse-engineering".

@mjg59 I agree with this last statement 😁

@mjg59 I agree not all code is art, and often not even craft. But contrary to optimizing compilers, we're not yet at a point where the generated code only needs to be read/modified by a handful of optimization experts, as it is with ASM. The generated code isn't even reliably identical between 2 prompts.

The AIGen'd code I've seen can be quite elegant taken in isolation, but looks a lot like a Frankenstein'd behemoth when I look at "large" (beyond toy project) code bases.

@jenesuispasgoth I mean kind of the point of free software is that people get to modify it to their own ends and that doesn't mean it has to be good - when I first started hacking things to meet my needs I was definitely writing stuff that couldn't be upstreamed, but it worked for me, and making it easier for others to do that is a win

@mjg59 @jenesuispasgoth
There are people that analyse, design and then implement as code. Those are programmers. LLM can't replace that,
If you only ever tweak someone else's design, you may not have learned to program, only learned a language, or framework or library APIs. So maybe an LLM might help, because it's a plagiarism machine. It ignores licences and the companies building them (so called "training" = copying) have violated IP, copyright, copyleft/GPL etc on a massive scale. Theft.

@raymaccarthy I agree with the sentiment wrt IP theft (at least, insofar as, since proprietary code requires us to respect its license, the bare minimum would be to respect FLOSS licenses). If you take ethical concerns (including ecological ones) into consideration, I think there is no conversation left whatsoever.

I took @mjg59's comments from a more technical point of view, no so much ethical in that sense.

@jenesuispasgoth @mjg59
Some people think they can recycle FOSS from one licence to another using LLM, such as GPL2 to MIT or whatever. They are IP thieves.
All FOSS code, any so called copyleft licence, is actually copyright. Public domain code is a special case and in reality rare for anything written in the last 50 years. All of AT&T UNIX is still copyright.
Even programs or OS where the source has been made public with limitation for use is mostly still some sort of copyright.

@raymaccarthy @jenesuispasgoth @mjg59 I don't much like the answer, but the assessment in the US seems to be that, yes, this laundering works if the new code is different enough.
If you sidestep the question of whether the output can be copyrighted (such as chardet did in the end) and you rename it, you're probably "good".
(Again. Me no like. And maybe different in the EU.)

@larsmb @jenesuispasgoth @mjg59
The US is the country that on the one hand has the draconian DMCA (unfair) and on the other hand said it's fine for Google to entirely scan copyright works (a totally paid for decision that isn't "fair use").
The USPTO broken since Edison.

It's not a clean room re-implementation. It's automated plagiarism. I can do that in Perl or WP to a novel changing places and people. Copyright violation.
Even if you also manually transpose to a different era it might be.

@raymaccarthy @jenesuispasgoth @mjg59 I think it morally is a copyright violation too.

I also have come to the conclusion (including an explanation by Fontana in the chardet issue) that unless you can identify persistent copyrightable expression from prior art, your new work isn't a violation.

If you don't care whether it's copyrightable, you're probably in the clear.

Exposure is a problem if you're under NDA or trade secrets are involved, yes. Or maybe patents.

@jenesuispasgoth @raymaccarthy Agreed. From a pure technical sense @mjg59 makes sense. But it ignores the massive ethical and environmental problems and these cannot be decoupled.

The quality of the output is not relevant. It is going to get better. The ethical problems are not. Too many people opposed to LLMs concentrate on the ai-slop problem when they should be shouting about the ethical issues.

@raymaccarthy @jenesuispasgoth I'm not arguing that LLMs replace the need for humans who understand how code works, or that people who use them are becoming programmers in the process.

@jenesuispasgoth @mjg59 This is not AI endorsement, but given a sufficiently large problem / codebase, I would wager you wouldn't get a reliably identical result from having a human write code for the same problem twice either.
We expect determinism from LLMs because "its computers", not because its necessary for good results.

@ck right. I was thinking more in terms of, within the same prompt requesting code generation, identical needs which should generate a single function called at different sites become several functions with slight variations (sometimes only in the function name) being generated, which happens far less often in large codebases tackling the same issue.

@mjg59

@mjg59 Yeah, as soon as there‘s an ethically sourced and trained free LLM that‘s not controlled by very shitty companies I‘m totally on board with you.

Until then we shouldn’t let that shit near our projects.

@chris_evelyn That is a coherent position that I have no fundamental disagreement with

@chris_evelyn
What do you mean by "ethically sourced and trained"?
@mjg59

@light At minimum that:

  • all input material is legit - either public domain or fairly paid for
  • all labeling/curating is done under good labor conditions

@mjg59

@chris_evelyn @mjg59 … and that doesn't boost global warming and slurp up much needed water in order to train and run ...

@chris_evelyn

It's my belief that Mistral's models fit that bill.

@mjg59

@troed Shitty company, non-transparent model sourcing

@mjg59

@chris_evelyn

What an interesting claim. Has it got anything to do with reality?

@mjg59

@troed Venture funded by (among others) Andreessen Horowitz and Salesforce, no truly open models. Bye!

@troed @chris_evelyn @mjg59 last time I checked, Mistral models were merely open weight, with no training dataset available nor training pipeline released as FOSS. Has that changed?

@zacchiro I understood the ask I replied to was regarding ethical training. Mistral, as an EU company, has to abide by EU regulations AI companies in the US, China etc don't have to.

artificialintelligenceact.eu/a

@chris_evelyn @mjg59

artificialintelligenceact.euArticle 53: Obligations for Providers of General-Purpose AI Models | EU Artificial Intelligence Act

@troed I see. I don't know either what @chris_evelyn had in mind, so I'll leave it to them. But for what is worth the EU AI Act equally applies to all companies having access to the EU market. Mistral is not be special in that respect, unless the other players decide to leave the EU market (which is unlikely). @mjg59

@pkal @mjg59 Looks interesting at first glance, I will take a look, thanks!

@chris_evelyn @mjg59 I haven't taken a proper look at it either, so I don't know if it is open-washing as has been the case with a lot of other models, but if this means anything RMS has stated that it appears to "be free".

@mjg59 @pkal @chris_evelyn

[…] e.g. robots.txt. The Apertus
LLM was trained on web documents crawled by CommonCrawl while respecting standard machine-readable opt-out by websites. In addition, data from websites which have recently opted out by specifying at least one of the common AI crawlers, at the time of January 2025, was removed. Crucially, such removals were also applied
retroactively in all earlier crawls since 2013

https://huggingface.co/swiss-ai/Apertus-70B-2509/blob/main/Apertus_EU_Code_of_Practice.pdf

huggingface.coApertus_EU_Code_of_Practice.pdf · swiss-ai/Apertus-70B-2509 at mainWe’re on a journey to advance and democratize artificial intelligence through open source and open science.

@mjg59 @pkal @chris_evelyn

Ugg above post starred by a fucking ai “artist” that generates slop from stolen artwork. And a techbro that licks boot at Google Chrome. Fuck you and everything you stand for.

@mjg59 thankfully there are plenty of other reasons to dispise LLMs, so we don’t really have to have this discussion :)

@mjg59 So the big thing is that all art belongs to society. To promote creation, society grants limited exclusivity, mostly to fund the work.

This means that, in a utopia, copyright wouldn't exist because everyone could stand on everyone else's shoulders.

The biggest problem is the tail wagging the dog. It's not about promoting creation. It's about giving power plays in the game of life to a selected few. That's literally oppression.

@mjg59 Of course somewhat ironic because you'll sometimes get oppressor on oppressor conflicts... but, like Alien vs Predator, whoever wins, humans lose.

@mjg59 you might be missing a few of people's issues with LLMs. Our programmer standpoint is quite niche.

What happens to people with jobs that are affected by LLMs? They either start using LLMs to match the competition's performance, or get obsoleted... What if they can't actually afford using LLMs to stay competitive?...

And then there's art.

On top of all of that LLMs are energy and resource-hungry, ruining the environment and making everything more expensive...

@mjg59 heh, one of the new ideas in a project I'm doing virtualization work for is to have a fully local LLM generate bespoke apps and instantly summon them directly on the desktop.

I don't think current local LLMs are actually "ethical" either, all my "fuck that entire industry" concerns are always present, and personally I wouldn't like using straight up fuzzy statistically magically inferred apps at all. But I do like the idea of empowering people to locally just do bespoke things like that, as long as there's always a big disclaimer about it being made that way and so on.

@mjg59 The LLM-hate reminds me of the backlash against computers themselves. People insisted they were 100% worthless because someone got a bill for $0, and then a notice they were in arrears when it was not paid. Many projects either failed outright or people had to do their work twice, first the old pen and paper way which worked, and then also put it into the computer never to be seen again...

@mjg59 i think the submitting it back is the part people are angry about not that it is possible

@mjg59 I think the negativity comes from the fact that a lot of floss developers have other reasons why they work on projects besides scratching their own itch - "meeting the local needs" as you put it.

That is expanding their knowledge and, sometimes even the enjoyment of the programming act itself.

:goose_hacker: So if you treat open source development as a learning experience and an artistic expression, you're automatically going to balk at something that would take that away.

@mariusor I should be clear that I write code by hand and enjoy the process, and also agree that the only way you're going to get high quality human developers is through doing that. But I also think that the world is probably better if more people are able to modify code to meet their needs, even if those people never turn into high quality human developers as a result.

@mjg59 strictly local needs, you do you.

If using a giant model like Claude, you might want to consider what remodelling that code will cost the planet in terms of direct carbon output, electricity generation, water pollution, amortised environmental cost of building the Pollution Centres and the ongoing damage to local communities of the Pollution Centres.

If you can live with all that? Sure, use your magic auto complete. Just don't expect others to not judge you for it. Not saying I would, btw, but that's the argument .

@dgold @mjg59 Yea, but, like.. A functioning biosphere, human rights, a functioning democracy, respect for small peoples' creative rights, and code quality just aren't relevant to my specific use case

@dgold No disagreement whatsoever

Thank you for expressing the argument eloquently, succinctly, and without aggression.

I confess, I often tire of reading information that's repetitive. My feelings go way beyond ennui when predictability is coupled with writing that's selfish, sloppy, and divisive. This wrong style of writing has become a norm for some of the people are right to be concerned.

I'm amongst the countless people who are, quietly, deeply concerned about the impact on Earth's resources, the environment, and so on.

@dgold you're amongst the people who can inspire mutual respect. I wish more people could be like you. It saddens me that you're in a minority―not in what you think, in the way that you choose to write.

I'm tired, but not so tired that I can't spend ten minutes of my day thinking about how to thank you for writing nicely. I'm not sure how.

@mjg59 I completely agree, but I'd add a couple of things... if you understand the code, then LLMs are just providing acceleration to your efforts. Also, solving problems locally for yourself is great, but there's no reason why you shouldn't share the solution in case it helps someone else. Just be transparent about the possible quality concerns.

Oops!An unexpected error occurred.