mementomori.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Mementomori.social is a social media for mortals. We connect with the Fediverse, used by millions. This instance is backed by a Finnish company, digital web agency Digitoimisto Dude Oy.

Server stats:

412
active users

Fediverse stats:

10M
users

Jürgen Hubert

My stance on :

1. There _might_ be some useful use cases with this technology that could be worth exploring.

2. However, it is glaringly obvious that, as of now, their main purpose is to power the mother of all investments bubbles.

3. Which leads us to the present trillion dollar business case for "we must build energy- and water-wasting data centers everywhere so that we can scrape every single website a thousand times a month for new training data!"

4. Thus, there is currently pretty much no ethical way of using LLMs.

5. Any ethical exploration of LLM use cases will thus have to wait until the bubble has burst, the investors have moved on to the next scam, and we can sort through the rubble to check what is left.

@juergen_hubert

There might be some useful use cases with this technology that could be worth exploring.

heavy af emphasis of might. none have been found yet

@hsza

_If_ they are trained in highly specific contexts, with carefully curated training (and legal) training data, then I can see a use for them.

The current approach of: "Let's feed the total sum of human output into it in the hopes that AGI will arise from it!" is obviously nonsense.

@juergen_hubert

highly specific contexts, with carefully curated training

yup! though i tend to figure context-specific specialized ML does not fall under generative “AI” slop. there are plenty of such technologies that’ve been used in very valid ways since way before the slop generation bubble.
the issue is with software used for generating aggregate sludge from a vague prompt, that’s what the bubble is being blown around and that’s what’s at best a toy with no reasonable use cases

@juergen_hubert it's 2000 all over again. Everybody is laying dark fiber that nobody will need for decades.

@juergen_hubert AI is really good at making trump look like a moron. That's pretty useful.

@juergen_hubert

The challenge is primarily for academic researchers in this space (if there is anyone left that hasn't been bribed into becoming an "AI" bro with multi millions).

The commercial chatbot / codingbot endpoints with the gigantic models behind them are carefully orchestrated to support the hype of extreme competence that can help fire half the workforce etc.

What is needed is a deconstruction of the useful bits and a design of low level tools that stops with this charade.

@juergen_hubert

'ethical generative AI' is like 'fair-trade cocaine' - the words make sense, you can put them together, and imagine how it could be, in the abstract, possible.

In practice, though, this would require a massive change in how these things are produced and consumed.

There's no there there.

@juergen_hubert "The investors have moved to the next scam".

Well, by this logic, there is no ethical existence in our current society.

@danbrotherston

It's difficult, to be sure.

However, a major focus should be: "Do not participate in the latest scam that will wreck our economy when it pops."

@danbrotherston @juergen_hubert "Investing" is THE SCAM!

When one privileged person with money decides to become a parasite by making money without doing labor, then the solution is death to capitalism.

Block me if you would like, but capitalism, AI, LLMs, all of it must die, forever!

But people who invest? Bloody parasites, useless breeding rich people.

We don't need them anymore.

@csgraves @juergen_hubert and why do you equate LLMs, AI, and capitalism?

Even if I agreed with you on capitalism (which I’m closer than you think but I think you oversimplify things) LLMs and AI are not inherently investments. The business around them is the problem today.

@danbrotherston @juergen_hubert no, that isn't the problem, nor the primary problem. They simply use too many natural resources. Yes, they use too much water and power.

And honestly, I would prefer they be forced to be closed down forever.

But no, let us just give all water rights to billionaire tech bros.

This is my issue, that and how they all steal the works of other people to "train."

It's one hundred percent speculative bullshit, and it's wasteful.

@csgraves @juergen_hubert

I disagree that these things are intrinsically linked.

And more, the "steal the work of others" is an inherently capitalist concept.

But there really no point in arguing over it I feel.

@danbrotherston @juergen_hubert oh? So, as an example, I write and record music. They have indeed been scraping websites, and yes, including music made by independent musicians, and big artists alike.
I don't think you understand capitalism.

The idea of my objection to my written music being just taken willy nilly and used to "train" LLMs is absolutely theft!

So, you wouldn't object to me stealing from you, say, you are a visual artist.

You can't be mad when I copy and steal your work?

@csgraves @juergen_hubert

All this comment has done is shown that you haven't listened to anything I've said.

You don't really understand LLMs either for that matter.

@danbrotherston @juergen_hubert you act like I want to understand them, and I don't. They are not needed.

@danbrotherston @juergen_hubert

There is no argument here. Your position is absolutely incorrect, immoral and unethical.

I can just plagiarize any of your work and you can't complain? That's ridiculous, idiotic even.

@csgraves @juergen_hubert

Do you think that "ownership of a story" is not a concept created by capitalism for the purpose of financializing human creativity?

Because I have a few oral traditions to sell you if you think otherwise.

If I create a story, the idea that you should not be able to retell it because I own it and I am granted a monopoly on it, is an inherently capitalist concept.

By the way, if you want to have an actual conversation, I recommend you don't begin by calling my position "absolutely incorrect, immoral, and unethical" and "idiotic". Because if you're opening with that, then you really suggest that there is no point in me continuing a discussion with someone who has a firmly closed mind, and someone who isn't interested in a discussion.

It's doubly so because you have ended with a straw man...because I certainly didn't say anything you claimed. Plagerism is not what LLMs do, nor have I proposed that (or anything even remotely like it).

@danbrotherston @juergen_hubert I am not interested necessarily in a conversation. LLMs do nothing but scrape the web and yes, they indeed plagiarize.

@danbrotherston @juergen_hubert okay then. My mind remains closed to the "benefits" of this tech boondoggle.

@danbrotherston @juergen_hubert You see the problem with our current society.

@danbrotherston increasingly thinking there might not be, that's the worrying problem

@juergen_hubert

@juergen_hubert

💯 this. Although I wonder if we’ll be able to train a new frontier model ever again once the bubble has burst (and frankly I don’t care if we can’t)

@juergen_hubert

But if I run the LLM on a computer I own and I use it for tasks I find useful, does the argument still hold?

Is it unethical to use an open model today in the same way I use LibreOffice or Plex or Linux?

@gatesvp how was your open model trained ?

@juergen_hubert

@ced @juergen_hubert Sounds like you're suggesting that there is a specific model trainee regimen that you would consider to be ethical?

What does that look like?

@gatesvp @ced

One that's trained on specifically public domain/Creative Commons Zero source material.

As opposed to, say, the entirety of the World Wide Web.

@juergen_hubert @gatesvp @ced Which has been done! You have to put up with the model's capabilities being a couple years behind the state of the art, but then, if the state of the art can only be achieved by labor theft, maybe it shouldn't be the state of the art. engadget.com/ai/it-turns-out-y

Engadget · It turns out you can train AI models without copyrighted materialBy Will Shanklin

@clayote except that 2023 state of the art was really bad, and the progresses made since were in a huge part due by making the training set bigger and bigger. Public domain/CC0 sources are finite…
And once the bubble has burst, will there be enough money even for that ?

@juergen_hubert @gatesvp

@ced @juergen_hubert @gatesvp New material enters the public domain every year. And if the LLM companies were merely lobbying to reduce copyright terms, I'd be fine with it

@juergen_hubert @ced and this is actually a reasonable space for discussion, because I don't actually think that that's the extent of an ethical set.

For example, assume that we formed a society to build the next LLM. We negotiate a contract with a publisher to include their books in exchange for payment from the society, I would consider that to be ethical.

I think that the inclusion of government reports in our data set would be ethical. Though in many places they are neither public domain nor CC0. For example, Canadian publications have Crown Copyright.

Research papers funded by government grants feels like an ethical inclusion to me, but again they don't fall under either of those categories.

Is this flexibility here? Or do you think my proposed sources are purely unethical as inclusions?

@gatesvp @ced

Depends on a number of factors.

Such as: Is the resulting model freely available - or do you have to pay a fee to use it?

@juergen_hubert
would they still be useful, though ? (for some value of useful, at least…)

@gatesvp

@gatesvp nope, that’s the point ;-) – and I can perfectly be wrong about that, but for large models, I’m really doubtful.

@juergen_hubert

@gatesvp @juergen_hubert yes, if that open model scrapes from the web, then it is unethical. Ask it a question? Sure.

But "making music" with it, or any art, is pretty much the same thing as saying that you're a talentless hack fool who hates art and music.

@juergen_hubert anyone with this commonsense stance is worth a follow. Thank you.

@juergen_hubert 100%! Science, accessibility, cobbling disparate data, definitely some good use cases, but absolutely… we’ll need to wait for the crash as everyone seeks hyper capitalism first.

@juergen_hubert There was a moment when the tech could have benefits millions of people, but tech-bros are not in that business, they are in teh business of making themselves stupidly rich with no morals.

I worked at an AI company until recently, and have some in depth hands on experience with AI. It has some legit, and very interesting uses. But techbros/greed has ruined it for everyone, and the damage it's doing makes it really difficult to justify it at all.

@juergen_hubert I have to say that every use case I can imagine for them would be perfectly 100% ok on smaller local only models.

Actually, I would go so far as to say even most of the use-cases they're pretending they can do could be done local. They don't really need to be 300+GB models. That's just a way to not actually acquire properly clean data (eg just steal everything and cram it all together rather than paying people to actually create qualitative sets.)

So yeah, every single use-case basically blows away the current system anyway.

As you said, the current system exists only for supporting the bubble.

My hope is when this is all over communities make community collated data sets that are clean and lighter and never push them as being "general AI" which they are not.

@nazokiyoubinbou @juergen_hubert yeah it seems to me like they’re too preoccupied with chasing ever more marginal performance improvements instead of maybe considering how to make the models more efficient (bar that context compression thing recently). Evil tongues would claim it’s a competitive advantage for those with deeper pockets

@flying_saucers @nazokiyoubinbou

Also, the current approach places a disproportionate burden on those who maintain websites, as these get constantly scraped for new content and thus see drastically increased page loads.

Speaking as the maintainer of such a website. 😡

@juergen_hubert @nazokiyoubinbou wouldn’t it be nice if people respected robots.txt 🙃

@flying_saucers @nazokiyoubinbou

Instead, we get anonymous bot-nets with a changing roster of IP addresses.

Seriously. Within the span of six hours, my wiki once received 3,800 requests for "Special:RecentChanges". This is not something most readers will do.

@juergen_hubert that’s messed up. Is there a way to rate limit just that page in particular?

@flying_saucers

Still trying to figure that out.

@flying_saucers @nazokiyoubinbou @juergen_hubert

the whole AI-as-deep learning thing is predicated on amounts of compute and training data only available to a few actors in the space, and this is before 'generative' AI.

MeredithWhittaker and Kate Crawford have been tapping that sign for almost a decade now.

It's always structurally favoured the Big Pocket crowd.

What's ironic is that your frontier, agentic, chatbot of 2026 would give a competent literature review on anyone bothering to ask

Oops!An unexpected error occurred.