Hacker Newsnew | past | comments | ask | show | jobs | submit | mtkd's commentslogin

I expect in a few days there will be a new tool launched that returns word frequency/velocity in recent biomedical papers ... so next year's PhDs can level things using an MCP function

Will there? is someone working on it?

Same was said about dejanews, stackoverflow etc. and intellisense

Stack overflow didn't create a positive feedback loop where the solution to having to deal with an obscure, badly written, incomprehensible code base is creating an more incomprehensible sloppy code to glue it all together.

Neither did intellisense. If anything, it encouraged structuring your code better so that intellisense would be useful.

Intellisense does little for spaghetti code. And it was my #1 motivation to document the code in a uniform way, too.

The most important impact of tools is that they change the way we think and see the world, and this shapes the world we create with these tools.

When you hold a hammer, everything is a nail, as the saying goes.

And when you hold a gun, you're no longer a mere human; you're a gunman. And the solution space for all sorts of problems starts looking very differently.

The AI debate is not dissimilar to the gun debate.

Yes, both guns and the AI are powerful tools that we have to deal with now that they've been invented. And people wielding these tools have an upper hand over those who don't.

The point that people make in both debates that tends to get ignored by the proponents of these tools is that excessive use of the tools is exacerbating the very problem these tools are ostensibly solving.

Giving guns to all schoolchildren won't solve the problem of high school shootings — it will undeniably make it worse.

And giving the AI to all software developers won't solve the problem of bad, broken code that negatively impacts people who interact with it (as either users or developers).

Finally, a note. Both the gun technology and the AI have been continuously improved since their invention. The progress is undeniable.

Anyone who is thinking about guns in 1850 terms is making a mistake; the Maxim was a game changer. And we're not living in ChatGPT 2.0 times either.

But with all the progress made, the solution space that either tool created hasn't been changing in nature. A problem that wasn't solveable with a flintlock musket or several remains intractable for an AK-74 or an M16.

Improvements in either tech certainly did change the scale at which the tools were applied to resolve all sorts of problems.

And the first half of the 20th century, to this day, provides most of the most brilliant, masterful examples of using guns at scale.

What is also true is that the problems never went away. Nor did better guns made the lives of the common soldier any better.

The work of people like nurse Nightingale did.

And most of that work was that the solution to increasingly devastating battlefield casualties and dropping battlefield effectiveness wasn't giving every soldier a Maxim gun — it was better hygiene and living conditions. Washing hands.

The Maxim gun was a game changer, but it wasn't a solution.

The solution was getting out of the game with stupid prizes (like dying of cholera or typhoid fever). And it was an organizational issue, not a technological one.

* * * * *

To end on a good note, an observation for the AI doomers.

Genocides have predated the guns by millenia, and more people have died by the machete and the bayonet than by any other weapon even in the 20th century. Perhaps the 21st too.

Add disease and famine, and death by gun are a drop in the bucket.

Guns aren't a solution to violence, but they're not, in themselves, a cause of it on a large enough scale.

Mass production of guns made it possible to turn everyone into a soldier (and a target), but the absolute majority of people today have never seen war.

And while guns, by design, are harmful —

— they're also hella fun.


Unnecessarily critical take on a quality write-up

Much of the criticism of AI on HN feels driven by devs who have not fully ingested what is going with MCP, tools etc. right now as not looked deeper than making API calls to an LLM


OP's comment also seems to be firmly stuck in 2023 when you'd prompt ChatGPT or whatever. The fact that LLMs today, when strapped into an agentic harness, can do or help with all of these things (ideation, architecture, use linters, validate code, evaluate outputs, and a million other things) seems to elude them.

Dothey do requirement gatherings? Like talking to stakeholder and getting their input of what the feature should, translating business jargon to domain terms?

No.

Do they do the analysis? Removing specs that conflict with each other, validating what's possible in the technical domain and in the business domain?

No.

Do they help with design? Helping coming up with the changes that impact the current software the least, fitting in the current architecture and be maintainable in the feature.

All they do is pattern matching on your prompt and the weights they have. Not a true debate or weighing options based on the organization context.

Do they help with coding?

A lot if you're already experienced with the codebase and the domain. But that's the easiest part of the job.

Do they help with testing? Coming up with tests plan, writing test code, running them, analysing the output of the various tools and producing a cohesive report of the defects?

I don't know as I haven't seen any demo on that front.

Do they help with maintenance? Taking the same software and making changes to keep it churning on new platforms, through dependencies updates and bug fixes?

No demo so far.


Why do you think any of these should be a challenge for, say, O3/O3 pro?

You pretty much just have to ask and give them access for these things. Talking to a stakeholder and translating jargon and domain terms? Trivial. They can churn through specs and find issues, none of that seems particularly odd to ask of a decent LLM.

> Do they help with testing? Coming up with tests plan, writing test code, running them, analysing the output of the various tools and producing a cohesive report of the defects?

This is pretty standard in agentic coding setups. They'll fix up broken tests, and fix up code when it doesn't pass the test. They can add debug statements & run to find issues, break down code to minimal examples to see what works and then build back up from there.

> Do they help with maintenance? Taking the same software and making changes to keep it churning on new platforms, through dependencies updates and bug fixes?

Yes - dependency updates is probably the easiest. Have it read the changelogs, new api docs and look at failing tests, iterate to have it pass.

These things are progressing surprisingly quickly so if your experience of them is from 2024 then it's quite out of date.


As for a Demo on that front here it is via OpenAI's Codex, see https://openai.com/index/introducing-codex/ Here's the demo https://platform.openai.com/docs/codex/overview and

> Do they do requirement gatherings? Like talking to stakeholder and getting their input of what the feature should, translating business jargon to domain terms? No.

Why not? This is a translation problem so right up its alley.

Give it tool access to communicate directly with stakeholders (via email or chat) and put it in a loop to work with them until the goal is reached (stakeholders are happy). Same as a human would do.

And of course it will still need some steering by a "manager" to make sure it's building the right things.


> Why not? This is a translation problem so right up its alley.

Translating a sign can be done with a dictionary. Translating a document is often a huge amount of work due to cultural difference, so you can not make a literal translation of sentences. And sometimes terms don't map to each other. That's when you start to use metaphors (and footnotes).

Even in the same organization, the same term can mean different things. As humans we don't mind when terms have several definitions and the correct one is contextual. But software is always context free. Meaning everything is fixed at its inception and the variables govern flow, not the instruction themselves ("eval" instruction (data as code) is dangerous for a reason).

So the whole process is going from something ambiguous and context dependent, to something that isn't. And we do this by eliminating incorrect definitions. Tell me how LLMs is going to help with that when it has no sense of what correct and what it is not (aka judging truthness).


> Tell me how LLMs is going to help with that when it has no sense of what correct and what it is not (aka judging truthness).

Same way it works with humans: someone tells it what "correct" means until it gets it right.


> Dothey do requirement gatherings?

This is true, but they have helped prepare me with good questions to ask during those meetings!

> Do they do the analysis? Removing specs that conflict with each other, validating what's possible in the technical domain and in the business domain?

Yes, I have had LLMs point out missing information or conflicting information in the spec. See above about "good questions to ask stakeholders."

> Do they help with design? Helping coming up with the changes that impact the current software the least, fitting in the current architecture and be maintainable in the feature.

Yes.

I recently had a scenario where I had a refactoring task that I thought I should do, but didn’t really want to. It was cleaning up some error handling. This would involve a lot of changes to my codebase, nothing hard, but it would have taken me a while, and been very boring, and I’m trying to ship features, not polish off the perfect codebase, so I hadn’t done it, even though I still thought I should.

I was able to ask Claude “hey, how expensive would this refactoring be? how many methods would it change? What’s the before/after diffs on a simple affected place, and one of the more complex affected places look like?

Previously, I had to use my hard-won human intuition to make the call about implementing this or not. It’s very fuzzy. With Claude, I was able to very quickly quantify that fuzzy notion into something at least close to accurate: 260 method signatures. Before and after diffs look decent. And this kind of fairly mechanical transformation is something Claude can do much more quickly and just as accurately as I can. So I finally did it.

That I shipped the refactoring is one point. But the real point is that I was able to quickly focus my understanding of the problem, and make a better, more informed decision because of it. My gut was right. But now I knew it was right, without needing to actually try it out.

> Not a true debate or weighing options based on the organization context.

This context is your job to provide. They will take it into account when you provide it.

> Do they help with coding?

Yes.

> Do they help with testing? Coming up with tests plan, writing test code, running them, analysing the output of the various tools and producing a cohesive report of the defects?

Yes, absolutely.

> Do they help with maintenance? Taking the same software and making changes to keep it churning on new platforms, through dependencies updates and bug fixes?

See above about refactoring to improve quality.


+1. Some refactorings are important but just not urgent enough compared to features. Letting CC do these refactorings makes quite a difference.

At least in the case of lot of automated test coverage and typed language (Go) so it can work independently efficiently.


I mean, if you "program" (prompt) them to do those stuff, then yeah, they'll do that. But you have to consider the task just like if you handed it over to a person with absolutely zero previous context, and explain what you need from the "requirements gathering", and how it should handle that.

None of the LLMs handle any of those things by themselves, because that's not what they're designed for. They're programmable things that output text, that you can then program to perform those tasks, but only if you can figure out exactly how a human would handle it, and you codify all the things we humans can figure out by ourselves.


> But you have to consider the task just like if you handed it over to a person with absolutely zero previous context,

Which no one does. Even when hiring someone, there's the basic premise that they know how they should do the job (interns are there to learn, not to do). And then they are trained for the particular business context, with a good incentive to learn well and then do the job well.

You don't just suddenly wake up and find yourself at an unknown company being asked to code something for a jira task. And if you do find yourself in such situation, the obvious thing is to figure what's going on, not "Sure, I'll do it".


I don't understand the argument, I haven't said humans act like that, what I said is how you have to treat LLMs if you want to use it for things like that.

If you're somehow under the belief that LLMs will (or should) magically replace a person, I think you've built the wrong understanding of what LLMs are and what they can do.


I interact with tools and with people. When with people, there's a shared understanding of the goal and the context (aka, alignment as some people like to called it). With tools, there's no such context needed. Instead I need reproducible results and clear output. And if it's something that I can automate, that it will follow my instructions closely.

LLMs are obviously tools, but their parameters space is so huge that's it's difficult to provide enough to ensure reliable results. With prompting, we have unreliable answers, but with agents, you have actions being made upon those reliable answers. We had that before with people copying and pasting from LLMs output, but now the same action is being automated. And then there's the feedback loop, where the agent is taking input from the same thing it has altered (often wrongly).

So it goes like this: Ambiguous query -> unrealiable information -> agents acting -> unreliable result -> unreliable validation -> final review (which are often skipped). And then the loop.

While with normal tools: Ambiguous requirement -> detailed specs -> formal code -> validation -> report of divergence -> review (which can be skipped) . There are issues in the process (which give us bugs) but we can pinpoint where we did wrong and fix the issue.


I'm sorry, I'm very lost here, are you responding to the wrong comment or something? Because I don't see how any of that is connected to the conversation from here on up?

>>> But you have to consider the task just like if you handed it over to a person with absolutely zero previous context, and explain what you need from the "requirements gathering", and how it should handle that

The most similar thing is software. Which is a list of instructions we give to a computer alongside the data that forms the context for this particular run. Then it goes to process that data and gives us a result. The basic premise is that these instructions need to be formal so that they became context-free. The whole context is the input to the code, and you can use the code whenever.

Natural language is context dependent. And the final result depends on the participants. So what you want is a shared understanding so that instructions are interpreted the same way by every participant. Someone (or the LLM) coming in with zero context is already a failure scenario. But even with the context baked in every participant, misunderstandings will occur.

So what you want is formal notation which removes ambiguity. It's not as flexible as natural language or as expressive, but it's very good at sharing instructions and information.


No they don't do requirements gathering, they also don't cook my food and wash my clothing. Some things are out of scope for an LLM.

Yes, they can do analysis, identify conflicting specs, etc. especially with a skilled human in the loop

Yes, they help with design, though this works best if the operator has sufficient knowledge.

The LLM can help significantly by walking through the code base, explaining parts of it in variable depth.

Yes, agentic LLMs can easily write tests, run them, validate the output (again, best used with an experienced operator so that anti-patterns are spotted early).

From your posts I gather you have not yet worked with a strong LLM in an agentic harness, which you can think of as almost a general purpose automation solution that can either handle, or heavily support most if not all of your points that you have mentioned.


This is the crypto discussion again.

"All our critics are clueless morons who haven't realised the one true meaning of things".

Have you once considered that critics have tried these tools in all these combinations and found them lacking in more ways than one?


The huge gap between the people who claim "It helps me some/most of the time" and the other people who claim "I've tried everything and it's all bad" is really interesting to me.

Is it a problem of knowledge? Is it a problem of hype that makes people over-estimate their productivity? Is it a problem of UX, where it's hard to figure out how to use these tools correctly? Is it a problem of the user's skills, where low-skilled developers see lots of value but high-skilled developers see no value, or even negative value sometimes?

The experiences seem so different, that I'm having a hard time wrapping my mind around it. I find LLMs useful in some particular instances, but not all of them, and I don't see them as the second coming of Jesus. But then I keep seeing people saying they've tried all the tools, and all the approaches, and they understand prompting, yet they cannot get any value whatsoever from the tools.

This is maybe a bit out there, but would anyone (including parent) be up for sending me a screen recording of exactly what you're doing, if you're one of the people that get no value whatsoever from using LLMs? Or maybe even a video call sharing your screen?

I'm not working in the space, have no products or services to sell, only curious is why this vast gap seemingly exists, and my only motive would be to understand if I'm the one who is missing something, or there are more effective ways to help people understand how they can use LLMs and what they can use them for.

My email is on my profile if anyone is up for it. Invitation open for anyone struggling to get any useful responses from LLMs.


I think it's going to be personal. Because people define values in different ways, and the definition depends on the current context. I've used LLMs for things like shellscript, plotting with pyplot, explanations,... But always taking the output with a huge grain of salt. What I'm looking for is not the output itself, but the direction it can give me. But the only value is when I'm pressed for time and can't use a more objective and complete approach.

When you read the manual page for a program, or the documentation for a library, the things described always (99.99999...%) exist. So I can take it as objective truth. The description may be lacking, so I don't have a complete picture, but it's not pure fantasy. And if it turns out that it is, the solution is to drop it and turn back.

So when I act upon it, and the result comes back, I question my approach, not the information. And often I find the flaw quickly. It's slower initially, but the final result is something I have good confidence in.


> And often I find the flaw quickly. It's slower initially, but the final result is something I have good confidence in.

I guess what I'm looking for are people who don't have that experience, because you seem to be getting some value out of using LLMs at least, if I understand you correctly?

There are others out there who have tried the same approach, and countless of other approaches (self-declared at least) yet get 0 value from them, or negative value. These are the people I'm curious about :)


> The experiences seem so different, that I'm having a hard time wrapping my mind around it.

Because we only see very disjointed descriptions, with no attempt to quantify what we're talking about.

For every description of how LLMs work or don't work we know only some, but not all of the following:

- Do we know which projects people work on? No

- Do we know which codebases (greenfield, mature, proprietary etc.) people work on? No

- Do we know the level of expertise the people have? Is the expertise in the same domain, codebase, language that they apply LLMs to?

- How much additional work did they have reviewing, fixing, deploying, finishing etc.?

Even if you have one person describing all of the above, you will not be able to compare their experience to anyone else's because you have no idea what others answer for any of those bullet points.

And that's before we get into how all these systems and agents are completely non-deterministic, and works now may not work even 1 minute from now for the exact same problem.

And that's before we ask the question of how a senior engineer's experience with a greenfield project in React with one agent and model can even be compared to a bon-coding designer in a closed-source proprietary codebase in OCaml with a different agent and model (or even the same, because of non-determinism).


> And that's before we get into how all these systems and agents are completely non-deterministic,

And that is the main issue. For some the value is reproducible results, for others, as long as they got a good result, it's fine.

It's like coin tossing. You may want tail all the time, because that's your chosen bet. You may prefer tail, but don't mind losing money if it's head. You may not interested in either, but you're doing the tossing and wants to know the techniques that works best for getting tail. Or you're just trying and if it's tail, your reaction is only "That's interesting".

The coin itself does not matter and the tossing is just an action. The output is what get judged. And the judgment will vary based on the person doing it.

So software engineering used to be the pursuit of tail of the time (by putting the coin on the ground, not tossing it). Then LLMs users say it's fine to toss the coin, because you'll get tail eventually. And companies are now pursuing the best coin tossing techniques to get tail. And for some, when the coin tossing gives tail, they only say "that's a nice toss".


> And companies are now pursuing the best coin tossing techniques to get tail.

With the only difference that the techniques for throwing coins can be verified by comparing the results of the tosses. More generally it's known as forcing https://en.wikipedia.org/wiki/Forcing_(magic)

What we have instead is companies (and people) saying they have perfected the toss not just for a specific coin, but for any objects in general. When it's very hard to prove that it's true even for a single coin :)

That said, I really like your comment :)


>> is going with MCP, tools etc.

all these are just tools. there is nothing more to it. there is no etc.


In 2023 Huawei surprised with the Kirin 9000S in the Mate 60, this seems to get forgotten when talking about GPU moats and sanction effectiveness


It's very different to blockchain hype

I had similar skepticism initially, but I would recommend you dip toe in water on it before making judgement

The conversational/voice AI tech now dropping + the current LLMs + MCP/tools/functions to mix in vendor APIs and private data/services etc. really feels like a new frontier

It's not 100% but it's close enough for a lot of usecases now and going to change a lot of ways we build apps going forward


Probably my judgement is a bit fogged. But if I get asked about building AI into our apps just one more time I am absolutely going to drop my job and switch careers

That's likely because OG devs have been seeing the hallucination stuff, unpredicability etc. and questioning how that fits with their carefully curated perfect system

What blocked me initially was watching NDA'd demos a year or two back from a couple of big software vendors on how Agents were going to transform enterprise ... what they were showing was a complete non-starter to anyone who had worked in a corporate because of security, compliance, HR, silos etc. so I dismissed it

This MCP stuff solves that, it gives you (the enterprise) control in your own walled garden, whilst getting the gains from LLMs, voice etc. ... the sum of the parts is massive

It more likely wraps existing apps than integrates directly with them, the legacy systems becoming data or function providers (I know you've heard that before ... but so far this feels different when you work with it)


There are 2 kinds of usecases that software automates. 1) those that require accuracy and 2) those that dont (social media, ads, recommendations).

Further, there are 2 kinds of users that consume the output of software. a) humans, and b) machines.

Where LLMs shine are in the 2a usecases, ie, usecases where accuracy does not matter and humans are end-users. there are plenty of these usecases.

The problem is that LLMs are being applied to 1a, 1b usecases where there is going to be a lot of frustration.



How does MCP solve any of the problems you mentioned? The LLM still has to access your data, still doesn't know the difference between instructions and data, and still gives you hallucinated nonsense back – unless there's some truly magical component to this protocol that I'm missing.

The information returned by the MCP server is what makes it not hallucinate. That's one of the primary use cases.

> That's likely because OG devs have been seeing the hallucination stuff, unpredicability etc. and questioning how that fits with their carefully curated perfect system

That is the odd part. I am far from being part of that group of people. I‘m only 25, I joined the industry in 2018 as part of an training program in a large enterprise.

The odd part is, many of the promises are a bit Déjà-vu even for me. „Agents going to transform the enterprise“ and other promises do not seem that far off the promises that were made during the low code hype cycle.

Cynically, the more I look at the AI projects as an outsider, the more I think AI could fail in enterprises largely because of the same reason low code did. Organizations are made of people and people are messy, as a result the data is often equally messy.


Rule of thumb: the companies building the models are not selling hype. Or at least the hype is mostly justified. Everyone else, treat with extreme skepticism.

Is there anything new that’s come out in conversational/voice? Sesame Maya and Miles were kind of impressive demos, but that’s still in ’research preview’. Kyutai presented really a cool low latency open model, but I feel like we’re still closer to Siri than actually usable voice interfaces.

It's moving very fast:

https://elevenlabs.io/

https://layercode.com/ (https://x.com/uselayercode has demos)

Have you used the live mode on the Gemini App (or stream on AI Studio)?


SaaS companies tended to need material engineering resource due to the software stacks and squad-style team structures in place -- also leaned heavily on costly metered infra

I'm not seeing anything like the same level of heads or stack complexity in this wave (Vercel, Firebase etc.), and the vendors involved get cheaper every day ... along with increasing ability to run models locally with no metered costs at all


If you're in the AI space you need roughly the same infrastructure as any other SaaS PLUS your LLM costs. Take a look at AWS Bedrock costs [1] and you'll soon realize your costs can escalate rapidly unlike traditional SaaS infra which is easier (er, less difficult) to predict costs.

[1] https://aws.amazon.com/bedrock/pricing/


The dirty secret of an awful lot of these LLM SAAS companies is that AWS are giving them tens of thousands of dollars to bootstrap, which they are paid back for with 8 figure investments from VCs. Anyone who is putting their own money on the line for anything other than the very first $100 or so for a PoC is being conned.


How?

Now you need to deal with all the traditional infra, plus a bunch of specific infra dealing with LLM apps, even if you’re just a wrapper using vendor APIs.

How are things in any way simplified? I only see more layers of complexity.


It's deeper than the security issue

You could have two different packages in a build doing similar things -- one uses less memory but is slower to compute than the other -- so used selectively by scenario from previous experience in production

If someone unfamiliar with the build makes a change and the assistant swaps the package used in the change -- which goes unnoticed as the package itself is already visible and the naming is only slightly different, it's easy to see how surprises can happen

(I've seen o3 do this every time the prompt was re-run in this situation)


Migration to fly.io is simple enough ... it is much closer to the original Heroku both technically and as company (if you ever need to contact them)


For me the heroku database and heroku add-ons were essential, if I was going to use Heroku. Without that I may as well use a IaaS.


That is different things though -- you can replace functions with AI tech without needing AI to write all the code and that is happening in a big way right now -- Krishna was talking about reducing customer facing roles by around 30% over 5 years in that earlier statement

As of right now the code assistants mostly just make existing coders more productive -- predicting 20 or 30% of new code in near term will be generated is not unreasonable -- 90% is a stretch as has been discussed here many times


Do really think LLM vendors that download 80TB+ of data over torrents are going to be labeling their crawler agents correctly and running them out of known datacenters?


The ones I noticed in my logfiles behave impeccably: retrieve robots.txt every week or so and act on it.

(I noticed Claude, OpenAI and a couple of others whose names were less familiar to me.)


Apparently they use smart appliances to scrape websites from residential accounts.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: