More

mikeocool · 2025-07-05T16:39:04 1751733544

When I sold some shares in my company, it sure was nice to not pay any taxes thanks to QSBS. But it’s sort of an absurd handout to rich people — I have a hard time believing investment money would flee the US if early stage investors/founders had to pay long term cap gains on their first $10M of gains (after all, we’d still have carried interest to keep the VCs happy).

It’s also already really easy to multiply the limit, by gifting stock to your spouse, kids, or a trust — all of which can be done just before you sell and keep the benefit. So raising that limit just makes it more absurd.

Though, if you’re an employee at an early stage startup and you can afford it/stomach the risk, QSBS is a good reason to exercise your options early.

jedberg · 2025-07-05T17:56:04 1751738164

It is certainly a handout to rich people, but it does serve a purpose. If you have a choice to invest in a startup vs a more stable investment, the $10M (or now $15M) in tax free gains is a strong incentive to choose the startup investment over something else.

And at the end of the day, small businesses usually drive the most innovation, so getting rich people to direct their money into startups instead of big companies is good for the country as a whole.

mikeocool · 2025-07-03T00:50:07 1751503807

Given all that university students are asked to pay for already, it would seem rather odd to ask them to also pay for the world's cancer research.

mikeocool · 2025-06-30T19:00:12 1751310012

What exactly would this report reveal? Laptops that have some sort gremlin in them resulting in lots of repairs over time?

If I worked for an organization that deployed or sold large numbers of used PCs -- and that problem cropped up frequently enough to matter, I think my take away would be: "let's stop deploying/selling used HP laptops and switch to a more reliable brand," not "let's try to use this fancy reporting to identify them before they get deployed."

mikeocool · 2025-06-26T02:42:55 1750905775

> keys are just hallucinated based on expectations of what the keys would be called and are either wrong or they flat out don't exist

To be fair this is also how I generally try to write IAM configs without AI.

mikeocool · 2025-06-21T19:54:29 1750535669

Personally, the best soups I’ve ever had were not made in kitchens that were optimized for efficiency or automation, they were optimized for quality.

They weren’t cheap soups, but they sure were good.

awb · 2025-06-21T21:04:51 1750539891

Luxury goods and staple goods have distinct optimizations, both viable for generating profits and economic utility.

A high end soup and an affordable soup might be serving two different markets.

dyauspitr · 2025-06-21T20:56:49 1750539409

Quality is a function of the ingredients used and the correct preparation. Neither of these things are something machines can’t do.

mikeocool · 2025-06-19T22:40:46 1750372846

How about release 2.0 and then don’t release 2.1 for a LONG time.

I get that in the early days such a fast paced release/EOL schedule made sense. But now something that operates at such a low level shouldn’t require non-security upgrades every 3 months and have breaking API changes at least once a year.

mikeocool · 2025-06-18T13:30:21 1750253421

This very much aligns with my experience — I had a case yesterday where opus was trying to do something with a library, and it encountered a build error. Rather than fix the error, it decided to switch to another library. It then encountered another error and decided to switch back to the first library.

I don’t think I’ve encountered a case where I’ve just let the LLM churn for more than a few minutes and gotten a good result. If it doesn’t solve an issue on the first or second pass, it seems to rapidly start making things up, make totally unrelated changes claiming they’ll fix the issue, or trying the same thing over and over.

Workaccount2 · 2025-06-18T14:09:51 1750255791

They poison their own context. Maybe you can call it context rot, where as context grows and especially if it grows with lots of distractions and dead ends, the output quality falls off rapidly. Even with good context the rot will start to become apparent around 100k tokens (with Gemini 2.5).

They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.

Right now I work around it by regularly making summaries of instances, and then spinning up a new instance with fresh context and feed in the summary of the previous instance.

simonw · 2025-06-18T23:17:45 1750288665

I think you just coined "context rot", what an excellent term! Quoted you on my blog https://simonwillison.net/2025/Jun/18/context-rot/

lovich · 2025-06-19T00:01:25 1750291285

I don’t know why, but going out of your way to make sure the coining of this is attributed to a random user on the internet made me incredibly nostalgic for what the pre Web 2.0 internet was like sans the 4chans, liveleak, and their forebears on usenet

gcr · 2025-06-19T02:06:53 1750298813

maybe someday soon, LLMs will learn to smoothly forget their own irrelevant context

imagine instead of predicting just the next token, the LLM predicts a mask over the previous tokens, that is then thresholded and only “relevant” tokens are kept in the next inference

one key distinction between humans and LLMs is that humans are excellent at forgetting irrelevant data. we forget tens of times a second and only keep what's necessary

bhl · 2025-06-18T23:24:36 1750289076

I always referred it more as context degradation, but rot is more visceral.

jerpint · 2025-06-19T00:45:18 1750293918

History in the making

hnhn34 · 2025-06-19T03:03:35 1750302215

Include me in the screenshot.

codeflo · 2025-06-18T14:42:58 1750257778

I wonder to what extent this might be a case where the base model (the pure token prediction model without RLHF) is "taking over". This is a bit tongue-in-cheek, but if you see a chat protocol where an assistant makes 15 random wrong suggestions, the most likely continuation has to be yet another wrong suggestion.

People have also been reporting that ChatGPT's new "memory" feature is poisoning their context. But context is also useful. I think AI companies will have to put a lot of engineering effort into keeping those LLMs on the happy path even with larger and larger contexts.

potatolicious · 2025-06-18T20:31:40 1750278700

I think this is at least somewhat true anecdotally. We do know that as context length increases, adherence to the system prompt decreases. Whether that de-adherence is reversion to the base model or not I'm not really qualified to say, but it certainly feels that way from observing the outputs.

Pure speculation on my part but it feels like this may be a major component of the recent stories of people being driven mad by ChatGPT - they have extremely long conversations with the chatbot where the outputs start seeming more like the "spicy autocomplete" fever dream creative writing of pre-RLHF models, which feeds and reinforces the user's delusions.

Many journalists have complained that they can't seem to replicate this kind of behavior in their own attempts, but maybe they just need a sufficiently long context window?

steveklabnik · 2025-06-18T15:29:36 1750260576

> They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.

In Claude Code you can use /clear to clear context, or /compact <optional message> to compact it down, with the message guiding what stays and what goes. It's helpful.

libraryofbabel · 2025-06-18T15:49:48 1750261788

Also in Claude Code you can just press <esc> a bunch of times and you can backtrack to an earlier point in the history before the context was poisoned, and re-start from there.

Claude has some amazing features like this that aren’t very well documented. Yesterday I just learned it writes sessions to disk and you can resume them where you left off with -continue or - resume if you accidentally close or something.

drewnick · 2025-06-18T19:45:15 1750275915

Thank you! This just saved me after closing laptop and losing a chat in VS Code. Cool feature and always a place where Clause Code UX was behind chat - being able to see history. "/continue" saved me ~15 minutes of re-establishing the planning for a new feature.

Also loving the shift + tab (twice) to enter plan mode. Just adding here in case it helps anyone else.

Aeolun · 2025-06-18T22:20:58 1750285258

Claude code should really document this stuff in some kind of tutorial. There’s too much about code that I need to learn from random messages on the internet.

steveklabnik · 2025-06-18T19:57:59 1750276679

> Claude has some amazing features like this that aren’t very well documented.

Yeah, it seems like they stealth ship a lot. Which is cool, but can sometimes lead to a future that's unevenly distributed, if you catch my drift.

darrenspencer95 · 2025-06-30T02:42:51 1751251371

Super helpful! Thanks!

no_wizard · 2025-06-18T19:45:12 1750275912

I feel like as an end user I’d like to be able to do more to shape the LLM behavior. For example, I’d like to flag the dead end paths so they’re properly dropped out of context and not explored again, unless I as a user clear the flag(s).

I know there is work being done on LLM “memory” for lack of a better term but I have yet to see models get more responsive over time with this kind of feedback. I know I can flag it but right now it doesn’t help my “running” context that would be unique to me.

I have a similar thought about LLM “membranes”, which combines the learning from multiple users to become more useful, I am keeping a keen eye on that as I think that will make them more useful on a organizational level

nomel · 2025-06-19T00:58:40 1750294720

Any good chat client will let you not only modify previous messages in place, but also modify the LLM responses, and regenerate from any point.

xwolfi · 2025-06-19T01:41:44 1750297304

At some point, shouldn't these things start understanding what they're doing ?

voxelghost · 2025-06-19T02:04:44 1750298684

I often think that context should be a tree, where you can say - lets prune this whole branch, it didn't lead to anything good.

darepublic · 2025-06-18T18:03:56 1750269836

There are just certain problems that they cannot solve. Usually when there is no clear example in its pretraining or discoverable on the net. I would say the reasoning capabilities of these models are pretty shallow, at least it seems that way to me

dingnuts · 2025-06-18T18:08:35 1750270115

They can't reason at all. The language specification for Tcl 9 is in the training data of the SOTA models but there exist almost no examples, only documentation. Go ahead, try to get a model to write Tcl 9 instead of 8.5 code and see for yourself. They can't do it, at all. They write 8.5 exclusively, because they only copy. They don't reason. "reasoning" in LLMs is pure marketing.

nomel · 2025-06-19T01:09:07 1750295347

It becomes clear that it's just statistics once you get near a statistically significant "attractor".

A silly example is any of the riddles where you just simplify it to an obvious degree and the LLM can't get it (mostly gone with recent big models), like: "A man, a sheep, and a boat need to get across a river. How can they do this safely without the sheep being eaten".

A more practically infuriating example is when you want to do something slightly different than a very common problem. The LLM might eventually get it right, after too much guidance, but then it'll slowly revert back to the "common" case. For example, replacing whole chunks of code with whatever common thing when you tell it add comments. This happens frequently to me with super basic vector math.

kossae · 2025-06-18T14:43:44 1750257824

This is my experience as well, and for now comes down to a workflow optimization. As I feel the LLM getting off track, I start a brand new session with useful previous context pasted in from my previous session. This seems to help steer it back to a decent solution, but agreed it would be nice if this was more automated based off of user/automated feedback (broken unit test, "this doesn't work", etc.)

kazinator · 2025-06-18T17:02:03 1750266123

"Human Attention to the Right Subset of the Prior Context is All You Need"

joshstrange · 2025-06-21T19:06:10 1750532770

Great term, I never put it into words but I feel this deeply. I rarely go back and forth more than 2-3 times with an LLM before ejecting to a new conversation. I’ve just been burned so much by old context informing the conversation later to my detriment. As soon as it gets something wrong I know the “rot” has set in and I need to start over (bringing over the best parts).

But, OpenAI and friends should let me purge my questions and, more importantly, the LLM response from the chat. More often than not, it’s poisoning itself with bad ideas, flip-flopping, etc. I hate having to pick up and move to a new chat but if I don’t the conversation will only go downhill.

HeWhoLurksLate · 2025-06-18T14:50:15 1750258215

I've found issues like this happen extremely quickly with ChatGPT's image generation features - if I tell it to put a particular logo in, the first iteration looks okay, while anything after that starts to look more and more cursed / mutant.

nojs · 2025-06-18T15:56:45 1750262205

https://www.astralcodexten.com/p/the-claude-bliss-attractor

rvnx · 2025-06-18T14:52:41 1750258361

I've noticed something, even if you ask to edit a specific picture, it will still used the other pictures in the context (and this is somewhat unwanted)

vunderba · 2025-06-18T15:14:38 1750259678

gpt-image-1 is unfortunately particular vulnerable to this problem. The more you want to change the initial image - the better off you'd honestly be just starting an entirely new conversation.

OtherShrezzing · 2025-06-18T15:26:07 1750260367

>They really need to figure out a way to delete or "forget" prior context, so the user or even the model can go back and prune poisonous tokens.

This is possible in tools like LM Studio when running LLMs locally. It's a choice by the implementer to grant this ability to end users. You pass the entire context to the model in each turn of the conversation, so there's no technical reason stopping this feature existing, besides maybe some cost benefits to the inference vendor from cache.

5mv2 · 2025-06-19T11:35:06 1750332906

Context summarization will be natively added soon.

It's already the case on tools like block.github.io/goose:

```

Summarize Conversation This will summarize your conversation history to save context space.

Previous messages will remain visible but only the summary will be included in the active context for Goose. This is useful for long conversations that are approaching the context limit.

```

iLoveOncall · 2025-06-18T22:22:47 1750285367

> They really need to figure out a way to delete or "forget" prior context

This is already pretty much figured out: https://www.promptingguide.ai/techniques/react

We use it at work and we never encounter this kind of issues.

barapa · 2025-06-18T23:57:09 1750291029

How does ReAct address this? Unless one of the actions is deleting part of the message history...

qwertox · 2025-06-19T09:14:00 1750324440

My same experience with Gemini 2.5.

It mostly happens when you pass it similar but updated code, for some reason it then doesn't really see the newest version and reasons over obsolete content.

I've had one chat recover from this, though.

dr_dshiv · 2025-06-18T22:53:14 1750287194

Context rot! Love it. One bad apple spoils the barrel.

I try to keep hygiene with prompts; if I get anything bad in the result, I try to edit my prompts to get it better rather than correcting in conversation.

rxzzh · 2025-06-19T09:30:37 1750325437

What a good concept! Maybe there will be researchs focus on let LLM effectively forget things!

autobodie · 2025-06-18T17:59:08 1750269548

If so, that certainly fits with my experiences.

eplatzek · 2025-06-18T18:29:23 1750271363

Honestly that feels a like a human.

After hitting my head against a wall with a problem I need to stop.

I need to stop and clear my context. Go a walk. Talk with friends. Switch to another task.

vadansky · 2025-06-18T14:27:24 1750256844

I had a particularly hard parsing problem so I setup a bunch of tests and let the LLM churn for a while and did something else.

When I came back all the tests were passing!

But as I ran it live a lot of cases were still failing.

Turns out the LLM hardcoded the test values as “if (‘test value’) return ‘correct value’;”!

ffsm8 · 2025-06-18T14:58:21 1750258701

Missed opportunity for the LLM, could've just switched to Volkswagen CI

https://github.com/auchenberg/volkswagen

EGreg · 2025-06-18T15:17:27 1750259847

This is gold lol

artursapek · 2025-06-18T21:41:28 1750282888

lmfao

mikeocool · 2025-06-18T17:04:39 1750266279

Yeah — I had something like this happen as well — the llm wrote a half decent implementation and some good tests, but then ran into issues getting the tests to pass.

It then deleted the entire implementation and made the function raise a “not implemented” exception, updated the tests to expect that, and told me this was a solid base for the next developer to start working on.

bluefirebrand · 2025-06-18T14:44:37 1750257877

This is the most accurate Junior Engineer behavior I've heard LLMs doing yet

vunderba · 2025-06-18T15:17:11 1750259831

I've definitely seen this happen before too. Test-driven development isn't all that effective if the LLM's only stated goal is to pass the tests without thinking about the problem in a more holistic/contextual manner.

matsemann · 2025-06-18T17:53:12 1750269192

Reminds me of trying to train a small neural net to play Robocode ~10+ years ago. Tried to "punish" it for hitting walls, so next morning I had evolved a tanks that just stood still... Then punished it for standing still, ended up with a tanks just vibrating, alternating moving back and forth quickly, etc.

vunderba · 2025-06-18T18:23:59 1750271039

That's great. There's a pretty funny example of somebody training a neural net to play Tetris on the Nintendo entertainment system, and it quickly learned that if it was about to lose to just hit pause and leave the game in that state indefinitely.

amlib · 2025-06-18T20:38:01 1750279081

I guess it came to the same conclusion as the computer in War Games, "The only way to win is not to play"

insane_dreamer · 2025-06-18T21:17:35 1750281455

While I haven't run into this egregious of an offense, I have had LLMs either "fix" the unit test to pass with buggy code, or, conversely, "fix" the code to so that the test passes but now the code does something different than it should (because the unit test was wrong to start with).

FuckButtons · 2025-06-19T03:27:28 1750303648

Seems like property based tests would be good for llms, it’s a shame that half the time coming up with a good property test can be as hard as writing the code.

peacebeard · 2025-06-18T14:10:29 1750255829

Very common to see in comments some people saying “it can’t do that” and others saying “here is how I make it work.” Maybe there is a knack to it, sure, but I’m inclined to say the difference between the problems people are trying to use it on may explain a lot of the difference as well. People are not usually being too specific about what they were trying to do. The same goes for a lot of programming discussion of course.

alganet · 2025-06-18T14:57:40 1750258660

> People are not usually being too specific about what they were trying to do. The same goes for a lot of programming discussion of course.

In programming, I already have a very good tool to follow specific steps: _the programming language_. It is designed to run algorithms. If I need to be specific, that's the tool to use. It does exactly what I ask it to do. When it fails, it's my fault.

Some humans require algorithmic-like instructions too. Like cooking a recipe. However, those instructions can be very vague and a lot of humans can still follow it.

LLMs stand on this weird place where we don't have a clue in which occasions we can be vague or not. Sometimes you can be vague, sometimes you can't. Sometimes high level steps are enough, sometimes you need fine-grained instructions. It's basically trial and error.

Can you really blame someone for not being specific enough in a system that only provides you with a text box that offers anthropomorphic conversation? I'd say no, you can't.

If you want to talk about how specific you need to prompt an LLM, there must be a well-defined treshold. The other option is "whatever you can expect from a human".

Most discussions seem to juggle between those two. LLMs are praised when they accept vague instructions, but the user is blamed when they fail. Very convenient.

peacebeard · 2025-06-18T17:47:09 1750268829

I am not saying that people were not specific in their instructions to the LLM, but rather that in the discussion they are not sharing specific details of their success stories or failures. We are left seeing lots of people saying "it worked for me" and "it didn't work for me" without enough information to assess what was different in those cases. What I'm contending is that the essential differences in the challenges they are facing may be a primary factor, while these discussions tend to focus on the capabilities of the LLM and the user.

alganet · 2025-06-18T19:02:40 1750273360

> they are not sharing specific details of their success stories or failures

Can you blame them for that?

For other products, do you think people contact customer support with an abundance of information?

Now, consider what these LLM products promise to deliver. Text box, answer. Is there any indication that different challenges might yield difference in the quality of outcome? Nope. Magic genie interface, it either works or it doesn't.

heyitsguay · 2025-06-18T14:23:45 1750256625

I've noticed this a lot, too, in HN LLM discourse.

(Context: Working in applied AI R&D for 10 years, daily user of Claude for boilerplate coding stuff and as an HTML coding assistant)

Lots of "with some tweaks i got it to work" or "we're using an agent at my company", rarely details about what's working or why, or what these production-grade agents are doing.

Aeolun · 2025-06-18T22:25:31 1750285531

I ask it to build it to 3d voxel engine in Rust, and it just goes off and do it. Same for a vox file parser.

Sure, it takes some creative prompting, and a lot of turns to get it to settle on the proper coordinate system for the whole thing, but it goes ahead and does it.

This took me two days so far. Unfortunate, the scope of the thing is now so large that the quality rapidly starts to degrade.

SchemaLoad · 2025-06-18T23:35:31 1750289731

Building something from scratch where there are plenty of examples public on github seems to be the easiest case. Put these agents on a real existing codebase and ask them to fix a bug and they become useless.

twosdai · 2025-06-19T01:10:26 1750295426

I think this would vary a lot between "real" code basis. I have had a lot of success when using somewhat stricter frameworks, with typed interfaces, and requiring well defined unit tests, and modules which ecapsulate a lot of logic.

Basically like Java Spring Boot or NestJS type projects.

Aeolun · 2025-06-19T22:28:34 1750372114

I like that skeptical people always first ask for examples, then when someone gives them, they immediately switch to “well, sure, but that one is easy!”

nico · 2025-06-18T14:55:03 1750258503

> Rather than fix the error, it decided to switch to another library

I’ve had a similar experience, where instead of trying to fix the error, it added a try/catch around it with a log message, just so execution could continue

accrual · 2025-06-18T14:15:55 1750256155

I've had some similar experiences. While I find agents very useful and able to complete many tasks on its own, it does hit roadblocks sometimes and its chosen solution can be unusual/silly.

For example, the other day I was converting models but was running out of disk space. The agent decided to change the quantization to save space when I'd prefer it ask "hey, I need some more disk space". I just paused it, cleared some space, then asked the agent to try the original command again.

skerit · 2025-06-18T13:53:55 1750254835

> I don’t think I’ve encountered a case where I’ve just let the LLM churn for more than a few minutes and gotten a good result.

Is this with something like Aider or CLine?

I've been using Claude-Code (with a Max plan, so I don't have to worry about it wasting tokens), and I've had it successfully handle tasks that take over an hour. But getting there isn't super easy, that's true. The instructions/CLAUDE.md file need to be perfect.

nico · 2025-06-18T14:44:00 1750257840

> I've had it successfully handle tasks that take over an hour

What kind of tasks take over an hour?

aprilthird2021 · 2025-06-18T17:09:55 1750266595

You have to give us more about your example of a task that takes over an hour with very detailed instruction. That's very intriguing

mtalantikite · 2025-06-18T15:46:55 1750261615

I had Claude Code deep inside a change it was trying to make, struggling with a test that kept failing, and then decided to delete the test to make the test suite pass. We've all been there!

I generally treat all my sessions with it as a pairing session, and like in any pairing session, sometimes we have to stop going down whatever failing path we're on, step all the way back to the beginning, and start again.

nojs · 2025-06-18T15:52:07 1750261927

> decided to delete the test to make the test suite pass

At least that’s easy to catch. It’s often more insidious like “if len(custom_objects) > 10:” or “if object_name == ‘abc’” buried deep in the function, for the sole purpose of making one stubborn test pass.

Aeolun · 2025-06-18T22:28:42 1750285722

Hah, I got “All this async stuff looks really hard, let’s just replace it with some synchronous calls”.

“Claude, this is a web server!”

“My apologies… etc.”

akomtu · 2025-06-18T16:13:13 1750263193

Claude Doctor will hopefully do better.

enraged_camel · 2025-06-18T13:41:53 1750254113

I've actually thought about this extensively, and experimented with various approaches. What I found is that the quality of results I get, and whether the AI gets stuck in the type of loop you describe, depends on two things: how detailed and thorough I am with what I tell it to do, and how robust the guard rails I put around it are.

To get the best results, I make sure to give detailed specs of both the current situation (background context, what I've tried so far, etc.) and also what criteria the solution needs to satisfy. So long as I do that, there's a high chance that the answer is at least satisfying if not a perfect solution. If I don't, the AI takes a lot of liberties (such as switching to completely different approaches, or rewriting entire modules, etc.) to try to reach what it thinks is the solution.

prmph · 2025-06-18T14:09:30 1750255770

But don't they keep forgetting the instructions after enough time have passed? How do you get around that? Do you add an instruction that after every action it should go back and read the instructions gain?

enraged_camel · 2025-06-18T15:30:02 1750260602

They do start "drifting" after a while, at which point I export the chat (using Cursor), then start a new chat and add the exported file and say "here's the previous conversation, let's continue where we left off". I find that it deals with the transition pretty well.

It's not often that I have to do this. As I mentioned in my post above, if I start the interaction with thorough instructions/specs, then the conversation concludes before the drift starts to happen.

Wowfunhappy · 2025-06-18T17:15:52 1750266952

> I don’t think I’ve encountered a case where I’ve just let the LLM churn for more than a few minutes and gotten a good result.

I absolutely have, for what it's worth. Particularly when the LLM has some sort of test to validate against, such as a test suite or simply fixing compilation errors until a project builds successfully. It will just keep chugging away until it gets it, often with good overall results in the end.

I'll add that until the AI succeeds, its errors can be excessively dumb, to the point where it can be frustrating to watch.

civilian · 2025-06-18T17:20:03 1750267203

Yeah, and I have a similar experience watching junior devs try to get things working-- their errors can be excessively dumb :D

qazxcvbnmlp · 2025-06-18T13:41:21 1750254081

when this happens I do thew following

1) switch to a more expensive llm and ask it to debug: add debugging statements, reason about what's going on, try small tasks, etc 2) find issue 3) ask it to summarize what was wrong and what to do differently next time 4) copy and paste that recommendation to a small text document 5) revert to the original state and ask the llm to make the change with the recommendation as context

rurp · 2025-06-18T14:02:01 1750255321

This honestly sounds slower than just doing it myself, and with more potential for bugs or non-standard code.

I've had the same experience as parent where LLMs are great for simple tasks but still fall down surprisingly quickly on anything complex and sometimes make simple problems complex. Just a few days ago I asked Claude how to do something with a library and rather than give me the simple answer it suggested I rewrite a large chunk of that library instead, in a way that I highly doubt was bug-free. Fortunately I figured there would be a much simpler answer but mistakes like that could easily slip through.

ziml77 · 2025-06-18T15:33:39 1750260819

Yeah if it gets stuck and can't easily get itself unstuck, that's when I step in to do the work for it. Otherwise it will continue to make more and more of a mess as it iterates on its own code.

qazxcvbnmlp · 2025-06-22T12:20:33 1750594833

it is slower than doing it yourself in the following scenarios 1) working in a language you are familiar in but has speed advantages in other areas 1) working in languages and or frameworks you are not familiar in 2) by documenting where the llm went wrong, you can append that to its rules and avoid the issue next time

nico · 2025-06-18T14:48:08 1750258088

> 1) switch to a more expensive llm and ask it to debug

You might not even need to switch

A lot of times, just asking the model to debug an issue, instead of fixing it, helps to get the model unstuck (and also helps providing better context)

onlyrealcuzzo · 2025-06-18T13:54:20 1750254860

> If it doesn’t solve an issue on the first or second pass, it seems to rapidly start making things up, make totally unrelated changes claiming they’ll fix the issue, or trying the same thing over and over.

Sounds like a lot of employees I know.

Changing out the entire library is quite amusing, though.

Just imagine: I couldn't fix this build error, so I migrated our entire database from Postgres to MongoDB...

KronisLV · 2025-06-18T22:07:24 1750284444

That is amusing but I remember HikariCP in an older Java project having issues with DB connections against an Oracle instance. Thing is, no settings or debugging could really (easily) narrow it down, whereas switching to DBCP2 both fixed whatever was the stability issue in that particular pairing, as well as has nice abandoned connection tracking too. Definitely the quick and dirty solution that still has good results.

mathattack · 2025-06-18T14:11:28 1750255888

It may be doing the wrong thing like an employee, but at least it's doing it automatically and faster. :)

butterknife · 2025-06-18T14:08:47 1750255727

Thanks for the laughs.

didgeoridoo · 2025-06-18T14:28:44 1750256924

Probably had “MongoDB is web scale” in the training set.

ninetyninenine · 2025-06-19T03:16:23 1750302983

I got an idea. Context compression. Once the context reaches a certain size threshold have the LLM summarize it into bulletpoints then start a new session with that summary as the context.

Humans as well don’t remember the entire context either. For your case the summary already says tried library A and B and it didn’t work, it’s unlikely the LLM will repeat library A given that the summary explicitly said it was attempted.

I think what happens is that if the context gets to large the LLM sort of starts rambling or imitating rambling styles it finds online. The training does not focus on not rambling and regurgitation so the LLM is not watching too hard for that once the context gets past a certain length. People ramble too and we repeat shit a lot.

insane_dreamer · 2025-06-18T18:49:29 1750272569

I have the same experience. When it starts going down the wrong path, it won't switch paths. I have to intervene, put my thinking cap on, and tell the agent to start over from scratch and explore another path (which I usually have to define to get them started in the right direction). In the end, I'm not sure how much time I've actually saved over doing it myself.

the__alchemist · 2025-06-18T13:32:20 1750253540

This is consistent with my experience as well.

smeeth · 2025-06-18T22:00:18 1750284018

I suspect its something to do with the following:

When humans get stuck solving problems they often go out to acquire new information so they can better address the barrier they encountered. This is hard to replicate in a training environment, I bet its hard to let an agent search google without contaminating your training sample.

fcatalan · 2025-06-18T13:39:05 1750253945

I brought over the source of the Dear imgui library to a toy project and Cline/Gemini2.5 hallucinated the interface and when the compilation failed started editing the library to conform with it. I was all like: Nono no no no stop.

BoiledCabbage · 2025-06-18T18:11:38 1750270298

Oh man that's good - next step create a PR to push it up stream! Everyone can benefit from its fixes.

Aeolun · 2025-06-18T22:19:00 1750285140

I feel like they might eventually arrive at the right solution, but generally, interrupting it before it goes off on a wild tangent saves you quite a bit of time.

reactordev · 2025-06-18T16:22:27 1750263747

This happens in real life too when a dev builds something using too much copy pasta and encounters a build error. Stackoverflow was founded on this.

EnPissant · 2025-06-18T16:53:30 1750265610

Where you using Claude Code or something else? I've had very good luck with Claude Code not doing what you described.

dylan604 · 2025-06-18T16:01:53 1750262513

> it encountered a build error.

does this mean that even AI gets stuck in dependency hell?

hhh · 2025-06-18T16:13:39 1750263219

of course, i’ve even had them actively say they’re giving up after 30 turns

mikeocool · 2025-06-08T20:16:35 1749413795

RAID isn’t backup - but in my years running computers at my house I’ve been lucky enough to lose zero machines to theft, water damage, fire, etc. but I have had many hard drives fail.

Way more convenient to just swap out a drive then to swap out a drive and restore from backup.

PhilipRoman · 2025-06-08T20:50:22 1749415822

Interesting, I've had the exact opposite experience. My oldest HDD from 2007 is still going strong. Haven't had even a single micro SD card fail in a RPI. I built some fancy backup infrastructure for myself based on a sharded hash addressed database but so far have only used the backups to recover from "Layer 8" issues :)

I had a look at my notes and so far the only unexpected downtime has been due to 1x CMOS battery running out after true power off, 1x VPS provider randomly powering off my reverse proxy, 2x me screwing around with link bonding (connections always started to fail a few hours later, in middle of night).

mikeocool · 2025-06-02T16:18:43 1748881123

Though arguably cloud infra made it so that a lot more companies who never would have built out a data center or leased a chunk of space in one were spinning up some serious infra in AWS or Azure -- and thus hiring at least 1-2 devops engineers.

Before the end of zero interest rate policy, all the sysadmins I knew who the made the transition to devops were never stuck looking for a job for long.

mikeocool · 2025-06-01T18:17:24 1748801844

The Aurora app on iOS can set to send a critical notification when you’re likely to see it in your location.

It alerted me (in New York) this morning at about 4AM — though I slept through it.