Interactivity and liveness in programming deserves to be discussed far more often than it is on front-page of hacker news, but excited there are multiple ongoing threads!
I'm a very strong supporter of interactive blogposts as well. Obviously https://ciechanow.ski/ is leader here - being able to mess with something to build intuition is huge.
Lisp is very well-suited to live development due to code being data, but live development doesn't need to be lispy.
I built a live development extension for Love2D which lets you do graphics livecoding (both lua and glsl) in real-time - every keystroke updating the output, (if a valid program).
So many cool things once you break down the barrier between editor and running program.
I've also asked myself the question of, what would a language look like that was natively built for live development, and built out a prototype - though it's definitely a sandbox so haven't posted it anywhere yet, other than demos on mastadon.
Oh wow, just had to log in and give you a high-five for livelove because this is the first I've heard of it and it sounds like the sort of thing I absolutely need to try out.
I remember giving Love2D a go a couple of years ago with Fennel and the lack of such a thing sent me grumbling back to Common Lisp. I'd never even have thought of building that functionality in Love/Lua myself - assuming it's something that the runtime just didn't support - and it absolutely would never have occurred to me to use LSP to set it up. I've not even used it yet and it's already doing things to my brain, so thanks!
I guess the prevailing worldview is that "recompile everything and re-run" is good enough if it takes 2 seconds. But agreed that it just "feels" different when you're doing live in lisp... I miss Emacs!
Recompile and hot reload, maybe. 2 seconds if you're very lucky. Many setups are much slower. I've seen some really cool projects over the last few years- things like tcc + hot reload that have really good turn around time.
But "live" is a whole different thing. And most languages aren't built with the expectation that you'll be patching it while it's running - at least as standard operating procedure and without nuking state.
And that's a key part.
I think you should be able to build an app and develop a game without ever having to restart them.
Well, for me it’s not enough because I need to get back to where I was, repeating the same actions so it gets to the same state. With live dev I don’t need this, or a history replay method. I only update the running code. Heck I could also update the in memory var too if so I want.
Parent comment isn't asking how data is requested from the back-end.
GP comment is (seemingly) describing keeping an entirely client side instance (data stored locally / in memory) snapshot of the back-end database.
Parent comment is asking how the two are kept in sync.
It's hard to believe it would be the method you're describing and take 25ms.
If you're doing http range requests, that suggests you're reading from a file which means object storage or disk.
I have to assume there is something getting triggered when back end is updating to tell the client to update their instance. (Which very well could just be telling it to execute some sql to get the new / updated information it needs)
Or the data is entirely in memory on the back end in an in memory duckdb instance with the latest data and just needs to retrieve it / return it from memory.
I saw pretty good reasoning quality with phi-4-mini. But alright - I’ll still run some tests with qwen2.5-coder and plan to add support for it next. Would be great to compare them side by side in practical shell tasks. Thanks so much for the pointer!
Is this the widely used term? Do you know of any open source models fine-tuned as an "inner loop" / native agentic llm? Or what the training process looks like?
I don't see why any model couldn't be fine-tuned to work this way - i.e. tool use doesn't need to be followed by an EOS token or something - it could just wait for an output (or even continue with the knowledge there's an open request, and to take action when it comes back)
Looks like someone used a tool which generates a landing page for you (the product they used left an image that advertises that company). There's no product here / violates Show HN.
If you look at the Privacy Policy or Terms pages, they call the site ActionFigureGenerator, and they lack the styling of the homepage. There's even a leftover image from the other site on the landing page. Both sites are similar in design.
So either both sites are run by the same person and they copied their own template, or they stole the website design outright. Given the broken styling on the privacy and terms pages, I'm leaning towards stolen.
Surprised that "controlling cost" isn't a section in this post. Here's my attempt.
---
If you get a hang of controlling costs, it's much cheaper. If you're exhausting the context window, I would not be surprised if you're seeing high cost.
Be aware of the "cache".
Tell it to read specific files (and only those!), if you don't, it'll read unnecessary files, or repeatedly read sections of files or even search through files.
Avoid letting it search - even halt it. Find / rg can have a thousands of tokens of output depending on the search.
Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT.
The cache also goes away after 5-15 minutes or so (not sure) - so avoid leaving sessions open and coming back later.
Never use /compact (that'll bust cache, if you need to, you're going back and forth too much or using too many files at once).
Don't let files get too big (it's good hygiene too) to keep the context window sizes smaller.
Have a clear goal in mind and keep sessions to as few messages as possible.
Write / generate markdown files with needed documentation using claude.ai, and save those as files in the repo and tell it to read that file as part of a question.
I'm at about ~$0.5-0.75 for most "tasks" I give it. I'm not a super heavy user, but it definitely helps me (it's like having a super focused smart intern that makes dumb mistakes).
If i need to feed it a ton of docs etc. for some task, it'll be more in the few $, rather than < $1. But I really only do this to try some prototype with a library claude doesn't know about (or is outdated).
For hobby stuff, it adds up - totally.
For a company, massively worth it. Insanely cheap productivity boost (if developers are responsible / don't get lazy / don't misuse it).
If I have to be so cautious while using a tool might as well write the code myself lol.
I’ve used Claude Code extensively and it is one of the best AI IDE. It just gets things done.
The only downside is the cost. I was averaging $35-$40/day. At this cost, I’d rather just use Cursor/Windsurf.
Not having to specify files is a humongous feature for me. Having to remember which file code is in is half the work once you pass a certain codebase size.
That sometimes work sometimes doesn’t and takes 10x time. Same with codex. I would have both and switch between them depending on what you feel will get it right better
Yeah, I tried CC out and quickly noticed it was spending $5+ for simple LLM capable tasks. I rarely break $1-2 a session using aider. Aider feels like more of a precision tool. I like having the ability to manually specify.
I do find Claude Code to be really good at exploration though - like checking out a repository I'm unfamiliar with and then asking questions about it.
Aider is a great tool. I do love it. But I find I have to do more with it to get the same output as Claude Code (no matter what LLM I used with Aider). Sure it may end up being cheaper per run, but not when my time is factored in.
The flip side is I find Aider much easier to limit.
After switching to Aider, I realized the other tools have been playing elaborate games to choose cheaper models and to limit files and messages in context, both of which increase their bills.
Get an Openrouter account and you can play with almost all providers, I was burning money on Claude, tried V3 (blocked Deepseek provider for being flaky, let the laypeople mock them) and experimental and GA Gemini models.
The cost of the task scales with how long it takes, plus or minus.
Substitute “cost” with “time” in the above post and all of the same tips are still valuable.
I don’t do much agentic LLM coding but the speed (or lack thereof) was one of my least favorite parts. Using any tricks that narrow scope, prevent reprocessing files over and over again, or searching through the codebase are all helpful even if you don’t care about the dollar amount.
Hard agree. Whether it's 50 cents or 10 dollars per session, I'm using it to get work done for the sake of quickly completing work that aims to unblock many orders of magnitude more value. But in so far as cheaper correct sessions correlate with sessions where the problem solving was more efficient anyhow, they're fairly solid tips.
I agree but optimisation often reveals implementation details helping to understand limits of current tech more. It might not be worth the time but part of engineering is optimisation and another part is deep understanding of tech. It is sometimes worth optimising anyway if you want to take the engineering discipline to the next level within yourself.
I myself didn’t think about not running linters however it makes obvious sense now and gives me the insight about how Claude Code works allowing me to use this insight in related engineering work.
Exactly. I've been using the chat gpt desktop app not because of the model quality but because of the UX. It basically seamlessly integrates with my IDEs (intellij and vs code). Mostly I just do stuff like select a few lines, hit option+shift+1, and say something like "fix this". Nice short prompt and I get the answer relatively quickly. Option+shift+1 opens chat gpt with the open file already added to the context. It sees what lines are selected. And it also sees the output of any test runs on the consoles. So just me saying "fix this" now has a rich context that I don't need to micromanage.
Mostly I just use the 4o model instead of the newer better models because it is faster. It's good enough mostly and I prefer getting a good enough answer quickly than the perfect answer after a few minutes. Mostly what I ask is not rocket science so perfect is the enemy of good here. I rarely have to escalate to better models. The reasoning models are annoyingly slow. Especially when they go down the wrong track, which happens a lot.
And my cost is a predictable 20$/month. The downside is that the scope of what I can ask is more limited. I'd like it to be able to "see" my whole code base instead of just 1 file and for me to not have to micro manage what the model looks at. Claude can do that if you don't care about money. But if you do, you are basically micro managing context. That sounds like monkey work that somebody should automate. And it shouldn't require an Einstein sized artificial brain to do that.
There must be people that are experimenting with using locally running more limited AI models to do all the micromanaging that then escalate to remote models as needed. That's more or less what Apple pitched for Apple AI at some point. Sounds like a good path forward. I'd be curious to learn about coding tools that do something like that.
In terms of cost, I don't actually think it's unreasonable to spend a few hundred dollars per month on this stuff. But I question the added value over the 20$ I'm spending. I don't think the improvement is 20x better. more like 1.5x. And I don't like the unpredictability of this and having to think about how expensive a question is going to be.
I think a lot of the short term improvement is going to be a mix of UX and predictable cost. Currently the tools are still very clunky and a bit dumb. The competition is going to be about predictable speed, cost and quality. There's a lot of room for improvement here.
It usually does, just with a time delay and a strict condition that the firm you work at can actually commercialize your productivity. Apply your systems thinking skills to compensation and it will all make sense.
It's interesting that this is a problem for people because I have never spent more than about $0.50 on a task with Claude Code. I have pretty good code hygiene and I tell Claude what to do with clear instructions and guidelines, and Claude does it. I will usually go through a few revisions and then just change anything myself if I find it not quite working. It's exactly like having an eager intern.
I don't think about controlling cost because I price my time at US$40/h and virtually all models are cheaper than that (with the exception of o1 or Gemini 2.5 pro).
If I spend $2 instead of $0.50 on a session but I had to spend 6 minutes thinking about context, I haven't gained any money.
If your expectation is to produce the same amount of output, you could argue when paying for AI tools, you're choosing to spend money to gain free time.
4 hours coding project X or 3 hours and a short hike with your partner / friends etc
If what I'm doing doesn't have a positive expected value, the correct move isn't to use inferior dev tooling to save money, it's to stop working on it entirely.
I assume they use a conversation, so if you compress the prompt immediately you should only break cache once, and still hit cache on subsequent prompts?
I pretty much one shot a scraper from an old Joomla site with 200+ articles to a new WP site, including all users and assets, and converting all the PDFs to articles. It cost me like $3 in tokens.
I guess the question the is: can't VScode Copilot do the same for a fixed $20/month? It even has access to all SOTA models like Claude 3.7, Gemini 2.5 Pro and GPT o3
Vscode’s agent mode in copilot (even in the insider’s nightly) is a bit rough in my experience: lots of 500 errors, stalls, and outright failures to follow tasks (as if there’s a mismatch between what the ui says it will include in context vs what gets fed to the LLM).
I would have thought so, but somehow no. I have a cursor subscription with access to all of those models, and I still consistently get better results from claude code.
no it's a few hundred lines of python to parse weird and inconsistent HTML into json files and CSV files, and then a sync script that can call the WP API to create all the authors as needed, update the articles, and migrate the images
Some tools take more effort to hold properly than others. I'm not saying there's not a lot of room for improvement - or that the ux couldn't hold the users hand more to force things like this in some "assisted mode" but at the end of the day, it's a thin, useful wrapper around an llm, and llms require effort to use effectively.
I definitely get value out of it- more than any other tool like it that I've tried.
Think about what you would do in an unfamiliar project with no context and the ticket
"please fix the authorization bug in /api/users/:id".
You'd start by grepping the code base and trying to understand it.
Compare that to, "fix the permission in src/controllers/users.ts in the function `getById`. We need to check the user in the JWT is the same user that is being requested"
On a shorter timeline than you'd think none of working with these tools will look like this.
You'll be prompting and evaluating and iterating entirely finished pieces of software and be able to see multiple attempts at each solve at once, none of this deep in the weeds fixing a bug stuff.
We're rapidly approaching a world where a lot of software will be being made without an engineer hire at all, maybe not the hardest most complex or novel software but a lot of software that previously required a team of 3-15 wont have a single dev.
> So, AIs are overeager junior developers at best, and not the magical programmer replacements they are advertised as.
This may be a quick quip or a rant. But the things we say have a way of reinforcing how we think. So I suggest refining until what we say cuts to the core of the matter. The claim above is a false dichotomy. Let's put aside advertisements and hype. Trying to map between AI capabilities and human ones is complicated. There is high quality writing on this to be found. I recommend reading literature reviews on evals.
Don’t be a dismissive dick; that’s not appropriate for this forum. The above post is clearly trying to engage thoughtfully and offers genuinely good advice.
I’m thinking you might be a kind of person that requires very direct feedback. Your flagged comment was unkind and unhelpful. Your follow-up response seems to suggest that you were justified in being rude?
You also mischaracterize my comment two levels up. It didn’t wave you away by saying “just google it”. It said — perhaps not directly enough — that your comment was off track and gave you some ideas to consider and directions to explore.
> There is high quality writing on this to be found. I recommend reading literature reviews on evals.
This is, quite literally, "just google it".
And yes, I prefer direct feedback, not vague philosophical and pseudo-philosophical statements and vague references. I'm sure there's high quality writing to be found on this, too.
We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.
> not vague philosophical and pseudo-philosophical statements and vague references
If you stop being so uncharitable, more people might be inclined to engage you. Try to interpret what I wrote as constructive criticism.
Shall we get back to the object level? You wrote:
> AIs are overeager junior developers at best
Again, I'm saying this isn't a good framing. I'm asking you to consider you might be wrong. You don't need to hunker down. You don't need to counter-attack. Instead, you could do more reading and research.
> We have very different ideas of what "literal" means. You _interpreted_ what I wrote as "just Google it". I didn't say those words verbatim _nor_ do I mean that. Use a search engine if you want to find some high-quality papers. Or use Google Scholar. Or go straight to Arxiv. Or ask people on a forum.
Aka "I will make some vague references to some literature, go Google it"
> Instead, you could do more reading and research.
Instead of vague "just google it", and vague ad hominems you could actually provide constructive feedback.
The grandparent is talking about how to control cost by focusing the tool. My response was to a comment about how that takes too much thinking.
If you give a junior an overly broad prompt, they are going to have to do a ton of searching and reading to find out what they need to do. If you give them specific instructions, including files, they are more likely to get it right.
I never said they were replacements. At best, they're tools that are incredibly effective when used on the correct type of problem with the right type of prompt.
I have been quite skeptical of using AI tools and my experiences using them have been frustrating for developing software but power tools usually come with a learning curve while "good product" with clean simplified interface often results in reduced capability.
VIM, Emacs and Excel are obvious power tools which may require you to think but often produce unrivalled productivity for power users
So I don't think the verdict that the product has a bad UI is fair. Natural language interfaces is such a step up from old school APIs with countless flags and parameters
Mh. Like, I'm deeply impressed what these AI assistants can do by now. But, the list in the parent comment there is very similar to my mental check-list of pair-programming / pair-admin'ing with less experienced people.
I guess "context length" in AIs is what I intuitively tracked with people already. It can be a struggle to connect the Zabbix alert, the ticket and the situation on the system already, even if you don't track down all the zabbix code and scripts. And then we throw in Ansible configuring the thing, and then the business requriements by more, or less controlled dev-teams. And then you realize dev is controlled by impossible sales-terms.
These are scope -- or I guess context -- expansions that cause people to struggle.
Never edit files manually during a session (that'll bust cache). THIS INCLUDES LINT
Yesterday I gave up and disabled my format-on-save config within VSCode. It was burning way too many tokens with unnecessary file reads after failed diffs. The LLMs still have a decent number of failed diffs, but it helps a lot.
GitHub copilot follows your context perfectly. I don't have to tell it anything about files. I tried this initially and it just screwed up the results.
> GitHub copilot follows your context perfectly. I don't have to tell it anything about files. I tried this initially and it just screwed up the results.
Just to make sure we're on the same page. There are two things in play. First, a language model's ability to know what file you are referring to. Second, an assistant's ability to make sure the right file is in the context window. In your experience, how does Claude Code compare to Copilot w.r.t (1) and (2)?
I'm a very strong supporter of interactive blogposts as well. Obviously https://ciechanow.ski/ is leader here - being able to mess with something to build intuition is huge.
reply