Totally possible. In general I believe that while more powerful in their best outputs, Sonnet/Opus 4 are in other ways (alignment / consistency) a regression on Sonnet 3.5v2 (often called Sonnet 3.6), as Sonnet 3.7 was. Also models are complex objects, and sometimes in a given domain a given model that on paper is weaker will work better. And, on top of that: interactive use vs agent requires different reinforcement learning training that sometimes may not be towards an aligned target... So also using the model in one way or the other may change how good it is.
Thanks. Also based on the coding rig you use models may not match the performance of what it is served via web. Or may not be as cheap. For instance the Gemini 2.5 pro 20$ account is very hard to saturate with queries.
Terminal with vim in one side, the official web interface of the model in the other side. The pbcopy utility to pass stuff in the clipboard. I believe models should be used in their native interface as when there are other layers sometimes the model served is not exactly the same, other times it misbehaves because of RAG and in general no exact control of the context window.
This seems like a lot of work depending upon the use case. e.g. the other day I had a bunch of JSON files with contact info. I needed to update them with more recent contact info on an internal Confluence page. I exported the Confluence page to a PDF, then dropped it into the same directory as the JSON files. I told Claude Code to read the PDF and use it to update the JSON files.
It tried a few ways to read the PDF before coming up with installing PyPDF2, using that to parse the PDF, then updated all the JSON files. It took about 5 minutes to do this, but it ended up 100% correct, updating 7 different fields across two dozen JSON files.
(The reason for the PDF export was to get past the Confluence page being behind Okta authentication. In retrospect, I probably should've saved the HTML and/or let Claude Code figure out how to grab the page itself.)
How would I have done that with Gemini using just the web interface?
It’s not that bad: K2 and DeepSeek R1 are at the level of frontier models of one year ago (K2 may be even better: I have enough experience only with V3/R1). We will see more coming since LLMs are incredibly costly to train but very simple in their essence (it’s like if their fundamental mechanic is built in the physical nature of the computation itself) so the barrier to entry is large but not insurmountable.
Please send your thoughts and prayers to Gemini 2.5 Pro hopefully they can recover and get well soon enough, I hope Google lets them out of the hospital soon and discharges them, the last 3 week has been hell for me without them there.
OP as a free user of Gemini 2.5 Pro via Ai studio my friend has been hit by the equivalent of a car breaking approximately 3 weeks, I hope they can recover soon, it is not easy for them.
Agree with that. Read it as expert-level knowledge without all the other stuff LLMs can’t do as well as humans. LLMs way to express knowledge is kinda of alien as it is different, so indeed those are all poor simplifications. For instance an LLM can’t code as well as a top human coder but can write a non trivial program from the first to the last character without iterating.
What sticks out to me is Gemini catching bugs before production release, was hoping you’d give a little more insight into that.
Reason being is that we expect ai to create bugs and we catch them, but if Gemini is spotting bugs by some way of it being a QA (not just by writing and passing tests) then that perks my interest.
Our team has pretty aggressively started using LLMs for automated code review. It will look at our PRs and post comments. We can adding more material for different things for it to consider- from a looking at a summarized version of our API guidelines, general prompts like, "You are an expert software engineer and QA professional, review this PR and point out any bugs or other areas of technical risk. Make concise suggestions for improvement where applicable." - it catches a ton of stuff.
Another area we've started doing is having it look at build failures and writing a report on suggested root causes before even a human looks at it - saves time.
Or (and we haven't rolled this out automatically yet but are testing a prototype) having it triage alarms from our metrics, with access to the logs and codebase to investigate.
I have been surprised more folks have no rolled these out as paid for products. I have been getting tons of use out of systems like cursors bugbot. The signal to noise is high and while it’s not always right it catches a lot of bugs I would have missed.
Yep when I use agents I go for Claude Code. For example I needed to buy too many Commodore 64 than appropriate lately, and I let it code a Telegram bot advising me when popular sources would have interesting listings. It worked (after a few iterations) then I looked at the code base and wanted to puke but who cares in this case? It worked and it was much faster and I had zero to learn in the proces of doing it myself. I published a Telegram library for C in the past and know how it works and how to do scraping and so forth.
For example I needed to buy too many Commodore 64 than appropriate lately
Been there, done that!
for those one-off small things, LLMs are rather cool. Especially Cloude Code and Gemini CLI. I was given an archive of some really old movies recently, but files were bearing title names in Croatian instead of original (mostly English ones). So I claude --dangerously-skip-permissions into the directory with movies and in a two-sentence prompt I asked it to rename files into a given format (that I tend to have in my archive) and for each title to find original name and year or release and use it in the file.. but, before commiting rename to give me a list of before and after for approval. It took like what, a minute of writing a prompt.
Now, for larger things, I'm still exploring a way, an angle, what and how to do it. I've tried from yolo prompting to structured and uber structured approaches, all the way to mimicking product/prd - architecture - project management / tasks - developer/agents.. so far, unless it's rather simpler projects I don't see it's happening that way. Most luck I had was "some structure" as context and inputs and then guiding prompting during sessions and reviewing stuff. Almost pair-programming.
You can use an LLM to help document a codebase, but it's still an arduous task because you do need to review and fix up the generated docs. It will make, sometimes glaring sometimes subtle, mistakes. And you want your documentation to provide accuracy rather than double down on or even introduce misunderstanding.
This fact is one of the most pleasant surprises I’ve had during this AI wave. Finally, a concrete reason to care about your docs and your code quality.
Excellent article. It is very hard to understand a few things about Prusa, lately:
1. The Nextruder looks 5 years behind Bambulab nozzle switching, without to mention the cost of a new nozzle. A clogged nozzle is a non issue in a Bambulab printer, but it causes me a big cost and more work with my MK4 (which has the same extruder as the Core One).
2. How is it possible that these printers still lack at least a cheap webcam?
3. One of the strengths of Prusa should be support. It used to be very good, years ago. Now the issue the OP is reporting about the app that is not able to detect the principal component in the sound of the belt, is an example of a more extended problem, that one can see in many ways, especially in the MK4 / Core One documentation, that is especially lacking.
In general, here the OP is doing the work that Prusa should be doing to provide a better experience, without to mention all the design issues that they are not fixing directly before shipping their printers. I'm also a Bambulab user, and my A1 costed a fraction of my MK4 and it is the printer I always hit because of the zero-issues. It just works.
Now companies may have ups and downs, but there is some problem at Prusa: they are still not understanding what's really happening and where their problems are.
1. I think even on bambu people switch extruder so they have swichable nozzles. Atleast thats what i did on my P1S. There is huge aftermarket with these if you need them.
2. Reasoning i heard is that Prusa printers are used a lot by print farms that dont want them for security reasons and that there are aftermarket cams that are going to be a lot better than what they can deliver. Again cam on Bambu P1S is pretty bad so if you like the feature you end up changing it but because the chip in P1S is pretty low powered you end up adding whole different camera system.
3. This is very good point. I guess they were under pressure to release asap and the docs are rushed / in process. The upside is that they have track record of long support for the products.
I am not sure that they are so clueless.
I’ve had P1S for some time. It’s great. I wanted focus on 3D printing not on 3D printer.
But now? Bambu is update away from not being able to print outside their cloud. There is zero openness. They do everything to stop any kinf of reverse engineering or alternative firmwares. Afaik they might just decide tomorrow that they stop support some of the older models and they simply stop printing.
I still ended up messing, modding, tweaking and learning about the Bambu printer anyway.
But i also found a lot of use for 3D printer. So idea of buying 3x times more expensive printer kit that will take me 20h to assemble and then even more time to tweak to print as good as Bambu… is OK? Almost intruging? I will know that with care it will work for a looong time i will know how it works and it will be valuable knowledge.
It seems a lot like linux vs mac. At some point you bite the bullet and never look back. Or you do and go back.
I definitely understand how the printer works better than if I bought it assembled. It'll definitely save me time troubleshooting/maintaining/repairing it later. But I spent more time building it than I could ever save during teardown/rebuilds.
I bought a kit because I like building stuff, the Core One kit had the same appeal as a Lego model or a model car. If that's not appealing to you, do yourself a favor and buy the completed printer.
If you do get the kit, get a couple of ice cube trays to use to organize fasteners; keeping them organized in the bags Prusa sends was a battle I wasn't interested in fighting.
I assembled the mk3s from kit. It was long but at least I have a mental model of where things are, and what to remove first if I needed to exchange parts.
The Buddy3D camera has a firmware update recently, now it can RTSP stream inside your LAN and you can force day/night detection. Also, it saves timelapses to a local microSD. Still not super cheap, but yeah.
reply