A nice way to use traditional ML models today is to do feature extraction with a LLM and classification on top with trad ML model. Why? because this way you can tune your own decision boundary, and piggy back on features from a generic LLM to power the classifier.
For example CV triage, you use a LLM with a rubric to extract features, choosing the features you are going to rely on does a lot of work here. Then collect a few hundred examples, label them (accept/reject) and train your trad ML model on top, it will not have the LLM biases.
You can probably use any LLM for feature preparation, and retrain the small model in seconds as new data is added. A coding agent can write its own small-model-as-a-tool on the fly and use it in the same session.
What do you mean by "feature extraction with an LLM?". I can get this for text based data, but would you do that on numeric data? Seems like there are better tools you could use for auto-ML in that sphere?
Unless by LLM feature extraction you mean something like "have claude code write some preprocessing pipeline"?
Yes, it should remain part of the commit, and the work plan too, including judgements/reviews done with other agents. The chat log encodes user intent in raw form, which justifies tasks which in turn justify the code and its tests. Bottom up we say the tests satisfy the code, which satisfies the plan and finally the user intent. You can do the "satisfied/justified" game across the stack.
I only log my own user messages not AI responses in a chat_log.md file, which is created by user message hook in the repo.
I'm switching from Claude Web to Claude Code. Local files give me memory I actually control, unlike Anthropic's implementation. CC doesn't carry state between sessions — you just put whatever project context it needs in a file.
It's been a few months since gemini 3 and opus 4.5 were released and I still regularly have feelings of dread in me because I'm deprived of something (which I assume is the thrill and pride of being able to explore solution spaces in non stupid ways to find plausible answers on my own)
Maybe it's the usual webdev corp job that is too focused on mainstream code and where AI is used to sell more, not find new ideas that could be exciting..
> Not understanding how consciousness is created doesn't make it divine.
It's not divine, just expensive, and has to pay its costs. That little thing - cost - powers evolution. Cost defines what can exist and shaped us into our current form, it is the recursive runway of life.
John Good's quote is pretty myopic, it assumes machines make better machines based on being "ultraintelligent" instead of learning from environment-action-outcome loop.
It's the difference between "compute is all you need" and "compute+explorative feedback" is all you need. As if science and engineering comes from genius brains not from careful experiments.
There's an implicit assumption there, anything a computer as intelligent as a human does will be exactly what a human would do, only faster. Or more intelligent. If the process is part of the intelligent way of doing things, like the scientific method and careful experimentation, then that's what the ultraintelligent machine will do.
There's no implication that it's going to do it all magically in its head from first principles; it's become very clear in AI that embodiment and interaction with the real world is necessary. It might be practical for a world model at sufficient levels of compute to simulate engineering processes at a sufficient level of resolution that they can do all sorts of first principles simulated physical development and problem solving "in their head", but for the most part, real ultraintelligent development will happen with real world iterations, robots, and research labs doing physical things. They'll just be far more efficient and fast than us meatsacks.
At sufficient levels of intelligence, one can increasingly substitute it for the other things.
Intelligence can be the difference between having to build 20 prototypes and building one that works first try, or having to run a series of 50 experiments and nailing it down with 5.
The upper limit of human intelligence doesn't go high enough for something like "a man has designed an entire 5th gen fighter jet in his mind and then made it first try" to be possible. The limits of AI might go higher than that.
Exceedingly elaborate, internally-consistent mind constructs, untested against the real world, sounds like a good definition of schizophrenia. May or may not correlate with high intelligence.
We only call it "schizophrenia" when those constructs are utterly useless.
They don't have to be. When they aren't, sometimes we call it "mathematics".
You only have to "test against the real world" if you don't already know the outcome in advance. And you often don't. But you could have. You could have, with the right knowledge and methods, tested the entire thing internally and learned the real world outcome in advance, to an acceptable degree of precision.
We have the knowledge to build CFD models already. The same knowledge could be used to construct a CFD model in your own mind. We have a lot of scattered knowledge that could be used to make extremely elaborate and accurate internal world models to develop things in - if only, you know, your mind was capable of supporting such a thing. And it isn't! Skill issue?
I like the substitution concept. What humans can do depends on the abstractions and the tools. One could picture just the shape of the jet and have a few ideas how to improve it further. If that is enough info for the tool it could be worthy of the label "designed by Jim".
From what I can see we're working as hard as we can to build them. You can watch the "let's put this on a Raspberry Pi and see what happens" seeds of Skynet develop in real time.
There's something compelling about helping assemble the machine. Science fiction was completely wrong about motivation. It's fun.
I've noticed this core philosophical difference in certain geographically associated peoples.
There is a group of people who think AI is going to ruin the world because they think they themselves (or their superiors) would ruin the world.
There is a group of people who think AI is going to save the world because they think they themselves (or their superiors) would save the world.
Kind of funny to me that the former is typically democratic (those who are supposed to decide their own futures are afraid of the future they've chosen) while the other is often "less free" and are unafraid of the future that's been chosen for them.
There is also a group of people who think AI is going to ruin the world because they don't think the AI will end up doing what its creators (or their superiors) would want it to do.
It's our job after all to keep the agent aligned, we should not expect it to self recover when it goes astray or mind its own alignment. Even with humans we hire managers to align the activity of subordinates, keeping intent and work in sync.
That said, I find that running judge agents on plans before working and on completed work helps a lot, the judge should start with fresh context to avoid biasing. And here is where having good docs comes in handy, because the judge must know intent not just study the code itself. If your docs encode both work and intent, and you judge work by it, then misalignment is much reduced.
My ideal setup has - a planning agent, followed by judge agent, then worker, then code review - and me nudging and directing the whole process on top. Multiple perspectives intersect, each agent has its own context, and I have my own, that helps cover each other's blind spots.
> Even with humans we hire managers to align the activity of subordinates, keeping intent and work in sync.
We do this socially too. From a very young age, children teach each other what they like and don't like, and in that way mutually align their behaviour toward pro social play.
> I find that running judge agents on plans before working and on completed work helps a lot
How do you set this up? Do you do this on top of the claude code CLI somehow, or do you have your own custom agent environment with these sort of interactions set up?
I use a task.md file for each task, it has a list of gates just like ordinary todo lists in markdown. The planner agent has an instruction to install a judge gate at the top and one at the bottom. The judge runs in headless mode and updates the same task.md file. The file is like an information bus between agents, and like code, it runs gates in order reliably.
I am actively thinking about task.md like a new programming language, a markdown Turing machine we can program as we see fit, including enforcement of review at various stages and self-reflection (am I even implementing the right thing?) kind of activity.
I tested it to reliably execute 300+ gates in a single run. That is why I am sending judges on it, to refine it. For difficult cases I judge 3-4 times before working, each judge iteration surfaces new issues. We manually decide judge convergence on a task, I am in the loop.
The judge might propose bad ideas about 20% of the time, sometimes the planner agent catches them, other times I do. Efficient triage hierarchy: judge surfaces -> planner filters -> I adjudicate the hard cases.
There's a school of thought that the reason so many autistic founders succeed is that they're unable to interpret this kind of programming. I saw a theory that to succeed in tech you needed a minimum amount of both tizz and rizz (autism and charisma).
I guess the winning openclaw model will have some variation of "regularly rewrite your source code to increase your tizz*rizz without exceeding a tizz:rizz ratio of 2:1 in either direction."
> This article is far off the mark. The improvement is not in the user-side. You can write docs or have the robot write docs; it will improve performance on your repo, but not “improve” the agent.
No, the idea is to create these improved docs in all your projects, so all your agents get improved as a consequence, but each of them with its own project specific documentation.
You can't improve the agents but you can improve their work environment. Agents gain a few advantages from up to date docs:
1. faster bootstrap and less token usage than trashing around the code base to reconstitute what it does
2. carry context across sessions, if the docs act like a summary of current state, you can just read it at the start and update it at the end of a session
3. hold information you can't derive from studying the code, such as intents, goals, criteria and constraints you faced, an "institutional memory" of the project
Agree, this is the point the article makes. I don't think the article claims that it's the agent that is directly improved or altered, but that through the process of the agent self-maintaining its environment, then using that improvement to bootstrap its future self or sub-agents, that the agent _performance_ is holistically better.
> ... if the docs act like a summary of current state, you can just read it at the start and update it at the end of a session
Yeah, exactly. The documentation is effectively a compressed version of the code, saving agent context for a good cross-section of (a) the big picture, and (b) the details needed to implement a given change to the system.
Think we're all on the same page here, but maybe framing it differently.
> Nothing you write will matter if it is not quickly adopted to the training dataset.
That is my take too, I was surprised to see how many people object to their works being trained on. It's how you can leave your mark, opening access for AI, and in the last 25 years opening to people (no restrictions on access, being indexed in Google).
"On reflection I have started to worry again. In 10 to 20 years nobody will read anything any more, they just will read LLM digests. So, the single most important task of a writer starting right now is to get your efforts wired in to the LLMs"
You're words will be like a drop in the ocean, an ocean where the water volume keeps increasing every year. Also if nobody reads anything anymore what's the point?
Most people value their time and work and don't want to give it away for free to some billionaire so they can reproduce it as slop for their own private profit.
That's to say, most people recognize when they're getting fucked over and are correct to object to it.
People who produced the works LLMs are trained on are not compensated for the value they are now producing, and their skills are increasingly less valued in a world with LLMs. The value the LLMs are producing is being captured by employees of AI companies who are driving up rent in the Bay Area, and driving up the cost of electricity and water everywhere else.
Your surprise to people’s objections makes sense if you can’t count.
> People who produced the works LLMs are trained on are not compensated for the value they are now producing
the value being extracted via LLM techniques is new value, which did not previously exist. The producer(s) of the old data had an asking price, which was taken by the LLM trainers. They cannot make the argument that since the LLM is producing new value, they should retroactively update their old asking price for their works.
They could update their asking price for any new works they produce. They also have the right to ask their works not be used for training, etc. But they cannot ask their old works to be paid for by the new uses in LLM in a retroactive way.
> They could update their asking price for any new works they produce. They also have the right to ask their works not be used for training, etc.
Someone else already pointed out that many works used to train LLMs were stolen, but also, it’s unclear whether this is true, either. Can you opt out? Because copyright should have been enough to prevent a company from stealing and profiting from your work, but it wasn’t in the case of every existing LLM.
For example CV triage, you use a LLM with a rubric to extract features, choosing the features you are going to rely on does a lot of work here. Then collect a few hundred examples, label them (accept/reject) and train your trad ML model on top, it will not have the LLM biases.
You can probably use any LLM for feature preparation, and retrain the small model in seconds as new data is added. A coding agent can write its own small-model-as-a-tool on the fly and use it in the same session.
reply