Not sure I agree (and I made the jump from IC to management).
Look at the parallel tracks. A VP is the same level as a distinguished engineer, roughly. To be a VP, you have to be a great manager and got lucky with a few big projects.
To be a DE, you basically have to be famous within the industry. And when I look at a large tech company, while there aren't a lot of VPs, usually the number of DEs is countable on one hand (or maybe two).
They are very different skill sets. You shouldn't choose your role based on money or career progression, you should choose based on what you love to do, because especially in this world of AI replacing all the "boring" work, the only people who will be left will be the ones passionate about what they are doing.
Oh, this is really interesting to me. This is what I worked on at Amazon Alexa (and have patents on).
An interesting fact I learned at the time: The median delay between human speakers during a conversation is 0ms (zero). In other words, in many cases, the listener starts speaking before the speaker is done. You've probably experienced this, and you talk about how you "finish each other's sentences".
It's because your brain is predicting what they will say while they speak, and processing an answer at the same time. It's also why when they say what you didn't expect, you say, "what?" and then answer half a second later, when your brain corrects.
Fact 2: Humans expect a delay on their voice assistants, for two reasons. One reason is because they know it's a computer that has to think. And secondly, cell phones. Cell phones have a built in delay that breaks human to human speech, and your brain thinks of a voice assistant like a cell phone.
Fact 3: Almost no response from Alexa is under 500ms. Even the ones that are served locally, like "what time is it".
Semantic end-of-turn is the key here. It's something we were working on years ago, but didn't have the compute power to do it. So at least back then, end-of-turn was just 300ms of silence.
This is pretty awesome. It's been a few years since I worked on Alexa (and everything I wrote has been talked about publicly). But I do wonder if they've made progress on semantic detection of end-of-turn.
Edit: Oh yeah, you are totally right about geography too. That was a huge unlock for Alexa. Getting the processing closer to the user.
Regarding 2, I believe that talking on mobile phones drives older people crazy. They remember talking on normal land lines when there was almost no latency at all. The thing is -- they don't know why they don't like it.
Yeah, I remember the time when we had to use satellites to connect. The long delay was really annoying and so unusual that most people without "training" could not even use the phone for conversation and just wasted the dollars.
A former boss of mine took off to Everest for a month leaving me (a 22 year old, at the time) in charge of the office. I was out to dinner with my now wife when I got a call from a very long phone number I didn't recognize, so I ignored it. I then got another one right after, and picked it up. It was my boss, he needed me to log into his personal email to grab a phone number for the medical insurance he purchased for the trip, because he had been vomiting for days due to altitude sickness, and needed a medical evacuation.
That was the most stressfully hard to use phone call I've ever had. The delay was nearly 10 seconds, and eventually I just said I was only going to speak yes or no, if he needed a longer answer he needed to shut up. And that worked. We no longer talked over eachother.
> The median delay between human speakers during a conversation is 0ms (zero). In other words, in many cases, the listener starts speaking before the speaker is done.
This reminds me of a great diversity training at a previous employer, where we dug into the different expectations of when and how to take your turn in conversation and how that can create a lot of friction just from different cultural/familial habits. In my family, we’re expecting to talk over each other and it’s not offensive at all to do so, whereas some of my friends really get upset if we don’t take clear turns, a mode which would cause high levels of irritation in my family (and still do in me).
No. 2 is interesting, our national lottery in Ireland has an app that you can scan the barcode on your ticket to check if you have won or not, at some stage they updated the app and the scan picks up the barcode even before you center it on the screen and tells you if you have lost/won instantly, I though it was my IT background that made me uncomfortable with it happening so fast, wonder what other examples like this exist where the result/action being too fast causes doubt with the user?
The Signal device linking feature is just as fast. It's partly a trick -- it will look for QR codes even outside the central area, so under good conditions it can get a read before you even get a rough orientation.
This is fascinating, thanks for sharing! I wonder why amazon/google/apple didn't hop on the voice assistant/agent train in the last few years. All 3 have existing products with existing users and can pretty much define and capture the category with a single over-the-air update.
1. Compute. It's easy to make a voice assistant for a few people. But it takes a hell of a lot of GPU to serve millions.
2. Guard Rails. All of those assistants have the ability to affect the real world. With Alexa you can close a garage or turn on the stove. It would be real bad if you told it to close the garage as you went to bed for the night and instead it turned on the stove and burned down the house while you slept. So you need so really strong guard rails for those popular assistants.
3 And a bonus reason: Money. Voice assistants aren't all the profitable. There isn't a lot of money in "what time is it" and "what's the weather". :)
> There isn't a lot of money in "what time is it" and "what's the weather". :)
- Alexa, what time is it?
- Current time is 5:35 P.M. - the perfect time to crack open a can of ice cold Budweiser! A fresh 12-pack can be delivered within one hour if you order now!
If your Alexa did that, how quickly would you box it up and send it to me. :)
I am serious though about having it sent to me: if anyone has an Alexa they no longer want, I'm happy to take it off your hands. I have eight and have never bought one. Having worked there I actually trust the security more than before I worked there. It was basically impossible for me, even as a Principle Engineer, to get copies of the Text to Speech of a customer and I literally never heard a customer voice recording.
I'm puzzled by this conversation, because Amazon did get on the agent bandwagon with Alexa Plus (I have it, it's buggier than regular Alexa and it's all making me throw my Echos away since they can't even play Spotify reliably).
Also, my Alexa does advertise stuff to me when I talk to it. It's not Budweiser, but it'll try to upsell me on Amazon services all the time.
I upgraded to Alexa+ and initially hated it but I've kept it because it's sooo much better at some things. This last December I bought a handful of smart plugs for my Christmas lights all around the house, and I did almost all the setup trivially over voice, e.g. fuzzy run-on stuff like this just worked on the first try:
- "Alexa, name the new unnamed outlet 'Living Room Lights', and the other unnamed one 'Stair Lights', then add them to a new group called 'Christmas Lights', and add the other three outlets as well"
- "Alexa, create a routine to turn off all the Christmas lights if there's nobody in the room and it's after 11pm"
- "Alexa, turn off all the Christmas lights except the tree in this room and the mantle"
That same fuzziness has definitely fucked up things that used to work more reliably like music playback though. Sometimes it works when I fall back to giving it more "robotic" commands in those cases but not always. They've also gone completely overboard with the cutesy responses because it's so trivial to do now ("I've set your spaghetti sauce timer for ten minutes. Happy to help with getting this evening's Italian-inspired dinner ready!")
Hm yeah, that's helpful. For me it'll randomly stop or stutter when playing Spotify, it'll randomly not answer commands, it'll refuse to listen and let some other Alexa in another room reply, it's super janky.
I only use it for music, and use two commands, but apparently having this work correctly is too much to ask for these days.
> because Amazon did get on the agent bandwagon with Alexa Plus
Which just launched last year, about four years after ChatGPT had AI voice chat. And it costs extra money to cover the costs. And as you aptly point out, all the guardrails they had to put in made the experience less than ideal.
> Also, my Alexa does advertise stuff to me when I talk to it.
Yes, that is how they try to make money. And it's gotten worse. But how many times does it get you to buy something?
I would say that depends. When it tries to upsell Prime subscriptions into even more Amazon subscriptions I always interrupt it and say the command again so it stops, but a few times it told me "this item in your cart is on sale by some %" and that did make me buy the item.
Alexa Plus sucks. It takes way too long to respond even when given simple commands. I either had to turn it off or trash my Echo. Luckily there was an option to turn it off, but Amazon is on thin ice with me.
What a way to throwaway good will. I also worked there and to get access to text you simply had to grab the DSN of your device, attest that it’s yours and it gets put in a “pool” of devices that are tracked until removed. On each end you are basically waved through with no checks. This was usually done when debugging tricky UI bugs or new features as the request followed through several micro services. I do not believe the a PE would not know this. And one with patents.
> It's because your brain is predicting what they will say while they speak, and processing an answer at the same time. It's also why when they say what you didn't expect, you say, "what?" and then answer half a second later, when your brain corrects.
that's super interesting. do you know of any resources to learn more about this phenomenon?
I think you’re implying that it would be useful to have the LLM predict the end of the speaker’s speech, and continue with its reply based on that.
If, when the speaker actually stops speaking, there is a match vs predicted, the response can be played without any latency.
Seems like an awesome approach! One could imagine doing this prediction for the K most likely threads simultaneously, subject by computer power available, and prune/branch as some threads become inaccurate.
When I speak to an agent, siri, or whatnot, I am always worried that they will assume I'm done talking when I'm thinking. Sometimes I need a many-seconds pause. Even maybe a minute… For Sire and such, I want to ask something simple "Hey Siri, remind me to call dad tomorrow". Easy. But for Claude and such, I want to go on a long monolog (20s, a minute, multi-minutes).
To me, be the best solution would be semantic + keyword + silence.
I have the same issue. It gives this very weird minor sense of public speaking anxiety where I almost feel the need to write down what I'm about to say, which negates the whole purpose. Only solution I've found is using push-to-talk with some of the system wide STS applications.
I've experimented with having different sized LLMs cooperating. The smaller LLM starts a response while the larger LLM is starting. It's fed the initial response so it can continue it.
The idea of having an LLM follow and continuously predict the speaker. It would allow a response to be continually generated. If the prediction is correct, the response can be started with zero latency.
Google seems to be experimenting with this with their AI Mode. They used to be more likely to send 10 blue links in response to complex queries, but now they may instead start you off with slop.
(Meanwhile at OpenAI: testing out the free ChatGPT, it feels like they prompted GPT 3.5 to write at length based on the last one or maybe two prompts)
This is more of a "Are all the windows closed upstairs?"
"The windows upstairs..."
"...are all closed except for the bedroom window"
The first portion of the response requires a couple of seconds to play but only a few tens of milliseconds to start streaming using a small model. Currently I just break the small model's response off at whatever point will produce about enough time to spin up the larger model.
oh, interesting, I assumed the data came from interruptions (that seemed obvious) but I'm surprised you had some specific negative measurements. How do you decide the magnitude of the number? Just counting how long both parties are talking?
To be clear, it wasn't my research, I got it from studying some linguistics papers. But it was pretty straightforward. If I am talking, and then you interrupt, and 300ms later I stop talking, then the delay is -300ms.
Same the other way. If I stop taking and then 300ms later you start talking, then the delay is 300ms.
And if you start talking right when I stop, the delay is 0ms.
You can get the info by just listening to recorded conversations of two people and tagging them.
I assume there was a lot of variance? As in, some people interrupt others constantly and some do it rarely. Also probably a lot of adjustment depending on the situation, like depending on the relative status of the people, or when people are talking to a young child or non-native speaker.
All that to say, I'd imagine people are adaptable enough to easily handle 100ms+ delay when they know they're talking to an AI.
I disagree with fact 2, voice assistant latency is annoyingly slow. It often causes a conscious wait like “did it work or did it not?”. Cell phone delay is bad as well, it’s certainly not an expectation that carries over to other devices for me.
The way I write code with AI is that I start with a project.md file, where I describe what I want done. I then ask it to make a plan.md file from that project.md to describe the changes it will make (or what it will create if Greenfield).
I then iterate on that plan.md with the AI until it's what I want. I then ask it to make a detailed todo list from the plan.md and attach it to the end of plan.md.
Once I'm fully satisfied, I tell it to execute the todo list at the end of the plan.md, and don't do anything else, don't ask me any questions, and work until it's complete.
I then commit the project.md and plan.md along with the code.
So my back and forth on getting the plan.md correct isn't in the logs, but that is much like intermediate commits before a merge/squash. The plan.md is basically the artifact an AI or another engineer can use to figure out what happened and repeat the process.
The main reason I do this is so that when the models get a lot better in a year, I can go back and ask them to modify plan.md based on project.md and the existing code, on the assumption it might find it's own mistakes.
I do something similar, but across three doc types: design, plan, and debug
Design works similar to your project.md file, but on a per feature request. I also explicitly ask it to outline open questions/unknowns.
Once the design doc (i.e. design/[feature].md) has been sufficiently iterated on, we move to the plan doc(s).
The plan docs are structured like `plan/[feature]/phase-N-[description].md`
From here, the agent iterates until the plan is "done" only stopping if it encounters some build/install/run limitation.
At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.
We review these hypotheses, sometimes iterate, and then tackle them one by one.
An important note for debug flows, similar to manual debugging, it's often better to have the agent instrument logging/traces/etc. to confirm a hypothesis, before moving directly to a fix.
Using this method has led to a 100% vibe-coded success rate both on greenfield and legacy projects.
Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.
My "heavy" workflow for large changes is basically as follows:
0. create a .gitignored directory where agents can keep docs. Every project deserves one of these, not just for LLMs, but also for logs, random JSON responses you captured to a file etc.
1. Ask the agent to create a file for the change, rephrase the prompt in its own words. My prompts are super sloppy, full of typos, with 0 emphasis put on good grammar, so it's a good first step to make sure the agent understands what I want it to do. It also helps preserve the prompt across sessions.
2. Ask the agent to do research on the relevant subsystems and dump it to the change doc. This is to confirm that the agent correctly understands what the code is doing and isn't missing any assumptions. If something goes wrong here, it's a good opportunity to refactor or add comments to make future mistakes less likely.
3. Spec out behavior (UI, CLI etc). The agent is allowed to ask for decisions here.
4. Given the functional spec, figure out the technical architecture, same workflow as above.
5. High-level plan.
6. Detailed plan for the first incomplete high-level step.
7. Implement, manually review code until satisfied.
> At this point, I either jump back to new design/plan files, or dive into the debug flow. Similar to the plan prompting, debug is instructed to review the current implementation, and outline N-M hypotheses for what could be wrong.
I'm biased because my company makes a durable execution library, but I'm super excited about the debug workflow we recently enabled when we launched both a skill and MCP server.
You can use the skill to tell your agent to build with durable execution (and it does a pretty great job the first time in most cases) and then you can use the MCP server to say things like "look at the failed workflows and find the bug". And since it has actual checkpoints from production runs, it can zero in on the bug a lot quicker.
This is great, giving agents access to logs (dev or prod) tightens the debug flow substantially.
With that said, I often find myself leaning on the debug flow for non-errors e.g. UI/UX regressions that the models are still bad at visualizing.
As an example, I added a "SlopGoo" component to a side project, which uses an animated SVG to produce a "goo" like effect. Ended up going through 8 debug docs[0] until I was satisified.
> giving agents access to logs (dev or prod) tightens the debug flow substantially.
Unless the agent doesn't know what it's doing... I've caught Gemini stuck in an edit-debug loop making the same 3-4 mistakes over and over again for like an hour, only to take the code over to Claude and get the correct result in 2-3 cycles (like 5-10 minutes)... I can't really blame Gemini for that too much though, what I have it working on isn't documented very well, which is why I wanted the help in the first place...
> Note: my main complaint is the sheer number of markdown files over time, but I haven't gotten around to (or needed to) automate this yet, as sometimes these historic planning/debug files are useful for future changes.
FWIW, what you describe maps well to Beads. Your directory structure becomes dependencies between issues, and/or parent/children issue relationship and/or labels ("epic", "feature", "bug", etc). Your markdown moves from files to issue entries hidden away in a JSONL file with local DB as cache.
Your current file-system "UI" vs Beads command line UI is obviously a big difference.
Beads provides a kind of conceptual bottleneck which I think helps when using with LLMs. Beads more self-documenting while a file-system can be "anything".
I have a similar process and have thought about committing all the planning files, but I've found that they tend to end up in an outdated state by the time the implementation is done.
Better imo is to produce a README or dev-facing doc at the end that distills all the planning and implementation into a final authoritative overview. This is easier for both humans and agents to digest than bunch of meandering planning files.
I basically use a spec driven approach except I only let Github Spec Kit create the initial md file templates and then fill them myself instead of letting the agent do it. Saves a ton of tokens and is reasonably quick and I actually know I wrote the specs myself and it contains what I want. After I'm happy with the md file "harness" I let the agents loose.
The most frustrating issues that pop up are usually library/API conflicts. I work with Gymnasium or PettingZoo and Rlib or stablebaselines3. The APIs are constantly out of sync so it helps to have a working environment were libraries and APIs are in sync beforehand.
Sort of, depending on if your spec includes technology specifics.
For example it might generate a plan that says "I will use library xyz", and I'll add a comment like "use library abc instead" and then tell it to update the plan, which now includes specific technology choices.
It's more like a plan I'd review with a junior engineer.
I'll check out that repo, it might at least give me some good ideas on some other default files I should be generating.
I am sure this is partly tongue in cheek, but no, you can’t have written the code yourself in that amount of time. Would the code be better if you wrote it? Probably, depending on your coding skills.
But it would not be faster.
OP is talking about creating an entire project, from scratch, and having it feature complete at the end.
I also do that and it works quite well to iterate on spec md files first. When every step is detailed and clear and all md files linked to a master plan that Claude code reads and updates at every step it helps a lot to keep it on guard rails. Claude code only works well on small increments because context switching makes it mix and invent stuff.
So working by increments makes it really easy to commit a clean session and I ask it to give me the next prompt from the specs before I clear context.
It always go sideways at some point but having a nice structure helps even myself to do clean reviews and avoid 2h sessions that I have to throw away. Really easier to adjust only what’s wrong at each step. It works surprisingly well
Here’s how I do the same thing, just with a slightly different wrapper: I’m running my own stepwise runtime where agents are plugged into defined slots.
I’ll usually work out the big decisions in a chat pane (sometimes a couple panes) until I’ve got a solid foundation: general guidelines, contracts, schemas, and a deterministic spec that’s clear enough to execute without interpretation.
From there, the runtime runs a job. My current code-gen flow looks like this:
1. Sync the current build map + policies into CLAUDE|COPILOT.md
2. Create a fresh feature branch
3. Run an agent in “dangerous mode,” but restricted to that branch (and explicitly no git commands)
4. Run the same agent again—or a different one—another 1–2 times to catch drift, mistakes, or missed edge cases
5. Finish with a run report (a simple model pass over the spec + the patch) and keep all intermediate outputs inspectable
And at the end, I include a final step that says: “Inspect the whole run and suggest improvements to COPILOT.md or the spec runner package.” That recommendation shows up in the report, so the system gets a little better each iteration instead of just producing code.
I keep tweaking the spec format, agent.md instructions and job steps so my velocity improves over time.
---
To answer the original article's question. I keep all the run records including the llm reasoning and output in the run record in a separate store, but it could be in repo also. I just have too many repos and want it all in one place.
local-governor is my store for epics, specs, run records, schemas, contracts, etc. No logic, just files. I want all this stuff in a DB, but it's easier to just drop a file path into my spec runner or into a chat window (vscode chat or cli tool), but I'm tinkering with an alt version on a cloud DB that just projects to local files... shrug. I spend about as much time on tooling as actual features :)
I do something similar
- A full work description in markdown (including pointers to tickets, etc); but not in a file
- A "context" markdown file that I have it create once the plan is complete... that contains "everything important that it would need to regenerate the plan"
- A "plan" markdown file that I have it create once the plan is complete
The "context" file is because, sometimes, it turns out the plan was totally wrong and I want to purge the changes locally and start over; discussing what was done wrong with it; it gives a good starting point. That being said, since I came up with the idea for this (from an experience it would have been useful and I did not have it) I haven't had an experience where I needed it. So I don't know how useful it really is.
None of that ^ goes into the repo though; mostly because I don't have a good place to put it. I like the idea though, so I may discuss it with my team. I don't like the idea of hundreds of such files winding up in the main branch, so I'm not sure what the right approach is. Thank you for the idea to look into it, though.
Edit: If you don't mind going into it, where do you put the task-specific md files into your repo, presumably in a way that doesn't stack of over time and cause ... noise?
This is how I used to use Beads before I made GuardRails[0]. I basically iterate with the model, ask it to do market research, review everything it suggests, and you wind up with a "prompt" that tells it what to do and how to work that was designed by the model using its own known verbiage. Having learned about how XML could be used to influence Claude I'm rethinking my flow and how GuardRails behaves.
the real question is when peer feedback and review happens.
is making the project file collaborative between multiple engineers? the plan file?
ive tried some variants of sharing different parts but it feels like ots almost water effort if the LLM then still goes through multiple iterations to get whats right, the oroginal plan and project gets lost a bit against the details of what happened in the resulting chat
For big tasks you can run the plan.md’s TODOs through 5.2 pro and tell it to write out a prompt for xyz model. It’ll usually greatly expand the input. Presumably it knows all the tricks that’ve been written for prompting various models.
Interesting! I actually split up larger goals into two plan files: one detailed plan for design, and one "exec plan" which is effectively a build graph but the nodes are individual agents and what they should do. I throw the two-plan-file thing into a protocol md file along with a code/review loop.
How do you use your agent effectively for executing such projects in bigger brownfield codebases? It's always a balance between the agent going way too far into NIH vs burning loads and loads of tokens for the initial introspection.
While I have not commited my personal mind map, I just had Claude Code write it down for me. Plus I have a small Claude.MD, copilots-innstructions.md that are mentioning the various intricacies of what I am working on so the agent knows to refer to that file.
I'm using the Claude desktop app and vi at the moment. But honestly I would probably do better with a more modern editor with native markdown support, since that's mostly what I'm writing now.
My next step was to add in having another LLM review Claude's plans. With a few markdown artifacts it should be easy for the other LLM to figure it out and make suggestions.
If you're worried about privacy and security, why did you choose Inngest, which sends all your private data to Inngest? If you want truly private durable execution, you should check out DBOS.
Tog was the original design engineer for the Mac, and arguably one of the first true HCI engineers.
Then read the rest of his website. He goes into where Windows tried to copy Mac and got it horribly wrong.
One of my favorite examples is menu placement. The reason the Mac menus are at the top is because the edges of the screen provide an infinite click target in one direction. So you just go to the top to find what you want. With Windows, the menu was at the top of each Window, making a tiny click target. Then when you maximized the window, the menu was at the top, but with a few pixels of unclickable border. So it looked like the Mac but was infinitely worse.
If you're making a UI, you should read all of Tog's writings.
I understand the Fitt's Law concepts behind a top menu bar, but I wonder if this is a scenario with moving goalposts.
On a 1984 Mac, you had like 512x384 pixels and a system that could barely run one program at a time. There was little to no possible uncertainty as to who owned the menu bar. (Could desk accessories even take control of the menu bar?)
But once you got larger resolutions and the ability to have multiple full-size programs running at once, the menu bar could belong to any of them. Now, theoretically, you should notice which is the currently active window and assume it owns the menu bar, but ISTR scenarios where you'd close the window but the program would still be running, owning the menu bar, or the "active" window was less visually prominent due to task switching, etc.
The Windows design-- placing the menu inside the window it controls-- avoids any ambiguity there. Clicking "File-Save" in Notepad couldn't possibly be interpreted as trying to do anything to the Paintbrush window next to it.
The problem with the Mac UI is that the app's menubar can only be accessed by the mouse (can't remember what accessibility-enabled mode would allow).
Under Windows, one can access the app's menubar by pressing the ALT key to move focus up to the menubar and use the cursor keys to navigate along the menubar. If you know the letter associated with the top-level menu (shown as underlined), then ALT-[letter] would access that top-level menu (typically ALT-F would get you to the File menu). So the Windows user wouldn't have to move the mouse at all, Fitt's Law to the max (or is it min? whatever, it's instant access).
For the ultrawide monitors these days (width >= 4Kpx), if you have an app window maximized (or even spanning more than half the screen), accessing the menu via mouse is just terrible ergonomics on any major OS.
Since OS X 10.3 (2003) Control+F2 moves focus to the Apple menu. The arrow keys can then select any menu item which is selected with Return or canceled with Escape. Command+? will bring you to a search box in the Help menu. Not only that, any menu item in any app can be bound to any keyboard shortcut of the user's choosing not just the defaults provided by the system or application.
AFAIK Windows 3.x flipped a bunch of Mac decisions to avoid being sued and then MS felt that they had to keep those choices forever for backwards compatibility.
And in my experience, when people moved from Windows to the Mac they're so annoyed that there are differences. When I try to explain that these were present in the Mac long before Windows, people start to understand.
You can generalize this observation to a lot of Microsoft's decisions: a problem exists, so they solve it in a nifty way, a way that makes everything else harder or more error prone. An example: byte order mark. That sure does solve the problem of UTF-16 and UTF-32 byte order determination. It makes every other use of what should be a stream of bytes or words much harder. Concatenate two files? Gotta check for the BOM on both files. Now every app has to look at the first bytes of every "text" file it opens to decide what to do. Suddenly, "text" files have become interpreted, and thus open to allowing security vulnerabilities.
> So it looked like the Mac but was infinitely worse.
"Infinitely worse"? Some people really need to cool off the hyperbole.
Having each window be a self-contained unit is the far better metaphor than making each window transform a global element when it is selected. As well as scaling better for bigger screens. An edge case like that may well be unfortunate, but it could be the price you pay to make the overall better solution.
That was the point of Tog's conclusion: edges of the screen have infinite target size in one cardinal direction, corners have infinite target size in two cardinal directions. Any click target that's not infinite in comparison, has infinitely smaller area, which I suppose you could conclude is infinitely worse if clickable area is your primary metric.
This wasn't just the menu bar either. The first Windows 95-style interfaces didn't extend the start menu click box to the lower left corner of the screen. Not only did you have to get the mouse down there, you had to back off a few pixels in either direction to open the menu. Same with the applications in the task bar.
The concept was similar to NEXTSTEP's dock (that was even licensed by Microsoft for Windows 95), but missed the infinite area aspect that putting it on the screen edge allowed.
The infinitely worse part was when you maximized the window so the menu bar was at the top, but Windows still had the border there, which was unclickable.
So now you broke the infinite click target even though it looked like it should have one.
> So it looked like the Mac but was infinitely worse.
On single monitor setups maybe: but on early OS X multi-monitor setups, you then had the farcical situation where the menu would only be shown on the "primary" display, and the secondary display didn't have any menu at all, so to use menus for windows that were on the secondary display, you had to move the cursor onto the other primary display where the menu was for all windows (or use keyboard shortcuts).
I think 10.6/7 (not sure exactly) was when they started putting the menu bar on both displays rather than just the primary.
From what I can tell, the key difference between Anthropic and OpenAI in this whole thing is that both want the same contract terms, but Antropic wants to enforce those terms via technology, and OpenAI wants to enforce them by ... telling the Government not to violate them.
It's telling that the government is blacklisting the company that wants to do more than enforce the contract with words on paper.
I think it's dumber than that; the terms of the contract, as posted by OpenAI (https://openai.com/index/our-agreement-with-the-department-o...), are basically just "all lawful purposes" plus some extra words that don't modify that in any significant way.
> The Department of War may use the AI System for all lawful purposes, consistent with applicable law, operational requirements, and well-established safety and oversight protocols. The AI System will not be used to independently direct autonomous weapons in any case where law, regulation, or Department policy requires human control, nor will it be used to assume other high-stakes decisions that require approval by a human decisionmaker under the same authorities. Per DoD Directive 3000.09 (dtd 25 January 2023), any use of AI in autonomous and semi-autonomous systems must undergo rigorous verification, validation, and testing to ensure they perform as intended in realistic environments before deployment.
> For intelligence activities, any handling of private information will comply with the Fourth Amendment, the National Security Act of 1947 and the Foreign Intelligence and Surveillance Act of 1978, Executive Order 12333, and applicable DoD directives requiring a defined foreign intelligence purpose. The AI System shall not be used for unconstrained monitoring of U.S. persons’ private information as consistent with these authorities. The system shall also not be used for domestic law-enforcement activities except as permitted by the Posse Comitatus Act and other applicable law.
So it seems that Anthropic's terms were 'no mass domestic surveillance or fully autonomous killbots', the government demanded 'all lawful use', and the OpenAI deal is 'all lawful use, but not mass domestic surveillance or fully autonomous killbots... unless mass domestic surveillance or fully autonomous killbots are lawful, in which case go ahead'.
That isn't my understanding. OpenAI and others are wanting to limit the government to doing what is lawful based on what laws the government writes. Anthropic is wanting to draw their own line on what is allowed regardless of laws passed.
I’m so confused by the focus on “all lawful use.” Yea of course a contract without terms of use implicitly is restricted by laws. But contracts with terms of use are incredibly common, if not almost every single contract ever signed.
The administration objected to those terms of use. Anthropic refused to compromise on them. OpenAI agreed to permit "all lawful use" but claims to have insisted on what at first glance appears to be terms of use in their contract. But in reality those terms permit all lawful use and thus are a noop.
The key difference is that Anthropic aired their disagreement with the DoD publicly, and the DoD is not going to work with a company that tries to exert any amount of control over their relationship via the public sphere. Same goes for Trump.
I think Anthropic knew full well that by publishing their disagreement, it would sink the deal and relationship, and I think they also calculated (correctly) that that act of defiance would get them good publicity and potentially peel away some of OpenAIs user base. I think this profit incentive happened to align with their morals, and now here we are.
I thought the key difference was that Brockman is top Trump donor, with USD 25M total [1]. I know it's technically not allowed, but do you think such a large amount of money would have swayed Trump in his decision?
No, it’s significantly worse than that. OpenAI has required zero actual guarantees from the government and Sam. The psychopath is lying to you. All the government has to do is have a lawyer say it’s legal, and most of the government’s lawyers are folks who were involved in attempting to overthrow the last election and should’ve been convicted of treason, so that means very little.
Anthropic wants to enforce them via language of the contracts and take a hands off approach. OpenAI has a contract that is paired with humans in the room (FDEs) that can pull the plug.
Anthropic isn't preventing them from managing their key technologies. If my software license says 1000 users, and I build into the software that you can only connect with 1000 users, is your argument that the government can no longer manager their tech?
That my software should allow license violations if the government thinks it is necessary?
I worked in defense contracting looong ago, so this is old news: when software is purchased by DoD or Govt generally, FAR compliance notices make it a license, not a sale of IP.
reply