Imagine the power goes out while sqlite is in the middle of writing a transaction to the WAL (before the write has been confirmed to the application). What do you want to happen when power comes back, and you reload the database?
If the transaction was fully written, then you'd probably like to keep it. But if it was not complete, you want to roll it back.
How does sqlite know if the transaction was complete? It needs to see two things:
1. The transaction ends with a commit frame, indicating the application did in fact perform a `COMMIT TRANSACTION`.
2. All the checksums are correct, indicating the data was fully synced to disk when it was committed.
If the checksums are wrong, the assumption is that the transaction wasn't fully written out. Therefore, it should be rolled back. That's exactly what sqlite does.
This is not "data loss", because the transaction was not ever fully committed. The power failure happened before the commit was confirmed to the application, so there's no way anyone should have expected that the transaction is durable.
The checksum is NOT intended to detect when the data was corrupted by some other means, like damage to the disk or a buggy app overwriting bytes. Myriad other mechanisms should be protecting against those already, and sqlite is assuming those other mechanisms are working, because if not, there's very little sqlite can do about it.
Why is the commit frame not sufficient to determine whether the transaction was fully written or not? Is there a scenario where the commit frame is fsynced to disk but the proceeding data isn't?
The disk controller may decide to write out blocks in a different order than the logical layout in the log file itself, and be interrupted before completing this work.
It’s worth noting this is also dependent on filesystem behavior; most that do copy-on-write will not suffer from this issue regardless of drive behavior, even if they don’t do their own checksumming.
NVMe drives do their own manipulation of the datastream. Wear leveling, GC, trying to avoid rewriting an entire block for your 1 bit change, etc. NVMe drives have CPUs and RAM for this purpose; they are full computers with a little bit of flash memory attached. And no, of course they're not open source even though they have full access to your system.
ZFS isn’t viable for SQLite unless you turn off fsync’s in ZFS, because otherwise you will have the same experience I had for years; SQLite may randomly hang for up to a few minutes with no visible cause, if there isn’t sufficient write txg’s to fill up in the background. If your app depends on SQLite, it’ll randomly die.
Btrfs is a better choice for sqlite, haven’t seen that issue there.
The latest comment seems to be a nice summary of the root cause, with earlier in the thread pointing to ftruncate instead of fsync being a trigger:
>amotin
>I see. So ZFS tries to drop some data from pagecache, but there seems to be some dirty pages, which are held by ZFS till them either written into ZIL, or to disk at the end of TXG. And if those dirty page writes were asynchronous, it seems there is nothing that would nudge ZFS to actually do something about it earlier than zfs_txg_timeout. Somewhat similar problem was recently spotted on FreeBSD after #17445, which is why newer version of the code in #17533 does not keep references on asynchronously written pages.
You know what's even easier than doing that? Neglecting to do it or meaning to do it then getting pulled in to some meeting (or other important distraction) and then imagining you did it.
> Neglecting to do it or meaning to do it then getting pulled in to some meeting (or other important distraction) and then imagining you did it.
If your job is to make sure your file system and your database—SQLite, Pg, My/MariDB, etc—are tuned together, and you don't tune it, then you should be called into a meeting. Or at least the no-fault RCA should bring up remediation methods to make sure it's part of the SOP so that it won't happen again.
The alternative the GP suggests is using Btrfs, which I find even more irresponsible than your non-tuning situation. (Heck, if someone on my sysadmin team suggested we start using Btrfs for anything I would think they were going senile.)
Facebook is apparently using it at scale, which surprised me. Though that’s not necessarily an endorsement, and who knows what their kernel patcheset looks like.
Apropos this use case, ZFS is usually not recommended for databases. Competent database storage engines have their own strong corruption detection mechanisms regardless. What filesystems in the wild typically provide for this is weaker than what is advisable for a database, so databases should bring their own implementation.
Actually bad news. Most popular filesystems and filesystem configurations have limited and/or weak checksums, certainly much worse than you'd want for a database. 16-bit and 32-bit CRCs are common in filesystems.
This is a major reason databases implement their own checksums. Unfortunately, many open source databases have weak or non-existent checksums too. It is sort of an indefensible oversight.
So when checksums are enabled and the DB process restarts or the host reboots does the DB run the checksum over all the stored data? Sounds like it would take forever for the database to come online. But if it doesn't it may not detect bitrot in time...?
On the other hand, I've heard people recommend running Postgres on ZFS so you can enable on the fly compression. This increases CPU utilization on the postgres server by quite a bit, read latency of uncached data a bit, but it decreases necessary write IOPS a lot. And as long as the compression is happening a lot in parallel (which it should, if your database has many parallel queries), it's much easier to throw more compute threads at it than to speed up the write-speed of a drive.
And after a certain size, you start to need atomic filesystem snapshots to be able to get a backup of a very large and busy database without everything exploding. We already have the more efficient backup strategies from replicas struggle on some systems and are at our wits end how to create proper backups and archives without reducing the backup freqency to weeks. ZFS has mature mechanisms and zfs-send to move this data around with limited impact ot the production dataflow.
Is an incremental backup of the database not possible? Pgbackrest etc. can do this by creating a full backup followed by incremental backups from the WAL.
On the big product clusters, we have incremental pgbackrest backups running for 20 minutes. Full backups take something between 12 - 16 hours. All of this from a sync standby managed by patroni. Archiving all of that takes 8 - 12 hours. It's a couple of terabytes on noncompressible data that needs to move. It's fine though, because this is an append-log-style dataset and we can take our time backing this up.
We also have decently sized clusters with very active data on them, and rather spicy recovery targets. On some of them, a full backup from the sync standby takes 4 hours, we need to pull an incremental backup at most 2 hours afterwards, but the long-term archiving process needs 2-3 hours to move the full backup to the archive. This is the first point in which filesystem snapshots, admittedly, of the pgbackrest repo, become necessary to adhere to SLOs as well as system function.
We do all of the high-complexity, high-throughput things recommended by postgres, and it's barely enough on the big systems. These things are getting to the point of needing a lot more storage and network bandwidth.
No, competent systems just need to have something that, taken together, prevents data corruption.
One possible instance of that is a database providing its own data checksumming, but another perfectly valid one is running one that does not on a lower layer with a sufficiently low data corruption rate.
Say more? I've heard people say that ZFS is somewhat slower than, say, ext4, but I've personally had zero issues running postgres on zfs, nor have I heard any well-reasoned reasons not to.
> What filesystems in the wild typically provide for this is weaker than what is advisable for a database, so databases should bring their own implementation.
The corruption was likely present for months or years, and postgres didn't notice.
ZFS, on the other hand, would have noticed during a weekly scrub and complained loudly, letting you know a disk had an error, letting you attempt to repair it if you used RAID, etc.
It's stuff like in that post that are exactly why I run postgres on ZFS.
If you've got specifics about what you mean by "databases should bring their own implementation", I'd be happy to hear it, but I'm having trouble thinking of any sorta technically sound reason for "databases actually prefer it if filesystems can silently corrupt data lol" being true.
SQLite on ZFS needs the Fsync behaviour to be off, otherwise SQLite will randomly hang the application as the fsync will wait for the txg to commit. This can take a minute or two, in my experience.
Btw this concern also applies to other databases, although probably it manifests in the worst way in SQLite. Essentially, you’re doing a WAL over the file systems’ own WAL-like recovery mechanism.
The point is that a database cannot rely on being deployed on a filesystem with proper checksums.
Ext4 uses 16-/32-bit CRCs, which is very weak for storage integrity in 2025. Many popular filesystems for databases are similarly weak. Even if they have a strong option, the strong option is not enabled by default. In real-world Linux environments, the assumption that the filesystem has weak checksums usually true.
Postgres has (IIRC) 32-bit CRCs but they are not enabled by default. That is also much weaker than you would expect from a modern database. Open source databases do not have a good track record of providing robust corruption detection generally nor the filesystems they often run on. It is a systemic problem.
ZFS doesn't support features that high-performance database kernels use and is slow, particularly on high-performance storage. Postgres does not use any of those features, so it matters less if that is your database. XFS has traditionally been the preferred filesystem for databases on Linux and Ext4 will work. Increasingly, databases don't use external filesystems at all.
I don't know but LLMs seem to think it uses a 32-bit CRC like e.g. Postgres.
In fairness, 32-bit CRCs were the standard 20+ years ago. That is why all the old software uses them and CPUs have hardware support for computing them. It is a legacy thing that just isn't a great choice in 2025.
Is not great for databases that do updates in place. Log-structured merge databases (which most newer DB engines are) work fine with its copy-on-write semantics.
The Workers platform uses Cap'n Proto extensively, as one might expect (with me being the author of Cap'n Proto and the lead dev on Workers). Some other parts of Cloudflare use it (including the logging pipeline, which used it before I even joined), but there are also many services using gRPC or just JSON. Each team makes their own decisions.
I have it on my TODO list to write a blog post about Workers' use of Cap'n Proto. By far the biggest wins for us are from the RPC system -- the serialization honestly doesn't matter so much.
That said, the ecosystem around Cap'n Proto is obviously lacking compared to protobuf. For the Cloudflare Workers Runtime team specifically, the fact that we own it and can make any changes we need to balances this out. But I'm hesitant to recommend it to people at other companies, unless you are eager to jump into the source code whenever there's a problem.
A while ago I talked to some team that was planning a migration to GraphQL, long after this was generally thought to be a bad idea. The lead seemed really attached to the "composable RPCs" aspect of the thing, and at the time it seemed like nothing else offered this. It would be quite cool if capnproto became a more credible option for this sort of situation. At the time users could read about the rpc composition/promise passing/"negative latency" stuff, but it was not quite implemented.
This makes me really sad. Protobufs are not all that great, but they were there first and “good enough”.
It’s frustrating when we can’t have nice things because a mediocre Google product has sucked all the air out of the room. I’m not only talking about Protobufs here either.
Heh, of all arguably-valid definitions of "LAN Party" I think this one is as far away from mine as you can get.
Traditional LAN party: Everyone brings their computers to one place to connect via a LAN, where they play games, swap files, demo stuff to each other, etc.
My LAN party: All my friends come over to my house and use the computers that I have already set up for them. Nobody brings their own. The point is to interact face-to-face, with video games as a catalyst. Swapping files and demos doesn't really happen since nobody brought their own computer. (My house: https://lanparty.house)
The Promised LAN Party: The LAN is extended, virtually, across multiple houses, so that the participants can play games, swap files, and demo stuff without actually leaving home. It's arguably no longer "local" but functionally it enables the same activities as a LAN party, other than the face-to-face interaction part.
I wonder who gets told their definition is "wrong" more. :)
This is a pretty amazing setup! I think in 2025 I would definitely prefer something like this. However, I think back in "the day" part of what made LAN parties fun was that everyone's PC was so individualized. I remember all of my high school friends and I coming of age and building our PCs. I helped a lot of my friends build their PCs and we all chose different things (such as the amount of RGB LEDs, which I thought were tacky...). I remember a friend of a friend had a water cooling system and I was so excited about checking it out. Also, things like the desktop wallpaper you chose, etc, contributed to this. There was something very magical about it all. Lugging our PCs to each others houses was a real labor of love.
And a real risk of a shattered CRT screen! I remember carting my bougie 17” Viewsonic around in the back of my Hyundai Excel and wondering if it would pick up a crack along the voyage…
CRTs might tougher than we gave them credit for. I once dropped a Sony Trinitron from shoulder height when it hit a low ceiling. Didn't crack. Still worked. (And yes, this was at a LAN party.)
When I was a kid we threw out our old B&W tv. I wanted to smash the CRT but had heard that they could explode so from a distance I fired several .22 bullets at the screen. They had no effect. IIRC the screen wasn't damaged at all? I can hardly believe what I'm writing but it was true
CRTs have to maintain a near-vacuum inside IIRC. So it's probably a matter of safety to make them strong; if they're too delicate and get mishandled, they implode and some hapless consumer gets a face full of glass.
Wouldn't imploding rather than exploding prevent the face full of glass? But I suppose it has to be pretty strong to maintain that vacuum even if they assumed no one ever touched, moved, or got near it.
Back in the hayday of lan parties in like 1995-1997 my only monitor was a absolute boulder of a 21" viewsonic (this is pre flatscreen or rather pre decent flatscreens, you could get like 15-17s but they were expensive and absolute trash). One night coming home from the bars, half drunk, in an alley my friend and I found an abandoned (maybe..) horizontal-able handtruck. Made the lan party load unload so much better.
In the previous century i visited many lan parties with my absolute beast of a pc case (an old Siemens 4U 19" metal monster where i stuffed an Amd Athlon setup in with a bunch of harddrives) that i got for free from somewhere. Then carried the huge CRT screen and placed it on top of it. It was insane, i was young (and insane), but i got it all dirt cheap. Most people loved it. And even back then repurposing discarded or super cheap hardware for as long as possible for as many functions as possible gave me much joy and saved me a great bunch of money.
If i had to do a "lan party" these days i'd just connect my Steam Deck to some hdmi beamer and play Jackbox games with a bunch of people.
The Promised LAN is a bit of a WAN party, but I would say that "LAN party" can certainly be assumed to include virtual LANs.
I'm even willing to say that a get-together of friends in the same location playing the same online game (perhaps using laptops / handhelds / tablets / smartphones / etc.) still fits the spirit of the LAN party, even though it might technically be over a WAN. (Former LAN game series like Diablo have evolved in this direction, for better or worse, and MMOs were always in this space. It's still a blast to play them with people in the same room.)
The best LAN party is the one that you are part of.
For my definition, I totally don't care whether the server is local or in the cloud. I have a fast enough internet connection that it won't make a difference. (I mean. I would _like_ servers to be local, but I'm not going to refuse a game just because it uses cloud servers.)
The important thing is only that the players are local.
That website was a very fun read :) what a cool place and so awesome to have so many friends to play with.
This line made me chuckle:
> I suggested to Jade: Should we move to Austin? Jade initially said no, because she wanted our kids to benefit from Palo Alto's school district. At the time, it was rated #12 in the nation. But, looking closer at the rankings revealed a surprise: The Eanes school district in Austin was #8. When I showed this to Jade, she changed her mind.
Could tell your wife was Chinese without even seeing the name. Chinese parents will made radical housing decisions for their children, even just to move from #12 to #8, lol. Love this.
In 1999 or so, there was a exclusive demo of Unreal Tournament you could download and play if you had a 3dFX video card. However, someone found out if you created a text file called "glide2.dll" in the game binary directory, you could run the demo in a software rendered mode.
At the time, I worked for a company with a large training room full of computers. The room had locking doors, and a small, narrow window in only one door. We made a cardboard cutout that fit into that window perfectly, and painted it flat black. If you put it in the inside of the window, it appeared as if the room was empty and dark. We called it the "beat-down screen".
We loaded up the UT demo on every machine in that room, and used to get a bunch of like-minded gamers to come down at the end of the work day and we'd play the three demo maps for hours. We eventually added Half-Life deathmatch (I loved the snark pit map) and Counterstrike. None of those machines had discrete video cards, so we had to run in software rendering mode on all games, at something ridiculous like 320x200, but it was glorious.
I was in high school in the late 90s to early aughts. The school system used Novell NetWare with Windows NT workstations. At the time, their security was lax. In fact, they set up the directory so that by default, every user logged in using the first four letters of their first name coupled with the last four letters of their last name as the username, and the last four digits of their phone number as their password. I realized this also applied to school employees. Most of whom never changed their password. All of whom were local administrators for computers. Some of whom had network administration rights.
I used multiple school officials' accounts to log in, push a copy of UT99, filled with custom maps, to a network share. We would then copy that folder to the hard drive of the school computers and play UT99 on them. We had amazing LAN parties where we would find empty computer labs after school and play games for hours.
They had BNC networking in that building at the time. It took "forever" in my mind to copy the game from the network share to the local hard drive. Totally worth it.
In those days they even let us maintain the high school website using Dreamweaver...
I was a sophomore in college in '99-00 and they had just brought in new Power Mac G4 machines with ATI Rage 128 16MB AGP cards. They were faster and better than anything I had used to that point and it also was around when Unreal Tournament was released which was a big deal. I was the administrator of this lab and was supposed to oversee students working on video and audio projects. But instead, we had epic UT tournaments and better yet, I got paid to be there. I also had keys to get into the lab so we would catch a buzz and play for hours.
My high school computer lab in 1998-99 was full of Windows 98 machines running a program called "Fortress" which was meant to lock it down and prevent tampering.
I made a custom boot disk (floppy) that would boot Windows bypassing Fortress. It was pretty easy.
During computer programming class I'd install Worms on the machine and we'd all play.
The instructor was a cool guy and said it was fine as long as we were getting our work done.
On one of the tests he included a question: "Who is the master of the ninja rope?"
Honestly I think interviews are just not something Linus does a lot of. He has a particular workflow where he writes a script for himself and then acts it out for the camera. Changing it up would be going outside his comfort zone and there wasn't a lot of time.
That said there is a "behind the scenes" video where we initially toured him through the house which is a lot more conversational, but it's on their pain subscription service.
Funnily enough the (only) LAN parties I ever experienced "back in the day" were pretty similar to yours:
1. there was one smallish computer lab tucked under a stairs in the science department in university, in which all of the computers had been "compromised" in some fashion & games installed for student LAN parties. Mainly after hours for those living on campus.
2. In the first tiny little company I ever worked for we'd have them in the office on occasion.
For your "traditional" types - how did people transport their computers? Laptops?
You just loaded up your enormous full tower PC and 17 inch CRT monitor in the back of your friend's brother's cousin's friend's station wagon, and made it happen. I had a Rubbermaid tub that I would use to lug the tower and all the necessary cables and accessories. A properly gaming-specced laptop would have been absurdly expensive (they still are) and a bit like cheating anyway.
I had a lanboy? case that came with a carry strap. The case was also mostly aluminum so it was pretty light. Not many people had laptops back then. Managers at work maybe had one. Most people had a proper desktop.
Laptops??? There weren't gaming-capable laptops in the 90's, and besides that, the ultimate status symbol at a LAN party was lugging in your 80-pound 20" Sony Trinitron CRT.
I vividly remember the desk holding my 15" Trinitron slowly bending under the load, but I don't know what it would have weighed. I'd imagine that 20" was a bear to get out of the car and make your way inside with.
Similarly, I'm not sure how 13 or 14 year old me got a 27" Trinitron TV downstairs by myself. 34 year old me would need an entire bottle of Advil for sure.
I did bring my dell craptop with that I scrounged parts from 3 nonworking used business ones together...it had the ATi Mobility 1 (Rage 128?) and could do half life 1 if you really tweaked things or Quake 3.
That aside, it sucked performance wise even with a Pentium 3. My main PC and 19in monitors were what we drug around to LANs all over.
There used to be an old movie theater in North Branch MN that was converted to basically a permanent LAN Party where people would just come and go.
Movies and bands would play on the stage on weekends or something, too. Best time of my life.
This is what I thought which is what confused me - I guess lugging around CRTs just seemed a little much.
I don't live in the US though so perhaps we just missed out on that rite of passage not living somewhere where kids are more likely to have access to a car.
Big fan of both your LAN houses! One thing I noticed is that you don't seem to have any art/pictures/decorations on any of your walls. Is that an intentional choice?
At the time the pictures were taken, we hadn't gotten around to populating the walls much. Now we've hung up a lot of our kids' art, nicely framed. Amusingly a lot of it looks sort of like abstract modern art, like Jackson Pollock or Rothko, enough so to confuse guests. :)
Abstract modern art seems to have some things in common with some kids' art: focus on materials, color, texture, perception; and representational art may focus on symbolism rather than realistic rendering. There's also an authenticity to art that is created for personal expression without worrying what other people will think about it.
It's hard to capture three dimensional physical art in two dimensions and/or digitallly, even more so when the art is abstract. The context and interaction with the physical environment can also be important.
Indoor plants are tricky with cats. They will chew on them, and many plants can be poisonous to them. Also keeping them watered is work, and they can create a mess if they grow in the wrong direction.
Do y'all get mosquitos on the roof? My back patio is screened in and some are still sneaking through. Would love to know the wired-ethernet-certainty type of approach to dealing with this.
Hybrid LANs are also pretty good. During the Christmas-New Year break, old school friends would often have a LAN party with ZeroTier or similar set up, so people who couldn't make it could still drop in and out. You'd get a great LAN party vibe from the room full of PCs and ethernet cables everywhere, but you'd also get much higher player counts. Everyone wins.
I think we can welcome virtual LAN parties into the fold, especially post-pandemic. Also games like Diablo 4 have given up on LAN and are WAN-only (though there is couch co-op?) but still fun to play in person, as are MMOs, online team FPS, etc.
Interesting! I grew up before network cards was a thing in home computers (Commodore 64 and Amiga), but a group of my friends organized what we called «meetings» which I would characterize as your traditional LAN party. I remember at some point that we hooked up two Amigas over a fairly long parallel cable and were able to send data across. Cannot recall if we actually were able to copy larger files between them though. Fun times!
To me, one of the defining features of a LAN party is a single broadcast domain. I thought that is what this was, at first, but it is actually an L3 overlay network with DNS and BGP and the whole nine yards. Somewhat a stretch for a LAN. :D
Absolutely. I think while plenty of games now can rely on UDP hole punching or file sharing can be done with centralized cloud storage, part of the earlier LAN appeal was sharing a segment together and all the ease that brought.
OS's like Windows can easily share folders and printers, games (particularly older ones) run LAN discovery off of broadcasts, and the lot. Sure, sometimes you can route it, but when I think LAN, I think back to the wireless bridges in a neighborhood LAN between houses we would setup - ARPs and all, in a big messy broadcast domain that worked well enough.
Today I think I'd reach for GRE tunnels to add that functionality if I was them. Otherwise, this is just the Internet with more steps.
It doesn’t capture the spirit of a group of friends getting together to play video games in a shared space? Or there’s a different definition of the LAN party spirit that somehow entirely precludes that aspect?
Like I could understand saying it misses out on the aspect of literally bringing your individual PCs, missing out on the neatness of everyone’s individuality as another commenter pointed out, but I don’t think they’d agree that the in person, gaming in the same place aspect is entirely precluded from “the spirit”
You are trying to deconstruct something, keeping part of it and calling it the same. You seem to be making my argument for me, so I have nothing to add.
Ah yes, spending an hour getting setup then at least two peoples computers don't boot or the GPU doesn't work...then no one has or installed any of the games you're hosting. Good memories
Obviously, as you predicted, the first reaction is "how do you afford all of that", which is a silly question, because the answer is "just be in the right place in the right moment".
Now, the second question is how do you get to actually organize a big party? My experience is that in modern times it's very difficult to maintain an extensive social network. First, people live far away from each other, so visiting someone becomes a journey. Second, people have shit to do, and when you invite them for a beer it usually means asking them to give up something else in that time (like taking care of their kids). Third, in the age of hyperindividualism it's difficult to meet people you vibe with, because everyone has their own distinct personality and the era of shared values and hobbies seems to be gone.
That allowed us to play Warcraft II with strangers on the Internet.
Wild times!
P.S: my very best LAN that said was a coax cable going through the window directly to my neighbours' house. Brothers in each house made for nice Warcraft II games.
Intel already has a great value GPU. Everyone wants them to disrupt the game, destroy the product niches. It's general purpose compute performance is quite ass alas but maybe that doesn't matter for AI?
I'm not sure if there are higher capacity gddr6 & 7's rams to buy. I semi doubt you can add more without more channels, to some degree, but also, AMD just shipped R9700 based on rx9070 but with double the ram. But something like Strix Halo, an API with more lpddr channels could work. Word is that Strix Halo's 2027 successor Medusa Halo will go to 6 channels and it's hard to see a significant advantage without that win; the processing is already throughput constrained-ish and a leap on memory bandwidth will definitely be required. Dual channel 128b isn't enough!
There's also MRDIMMs standard, which multiplexes multiple chips. That promises a doubling of both capacity and throughout.
Apple's definitely done two brilliant costly things, by putting very wide (but not really fast) memory on package (Intel had dabbled in doing similar with regular width ram in consumer space a while ago with Lakefield). And then by tiling multiple cores together, making it so that if they had four perfect chips next to each other they could ship it as one. Incredibly brilliant maneuver to get fantastic yields, and to scale very big.
It's not faster at running Qwen3-Coder, because Qwen3-Coder does not fit in 96GB, so can't run at all. My goal here is to run Qwen3-Coder (or similarly large models).
Sure you can build a cluster of RTX 6000s but then you start having to buy high-end motherboards and network cards to achieve the bandwidth necessary for it to go fast. Also it's obscenely expensive.
This is a style issue. Different people can have different definitions and cultures around TODOs.
My codebases tend to use TODO exactly as described here. A TODO is really just a comment documenting the implementation -- specifically, documenting something that is missing, but could be useful. It doesn't mean it actually needs to be done.
IMO it doesn't make sense to use comments in the codebase itself as an actual task list, because priorities are constantly changing. Things that seemed important when you wrote the code may turn out not to be, and things that you didn't think of when writing turn out to be needed. Are you constantly submitting PRs just to update the TODO comments to reflect current thinking? I mean, sure, I guess you could do that, but I think it makes more sense to maintain that list in a bug tracker or even just a text document that can be updated with less overhead.
It's not vibe coded, though. "Vibe coding" means taking the AI's code with no review at all. Whereas I carefully reviewed the AI's output for workers-oauth-provider.
> and when they make a change that breaks the existing API contract it always makes for several miserable days for me
If we changed an API in Workers in a way that broke any Worker in production, we consider that an incident and we will roll it back ASAP. We really try to avoid this but sometimes it's hard for us to tell. Please feel free to contact us if this happens in the future (e.g. file a support ticket or file a bug on workerd on GitHub or complain in our Discord or email kenton@cloudflare.com).
Thank you! To clarify it's been API contracts in the DNS record setting API that have hit me. I'm going from memory here and it's been a couple years I think so might be a bit rusty, but one example was a slight change in data type acceptance for TTL on a record. It used to take either a string or integer in the JSON but at some point started rejecting integers (or strings, whichever one I was sending at the time stopped being accepted) so the API calls were suddenly failing (to be fair that might not have technically been a violation of the contract, but it was a change in behavior that had been consistent for years and which I would not have expected). Another one was regarding returning zone_id for records where the zone_id stopped getting populated in the returned record. Luckily my code already had the zone_id because it needs that to build the URL path, but it was a rough debugging session and then I had to hack around it by either re-adding the zone ID to the returned record or removing zone ID from my equality check, neither of which were preferred solutions.
If we start using workers though I'll definitely let you know if any API changes!
> some technical decisions are absurd, such as the worker's cache.delete method, which only clears the cache contents in the data center where the Worker was invoked!!!
The Cache API is a standard taken from browsers. In the browser, cache.delete obviously only deletes that browser's cache, not all other browsers in the world. You could certainly argue that a global purge would be more useful in Workers, but it would be inconsistent with the standard API behavior, and also would be extraordinarily expensive. Code designed to use the standard cache API would end up being much more expensive than expected.
With all that said, we (Workers team) do generally feel in retrospect that the Cache API was not a good fit for our platform. We really wanted to follow standards, but this standard in this case is too specific to browsers and as a result does not work well for typical use cases in Cloudflare Workers. We'd like to replace it with something better.
>cache.delete obviously only deletes that browser's cache, not all other browsers in the world.
To me, it only makes sense if the put method creates a cache only in the datacenter where the Worker was invoked. Put and delete need to be related, in my opinion.
Now I'm curious: what's the point of clearing the cache contents in the datacenter where the Worker was invoked? I can't think of any use for this method.
My criticisms aren't about functionality per see or developers. I don't doubt the developers' competence, but I feel like there's something wrong with the company culture.
> To me, it only makes sense if the put method creates a cache only in the datacenter where the Worker was invoked. Put and delete need to be related, in my opinion.
That is, in fact, how it works. cache.put() only writes to the local datacenter's cache. If delete() were global, it would be inconsistent with put().
> Now I'm curious: what's the point of clearing the cache contents in the datacenter where the Worker was invoked? I can't think of any use for this method.
Say you read the cache entry but you find, based on its content, that it is no longer valid. You would then want to delete it, to save the cost of reading it again later.
> There was also a review of that code about a week later [0] which highlights the problems with LLM-generated code.
Not really. This "review" was stretching to find things to criticize in the code, and exaggerated the issues he found. I responded to some of it: https://news.ycombinator.com/item?id=44217254
Unfortunately I think a lot of people commenting on this topic come in with a conclusion they want to reach. It's hard to find people who are objectively looking at the evidence and drawing conclusions with an open mind.
Thanks for responding. I read that dude's review, and it kind of pissed me off in an "akshually I am very smart" sort of way.
Like his first argument was that you didn't have a test case covering every single MUST and MUST NOT in the spec?? I would like to introduce him to the real world - but more to the point, there was nothing in his comments that specifically dinged the AI, and it was just a couple pages of unwarranted shade that was mostly opinion with 0 actual examples of "this part is broken".
> Unfortunately I think a lot of people commenting on this topic come in with a conclusion they want to reach. It's hard to find people who are objectively looking at the evidence and drawing conclusions with an open mind.
Couldn't agree more, which is why I really appreciated the fact that you went to the trouble to document all of the prompts and make them publicly available.
Thank you for answering, I haven't seen your rebuke before. It does seem that any issues, even if there would be any (your arguments about CORS headers sound convincing to me, but I'm not an expert on the subject - I study them every time I need to deal with this) were not a result of using LLM but a conscious decision. So either way, LLM has helped you achieve this result without introducing any bugs that you missed and Mr. Madden found in his review, which sounds impressive.
I won't say that you have converted me, but maybe I'll give LLMs a shot and judge for myself if they can be useful to me. Thanks, and good luck!
You can certainly make the argument that this demonstrates risks of AI.
But I kind of feel like the same bug could very easily have been made by a human coder too, and this is why we have code reviews and security reviews. This exact bug was actually on my list of things to check for in review, I even feel like I remember checking for it, and yet, evidently, I did not, which is pretty embarrassing for me.
> There's no technical enforcement of the policies in the file, it's up to the client to honor them.
That's incorrect. Cloudflare does in fact enforce this at a technical level. Cloudflare has been doing bot detection for years and can pretty reliably detect when bots are not following robots.txt and then block them.
Imagine the power goes out while sqlite is in the middle of writing a transaction to the WAL (before the write has been confirmed to the application). What do you want to happen when power comes back, and you reload the database?
If the transaction was fully written, then you'd probably like to keep it. But if it was not complete, you want to roll it back.
How does sqlite know if the transaction was complete? It needs to see two things:
1. The transaction ends with a commit frame, indicating the application did in fact perform a `COMMIT TRANSACTION`.
2. All the checksums are correct, indicating the data was fully synced to disk when it was committed.
If the checksums are wrong, the assumption is that the transaction wasn't fully written out. Therefore, it should be rolled back. That's exactly what sqlite does.
This is not "data loss", because the transaction was not ever fully committed. The power failure happened before the commit was confirmed to the application, so there's no way anyone should have expected that the transaction is durable.
The checksum is NOT intended to detect when the data was corrupted by some other means, like damage to the disk or a buggy app overwriting bytes. Myriad other mechanisms should be protecting against those already, and sqlite is assuming those other mechanisms are working, because if not, there's very little sqlite can do about it.
reply