More

dgacmu · 2025-11-08T15:30:40 1762615840

I'm not sure I understand Q1 - that's exactly the point: If you withdraw _from your account_ and customer B withdraws from _their_ account, then the two events are unrelated and can be executed in either order (and, in fact, replicas would still have the same state even if some executed AB and some BA).

The replay is part of what the authors fixed in the original protocol. I believe but need to read their protocol in more detail on Monday that the intuition for this is that when there's an outage and you bring a new node online, the system commits a Nop operation that conflicts with everything. This effectively creates a synchronization barrier that that forces re-reading all of the previous commits.

But I'm confused about the phrasing of your question because the actor isn't clear here when you say "I re-read events 1-100" -- which actor is "I"? Remember that a client of the system doesn't read "events", it performs operations, such as "read the value of variable X". In other words, clients perform operations that observe _state_, and the goal of the algorithm is to ensure that the state at the nodes is consistent according to a specific definition of consistency.

So if a client is performing operations that involve a replacement node, the client contacts the node to read the state, and the node is responsible for synchronizing with the state as defined by the graph of operations conflicting with the part of the state requested by the client, which will include _all_ operations prior to the replacement of the node due to the no-op.

mjevans · 2025-11-08T16:30:37 1762619437

I forget the term, it might be Dependency Graph.

Hypothetically lets say there's a synchronized quantum every 60 seconds. Order of operations might not matter if transactions within that window do not touch any account referenced by other transactions.

However every withdrawal is also a deposit. If Z withdraws from Y, and Y withdraws from X, and X also withdraws from Z there's a related path.

Order also matters if any account along the chain would reach an 'overdraft' state. The profitable thing for banks to do would be to synchronously deduct the withdrawals first, then apply them to maximize the overdraft fees. A kind thing would be the inverse, assume all payments succeed and then go after the sources. Specifying the order of applied operations, including aborts, in the case of failures is important.

dgacmu · 2025-11-08T17:29:32 1762622972

Those transfers would be represented as having dependencies on both accounts they touch, and so would be forced to be ordered.

Transfer(a, b, $50)

And

Transfer(b, c, $50)

Are conflicting operations. They don't commute because of the possibility that b could overdraft. So the programmer would need to list (a, b) as the dependencies of the first transaction and (b, c) as the second. Doing so would prevent concurrent submission of these transactions from being executed on the fast path.

mrkeen · 2025-11-08T19:03:34 1762628614

> I'm not sure I understand Q1 - that's exactly the point: If you withdraw _from your account_ and customer B withdraws from _their_ account

Same account.

> the actor isn't clear here when you say "I re-read events 1-100" -- which actor is "I"?

The fundamental purpose of Paxos is that different actors will come to a consensus. If different actors see different facts, no consensus was reached, and Paxos wasn't necessary.

dgacmu · 2025-11-08T19:08:00 1762628880

If it's the same account, the two operations will have the same dependencies, and thus the system will be forced to order them the same at all replicas.

Izkata · 2025-11-08T17:55:28 1762624528

Between their two questions, I'm guessing more directly what they're getting at is if events 100 and 101 can be reordered, what's the guarantee that reconnecting doesn't end up giving you event 100 twice and skipping 101?

[Edit, rereading] Shortened down, just this part is probably it:

> which will include _all_ operations prior to the replacement of the node due to the no-op.

Sounds like a graph merge, not actually a replay.

dgacmu · 2025-11-08T13:28:45 1762608525

otrack et al.: Thank you and congratulations! It's gratifying seeing the wheels of research make progress.

My appreciation of formal and machine-checked proofs has grown since we wrote the original EPaxos paper; I was delighted at the time at the degree to which Iulian was able to specify the protocol in TLA+, but now in hindsight wish we (or a later student) had made the push to get the recovery part formalized as well, so perhaps we'd have found these issues a decade ago. Kudos for finding and fixing it.

Have you yourselves considered formalizing your changes to the protocol in TLA+? I wonder if the advances the formal folks have made over the last decade or so would ease this task. Or, perhaps better yet -- one could imagine a joint protocol+implementation verification in a system like Ironfleet or Verus, which would be tremendously cool and also probably a person-year of work. :)

Edited to add: This would probably make a great masters thesis project. If y'all are not already planning on going there, I might drop the idea to Bryan Parno and see if we find someone one of these years who would be interested in verifying/implementing your fixed version in Verus. Let me know (or if we start down the path I'll reach out).

_benedict · 2025-11-08T14:12:55 1762611175

I can’t speak for the authors, but I have been lucky enough to be collaborating with them on behalf of the Apache Cassandra project, to refine and prove the correctness of the Accord protocol - a derivative of EPaxos we have integrated into the database.

It would be fantastic if such a project could be pursued for this variant, which has the distinction of being the only “real world” implementation.

Either way, thank you for the original EPaxos paper - it has been a privilege to convert its intuitions into a practical system.

ALLTaken · 2025-11-09T00:02:42 1762646562

msc thesis only left. happy to do it with a twist. training an ml model to speed up proof generation and verification.

can you share more?

dgacmu · 2025-11-08T13:16:26 1762607786

In practice, almost every implementation of Paxos uses multi-paxos. Even the "Paxos Made Simple" paper notes:

> In normal operation, a single server is elected to be the leader, which acts as the distinguished proposer (the only one that tries to issue proposals) in all instances of the consensus algorithm.

because otherwise you don't have a mechanism for ordering; the more basic Paxos protocol only discusses how to arrive at consensus for a single proposal, not how to assign numbers to them in a reasonable way that preserves ordering.

dgacmu · 2025-11-08T13:13:07 1762607587

The original Paxos paper was termed "The Part-Time Parliament", and was explained -- I'm serious here -- not as a distributed systems protocol, but as a discussion about how electors on a Greek island could vote despite wandering in and out of the room. (Lamport). It set the stage for a series of papers using that theme. We continued on that theme when picking the title for the EPaxos paper, and these folks built on that. So yeah, it's a bit of a thing specifically in the paxos literature.

And wait until I tell you about the Byzantine Generals Problem. :-)

tremon · 2025-11-08T15:26:33 1762615593

https://lamport.azurewebsites.net/pubs/byz.pdf , for those who lack patience.

lovelearning · 2025-11-09T04:08:56 1762661336

Thank you for explaining!

dgacmu · 2025-11-06T01:06:01 1762391161

The post you're replying to didn't explain it well, but: LFP batteries don't use cobalt (or nickel).

LFP production is starting to pass NMC (lithium + nickel manganese cobalt oxide). Slightly lower density but a lot of advantages in lack of easily catching on fire, longer lifetime -- and lack of cobalt. LFP (LiFePo4) is the battery chemistry of choice today for solar installations, where the longer lifetime and increased safety are a big win and the slightly lower density doesn't matter, unlike mobile applications.

aeonfox · 2025-11-06T02:52:06 1762397526

I suppose I could have been clearer, but I figure it was an easy connection tom make from talking about chemistry to the question of whether cobalt is even relevant.

dgacmu · 2025-11-05T17:56:25 1762365385

These have all been stated as goals by various machine learning research efforts. And -- they're actually all examples in which a better search heuristic through an absolutely massive configuration space is helpful.

dgacmu · 2025-11-05T00:35:19 1762302919

I've switched 3 of my 4 nests to a z-wave thermostat and I'm really happy[1] with it so far. The Honeywell T6 pro. I got them used for about $56 each and they were in near-perfect condition.

[1] With one exception, which is really niche to me: The T6 has what looks like a PID-style control algorithm hiding in it, and instead of specifying a deadband between on/off, you can only specify a max number of cycles per hour. I already have a home-brewed PID algorithm controlling the temperature target of my boiler, so I actually _want_ a stupider thermostat that will stay on/off a little longer. But this is purely because I'm weird. The T6 is really good at keeping the temperature on target, and the homeassistant integration was fast and easy and has been totally solid. I recommend - I'm just waiting for the last one to arrive and I will have completely replaced my Nests (gen2 + gen3).

I'll also add that the local UI on the T6 is much better than the one on the nest. And the installation process was really simple -- Honeywell clearly learned from Nest on this one, and then beat them with the UI. I'm really happy with the upgrade, even though I'm totally annoyed with Google for wrecking my perfectly functional thermostat.

dgacmu · 2025-10-27T21:10:28 1761599428

I think that it's more useful to think of all defenses against physical intrusion as increasing the cost of intrusion in some way, be that time, skill, risk of being caught, access to specialized devices, etc.

Most "normal" locks don't increase the cost too much but they do raise it - perhaps enough for a thief to pick another target, or perhaps enough for the thief to choose another method of entry such as kicking in the door (which itself comes with additional risk of detection).

LogicHound · 2025-10-28T00:59:27 1761613167

Exactly it is about layers. It is the same with computer security. Is my network "unhackable" no. But I've put up enough layers of basic security that script kiddies and the like won't be able to get in.

dgacmu · 2025-10-26T13:05:20 1761483920

We had our first when I was 37 and our second when I was 43. It wasn't so bad - it's tiring but I'm also a lot more emotionally mature than I was in my 20s and early 30s. And I have an absolute ton more money and stability (which also helps pay for things like nannies and school).

The thing I'm a little sad about is that I'm unlikely to be there for too long when my kids have kids.

bee_rider · 2025-10-26T16:31:58 1761496318

I dunno, medical science keeps advancing. Maybe you’ll live forever.

dgacmu · 2025-10-24T18:19:16 1761329956

I think you're misinterpreting: "with the advent of transformers, (many) people doing NLP with pre-transformers techniques had to salvage their shit"

rootnod3 · 2025-10-25T17:29:12 1761413352

I guess. That's why I added the "unless I am mis-interpreting", still got downvoted for it because I guess it was against AI. The wording was confusing but so was my understanding of it as a non-native speaker. Shit happens.

dgacmu · 2025-10-26T14:49:15 1761490155

I agree that the wording was a bit confusing (as a native speaker).