More

wwilson · 2025-05-30T17:02:34 1748624554

There have historically been two giant adoption challenges for DST.

(1) Previously, you had to build your entire system around one of the simulation frameworks (and then not take any dependencies).

(2) It’s way too easy to fool yourself with weak search/input generation, which makes all your tests look green when actually you aren’t testing anything nontrivial.

As you say, Antithesis is trying to solve both of these problems, but they are very challenging.

I don’t know of anybody else who has a reliable way of retrofitting determinism onto arbitrary software. Facebook’s Hermit project tried to do this with a deterministic Linux userspace, but is abandoned. (We actually tried the same thing before we wrote our hypervisor, but found it didn’t work well).

A deterministic computer is a generically useful technology primitive beyond just testing. I’m sure somebody else will create one someday, or we will open-source ours.

wwilson · 2025-05-30T16:10:38 1748621438

DST was i̶n̶v̶e̶n̶t̶e̶d̶ popularized at FoundationDB a little over a decade ago, and has been quietly gathering steam ever since. If you’re interested in the technique, the FDB paper has some good info in section 4 and section 6.2: https://www.foundationdb.org/files/fdb-paper.pdf

(Disclosure: I am an author.)

I also gave a talk overview of it at Strange Loop back in 2015, but don’t have the youtube link handy.

If you’re interested in trying DST, we have a new company that aims to make it a much easier lift, especially for existing projects that can’t be retrofitted onto one of the many simulation frameworks: https://antithesis.com

Happy to answer any questions about the approach, here or over email.

mjb · 2025-05-30T17:57:44 1748627864

Hey Will. I'm a huge fan of the work you all are doing, and of FoundationDB, but I don't believe it's accurate that DST was invented at FoundationDB (or, maybe it was, but was also used in other places around the same time or before).

For example, the first implementations of AWS's internal lock service (Alf) used DST as a key part of the testing strategy, sometime around 2009. Al Vermeulen was influential in introducing it at AWS, and I believe it built on some things he'd worked on before.

Still, Anithesis is super cool, and I really admire how you all are changing the conversation around systems correctness. So this is a minor point.

we6251 · 2025-05-30T20:30:00 1748637000

Also a huge proponent of Antithesis and their current work, but there definitely were some notable precedents at or around that time e.g. MODIST from 2009 (https://www.usenix.org/legacy/event/nsdi09/tech/full_papers/...), which similarly tried to build a "model checker for distributed systems".

As another interesting historical side note, I have wondered about the similarities between Antithesis and "Corensic", a startup spun out of UW around a similar time period (circa 2009). Apparently, they "built a hypervisor that could on-demand turn a guest operating system into deterministic mode" (see "Deterministic Multiprocessing" at https://homes.cs.washington.edu/~oskin/). My impression is that their product was not a significant commercial success, and the company was acquired by F5 Networks in 2012 (https://comotion.uw.edu/startups/corensic/).

Overall, I don't over-index on novelty, and think it is generally good for ideas to be recycled/revived/re-explored, with updated, modern perspectives. I believe that most rigorous systems designers/engineers likely converge to similar ideas (model checking, value of determinism, etc.) after dealing with these types of complex systems for long enough. But, it is nevertheless interesting to trace the historical developments.

wwilson · 2025-05-30T21:01:57 1748638917

Corensic was impressive tech. I actually debriefed with one of their founders years ago. IIRC, their product was focused on finding single-process concurrency bugs.

Deterministic hypervisors are by no means new. Somebody once told me that VMWare used to support a deterministic emulation mode (mostly used for internal debugging). Apparently they lost the capability some time ago.

wwilson · 2025-05-30T18:06:37 1748628397

Hi Marc, thank you for the correction! We started doing it around 2010, and were not aware of any prior art. But I am not surprised to hear that others had the idea before us. I will give Al credit in the future.

joshstrange · 2025-05-30T16:24:02 1748622242

I think this is the link? https://www.youtube.com/watch?v=4fFDFbi3toc

wwilson · 2024-10-02T01:54:44 1727834084

My guess is rather that he's conflating the US with US + Western Europe.

ccppurcell · 2024-10-02T05:12:14 1727845934

And then you can make the appropriately similar conflation of USSR with Warsaw pact countries.

wwilson · 2024-09-10T21:40:54 1726004454

No, we don’t require any paravirtualization at all, and nothing needs to be manually rewritten. I’m not sure where you got that impression.

It also is not in any sense a replay engine. We don’t need to record anything except the inputs!

Veserv · 2024-09-10T22:22:12 1726006932

At timestamp 23:40 in the video by Alex Pshenichkin from 2024-06-10 it says data ingestion comes via VMCALL interactions. As such a call is literal nonsense if you are not virtualized, any such call inherently means you are using a paravirtualized interface. Now maybe FreeBSD has enough standardized paravirtualized drivers similar to virtio that you can just link it up, but that would still be paravirtualization solution with manual rewrites, just somebody else already did the manual rewrites. Has the fundamental design changed in the last 3 months?

This is exactly a replay engine (or I guess you could say replay engines are deterministic simulators). How do you think you replay a recording except with a deterministic execution system that injects the non-deterministic inputs at precise execution points? This is literally how all replay engines work. Furthermore, how do you think recordings work except by recording the inputs? That is literally how all recording systems designed to feed replay engines work. The only distinction is what constitutes non-determinism in a given context. At the whole hypervisor level, it is just I/O into the guest; at the process level, it is just system calls that write into the process; at the threading level, it is all writes into the process. These distinctions are somewhat interesting at a implementation level, but do not change the fundamental character of the solution which is that they are all a replay engine or deterministic simulator, whatever you want to call it.

wwilson · 2024-09-10T15:29:11 1725982151

This makes the mapping "injective": https://antithesis.com/blog/deterministic_hypervisor/

The "onto" direction doesn't really matter.

nynx · 2024-09-10T15:35:34 1725982534

How can it reverse time? Does it record a stack of every decision point?

intuitionist · 2024-09-10T15:38:49 1725982729

You don’t need to reverse time if you can deterministically reproduce everything that led up to the point of interest. (In practice we save a snapshot of your system at some intermediate point and replay from there.)

wwilson · 2024-09-10T14:30:19 1725978619

Any third party service does need to be mocked or stubbed out. We have a partnership with Localstack that lets us provide very polished AWS mocks that require zero configuration on your part (https://antithesis.com/docs/using_antithesis/environment.htm...).

If you need something else, reach out and ask us about it, because we have a few of them in the pipeline.

wwilson · 2024-09-10T14:25:37 1725978337

You're right that if you tried to do something like this using record/replay, you would pay an enormous cost. Antithesis does not use record/replay, but rather a deterministic hypervisor (https://antithesis.com/blog/deterministic_hypervisor/). So all we have to remember is the set of inputs/changes to entropy that got us somewhere, not the result of every system operation.

slippy · 2024-09-10T15:07:45 1725980865

The classic time space tradeoff question: If I run Antithesis for X time, say 4 hours, do you take periodic snapshot / deltas of state so that I don't have to re-run the capture for O(4 hours) again, from scratch just to go back 5 seconds?

wwilson · 2024-09-10T15:11:31 1725981091

Yes! See Alex's talk here: https://www.youtube.com/watch?v=0E6GBg13P60

In fact, we just made a radical upgrade to this functionality. Expect a blog post about that soon.

wwilson · 2024-09-10T14:24:21 1725978261

Yes, unfortunately we have not figured out how to rewind time in the real world yet. When we do, there are a lot of choices I'm going to revisit...

abeppu · 2024-09-10T14:36:10 1725978970

... but the intro makes it sound like this system is valuable in investigating bugs that occurred in prod systems:

> I’ve been involved in too many production outages and emergencies whose aftermath felt just like that. Eventually all the alerts and alarms get resolved and the error rates creep back down. And then what? Cordon the servers off with yellow police tape? The bug that caused the outage is there in your code somewhere, but it may have taken some outrageously specific circumstances to trigger it.

So practically, if a production outage (where I think "production" means it cannot be in a simulated environment, since the customers you're serving are real) is caused by very specific circumstances, and your production system records some, but not every attribute of its inputs and state ... how does one make use of antithesis? Concretely, when you have a fully-deterministic system that can help your investigation, but you have only a partial view of the conditions that caused the bug ... how do you proceed?

I feel like this post is over-promising but perhaps there's something I just don't understand since I've never worked with a tool set like this.

jackschu · 2024-09-10T18:09:30 1725991770

(I work at Antithesis)

I think you're right that the framing leans towards providing value in prod issues, but we left out how we provide value there. I think you're also right that we're just used to experiencing the value here, but it needs some explanation.

Basically this is where guided, tree-based fuzzing comes in. If something in the real world is caused by very specific circumstances, we're well positions to have also generated those specific circumstances. This is thanks to parallelism, intelligent exploration, fault injection, our ability to revisit interesting states in the past with fast snapshots, etc.

We've had some super notable instances of a customer finds a bug in prod, recalls its that weird bug they've been ignoring that we surfaced a month ago, and then uses this approach to debug.

The best docs on this are probably here: https://antithesis.com/docs/introduction/how_antithesis_work...

yellow_lead · 2024-09-10T16:04:00 1725984240

This was my thinking as well. Prod environments can be extremely complicated and issues often come down to specific configuration or data issues in production. So I had a lot of trouble understanding how the premise is connected to the product here.

qarl · 2024-09-10T14:29:02 1725978542

> Yes, unfortunately we have not figured out how to rewind time in the real world yet.

10 bucks says you get complaints for not implementing the "real world" feature.

wwilson · 2024-09-10T14:16:24 1725977784

The simulation is a completely generic Linux system, so we can run anything (including NodeJS). If your build tool can produce Docker containers, then it will work with us.

We don't run this on your production server, but in the same simulation that we use to find your bugs. See also: https://antithesis.com/product/how_does_antithesis_work/

wwilson · 2024-07-10T17:27:11 1720632431

Biggest advantages I know of for dynamic linking:

* You can use the LD_PRELOAD trick to override behavior at runtime.

* You can run with entirely different implementations of the dynamically linked library in different places.

* Software can pick up interface-compatible upgrades to its dependencies without being re-compiled and distributed again.

We use all three of these tricks in our SDKs, FWIW. But it is still a giant pain in the ass.