False. Fil-C secures C and C++. It’s more comprehensively safe than Rust (Fil-C has no escape hatches). And it’s compatible enough with C/C++ that you can think of it as an alternate clang target.
One of these days, a project will catch on that's vastly simpler than any memory solution today, yet solves all the same problems, and more robustly too, just like how it took humanity thousands of years to realize how to use levers to build complex machines. The solution is probably sitting right under our noses. I'm not sure it's your project (maybe it is) but I bet this will happen.
That’s a really great attitude! And I think you’re right!
I think in addition to possibly being the solution to safety for someone, Fil-C is helping to elucidate what memory safe systems programming could look like and that might lead to someone building something even better
> Fil-C achieves this using a combination of concurrent garbage collection and invisible capabilities (each pointer in memory has a corresponding capability, not visible to the C address space)
In almost all uses of C and C++, the language already has a runtime. In the Gnu universe, it's the combination of libgcc, the loader, the various crt entrypoints, and libc. In the Apple version, it's libcompiler_rt and libSystem.
Fil-C certainly adds more to the runtime, but it's not like there was no runtime before.
It makes it a lot less performant and there is no avoiding or mitigating that downside. C++ is often selected as a language instead of safer options for its unusual performance characteristics even among systems languages in practice.
Fil-C is not a replacement for C++ generally, that oversells it. It might be a replacement for some C++ software without stringent performance requirements or a rigorously performance-engineered architecture. There is a lot of this software, often legacy.
Is that it's literally what us software optimization engineers do. We keep writing optimizations until we find one that is a statistically significant speed-up.
Hence we are running experiments until we get a hit.
The only defense I know against this is to have a good perf CI. If your patch seemed like a speed-up before committing, but perf CI doesn't see the speed-up, then you just p-hacked yourself. But that's not even fool proof.
You just have to accept that statistics lie and that you will fool yourself. Prepare accordingly.
> Is that it's literally what us software optimization engineers do. We keep writing optimizations until we find one that is a statistically significant speed-up.
I don't think that is what it is saying. It is saying you would write one particular optimization (your hypothesis), and then you would run the experiment (measuring speed-up) multiple times until you see a good number.
It's fine to keep trying more optimizations and use the ones that have a genuine speedup.
Of course the real world is a lot more nuanced -- often times measuring the performance speed up involves hypothesis as well ("Does this change to the allocator improve network packet transmission performance?"), you might find that it does not, but you might run the same change on disk IO tests to see if it helps that case. That is presumably okay too if you're careful.
"Multiple times" doesn't have to mean "no modifications". Suppose the software is currently on version A. You think that changing it to a version B might make it more performant, so you implement and profile it. You find no difference, so you figure that your B implementation isn't good enough, and write a slight variation B', perhaps moving around some loops or function calls. If that makes no difference, you keep writing variations B'', B''', B'''', etc., until one of them finally comes out faster than version A. You finally declare that version B (when properly implemented) is better than version A, when you've really just tried a lot more samples.
Well it does mean "no modifications" to the hypothesis, hypothesis being about performance of code A and B. Code B' would be a change.
It's just semantics, but the point is that the article wasn't saying the same thing OP was worried about. There's nothing wrong with testing B, B', B'', etc. until you find a significant performance improvement. You just wouldn't test B several times and take the last set of data when it looks good. Almost goes without saying really.
Sure, it may not be precise repetition, but my idea here is that none of B', B'', etc. are really different than B (they may even compile down to the exact same bytecode), they're just the same thing but written differently. And in fact, none of these are really faster than A, even if they're all "changes". But it's the same issue as any other form of p-hacking, where you keep trying more and more trivial B-variations until you eventually get the result that you're looking for, by random chance. (Cf. the example in xkcd 882, which does change the experimental protocol each time, but only trivially.)
There is, in fact, "something wrong" with this, which is what GP was pointing out. It's literally covered under "Playing with multiple comparisons" in TFA.
(Personally, to combat this, I've ignored the fancy p-values and resorted to the eyeball test of whether it very consistently produces a noticable speedup.)
Perf, though? If a perf optimization changes the UI noticeably other than by making it smoother or otherwise less janky, someone is lying to someone about what "performance" means. Likely though that be, we needn't embarrass ourselves by following the sad example.
No, UIs churn because when they get good and stay that way, PMs start worrying no one will remember what they're for. Cf. 90% of UI changes in iOS since about version 12.
I thought languages such as Rust and flamegraphs and etc were supposed to help us avoid doing all this testing and optimization right? Like I use the built in analysis tools that come with cargo and such and what I have on my os, tools like cutter or reverse engineering tools. Even on python I use the default or standard profiling and optimization tools, I wonder sometimes if I am not doing something enough if the default tools thats recommended should cover most edge cases and performance cases right?
And software ultimately fails at perfect composability. So if you add code that purports to be an optimization then that code most likely makes it harder to add other optimizations.
Well, what is the test you are using to measure performance? Maybe the optimizations help performance in some cases and hurts performance in others... your test might not fully match all real world workloads.
These seem like two different things. Testing many different optimizations is not the same experiment; it's many different experiments. The SE equivalent of the practice being described would be repeatedly benchmarking code without making any changes and reporting results only from the favorable runs.
Doesn’t matter if it’s the same experiment or not.
Say I’m after p<0.05. That means that if I try 40 different purported optimizations that are all actually neutral duds, one of them will seem like a speedup and one of them will seem like a slowdown, on average.
That's not p hacking. That's just the nature of p values. P hacking is when you do things to make a particular experiment more likely to show as a success.
There's another cheeky example of this where you select a pseudo-random seed that makes your result significant. I have a personal seed, I use it in every piece of research that uses random number generation. It keeps me honest!
what they’re referring to might be better put as applying a patch once and then running it 500 times until you get a benchmark thats better than baseline for some reason
My own perf comparison: when I switched from Fil-C running on my system’s libc (recent glibc) for yololand to my own build of musl, I got a 1-2% perf regression. My best guess is that it’s because glibc’s memcpy/memmove/memset are better. Couldn’t have been the allocator since Fil-C’s runtime has its own allocator.
- Userland: the place where you C code lives. Like the normal userland you're familiar with, but everything is compiled with Fil-C, so it's memory safe.
- Yololand: the place where Fil-C's runtime lives. Fil-C's runtime is about 100,000 lines of C code (almost entirely written by me), which currently has libc as a dependency (because the runtime makes syscalls using the normal C functions for syscalls rather than using assembly directly; also the runtime relies on a handful of libc utility functions that aren't syscalls, like memcpy).
So Fil-C has two libc's. The yololand libc (compiled with a normal C compiler, only there to support the runtime) and the userland libc (compiled with the Fil-C compiler like everything else in Fil-C userland, and this is what your C code calls into).
Why does yoyoland need to use libc’s memcpy? Can’t you just use __builtin_memcpy?
On Linux, if all you need is syscalls, you can just write your own syscall wrapper-like Go does.
Doesn’t work on some other operating systems (e.g. Solaris/Illumos, OpenBSD, macOS, Windows) where the system call interface is private to the system shared libraries
> Why does yoyoland need to use libc’s memcpy? Can’t you just use __builtin_memcpy?
Unless you do special things, the compiler turns __builtin_memcpy into a call to memcpy. :-)
There is __builtin_memcpy_inline, but then you're at the compiler's whims. I don't think I want that.
A faithful implementation of what you're proposing would have the Fil-C runtime provide a memcpy function so that whenever the compiler wants to call memcpy, it will call that function.
> On Linux, if all you need is syscalls, you can just write your own syscall wrapper-like Go does.
I could do that. I just don't, right now.
You're totally right that I could remove the yolo libc. This is one of like 1,000 reasons why Fil-C is slower than it needs to be right now. It's a young project so it has lots of this kind of "expedient engineering".
Thanks. I went looking and saw this in the Fil-C manifesto:
> It's even possible to allocate memory using malloc from within a signal handler (which is necessary because Fil-C heap-allocates stack allocations).
Hmm, really? All stack allocations are heap-allocated? Doesn't that make Fil-C super slow? Is there no way to do stack allocation? Or did I misread what you meant by 'stack allocations'?
It’s a GC allocation, not a traditional malloc allocation. So slower than stack allocation but substantially faster than a malloc call.
And that GC allocation only happens if the compiler can’t prove that it’s nonescaping. The overwhelming majority of what look like stack allocations in C are proved nonescaping.
Consequently, while Fil-C does have overheads, this isn’t the one I worry about.
I see! Thanks for that answer. I'm sure I'll have lots of questions, like these:
You say you don't have to instrument malloc(), but somehow you must learn of the allocation size. How?
Are aliasing bugs detected?
I assume that Fil-C is a whole-program-only option. That is, that you can't mix libraries not compiled with Fil-C and ones compiled with Fil-C. Is that right?
So one might want a whole distro built with Fil-C.
How much are you living with Fil-C? How painful is it, performance-wise?
BTW, I think your approach is remarkable and remarkably interesting. Of course, to some degree this just highlights how bad C (and C++) is (are) at being memory-safe.
Malloc is just a wrapper for zgc_alloc and passes the size through. "Not instrumenting malloc" just means that the compiler doesn't have to detect that you're calling malloc and treat it specially (this is important as many past attempts to make C memory safe did require malloc instrumentation, which meant that if you called malloc via a wrapper, those implementations would just break; Fil-C handles that just fine).
Not sure exactly what you mean by aliasing bugs. I'm assuming strict aliasing violations. Fil-C allows a limited and safe set of strict aliasing optimizations, which end up having the effect of loads/stores moving according to a memory model that is weaker than maybe you'd want. So, Fil-C doesn't detect those. Like in any clang-based compiler, Fil-C allows you to pass `-fno-strict-aliasing` if you don't want those optimizations.
That's right, you have to go all in on Fil-C. All libs have to be compiled with Fil-C. That said, separate compilation of those modules and libraries just works. Dynamic linking just works. So long as everything is Fil-C.
Yes you could build a distro that is 100% Fil-C. I think that's possible today. I just haven't had the time to do that.
All of the software I've ported to Fil-C is fast enough to be usable. You don't notice the perf "problem" unless you deliberately benchmark compute workloads (which I do - I have a large and ever-growing benchmark suite). I wrote up my thoughts about this in a recent twitter discussion: https://x.com/filpizlo/status/1920848334429810751
A bunch of us PL implementers have long "joked" that the only thing unsafe about C are the implementations of C. The language itself is fine. Fil-C sort of proves that joke true.
> Not sure exactly what you mean by aliasing bugs.
I meant that if the same allocation were accessed as different kinds of objects, as if through a union, ... I guess what I really meant to ask is: does Fil-C know the types of objects being pointed to by a pointer, and therefore also the number of elements in arrays?
So, if you store a pointer to a location in memory and then load from that location using pointer type, then you get the capability that was last stored. But if the thing stored at the location was an integer, you get an invalid capability.
So Fil-C’s “type” for an object is ever evolving. The memory returned from malloc will be nothing but invalid capabilities for each pointer width word in that allocation but as soon as you store pointers to it then the locations you stored those pointers to will be understood as being pointer locations. This makes unions and weird pointer casts just work. But you can ever type confuse an int with a pointer, or different pointer types, in a manner that would let you violate the capability model (ie achieve the kind of weird state where you can access any memory you like).
Lots of tricks under the hood to make this thread safe and not too expensive.
When I was working with Envoy Proxy, it was known that perf was worse with musl than with glibc. We went through silly hoops to have a glibc Envoy running in an Alpine (musl) container.
Not sure but in likely to because right now I to use the same libc in userland (the Fil-C compiled part) and yololand (the part compiled by normal C that is below the runtime) and the userland libc is musl.
Having them be the same means that if there is any libc function that is best implemented by having userland call a Fil-C runtime wrapper for the yololand implementation (say because what it’s doing requires platform specific assembly) then I can be sure that the yololand libc really implements that function the same way with all the same corner cases.
But there aren’t many cases of that and they’re hacks that I might someday remove. So I probably won’t have this “libc sandwich” forever
> Sugiarto found a favorable cost-benefit analysis. The study estimated each crossing structure could save society between $235,000 and $443,000 annually through collision reductions. The savings varied based on structure size, design and location.
The issues with homebuilding are entirely to do with regulation, zoning, etc. Developers will build homes if they're allowed to. Banks will give them the loans (homes don't require government money). The problem is them not being allowed to. Because you're not allowed to build multi-family homes in a neighborhood, you're not allowed to build tall apartment buildings, approvals take forever, etc.
This is a really great article about binary compatibility!
I disagree with their idea for fixing it by splitting up glibc. I think it's a bad idea because it doesn't actually fix the problems that lead to compat breakage, and it's bad because it's harder than it seems.
They cite these compat bugs as part of their reasoning for why glibc should be split up:
I don't see how a single one of these would be fixed by splitting up glibc. If their proposed libdl or libthread were updated and had one of these regressions, it would cause just as much of a bug as if a monolithic libc updates with one of these regressions.
So, splitting up glibc wouldn't fix the issue.
Also, splitting up glibc would be super nasty because of how the threading, loading, and syscall parts of libc are coupled (some syscalls are implemented with deep threading awareness, like the setxid calls, threads need to know about the loader and vice-versa, and other issues).
I think the problem here is how releases are cut. In an ideal world, glibc devs would have caught all three of those bugs before shipping 2.41. Big corpos like Microsoft manage that by having a binary compatibility team that runs All The Apps on every new version of the OS. I'm guessing that glibc doesn't have (as much of) that kind of process.
I think this paper overestimates the benefit of what I call isoheaps (partitioning allocations by type). I wrote the WebKit isoheap implementation so it’s something I care about a lot.
Isoheaps can make mostly neutralize use after free bugs. But that’s all they do. Moreover they don’t scale well. If you isoheap a select set of stuff then it’s fine, but if you tried to deploy isoheaps to every allocation you get massive memory overhead (2x or more) and substantial time overhead too. I know because I tried that.
If an attacker finds a type confusion or heap buffer overflow then isoheaps won’t prevent the attacker from controlling heap layout. All it takes is that they can confuse an int with a ptr and game over. If they can read ptr values as ints then they can figure out how the heap is laid out (no matter how weirdly you laid it out). If they also can write ptr values as ints then control they whole heap. At that point it doesn’t even matter if you have control flow integrity.
To defeat attackers you really need some kind of 100% solution where you can prove that the attacker can’t use a bug with one pointer access to control the whole heap.
Yes, having coded quite a few years in C++ (on the Firefox codebase) before migrating to Rust, I believe that many C and C++ developers mistakenly assume that
1/ memory safety is the unachievable holy grail of safety ;
2/ there is a magic bullet somewhere that can bring about the benefits of memory safety, without any of the costs (real or expected).
In practice, the first assumption is wrong because memory-safety is just where safety _starts_. Once you have memory-safety and type-safety, you can start building stuff. If you have already expended all your cognitive budget on reaching this point, you have lost.
As for the magic bullets, all those I've seen suggested are of the better-than-nothing variety rather than the it-just-works variety they're often touted as. Doesn't mean that there won't ever be a solution, but I'm not holding my breath.
And of course, I've seen people claim more than once that AI will solve code safety & security. So far, that's not quite what's written on the wall.
Well, GC is very close to that magic bullet (comparatively spatial safety via bound checking is easy). It does have some costs of course, especially in a language like C++ that is GC-hostile.
C++ isn't hostile toward garbage collection — it's more the programmers using C++ who are . C++ is the only language that can have an optional, totally pause-less, concurrent GC engine (SGCL). No other programming language, not even Java, offers such a collector.
Lots of pauseless concurrent GCs have shipped for other languages. SGCL is not special in that regard. Worse, SGCL hasn’t been shown to actually avoid disruptions to program execution while the shipping concurrent GCs for Java and other languages have been shown to really avoid disruptions.
(I say disruptions, not pauses, because avoiding “pauses” where the GC “stops” your threads is only the first step. Once you tackle that you have to fix cases of the GC forcing the program to take bursts of slow paths on pointer access and allocation.)
SGCL is a toy by comparison to other concurrent GCs. For example is has hilariously complex pointer access costs that serious concurrent GCs avoid.
There isn’t a single truly pause-less GC for Java — and I’ve already proven that to you before. If such a GC exists for any other language, name it.
And no, SGCL doesn’t introduce slow paths, because mutators never have to synchronize with the GC. Pointer access is completely normal — unlike in other languages that rely on mechanisms like read barriers.
> There isn’t a single truly pause-less GC for Java — and I’ve already proven that to you before. If such a GC exists for any other language, name it.
You haven't proven that. If you define "pause" as "the world stops", then no, state of the art concurrent GCs for Java don't have that. If you define "pause" as "some thread might sometimes take a slow path due to memory management" then SGCL has those, as do most memory management implementations (including and especially malloc/free).
> And no, SGCL doesn’t introduce slow paths, because mutators never have to synchronize with the GC. Pointer access is completely normal — unlike in other languages that rely on mechanisms like read barriers.
The best concurrent GCs have no read barriers, only extremely cheap write barriers.
You have allocation slow paths, at the very least.
First, there are no Java GCs that completely eliminate stop-the-world pauses. ZGC and Shenandoah reduce them to very short, sub-millisecond windows — but they still exist. Even the most concurrent collectors require STW phases for things like root scanning, final marking, or safepoint synchronization. This is documented in OpenJDK sources, benchmarks, and even in Oracle’s own whitepapers. Claiming Java has truly pause-less GC is simply false.
Second, you’re suggesting there are moving GCs that don’t use read barriers and don’t stop mutator threads at all. That’s technically implausible. Moving collectors by definition relocate objects, and unless you stop the world or have some read barrier/hazard indirection, you can’t guarantee pointer correctness during concurrent access. You must synchronize with the mutator somehow — either via stop-the-world, read barriers, or epoch/hazard-based coordination. It’s not magic, it’s basic memory consistency.
SGCL works without moving anything. That’s why it doesn’t need synchronization, read barriers, or even slow-path allocation stalls. That’s not a limitation — that’s a design goal. You can dislike the model, but let’s keep the facts straight.
It is hostile in the sense that it allows hiding and masking pointers, so it is hard to have an exact moving GC.
SGCL, as impressive as it is, AFAIK requires pointers to be annotated, which is problematic for memory safety, and I'm not sure that it is a moving GC.
SGCL introduces the `tracked_ptr` smart pointer, which is used similarly to `shared_ptr`. The collector doesn't move data, which makes it highly efficient and — perhaps surprisingly — more cache-friendly than moving GCs.
Folks who make claims about the cache friendliness of copying GCs have millions of lines of credible test code that they’ve used to demonstrate that claim.
Compaction doesn't necessarily guarantee cache friendliness. While it does ensure contiguity, object layout can still be arbitrary. True cache performance often depends on the locality of similar objects — for example, memory pools are known for their cache efficiency. It's worth noting that Go deliberately avoids compaction, which suggests there's a trade-off at play.
As I mentioned earlier, take a look at the Golang. It's newer than Java, yet it uses a non-moving GC. Are you assuming its creators are intentionally making slower this language?
I feel that the lack of GC is one of the key differentiators that remain to C++. If a group of C++ developers were to adopt a GC, they'd be well on their way to abandoning C++.
This would be an option if the Linux userland wasn't a mish-mash of unconnected developers with their own practices and release cadence. It's why we have LTS distros where the company will put in the massive amount of work to preserve binary compatibility.
But the trade-off is that the software you have in your repos will be really old. At the end your RHEL support cycle libs will be a decade out of date.
I don't understand the use of JS benchmarks to compare Servo to Ladybird.
Ladybird indeed has its own JS engine, and it's very young and still interpreted.
On the other hand, Servo uses SpiderMonkey (i.e. shipping C++ code from Mozilla) as its JS engine.
So a Servo vs Ladybird comparison using JS benchmarks is really a comparison of Mozilla's shipping JS engine to Ladybird's from-scratch JS engine. And they're both written in C++.
I don't think there are any plans for Ladybird to have a JIT compiler (they used to have one but decided to remove it) [1, 2], so it's not clear to me that this performance gap will be improved anytime soon (if ever).
> I don't think there are any plans for Ladybird to have a JIT compiler (they used to have one but decided to remove it) [1, 2], so it's not clear to me that this performance gap will be improved anytime soon (if ever).
How does that make this comparison make any sense?
Point is, this isn't a Servo vs Ladybird comparison. It's a Mozilla vs Ladybird comparison.
Yes, here[0]. Although, it's not anywhere close to being used for everyday things. There are blockers listed in their GitHub issues and various issues posted to the Swift forums.
Thank you! I'd imagine performance-sensitive components in the engine need to remain in C++ (or a similar systems language) right? However, I'm not privy to Swift's runtime benchmarks.
> imagine performance-sensitive components in the engine need to remain in C++
I'd imagine so, yes. I think the vision is to use Swift in "risky" areas like parsing data for example. Probably much more too, but the big hitters would be safety critical areas I think.
False. Fil-C secures C and C++. It’s more comprehensively safe than Rust (Fil-C has no escape hatches). And it’s compatible enough with C/C++ that you can think of it as an alternate clang target.
reply