Most of our tools that adjust files work really well offline. systemd-tmpfiles, systemd-sysusers, systemctl and so on all support --root= and --image=.
I really don't know what the problem is suppoaed to be.
The 64bit issue is certainly an issue, but very much overblown.
First of all, in systemd, which is a heavy D-Bus user, we effectively IRL only send integers > 2^53 when we use UINT64_MAX as a special "niche" marker meaning "unlimited"/"unset"/"undefined". But the thing is that JSON has a proper concept for this: the "null" value (or simply not specifying a specific JSON field). Hence, in reasonably clean APIs this is mostly a non-issue.
That said, sd-varlink/sd-json (i.e. systemd's implementation of it), of course is 64bit signed and unsigned integer clean when it processes JSON. More-over it automatically handles if you do what the various specs on the internet suggest you do if you have an integer > 2^53: you encode it as decimal value as a string.
Would it be better if JSON would have been more precise on this, yes. Is it a big issue? No, not at all.
If you want more than 1-second precision, 64 bits are not enough. (Hmm does all C++ std::chrono implementation utilize 128-bit integers for nanosecond precision?)
The marshalling cost for JSON is negligible. Yes, it might be a bit slower than GVariant for example, but only by some fractional linear factor. And on small messages (which D-Bus currently always is, due to message size constraints enforced by broker) the difference is impossible to measure. To a point it really doesn't matter, in particular as JSON parsers have been ridiculously well optimized in this world.
What does matter though are roundtrips. In Varlink there are much fewer required for typical ops than there are in D-Bus. That's because D-Bus implies a broker (which doubles the number of roundtrips), but also because D-Bus forces you into a model of sending smaller "summary" messages when enumerating plus querying "details" for each listed objects, because it enforces transfer rate limits on everything (if you hit them, you are kicked off the bus), which means you have to refrain from streaming too large data.
Or in other words: marshalling is quite an irrelevant minor detail when it comes to performance, you must look at roundtrips instead and the context switches it effects, instead.
Using JSON for this has two major benefits: the whole world speaks JSON, and modern programming languages typically pretty natively. And it's directly readable in tools such as strace. With a simple "strace" I can now reasonably trace my programs, which a binary serialization will never allow you. And if you tell me that that doesn't matter, then you apparently live in an entirely different world than I do, because in mine debuggability does matter. A lot. Probably more than most other things.
To save others the click: Their issues were simply that Swift has no fast JSON impl, and in Rust, when using serde (most popular library handling JSON marshalling), it leads to binaries getting a bunch bigger. That's it. So yeah, same perspective -- unless either of the above matter in your case (in 90%+ of cases they don't), JSON is just fine from a perf perspective.
Serde is a rather chunky dependency, it's not just a matter of binaries getting bigger, but also compile times being dramatically slower.
IMO CBOR would be a better choice, you aren't limited to IEEE 754 floats for your numeric types. Yeah, some (de/en)coders can handle integer types, but many won't, it's strictly out of spec. I don't think building something as fundamental to an OS as relying on out-of-spec behavior is a great idea. It will result in confusion and many wasted hours sooner or later.
> CBOR would be a better choice, you aren't limited to IEEE 754 floats for your numeric types.
The other side of this coin, of course, is that now you have to support those other numeric types :) My usual languages of choice somehow don't support "negative integers in the range -2^64..-1 inclusive".
I mean, you don't have to support those? You still would need something on the other end to produce that type of datatype, which can be documented that it will never happen: you're making an interface anyways. The problem is if you literally don't have the option to represent common datatypes it will be a problem, not a hypothetical one just because the encoding layer can support it. Those are different problems.
And JSON, technically, allows use of unlimited-precision fractions, but also allows implementations to set arbitrary limits (it actually does, you're not required to parse JSON numbers as doubles). So the situation is not really different from CBOR, isn't it? Just™ make both sides to agree to stick to some common subset (e.g. integers-in-int64_t-range-only for some fields) and you're done; no need to support double-precision floating point numbers.
Huh, I went and referenced the ECMA JSON spec and you're right that it treats numbers only as sequences of digits which would make these effectively the same problem
I read the slides, and I found it refreshing that you said at the end: don't create per-language bindings for the libraries shipped with systemd, but simply use a JSON parser for your language. That underlined that you've specified a simple protocol.
Also, there have clearly also been several attempts over the years to make a faster D-Bus implementation (kdbus, BUS1), which were never accepted into the kernel. It makes a lot of sense to instead design a simpler protocol.
There is clearly also a cautionary take about how microbenchmarks (here, for serialisation) can mask systemic flaws (lots of context switches with D-Bus, especially once polkit had to be involved).
The danger I see is that JSON has lots of edge behavior around deserialization, and some languages will deserialize a 100 digit number differently. If the main benefit is removing the broker and the need for rate limiting - it could have been accomplished without using JSON.
You are writing this as if JSON was a newly invented thing, and not a language that has become the lingua franca of the Internet when it comes to encoding structured data. Well understood, and universally handled, since 1997.
A 100 digit number cannot be encoded losslessly in D-Bus btw, nor in the far majority of IPC marshallings on this word.
Having done systems-level OS development since 25y or so I never felt the burning urge to send a 100 digit number over local IPC.
Not that 100 digit numbers aren't useful, even in IPC, but typically, that's a cryptography thing, and they generally use their own serializations anyway.
You are writing this as if security was a newly invented thing. Having done systems level security development for 12 years, anything that can be produced maliciously will be. By using JSON, you've invented a new vulnerability class for malicious deserialization attacks.
Actually, not new. Earliest CVE I found was from 2017, which feels a decade later than it should be. I guess no one thought of pushing JSON over trusted interfaces, and probably for good reason.
> A 100 digit number cannot be encoded losslessly in D-Bus btw
I think the concern is that large numbers can in fact be encoded in JSON, but there is no guarantee that they will be decoded correctly by a receiver as the format is underspecified. So you have to cater for the ill defined common denominator.
Honestly, the only thing that surprises me is you're being pedantic, and encoding int64s as strings.
I know you know JSON is nominally only 53-bit safe, because JS numbers are doubles. But in practice I'd wager most JSON libraries can handle 64-bit integers.
What if varlink supported both JSON and a subset of cbor with the "cbor data follows" tag at the beginning (so the server can determine if it is json or cbor based on the beginning of the message)?
It would add a little complexity to the server, but then clients can choose if they want to use a human readable format that has more available libraries or a binary format.
As for strace, tooling could probably be added to automatically decode cbor to json, either as part of strace, or in a wrapper.
There could also be a varlink proxy (similar to varlink bridge) that could log or otherwise capture requests in a human readable format.
Not really. We use two text based formats for logging: BSD syslog, and systemd's structured logging (which is basically an env block, i.e. a key-value set, with some tweaks). Programs generate text logs, journald reads text logs hence. Programs that read from the journal get the text-based key/value stuff back, usually.
(Yes, we then store the structure log data on disk in a binary format. Lookup indexes are just nasty in text based formats).
Hence, not sure what the Journal has to do with Varlink, but any IPC that the journal does is text-based, and very nicely strace'able in fact, I do that all the time.
[Maybe, when trying to be a smartass, try to be "smart", and not just an "ass"?]
Sure the interface with the log might be text based, but my understanding is that the at rest format is binary and you need specialized tools to read it, standard unix grep is not going to cut it.
Although I use strace all the time, I hardly ever look at the payload of read and write calls, although I could see why it would be useful. But given a binary protocol it wouldn't be terribly hard to build a tool that parses the output of strace.
> [Maybe, when trying to be a smartass, try to be "smart", and not just an "ass"?]
thanks for the kind words and elevating the tone of the discussion.
The marshalling cost might be negligible for come use cases, but the bandwidth usage definitely is not. I think the best interface description protocol is one where the serialization format is unspecified. Instead, the protocol describes how to specify the structure, exchange sequences, and pre/post-conditions. A separate document describes how to implement that specification with a certain over the wire format. That way the JSON folks can use JSON when they want (unless they are using large longs), and other folks can use what they want (I like CBOR).
is this true of future desktop uses cases where every basic function will cause a torrent of traffic on that? or you're talking from a server start/stopping services only point of view?
I’ve worked with profiling code where the marshaling cost for JSON was the biggest cost. Namely it involved a heap allocation and copying a ton more data than was actually needed, and I ended up fixing it by turning the JSON into a static string and dropping the values in manually.
The systemd maintainers have probably done their due diligence and concluded that it isn’t an issue for their forseeable use cases, but it does lock everything in to doing string processing when interfacing with systemd, which is probably unnecessary. And you can’t trivially swap systemd out for something else.
systemd is so pervasive that it would be fine to add a binary-format-to-JSON translation ability into strace. That shifts the cost of debugging to the debug tools, rather than slowing down production code.
Doing any string processing tends to require a lot of branching, and branch mispredictions are most likely to slow down code. It also turns every 1-cyle load/store instruction into N-cycles.
String processing in C, which is what systemd and a lot of system tools are written, is pretty abysmal.
systemd is also non-optional, so if it turns out that it’s causing cache thrashing by dint of something generating a lot of small events, it’s not something you can do something about without digging into the details of your lowlevel system software or getting rid of systemd.
And it’s potentially just that much more waste on old or low-power hardware. Sure, it’s probably “negligible”, but the effort required to do anything more efficient is probably trivial compared to the aggregate cost.
And yeah, it may be better than D-Bus, but “it’s not as bad as the thing that it replaced” is pretty much the bare minimum expectation for such a change. I mean, if you’re swapping out things for something that’s even worse, what are you even doing?
I see there’s a TCP sidechannel, but why increase the complexity of the overall system by having two different channels when you could use one?
Dunno. This isn’t really an area that I work in, so I can’t say for sure it was the wrong decision, but the arguments I hear being made for it don’t seem great. For something fundamental like systemd, I’d expect it to use a serialization format that prioritizes being efficient and strongly-typed with minimal dependencies, rather than interoperability within the application layer with weakly-typed interpreted languages. This feels like a case of people choosing something they’re more personally familiar with than what’s actually optimal (and again, the reason I’d consider being optimal in this case being worth it is because this is a mandatory part of so many devices).
EDIT: Also, the reason that binary serialization is more efficient is because it’s simpler - for machines. JSON looks simpler to humans, but it’s actually a lot more complex under the hood, and for something fundamental having something simple tends to be better. Just because there’s an RFC out there that answers every question you could possibly have about JSON still doesn’t mean it’s as good as something for which the spec is much, much smaller.
JSON’s deceptive simplicity also results in people trying to handroll their own parsing or serialization, which then breaks in edge cases or doesn’t quite 100% follow the spec.
And Just because you’re using JSON doesn’t force C/++ developers to validate it, someone can still use an atoi() on an incoming string because “we only need one thing and it avoids pulling in an extra dependency for a proper json parser”, then breaks when a subsequent version of systemd changes the message. Etc. If the goal is to avoid memory safety issues in C/++, using more strings is not the answer.
i honestly don't really get the angle of debugging via strace - i'd much rather prefer something more wireshark-like, where I can see all messages processes are sending to each other, since that would make it easier to decipher cases where sending a message to a service causes it to send other messages to its backends
Uh. systemd documents the protocol at various places and the protocol is trivial: a single text datagram sent to am AF_UNIX socket whose path you get via the NOTIFY_SOCKET. That's trivial to implement for any one with some basic unix programming knowledge. And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise. In particular non-C environments really should do their own native impl and not botjer wrapping libsystemd just for this.
But let me stress two other things:
Libselinux pulls in liblzma too and gets linked into tons more programs than libsystemd. And will end up in sshd too (at the very least via libpam/pam_selinux). And most of the really big distros tend do support selinux at least to some level. Hence systemd or not, sshd remains vulnerable by this specific attack.
With that in mind libsystemd git dropped the dep on liblzma actually, all compressors are now dlopen deps and thus only pulled in when needed.
> And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise
Deferring the load of the library often just makes things harder to analyze, not necessarily more secure. I imagine many of the comments quoting `ldd` are wrongly forgetting about `dlopen`.
(I really wish there were a way to link such that the library isn't actually loaded but it still shows in the metadata, so you can get the performance benefits of doing less work but can still analyze the dependency DAG easily)
It would make things more secure in this specific backdooring case, since sshd only calls a single function of libsystemd (sd_notify) and that one would not trigger the dlopen of liblzma, hence the specific path chosen by the backdoor would not work (unless libselinux fucks it up fter all, see other comments)
Dlopen has drawbacks but also major benefits. We decided the benefits relatively clearly outweigh the drawbacks, but of course people may disagree.
I have proposed a mechanism before, that would expose the list of libs we potentially load via dlopen into an ELF section or ELF note. This could be consumed by things such as packagae managers (for auto-dep generation) and ldd. However there was no interest in getting this landed from anyone else, so I dropped it.
Note that there are various cases where people use dlopen not on hardcoded lib names, but dynamically configured ones, where this would not help. I.e. things like glibc nss or pam or anything else plugin based. But in particular pam kinda matters since that tends to be loaded into almost any kind of security relavant software, including sshd.
The plugin-based case can covered by the notion of multiple "entry points": every library that is intended to be `dlopen`ed is tagged with the name of the interface it provides, and every library that does such `dlopen`ing mentions the names of such interfaces rather than the names of libraries directly. Of course your `ldd` tool has to scan all the libraries on the system to know what might be loaded, but `ldconfig` already does that for libraries not in a private directory.
This might sound like a lot of work for a package-manager-less-language ecosystem at first, but if you consider "tag" as "exports symbol with name", it is in fact already how most C plugin systems work (a few use an incompatible per-library computed name though, or rely entirely on global constructors). So really only the loading programs need to be modified, just like the fixed-name `dlopen`.
> And i tell pretty much anyone who wants to listen that they should just implement the proto on their own if thats rhe only reason for a libsystemd dep otherwise.
That's what I think too. Do the relevant docs point this out too? Ages ago they didn't. I think we should try to avoid that people just google "implement systemd notify daemon" and end up on a page that says "link to libsystemd and call sd_notify()".
The correct thing to do would be to put different unrelated APIs into their own library, instead of everything into libsystemd0. This has always been one of my biggest issues with it. It makes it hard to replace just one API from that library, because on a binary distribution, only one package can provide it. And as a nice side effect, surprises like this one could then be avoided.
systemd developers have already rejected that approach, so I guess we will end up with lots of reimplementations, both in individual projects and third-party libsystemd-notify style libraries.
I see that different clients implemented in different languages will need different client-libraries and maintaining all that is not something, a core project is going to do but if using the raw protocol instead of the convenience of libsystemd is a (commonly ignored) recommendation which makes a lot of sense in terms of segmentation, providing at least one reference implementation would point all systemd users into the right direction.
Recommending that each client should just implement the (trivial) protocol access itself does not make so much sense to me.
You also need to have sshd enabled to use PAM and your sshd pam stack should include pam_selinux. Then it will be dynamically loaded only when sshd starts a PAM session.
With systemd you can enroll any string you want as "PIN" for tpm. There are no restrictions. Can be long, can be alphanumeric, contain weird chars, up to you.
systemd has a similar logic, i.e. a recovery key concept, but we made sure you can type it in wherever a LUKS password would work too, even on systems where systemd is not available but LUKS ist. The recovery key is output in yubikey's modhex alphabet which means you can type it in on many keyboards even without setting a keymap first, and will work. We also output it as qr code, in case you want to scan it off. All on all it should be as robust as a recovery key could be.
Yes, a tpm2 enrollment takes up one slot, the recovery key another, a fido2 yet another, a pkcs11 key yet another and a password yet another in any combination/subset you like.
That's a highly unusual attitude for systemd. Most of the systemd architecture requires you to run systemd for everything if you use it at all. What changed?
Still would love if desktop terminal emulators would implement the zmodem receiver side, so that you can ssh into some host of your choice and just type "sz" to copy arbitrary files of your choice onto your local system.
It’s just not the same as having transfers available on the remote command line. I have a program I wrote that maintains a local server with a port forwarded over SSH. Then on the remote side I have a client program that sends a file(s) (or folder) back to my local computer. It can either save the file or open it, depending on the command I ran.
This type of automatic transfer makes it very convenient to generate figures on a server (where the data lives) and then view them locally. It’s a much better workflow than having to use sshfs or scp.
What I wrote is really quite similar to an old transfer program like zmodem with the added feature of auto opening a file if I choose.
Once you mount everything as local with rclone, transferring protocols have no sense, everything should be part of a filesystem. Plan9 did it right, there's no difference between local and remote once you get the grasps of 'bind'.
But if the analysis programs are in a different server or you don’t want to transfer 100s of GB of data to the local machine for processing, you still want to have the ability to selectively transfer individual files.
It doesn’t matter if you mount a remote server locally… unless your data is of a trivial size, you still want to do the processing remotely.
I see this as a theory vs practice difference. In theory having one unified file system is great and the way to go. In practice… there are issues.
screen supports zmodem if you have lrzsz installed. Just put "zmodem catch" into your ~/.screenrc , then ssh while in screen and execute sz/rz on the remote end. screen will automatically recognize the control sequence and prompt with a completed sz/rz command - enter executes and sends/receives the file.
We actually use something like the above. But thats not sufficient since we cannot set up the PAM session fully if $HOME is not accessible because we can't acquire a password for it...
I really don't know what the problem is suppoaed to be.
Lennart